Hi~~ I meet the following error when install apex with --cuda_ext
Environment info:
CUDA version: 9.1.85
CUDNN version: 7102
Kernel: 4.4.0-124-generic
Python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16) [GCC 7.3.0]
PyTorch: 0.4.0
Numpy: 1.14.3
Detailed logs:
(env) I have no name!@0c946469404d:/scratch/Chong_dxxz_Projects/Gitlab/Sparsity-NVIDIA/Libs/Apex$ python setup.py install --cuda_ext
torch.__version__ = 0.4.0
running install
running bdist_egg
running egg_info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'fused_adam_cuda' extension
gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/scratch/Chong_dxxz_Projects/Gitlab/Sparsity-NVIDIA/env/lib/python3.6/site-packages/torch/lib/include -I/scratch/Chong_dxxz_Projects/Gitlab/Sparsity-NVIDIA/env/lib/python3.6/site-packages/torch/lib/include/TH -I/scratch/Chong_dxxz_Projects/Gitlab/Sparsity-NVIDIA/env/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.6m -c apex/optimizers/csrc/fused_adam_cuda.cpp -o build/temp.linux-x86_64-3.6/apex/optimizers/csrc/fused_adam_cuda.o -O3 -DTORCH_EXTENSION_NAME=fused_adam_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
apex/optimizers/csrc/fused_adam_cuda.cpp:1:29: fatal error: torch/extension.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1
pytorch c++ APIs went through lots of backward-incompatible changes in the last few months. Unfortunately, if you want to use apex APIs that require building c/c++ extensions, you need relatively new version of pytorch (nightly packages or building from source). If you prefer to stick with 0.4 pytorch, you won't be able to use APIs that need compiled extensions, such as fused adam optimizer.
Yes, due to Pytorch's C++ API being (very) unstable and in flux prior to Pytorch 1.0, we only support cpp and cuda extensions for 1.0 and later. If you're set up with nvidia-docker, you could, for example, build Apex with cpp and cuda extensions within either an Nvidia NGC container (18.10 or 18.11) or the official Pytorch 1.0 container.
(substitute MY_IMAGE = nvcr.io/nvidia/pytorch:18.10-py3 or nvcr.io/nvidia/pytorch:18.11-py3 or pytorch/pytorch:nightly-devel-cuda9.2-cudnn7)
$ docker pull MY_IMAGE
$ docker run --runtime=nvidia -it --rm --ipc=host -v /data/on/bare/metal:/data/in/container MY_IMAGE
...
# git clone https://github.com/NVIDIA/apex.git
# cd apex
# python setup.py install --cpp_ext --cuda_ext
and it should work.
Thanks~~~
Most helpful comment
Yes, due to Pytorch's C++ API being (very) unstable and in flux prior to Pytorch 1.0, we only support cpp and cuda extensions for 1.0 and later. If you're set up with nvidia-docker, you could, for example, build Apex with cpp and cuda extensions within either an Nvidia NGC container (18.10 or 18.11) or the official Pytorch 1.0 container.
(substitute MY_IMAGE =
nvcr.io/nvidia/pytorch:18.10-py3ornvcr.io/nvidia/pytorch:18.11-py3orpytorch/pytorch:nightly-devel-cuda9.2-cudnn7)and it should work.