Hi, I'm using detectron2 on a computing cluster and thus have various gpus that the code will be run on as per allocation. detectron was installed successfully and i'm able to import it from python.
However I get the following error on certain(most) gpus:
RuntimeError: CUDA error: no kernel image is available for execution on the device (ROIAlign_forward_cuda at /network/home/guptagun/od/detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:361)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f2459bbc687 in /network/home/guptagun/anaconda3/envs/detectron/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: detectron2::ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xa24 (0x7f23f419189c in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: detectron2::ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xb6 (0x7f23f4132f66 in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x4ec8f (0x7f23f4144c8f in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x49750 (0x7f23f413f750 in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
<omitting python frames>
frame #9: THPFunction_apply(_object*, _object*) + 0x8d6 (0x7f245a4abe96 in /network/home/guptagun/anaconda3/envs/detectron/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
While on some gpus (one of them being Geforce GTX) the code runs as expected.
I was trying to run the demo.py file through:
python detectron2_repo/demo/demo.py --config-file detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input ./leftImg8bit.png --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
output of python -m detectron2.utils.collect_env.
------------------------ --------------------------------------------------
sys.platform linux
Python 3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0]
Numpy 1.15.4
Detectron2 Compiler GCC 7.4
Detectron2 CUDA Compiler 10.0
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.3.0
PyTorch Debug Build False
torchvision 0.4.1a0+d94043a
CUDA available True
GPU 0 GeForce GTX TITAN X
CUDA_HOME None
Pillow 5.3.0
cv2 4.1.0
------------------------ --------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
when i build detectron using : python setup.py build develop
TORCH_CUDA_ARCH_LIST was set empty, and so it should have been compiled for all architectures? (acc to https://github.com/facebookresearch/detectron2/issues/62#issuecomment-549432420)
What can I do while compiling so that I'm able to use detectron on most gpus, or is this an issue with the compute node I'm using?
Thanks,
Gunshi
Detectron2 CUDA Compiler 10.0
- CUDA Runtime 10.1
Your cuda versions mismatch and that's not allowed.
Hi, I have actually tried loading both cuda/10.0 and cuda/10.1 modules one by one before running the demo.py command and i still get the error, are you saying i should install pytorch and then detectron2 again but after loading cuda 10.0 specifically?
You should in general look at python -m detectron2.utils.collect_env to see whether this has been fixed prior to running detectron2.
You need to install the pytorch that matches your cuda module and recompile detectron2 afterwards.
@ppwwyyxx I am facing the exact same issue and my pytorch and detectron2 are compiled with exact same cuda versions. Also, I am facing this issue when I try to run detectron2 on different GPU than the one I have used to compile it. Here I compiled with titanX GPU, so it doesn't work on titanrtx or other GPUs. Note that I haven't installed using pip as I am modifying the codebase(only python files, not touching any cuda implementation) for my research, not sure if that has any effect though. Here is the output of python -m detectron2.utils.collect_env. Could you try installing on one GPU and test on other and see if this is general issue or I messed something up.
------------------------ --------------------------------------------------
sys.platform linux
Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Numpy 1.17.2
Detectron2 Compiler GCC 7.4
Detectron2 CUDA Compiler 10.0
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.3.0
PyTorch Debug Build False
torchvision 0.4.1a0+d94043a
CUDA available True
GPU 0 TITAN V
CUDA_HOME /ai/apps/cuda/10.0
NVCC Cuda compilation tools, release 10.0, V10.0.130
Pillow 6.2.0
cv2 4.1.0
------------------------ --------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.0
- NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
I am facing this issue when I try to run detectron2 on different GPU than the one I have used to compile it.
answered in https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md#common-installation-issues
@ppwwyyxx can you tell me if there is any way to set TORCH_CUDA_ARCH_LIST such that it works across all GPUs which support a particular cuda version? I see this variable set in Dockerfile for a set of GPUs, is there any command to set it for all? Like TORCH_CUDA_ARCH_LIST=All, rather than specifying GPUs TORCH_CUDA_ARCH_LIST="Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
This is a pytorch question and you can refer to https://github.com/pytorch/pytorch/issues/18781
It does not seem like "All" works for extension compilation at the moment.
I have the same problem trying to execute the following code:
>>> import torch
>>> device = torch.device("cuda")
>>> torch.rand(10).to(device)
RuntimeError: CUDA error: no kernel image is available for execution on the device
I solved the problem changing the driver from open source to NVIDIA proprietary (Ubuntu 18.04)
before: Using NVIDIA driver metapackage from nvidia-driver-410 (open source)
after: Using NVIDIA driver metapackage from nvidia-driver-390 (proprietary)
>>> torch.rand(10).to(device)
tensor([0.9129, 0.8937, 0.7499, 0.5510, 0.5670, 0.9313, 0.3335, 0.4019, 0.2288,
0.4771])
Hope it helps.

Hi @ppwwyyxx! I am having the same problem and the solution from Common Installation Issues didn't help.
I am running the code on Tesla V100 and getting an error. Running on Tesla K80 didn't produce such an error.
The error:
RuntimeError: CUDA error: no kernel image is available for execution on the device (ROIAlign_forward_cuda at /home/veronica/detectron2/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:364)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fadadba1193 in /home/veronica/dirs/detectron2/env2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: detectron2::ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0x9f4 (0x7fada3f24f2d in /home/veronica/dirs/detectron2/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: detectron2::ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0x9c (0x7fada3ea436c in /home/veronica/dirs/detectron2/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #3:+ 0x55465 (0x7fada3eb5465 in /home/veronica/dirs/detectron2/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #4:+ 0x555fe (0x7fada3eb55fe in /home/veronica/dirs/detectron2/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #5:+ 0x4fe33 (0x7fada3eafe33 in /home/veronica/dirs/detectron2/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
Output of python -m detectron2.utils.collect_env:
`------------------------ ----------------------------------------------------------------------------------
sys.platform linux
Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
numpy 1.18.1
detectron2 0.1 @/home/veronica/dirs/detectron2/detectron2
detectron2 compiler GCC 6.3
detectron2 CUDA compiler 10.1
detectron2 arch flags sm_37
DETECTRON2_ENV_MODULE
PyTorch 1.4.0 @/home/veronica/dirs/detectron2/env2/lib/python3.7/site-packages/torch
PyTorch debug build False
CUDA available True
GPU 0 Tesla V100-SXM2-16GB
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.1, V10.1.243
TORCH_CUDA_ARCH_LIST 6.0;6.1;6.2;7.0;7.5
Pillow 7.0.0
torchvision 0.5.0 @/home/veronica/dirs/detectron2/env2/lib/python3.7/site-packages/torchvision
torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75
cv2 4.2.0
PyTorch built with:
`
"Detectron2 CUDA Compiler", "CUDA_HOME", "PyTorch built with - CUDA" all contain cuda libraries of the same version.
I also tried to run export TORCH_CUDA_ARCH_LIST=6.0,7.0 & python train_net.py and it didn't help, I got the same error.
You need to __rebuild__ detectron2 with export TORCH_CUDA_ARCH_LIST=6.0;7.0.
Or build on the machine where you run detectron2.
@ppwwyyxx
Hi, I am new to using Pytorch. I am facing the above error. My system specifications are given below:
OS : Ubuntu 16.04
CUDA version - 10.1
Device count - 1
Device name - GeForce GT 720
Device capability - (3,5)
Pytorch version - 1.4.0
Can someone guide me as to how to resolve this issue?
Thanks in advance.
All information about such issues are given in https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md#common-installation-issues
What does the error even mean? what is actually going wrong?
RuntimeError: CUDA error: no kernel image is available for execution on the device
Hi@veronikayurchuk, did you solve your problem, I'm facing the same problem now. Could you give me some advice. Thanks!
You need to rebuild detectron2 with
export TORCH_CUDA_ARCH_LIST=6.0,7.0.
Or build on the machine where you run detectron2.
Thank you for your answer.
My setup is
(1) GPU 0,1 GeForce GTX TITAN X (arch=5.2)
(2) GPU 0,1,2,3 TITAN RTX (arch=7.5)
(1), (2) share same conda env.
but problem only for (1).
I guess the difference of ARCH.
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5: there is no arch=5.3 for (1).
to do so, I try export TORCH_CUDA_ARCH_LIST=5.2,7.5 & python -m pip install -e detectron2
but I got a bellow message, could you recommend some solution?
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/data2/detectron2/setup.py", line 222, in <module>
cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/setuptools/__init__.py", line 144, in setup
return distutils.core.setup(**attrs)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/setuptools/command/develop.py", line 38, in run
self.install_for_development()
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/setuptools/command/develop.py", line 140, in install_for_development
self.run_command('build_ext')
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 87, in run
_build_ext.run(self)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 580, in build_extensions
build_ext.build_extensions(self)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 208, in build_extension
_build_ext.build_extension(self, ext)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 411, in unix_wrap_ninja_compile
cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 336, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags))
File "/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1316, in _get_cuda_arch_flags
raise ValueError("Unknown CUDA arch ({}) or GPU not supported".format(arch))
ValueError: Unknown CUDA arch (5.2,7.5) or GPU not supported
(1) GPU 0,1 GeForce GTX TITAN X (arch=5.2)
sys.platform linux
Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) [GCC 7.3.0]
numpy 1.19.1
detectron2 0.2.1 @/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/detectron2
Compiler GCC 7.5
CUDA compiler CUDA 10.2
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE
PyTorch 1.5.0 @/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0,1 GeForce GTX TITAN X (arch=5.2)
CUDA_HOME /usr/local/cuda-10.1
Pillow 7.1.2
torchvision 0.6.0a0+82fd1c8 @/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.2.post20201013
cv2 3.4.2
PyTorch built with:
(2) GPU 0,1,2,3 TITAN RTX (arch=7.5)
sys.platform linux
Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) [GCC 7.3.0]
numpy 1.19.1
detectron2 0.2.1 @/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/detectron2
Compiler GCC 7.5
CUDA compiler CUDA 10.2
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE
PyTorch 1.5.0 @/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0,1,2,3 TITAN RTX (arch=7.5)
CUDA_HOME /usr/local/cuda-10.1
Pillow 7.1.2
torchvision 0.6.0a0+82fd1c8 @/home/miruware/anaconda3/envs/blend/lib/python3.6/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.2.post20201013
cv2 3.4.2
PyTorch built with:
TORCH_CUDA_ARCH_LIST should be separated by ";" not ",". It was a typo.
TORCH_CUDA_ARCH_LISTshould be separated by ";" not ",". It was a typo.
Thank you very much!! save my life!!!
bellow is working
export TORCH_CUDA_ARCH_LIST=7.5\;5.2
Most helpful comment
@ppwwyyxx I am facing the exact same issue and my pytorch and detectron2 are compiled with exact same cuda versions. Also, I am facing this issue when I try to run detectron2 on different GPU than the one I have used to compile it. Here I compiled with titanX GPU, so it doesn't work on titanrtx or other GPUs. Note that I haven't installed using pip as I am modifying the codebase(only python files, not touching any cuda implementation) for my research, not sure if that has any effect though. Here is the output of python -m detectron2.utils.collect_env. Could you try installing on one GPU and test on other and see if this is general issue or I messed something up.