Mmdetection: ImportError: ./mmdetection/mmdet/ops/nms/gpu_nms.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaRegisterFatBinaryEnd

Created on 10 Mar 2019  ·  23Comments  ·  Source: open-mmlab/mmdetection

Hi, I`m facing the problem with training:

alexander@alexander-desktop:~/Code/Projects/mmdetection$ ./tools/dist_train.sh configs/retinanet_r101_fpn_1x.py 1
Traceback (most recent call last):
File "./tools/train.py", line 7, in
from mmdet.datasets import get_dataset
File "/home/alexander/Code/Projects/mmdetection/mmdet/datasets/__init__.py", line 1, in
from .custom import CustomDataset
File "/home/alexander/Code/Projects/mmdetection/mmdet/datasets/custom.py", line 11, in
from .extra_aug import ExtraAugmentation
File "/home/alexander/Code/Projects/mmdetection/mmdet/datasets/extra_aug.py", line 5, in
from mmdet.core.evaluation.bbox_overlaps import bbox_overlaps
File "/home/alexander/Code/Projects/mmdetection/mmdet/core/__init__.py", line 6, in
from .post_processing import * # noqa: F401, F403
File "/home/alexander/Code/Projects/mmdetection/mmdet/core/post_processing/__init__.py", line 1, in
from .bbox_nms import multiclass_nms
File "/home/alexander/Code/Projects/mmdetection/mmdet/core/post_processing/bbox_nms.py", line 3, in
from mmdet.ops.nms import nms_wrapper
File "/home/alexander/Code/Projects/mmdetection/mmdet/ops/__init__.py", line 5, in
from .nms import nms, soft_nms
File "/home/alexander/Code/Projects/mmdetection/mmdet/ops/nms/__init__.py", line 1, in
from .nms_wrapper import nms, soft_nms
File "/home/alexander/Code/Projects/mmdetection/mmdet/ops/nms/nms_wrapper.py", line 4, in
from .gpu_nms import gpu_nms
ImportError: /home/alexander/Code/Projects/mmdetection/mmdet/ops/nms/gpu_nms.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaRegisterFatBinaryEnd

I`m using CUDA 10.1, pytorch 1.0.1.post2, python 3.6 on Ubuntu 18.04
Everything compiled well during installation.

installatioenv

Most helpful comment

Just solved the same issue. Check the compatibility of your Pytorch and Cuda version.

All 23 comments

Just solved the same issue. Check the compatibility of your Pytorch and Cuda version.

Just solved the same issue. Check the compatibility of your Pytorch and Cuda version.

Hi, Thank you for your sharing.
I also have this problem. I want to know if it can work in cuda10 and torch1.0 environments? and what's your cuda version and torch version?
Thank you for your answer!!

Just solved the same issue. Check the compatibility of your Pytorch and Cuda version.

Hi, Thank you for your sharing.
I also have this problem. I want to know if it can work in cuda10 and torch1.0 environments? and what's your cuda version and torch version?
Thank you for your answer!!

I'm using cuda 10.1 and pytorch 1.0.

As a reference, we have tried mmdetection on CUDA 9.0/9.2/10.0 with PyTorch 1.0 and CUDA 9.0/9.2 with PyTorch 0.4.1.

Hello, @hellock, @donglao, @Hesene, I meet a similar problem:
File "./mmdetection/mmdet/ops/dcn/__init__.py", line 1, in
from .functions.deform_conv import deform_conv, modulated_deform_conv
File "./mmdetection/mmdet/ops/dcn/functions/deform_conv.py", line 5, in
from .. import deform_conv_cuda
ImportError: ./mmdetection/mmdet/ops/dcn/deform_conv_cuda.cpython-35m-x86_64-linux-gnu.so: undefined symbol: __cudaRegisterFatBinaryEnd

I'm using cuda8.0 and PyTorch 1.0. The GCC version is 5.4.0.
I wonder if you have tried mmdetection under this configuration.
Thanks a lot!

I've met the same issue.
I`m using CUDA 10.1, pytorch 1.0.1.post2, python 3.6 on Ubuntu 18.04, too.
Note that CUDA 8.0 for Ubuntu 18.04 is not available. I've tried to compile pytorch 1.0.1.post2 and install from source code with CUDA 10.1, the error "undefined symbol: __cudaRegisterFatBinaryEnd" still occurred.

I've also tried CUDA 9.0 and pytroch 1.0.1.post2, and got error: "undefined symbol: __cudaPopCallConfiguration. Any tips?

what is your gcc version?

what is your gcc version?

gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

I compile it successfully under gcc 5.4 and CUDA 9.

I compile it successfully under gcc 5.4 and CUDA 9.

I'll try gcc 5.4 tomorrow.
Have you tried CUDA 10.0 or CUDA 10.1? The only relative code I could find is defined at:
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGCUDANV.cpp

where line 699 define the creation of function "__cudaRegisterFatBinaryEnd" and line 292 define the creation of function "__cudaPopCallConfiguration".

As for __cudaRegisterFatBinaryEnd, I found that cuda define a new attribute for CUDA 10.1 at:
https://clang.llvm.org/doxygen/include_2clang_2Basic_2Cuda_8h_source.html#l00108

 //  Various SDK-dependent features that affect CUDA compilation
 enum class CudaFeature {
   // CUDA-9.2+ uses a new API for launching kernels.
   CUDA_USES_NEW_LAUNCH,
   // CUDA-10.1+ needs explicit end of GPU binary registration.
   CUDA_USES_FATBIN_REGISTER_END,
 };

And CUDA_USES_FATBIN_REGISTER_END is checked in line 663 of
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGCUDANV.cpp#L663:

// Call __cudaRegisterFatBinaryEnd(Handle) if this CUDA version needs it.
     if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),
                            CudaFeature::CUDA_USES_FATBIN_REGISTER_END)) {
       // void __cudaRegisterFatBinaryEnd(void **);
       llvm::FunctionCallee RegisterFatbinEndFunc = CGM.CreateRuntimeFunction(
           llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
           "__cudaRegisterFatBinaryEnd");
       CtorBuilder.CreateCall(RegisterFatbinEndFunc, RegisterFatbinCall);
     }
   }

Any thing mismatch with CUDA 10.1 and pytorch?

@ruiyuanlu I tried CUDA 10 and gcc 7.x before, which is compatible.

Well, I've solved this issue on my machine using pytorch 1.1.0 (latest version on github).

gcc 5.x doesn't help, because some compile options in the CMakeLists.txt of pytorch 1.1.0 is not supported by gcc 5.x, while gcc 7.x is OK to compile pytroch.

It seems that CUDA 10.0 is slightly different from CUDA 10.1. This issue is caused by the CUDA version mismatch of pytorch during compiling time and run-time. Other conditions might occur if other mismatched run-time CUDA version is installed. For example, this error: _"undefined symbol: __cudaPopCallConfiguration"_ might occur for earlier version of CUDA. Thus, my solution is to recompile pytorch to match the run-time CUDA version. Maybe change the CUDA run-time version also works, I didn't test that. Here is how I fixed it.

(Ubuntu 18.04 only)

1. Uninstall pytorch if it doesn't work:

pip uninstall pytorch #  conda uninstall pytorch, if you use conda

2. Install CUDA-10.0 (optional)

This step is optional, other version of CUDA should be OK, if the CUDA version of compiling time matches run-time version.

Following the instructions of run file here:

Then check nvcc version:

nvcc -V

The output should be something like (release 10.0):

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Note that symlink is needed after cuda installation:

sudo rm -f /usr/local/cuda # optional, only if you already have this symlink
sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

Then, add paths to your ~/.basrc file. These paths will be used during pytorch compiling.

export CUDA_HOME=/usr/local/cuda
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export LIBRARY_PATH="$LIBRARY_PATH:/usr/local/cuda/lib64"

Use source to make sure the paths above will be loaded.

source ~/.bashrc

3. Compile pytorch

The instructions can be found here, but some details might be different.

Note that mkl=2019.3 is required. Details can be found in this issue.

conda install numpy pyyaml mkl=2019.3 mkl-include setuptools cmake cffi typing
conda install -c pytorch magma-cuda100 # optional step
# clone the pytorch source code
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
make clean # make clean is needed in my case
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
sudo python setup.py install # sudo is needed in my case.

After all the steps aforementioned, it finally works.

I've met the same issue.
I`m using CUDA 10.1, pytorch 1.0.1.post2, python 3.6 on Ubuntu 18.04, too.
Note that CUDA 8.0 for Ubuntu 18.04 is not available. I've tried to compile pytorch 1.0.1.post2 and install from source code with CUDA 10.1, the error "undefined symbol: __cudaRegisterFatBinaryEnd" still occurred.

I've also tried CUDA 9.0 and pytroch 1.0.1.post2, and got error: "undefined symbol: __cudaPopCallConfiguration. Any tips?

have you solve this problem? Can u help me ?

I've met the same issue.
I`m using CUDA 10.1, pytorch 1.0.1.post2, python 3.6 on Ubuntu 18.04, too.
Note that CUDA 8.0 for Ubuntu 18.04 is not available. I've tried to compile pytorch 1.0.1.post2 and install from source code with CUDA 10.1, the error "undefined symbol: __cudaRegisterFatBinaryEnd" still occurred.
I've also tried CUDA 9.0 and pytroch 1.0.1.post2, and got error: "undefined symbol: __cudaPopCallConfiguration. Any tips?

have you solve this problem? Can u help me ?

I've already solved this problem by recompiling pytorch and the solution has been posted here. Just follow my steps. It works on my machine. More details are needed if it doesn't work.

@ruiyuanlu I need uninstall CUDA 10.1 and install cuda10 ?I use cuda10.1, ubuntu16.04 before .
(1. Uninstall pytorch if it doesn't work:
pip uninstall pytorch # conda uninstall pytorch, if you use conda

  1. Install CUDA-10.0)

@wyhcqq Not really. I didn't test CUDA 10.1 on my machine. In my experience, it is OK to use CUDA10.1, just make sure the CUDA version of compiling time matches the run-time version.

I have the same question. My versions are pytorch1.1, cuda9.1, gcc5.4 Must i upgrade cuda9.1 to 9.0 or other versions?

I encounter the same question when runing following example :
python tools/test.py configs/faster_rcnn_r50_fpn_1x.py \ checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \ --show

I have tried cuda10.0/10.1 with pytorch1.3.0/1.2.0 and torchvision 0.4.1/0.4.0, and gcc version is 8.0. Dissapointly, I'm still stuck in this problem. Next I want to try with docker.


Finaly , I solve this problem by using official dockerfile in this project.

I got the same issue and resolved it by changing the cuda version :

found this helpful:
https://towardsdatascience.com/how-to-get-cuda-9-2-backend-for-pytorch-0-4-1-on-google-colab-57eb12aae27f

This link gives the steps to change the version of cuda for 0.4.1 pytorch :

!git clone https://gist.github.com/f7b7c7758a46da49f84bc68b47997d69.git
%cd f7b7c7758a46da49f84bc68b47997d69/
!bash pytorch041_cuda92_colab.sh

Hello, I have installed the CUDA 10.2 driver, but there is no CUDA 10.2 version on the pytorch official website. I would like to ask you, what should I do in this case?

@qinhongtju The prebuilt PyTorch package does not support CUDA 10.2, you may try compiling from source.

I had a similar problem with my model. But I had the luxury of having it mostly working on my local system with my puny GPU (4GB) and it was failing on school's GPU cluster (4 x VT100 32 GB). The issue is that the school's cluster operates through a restricted nvidia-docker platform, so the choice of setup is not entirely yours.

The key is looking at the dynamic libraries symbols exported. CUDA refers to its runtime libraries under the environment variable path LD_LIBRARY_PATH. In here, you can find this symbol as exported by the libcudart.so. Use the command readelf -sW to peruse your symbol tables. In my system, it looks like this:

echo $LD_LIBRARY_PATH
/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib64

cd /usr/local/cuda-10.0/lib64

readelf -sW *.so | grep FatBinaryEnd
390: 0000000000010ab0 37 FUNC GLOBAL DEFAULT 11 __cudaRegisterFatBinaryEnd@@libcudart.so.10.0

Here, it's shown that the __cudaRegisterFatBinaryEnd symbol is exported by the library: libcudart.so.10.0. So now just make sure this library is contained in the LD_LIBRARY_PATH by either editing the path definition, or copying/linking to the library file in the existing path

I also met this problem.

ImportError: /home/win/anaconda3/envs/alphapose/lib/python3.6/site-packages/alphapose-0.3.0+36e7721-py3.6-linux-x86_64.egg/alphapose/utils/roi_align/roi_align_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19NonVariableTypeMode10is_enabledEv
I think my cuda and pytorch is matched,cuda =10.0 and pytorch =1.1 . What can I do?Please help me .

Was this page helpful?
0 / 5 - 0 ratings

Related issues

liugaolian picture liugaolian  ·  3Comments

FrankXinqi picture FrankXinqi  ·  3Comments

michaelisc picture michaelisc  ·  3Comments

fmassa picture fmassa  ·  3Comments

dereyly picture dereyly  ·  3Comments