Maskrcnn-benchmark: ImportError: .../maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

Created on 28 Oct 2018 · 10Comments · Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

Traceback (most recent call last):
File "webcam.py", line 6, in
from predictor import COCODemo
File "/home/laonb/github/maskrcnn-benchmark/demo/predictor.py", line 6, in
from maskrcnn_benchmark.modeling.detector import build_detection_model
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/__init__.py", line 2, in
from .detectors import build_detection_model
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/detectors.py", line 2, in
from .generalized_rcnn import GeneralizedRCNN
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 11, in
from ..backbone import build_backbone
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/__init__.py", line 2, in
from .backbone import build_backbone
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/backbone.py", line 7, in
from . import resnet
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/resnet.py", line 19, in
from maskrcnn_benchmark.layers import FrozenBatchNorm2d
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 8, in
from .nms import nms
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

awaiting response

Source

laonb

Most helpful comment

@laonb this happens because you have to conflicting versions of CUDA on your machine.

The output of:

nvcc --version

and the output of:

conda list |grep cuda

these both will determine the answer

soumith on 29 Oct 2018

👍10

All 10 comments

environment:
Ubuntu 16.04
Pytorch 1.0.0.dev20181027
cuda 9.0
cuDNN 7.1.4.18

laonb on 28 Oct 2018

Hi,
I've never seen this error before.
From looking around o the internet, some solutions have pointed out to a few wrong installations in the system, see e.g., https://github.com/uber/horovod/issues/274

Can you try doing

ldd maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so

From the maskrcnn folder?

fmassa on 28 Oct 2018

👍1

@fmassa i run the ldd
laonb@LAONB-GPU:~/github/maskrcnn-benchmark/maskrcnn_benchmark$ ldd _C.cpython-36m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffde48d5000)
libcudart.so.9.0 => /home/laonb/anaconda3/lib/libcudart.so.9.0 (0x00007f2ae6743000)
libstdc++.so.6 => /home/laonb/anaconda3/lib/libstdc++.so.6 (0x00007f2ae6409000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2ae6100000)
libgcc_s.so.1 => /home/laonb/anaconda3/lib/libgcc_s.so.1 (0x00007f2ae5eee000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2ae5cd1000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2ae5907000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2ae6c1b000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2ae5703000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f2ae54fb000)

========
then rerun the python webcam.py --min-image-size 800
The error is the same as before.
Traceback (most recent call last):
File "webcam.py", line 6, in
from predictor import COCODemo
File "/home/laonb/github/maskrcnn-benchmark/demo/predictor.py", line 6, in
from maskrcnn_benchmark.modeling.detector import build_detection_model
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/__init__.py", line 2, in
from .detectors import build_detection_model
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/detectors.py", line 2, in
from .generalized_rcnn import GeneralizedRCNN
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 11, in
from ..backbone import build_backbone
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/__init__.py", line 2, in
from .backbone import build_backbone
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/backbone.py", line 7, in
from . import resnet
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/resnet.py", line 19, in
from maskrcnn_benchmark.layers import FrozenBatchNorm2d
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 8, in
from .nms import nms
File "/home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /home/laonb/github/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

laonb on 29 Oct 2018

Could you please copy and paste the output from the environment collection script from PyTorch (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch Version (e.g., 1.0):
OS (e.g., Linux):
How you installed PyTorch (conda, pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

I think there might be a clash between multiple CUDA versions in your machine

fmassa on 29 Oct 2018

😄1 👍1

@laonb this happens because you have to conflicting versions of CUDA on your machine.

The output of:

nvcc --version

and the output of:

conda list |grep cuda

these both will determine the answer

soumith on 29 Oct 2018

👍10

@fmassa
laonb@LAONB-GPU:~/github$ python collect_env.py
Collecting environment information...
PyTorch version: 1.0.0.dev20181027
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 396.44
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.4
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a

Versions of relevant libraries:
[pip] Could not collect
[conda] pytorch 0.4.0 py36hdf912b8_0 defaults
[conda] pytorch-nightly 1.0.0.dev20181027 py3.6_cuda9.0.176_cudnn7.1.2_0 pytorch
[conda] torchvision 0.2.1 py36_1 pytorch

laonb on 29 Oct 2018

@soumith

laonb@LAONB-GPU:~/github$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
laonb@LAONB-GPU:~/github$ conda list |grep cuda
cudatoolkit               9.0                  h13b8566_0    defaults
cudnn                     7.1.2                 cuda9.0_0    defaults
nccl                      1.3.5                 cuda9.0_0    defaults
pytorch-nightly           1.0.0.dev20181027 py3.6_cuda9.0.176_cudnn7.1.2_0    pytorch

laonb on 29 Oct 2018

I reinstall from scratch. And conda create a new env is running successful.
Failure before running must be caused by conflict.

laonb on 29 Oct 2018

❤2 🎉1

I faced a similar problem while importing predictor.
>>> import predictor Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/demo/predictor.py", line 6, in <module> from maskrcnn_benchmark.modeling.detector import build_detection_model File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/__init__.py", line 2, in <module> from .detectors import build_detection_model File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/detectors.py", line 2, in <module> from .generalized_rcnn import GeneralizedRCNN File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 11, in <module> from ..backbone import build_backbone File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/__init__.py", line 2, in <module> from .backbone import build_backbone File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/backbone.py", line 7, in <module> from maskrcnn_benchmark.modeling.make_layers import conv_with_kaiming_uniform File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/make_layers.py", line 10, in <module> from maskrcnn_benchmark.layers import Conv2d File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 9, in <module> from .nms import nms File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in <module> from maskrcnn_benchmark import _C ImportError: /home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1019UndefinedTensorImpl10_singletonE

I am running this on a shared server and cannot install conda from scratch. Although I did create a new environment as mentioned in INSTALL.md. Please let me know what can I do here.

member123456 on 23 Mar 2019

I faced a similar problem while importing predictor.
>>> import predictor Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/demo/predictor.py", line 6, in <module> from maskrcnn_benchmark.modeling.detector import build_detection_model File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/__init__.py", line 2, in <module> from .detectors import build_detection_model File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/detectors.py", line 2, in <module> from .generalized_rcnn import GeneralizedRCNN File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 11, in <module> from ..backbone import build_backbone File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/__init__.py", line 2, in <module> from .backbone import build_backbone File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/backbone.py", line 7, in <module> from maskrcnn_benchmark.modeling.make_layers import conv_with_kaiming_uniform File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/modeling/make_layers.py", line 10, in <module> from maskrcnn_benchmark.layers import Conv2d File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 9, in <module> from .nms import nms File "/home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in <module> from maskrcnn_benchmark import _C ImportError: /home/abhinav.anand/maskr-cnn/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1019UndefinedTensorImpl10_singletonE

I am running this on a shared server and cannot install conda from scratch. Although I did create a new environment as mentioned in INSTALL.md. Please let me know what can I do here.

@member123456 i faced the same problem.
have you solved this problem? how to find out whats wrong with it?
thanks a lot!