Maskrcnn-benchmark: ImportError: /data/repos/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at18SparseCUDATensorIdEv

Created on 28 Nov 2018  ยท  16Comments  ยท  Source: facebookresearch/maskrcnn-benchmark

โ“ Questions and Help

I have installed according to directions from INSTALL.md, running on the latest nightly build of PyTorch 1.0, inside a fresh, new conda environment called maskrcnn_benchmark.

I am getting the error when running the command > python tools/train_net.py

(maskrcnn_benchmark) user_name@server_name: /data/repos/maskrcnn-benchmark$ python tools/train_net.py
Traceback (most recent call last):
File "train_net.py", line 18, in
from maskrcnn_benchmark.engine.inference import inference
File "/data/repos/maskrcnn-benchmark/maskrcnn_benchmark/engine/inference.py", line 10, in
from maskrcnn_benchmark.data.datasets.evaluation import evaluate
File "/data/repos/maskrcnn-benchmark/maskrcnn_benchmark/data/datasets/evaluation/__init__.py", line 3, in
from .coco import coco_evaluation
File "/data/repos/maskrcnn-benchmark/maskrcnn_benchmark/data/datasets/evaluation/coco/__init__.py", line 1, in
from .coco_eval import do_coco_evaluation
File "/data/repos/maskrcnn-benchmark/maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py", line 10, in
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/data/repos/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in
from maskrcnn_benchmark.layers import nms as _box_nms
File "/data/repos/maskrcnn-benchmark/maskrcnn_benchmark/layers/__init__.py", line 8, in
from .nms import nms
File "/data/repos/maskrcnn-benchmark/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /data/repos/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at18SparseCUDATensorIdEv

How to fix this?

Most helpful comment

this problem result from the version of nvcc, you can run by
conda uninstall pytorch
conda uninstall pytorch-nightly
conda uninstall torch
codna uninstall torchvision
because torchvision 0.3.0 has problem,
then:
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=10.0
pip install torchvision==0.2.2
successfully!!

All 16 comments

Hi,

This is a duplicate of #119

You are probably not using PyTorch 1.0, and there is an old PyTorch version laying around in your machine that is conflicting.

Try doing in your interpreter

import torch
print(torch.__version__)

and it will probably show that you are not using 1.0

No, that is incorrect. We are running this on an AWS AMI image that comes preloaded with several Anaconda virtual environments (e.g. Tensorflow, PyTorch, MXNet, etc), but everything is contained within a conda virtual environment:

$ source activate maskrcnn_benchmark
$(maskrcnn_benchmark)$ python

import torch
torch.__version__

'1.0.0.dev20181124'

So I can confirm it is definitely running PyTorch 1.0

Also, nvcc -V shows it is on cuda 9.0:

(maskrcnn_benchmark) user_name@gpunn02:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Same question. The version of pytorch I use is 1.0.0a0+11ef519, it is built from source code.

update:
I solved it. Remove build and rebuild it.

@cpoptic you have probably built maskrcnn-benchmark using a previous version of PyTorch.
You need to remove the build folder and recompile it with the right PyTorch version.

Hello,

I have an issue to run the training after successful install of the repo.
This is the command line and the generated error message.

Thanks in advance,
Raouf

python -m pointnet2.train.train_cls

Traceback (most recent call last):
File "/home/raouf/workspace/gitprojects/Pointnet2_PyTorch/pointnet2/utils/pointnet2_utils.py", line 20, in
import pointnet2._ext as _ext
ImportError: /home/raouf/workspace/gitprojects/Pointnet2_PyTorch/pointnet2/_ext.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKSs

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/raouf/anaconda3/lib/python3.6/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/home/raouf/anaconda3/lib/python3.6/runpy.py", line 109, in _get_module_details
import(pkg_name)
File "/home/raouf/workspace/gitprojects/Pointnet2_PyTorch/pointnet2/init.py", line 17, in
from pointnet2 import utils
File "/home/raouf/workspace/gitprojects/Pointnet2_PyTorch/pointnet2/utils/init.py", line 8, in
from . import pointnet2_utils
File "/home/raouf/workspace/gitprojects/Pointnet2_PyTorch/pointnet2/utils/pointnet2_utils.py", line 24, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst

I have a similar problem to the topic
from maskrcnn_benchmark import _C
ImportError: /home/s/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1019UndefinedTensorImpl10_singletonE
The error occurs while running the single gpu training line. I have removed and installed this for more than 10 times and still it's not working. And this is the result of collect_env.py:
Collecting environment information...
PyTorch version: 1.0.1.post2
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.2 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.105
GPU models and configuration: GPU 0: TITAN Xp
Nvidia driver version: 415.27
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.16.2
[pip3] torch==1.0.1.post2
[pip3] torchvision==0.2.2.post3
[conda] blas 1.0 mkl
[conda] mkl 2019.1 144
[conda] mkl_fft 1.0.10 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch 1.0.1 py3.7_cuda10.0.130_cudnn7.4.2_2 pytorch
[conda] pytorch-nightly 1.0.0.dev20190316 py3.7_cuda10.0.130_cudnn7.4.2_0 pytorch
[conda] torchvision 0.2.2 py_3 pytorch

Does anyone have any idea what else can I do?
Thanks in advance

this problem result from the version of nvcc, you can run by
conda uninstall pytorch
conda uninstall pytorch-nightly
conda uninstall torch
codna uninstall torchvision
because torchvision 0.3.0 has problem,
then:
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=10.0
pip install torchvision==0.2.2
successfully!!

reinstall with changes the line into:
conda install -c pytorch pytorch-nightly=1.0.0 torchvision=0.2.2 cudatoolkit=9.0

conda install -c pytorch pytorch-nightly=1.0.0 torchvision=0.2.2 cudatoolkit=9.0

still have the same problem

ImportError: $/maskrcnn/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_30E

try python 3.6.5

Does the following error mean I need to uninstall cuda10.0 and reinstall cuda 9.0?

/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory

this problem result from the version of nvcc, you can run by
conda uninstall pytorch
conda uninstall pytorch-nightly
conda uninstall torch
codna uninstall torchvision
because torchvision 0.3.0 has problem,
then:
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=10.0
pip install torchvision==0.2.2
successfully!!

Thanks @futureisatyourhand, it worked for me :)

why they are so many N/A ? what does it mean?

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A

OS: Arch Linux
GCC version: (GCC) 4.8.5
CMake version: Could not collect

Python version: 3.7
Is CUDA available: N/A
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 440.31
cuDNN version: /usr/lib/libcudnn.so.7.6.4

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.3.1
[pip3] torchvision==0.4.2
[conda] _pytorch_select 0.2 gpu_0
[conda] blas 1.0 mkl
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] pytorch 1.2.0 cuda100py37h938c94c_0
[conda] pytorch-nightly 1.0.0.dev20190328 py3.7_cuda10.0.130_cudnn7.4.2_0
[conda] torchvision 0.4.0 cuda100py37hecfc37a_0

this problem result from the version of nvcc, you can run by
conda uninstall pytorch
conda uninstall pytorch-nightly
conda uninstall torch
codna uninstall torchvision
because torchvision 0.3.0 has problem,
then:
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=10.0
pip install torchvision==0.2.2
successfully!!

worked for me!!!

this problem result from the version of nvcc, you can run by
conda uninstall pytorch
conda uninstall pytorch-nightly
conda uninstall torch
codna uninstall torchvision
because torchvision 0.3.0 has problem,
then:
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=10.0
pip install torchvision==0.2.2
successfully!!

Thanks @futureisatyourhand, it worked for me :)

You're welcome. Many of the PyTorch's libraries are unstable and not very compatible.

this problem result from the version of nvcc, you can run by
conda uninstall pytorch
conda uninstall pytorch-nightly
conda uninstall torch
codna uninstall torchvision
because torchvision 0.3.0 has problem,
then:
conda install -c pytorch pytorch-nightly torchvision cudatoolkit=10.0
pip install torchvision==0.2.2
successfully!!

worked for me!!!

Many of the PyTorch's libraries are unstable and not very compatible.

Was this page helpful?
0 / 5 - 0 ratings