Mmdetection: RuntimeError: all tensors must be on devices

Created on 18 Feb 2020  路  5Comments  路  Source: open-mmlab/mmdetection

Hello, I successfully installed the mmdet and compiled well.
I can train the model with 1 GPU.
However, the runtime error occurs when I try to train the model with multiple GPUs.

The error looks like:

Traceback (most recent call last):
File "tools/train.py", line 126, in
main()
File "tools/train.py", line 122, in main
timestamp=timestamp)
File "/home/ubuntu/eff_panoptic/mmdet/apis/train.py", line 125, in train_detector
timestamp=timestamp)
File "/home/ubuntu/eff_panoptic/mmdet/apis/train.py", line 230, in _dist_train
model = MMDistributedDataParallel(model.cuda())
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 305, in __init__
self._ddp_init_helper()
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 323, in _ddp_init_helper
self._module_copies = replicate(self.module, self.device_ids, detach=True)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/replicate.py", line 88, in replicate
param_copies = _broadcast_coalesced_reshape(params, devices, detach)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/replicate.py", line 67, in _broadcast_coalesced_reshape
return comm.broadcast_coalesced(tensors, devices)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/cuda/comm.py", line 39, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]

My environment looks like:

sys.platform: linux
Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.105
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.3.1
MMDetection: 1.0rc1+d7f86ee
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.1

Most helpful comment

Try use mmcv 0.2.15

All 5 comments

What is your running script or command?

The command I used is usual:

tools/dist_train.sh [config_file] [num_gpus]

Try use mmcv 0.2.15

You are not using the latest code. Please upgrade to the latest mmdetection.

Thanks, I resolved the issue by upgrading both mmdetection and mmcv to the latest version.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

michaelisc picture michaelisc  路  3Comments

FrankXinqi picture FrankXinqi  路  3Comments

letanloc1998 picture letanloc1998  路  3Comments

BeBeauty picture BeBeauty  路  3Comments

songyuc picture songyuc  路  3Comments