Mmdetection: CUDNN_STATUS_NOT_SUPPORTED

Created on 18 Feb 2020  路  14Comments  路  Source: open-mmlab/mmdetection

When I run libra_faster_rcnn_r101_fpn_1x,an error is reported:"RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."
TIM鍥剧墖20200218145702

Most helpful comment

Modify the code in
https://github.com/open-mmlab/mmdetection/blob/c0ac99eff015c108b34a9f80e3ff59b106dbc62e/mmdet/models/plugins/non_local.py#L110 as following:

y = y.permute(0, 2, 1).contiguous().reshape(n, self.inter_channels, h, w)

All 14 comments

Modify the code in
https://github.com/open-mmlab/mmdetection/blob/c0ac99eff015c108b34a9f80e3ff59b106dbc62e/mmdet/models/plugins/non_local.py#L110 as following:

y = y.permute(0, 2, 1).contiguous().reshape(n, self.inter_channels, h, w)

@zuhaoran Hi, can you run python mmdet/utils/collect_env.py to collect your environment information and paste it here? I did not meet this error before. We need to find the source that causes the error. Thanks!

@shwoo93 Thank you for your answer

@OceanPang Sorry,I haven't used this code.

@zuhaoran We just want to confirm the source of the bug. It would be great if you can run the code and paste your env info here. Thanks!

@OceanPang I tried to run it but it didn't work

@zuhaoran We just want to confirm the source of the bug. It would be great if you can run the code and paste your env info here. Thanks!

I met the same problem, and the following is my env info, hope it could be helpful.
sys.platform: linux
Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0,1,2,3,4,5,6,7,8,9: GeForce RTX 2080 Ti
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.0
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.3.2
MMDetection: 1.1.0+639f934
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.0

I met the same CUDNN_STATUS_NOT_SUPPORTED error when I run libra_faster_rcnn_r50_fpn_1x.py and libra_retina_rcnn_r50_fpn_1x.py.
sys.platform: ubuntu 18.04
Python: 3.7.6
CUDA: 10.1
cudnn: 7.6.5
PyTorch: 1.4.0
GPU: 0,1,2,3,4,5,6,7 Tesla V100-SXM2
I run cascade_rcnn_r50_fpn_1x.py successfully wth the same environment.

I build MMCV and MMDetection on Mar 11 with the latest code from master branch.

@shwoo93 Thank you for your answer. It works.

I also ran in this error, but i could not locate the plugins folder in mmdetection/mmdet/models
Edit; Found it.
It is in mmdet/ops folder

Modify the code in
https://github.com/open-mmlab/mmdetection/blob/c0ac99eff015c108b34a9f80e3ff59b106dbc62e/mmdet/models/plugins/non_local.py#L110

as following:
y = y.permute(0, 2, 1).contiguous().reshape(n, self.inter_channels, h, w)

@shwoo93 Thank you. It solved my problem. Could you please tell me how the error was located.

@shwoo93 Thank you for your answer!
@OceanPang I think that's wired, since reshape will automatically make the tensor into contiguous shape. How does it get revelant with cuDNN error?

Well, with PyTorch 1.6, reducing the batch_size works for me. Seems a bug of PyTorch.

Was this page helpful?
0 / 5 - 0 ratings