Maskrcnn-benchmark: RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/username/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103

Created on 11 Nov 2018  路  5Comments  路  Source: facebookresearch/maskrcnn-benchmark

鉂揜untimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/username/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103

Hello, when I run the Mask_R-CNN_demo.ipynb, in the 'cuda' mode, everything goes well until the last kernel. When running predictions = coco_demo.run_on_opencv_image(image), the error happens.

For finding the reason, I try the following things, which make me more confused:
As an output of collect_env.py, I get the following informations:

Collecting environment information...
PyTorch version: 1.0.0.dev20181106
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 6.4.0-17ubuntu1) 6.4.0 20180424
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a

Versions of relevant libraries:
[pip3] numpy (1.15.1)
[pip3] torch (0.4.1)
[conda] Could not collect

And I also try to run the ./deviceQuery in NVIDIA_CUDA-9.1_Samples/bin/x86_64/linux/release, it gives the PASS result, more concretely:

deviceQuery, 
CUDA Driver = CUDART, 
CUDA Driver Version = 9.1, 
CUDA Runtime Version = 9.1, 
NumDevs = 2
Result = PASS

All the tests above shows that the cuda runtime version matches the cuda driver version. I hence don't know where comes from this runtime error, and how to fix it?

Thank you.

question

Most helpful comment

Have you tried removing all torch related libs and then reinstall with
conda install pytorch-nightly cuda92 -c pytorch ?

All 5 comments

From looking at this error in the internet, it might be a conflict between CUDA versions and drivers:
https://github.com/torch/cutorch/issues/809
https://devtalk.nvidia.com/default/topic/1028320/cuda-driver-version-is-insufficient-for-cuda-runtime-version/?offset=6

I'm not sure what would be the best solution here, but maybe downgrading to CUDA 9.0 or updating to CUDA 9.2 would fix it maybe?

Downgrade to cuda 9.0 but still can't help

Collecting environment information...
PyTorch version: 1.0.0.dev20181106
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 6.4.0-17ubuntu1) 6.4.0 20180424
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
/usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1
/usr/local/cuda-9.0/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip3] numpy (1.15.1)
[pip3] torch (0.4.1)
[conda] pytorch-nightly           1.0.0.dev20181106 py3.7_cuda9.0.176_cudnn7.1.2_0    pytorch

Here is the Trackback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-51-17ba477dc913> in <module>
      1 # compute predictions
----> 2 predictions = coco_demo.run_on_opencv_image(image)
      3 imshow(predictions)
      4 get_ipython().system('nvcc -V')

~/github/maskrcnn-benchmark/demo/predictor.py in run_on_opencv_image(self, image)
    167                 the BoxList via `prediction.fields()`
    168         """
--> 169         predictions = self.compute_prediction(image)
    170         top_predictions = self.select_top_predictions(predictions)
    171 

~/github/maskrcnn-benchmark/demo/predictor.py in compute_prediction(self, original_image)
    198         # compute predictions
    199         with torch.no_grad():
--> 200             predictions = self.model(image_list)
    201         predictions = [o.to(self.cpu_device) for o in predictions]
    202 

~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py in forward(self, images, targets)
     48         images = to_image_list(images)
     49         features = self.backbone(images.tensors)
---> 50         proposals, proposal_losses = self.rpn(images, features, targets)
     51         if self.roi_heads:
     52             x, result, detector_losses = self.roi_heads(features, proposals, targets)

~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py in forward(self, images, features, targets)
     94             return self._forward_train(anchors, objectness, rpn_box_regression, targets)
     95         else:
---> 96             return self._forward_test(anchors, objectness, rpn_box_regression)
     97 
     98     def _forward_train(self, anchors, objectness, rpn_box_regression, targets):

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py in _forward_test(self, anchors, objectness, rpn_box_regression)
    120 
    121     def _forward_test(self, anchors, objectness, rpn_box_regression):
--> 122         boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
    123         if self.cfg.MODEL.RPN_ONLY:
    124             # For end-to-end models, the RPN proposals are an intermediate state

~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in forward(self, anchors, objectness, box_regression, targets)
    136         anchors = list(zip(*anchors))
    137         for a, o, b in zip(anchors, objectness, box_regression):
--> 138             sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
    139 
    140         boxlists = list(zip(*sampled_boxes))

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in forward_for_single_feature_map(self, anchors, objectness, box_regression)
    116                 self.nms_thresh,
    117                 max_proposals=self.post_nms_top_n,
--> 118                 score_field="objectness",
    119             )
    120             result.append(boxlist)

~/github/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py in boxlist_nms(boxlist, nms_thresh, max_proposals, score_field)
     25     boxes = boxlist.bbox
     26     score = boxlist.get_field(score_field)
---> 27     keep = _box_nms(boxes, score, nms_thresh)
     28     if max_proposals > 0:
     29         keep = keep[: max_proposals]

RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/tianchuzhang/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103

Have you tried removing all torch related libs and then reinstall with
conda install pytorch-nightly cuda92 -c pytorch ?

I'l try doing what @lanpa mentioned. I don't really have any other suggestions :-/

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jbitton picture jbitton  路  4Comments

nanyoullm picture nanyoullm  路  3Comments

auroua picture auroua  路  3Comments

IenLong picture IenLong  路  4Comments

BelhalK picture BelhalK  路  4Comments