Hello, when I run the Mask_R-CNN_demo.ipynb, in the 'cuda' mode, everything goes well until the last kernel. When running predictions = coco_demo.run_on_opencv_image(image), the error happens.
For finding the reason, I try the following things, which make me more confused:
As an output of collect_env.py, I get the following informations:
Collecting environment information...
PyTorch version: 1.0.0.dev20181106
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 6.4.0-17ubuntu1) 6.4.0 20180424
CMake version: version 3.10.2
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
Versions of relevant libraries:
[pip3] numpy (1.15.1)
[pip3] torch (0.4.1)
[conda] Could not collect
And I also try to run the ./deviceQuery in NVIDIA_CUDA-9.1_Samples/bin/x86_64/linux/release, it gives the PASS result, more concretely:
deviceQuery,
CUDA Driver = CUDART,
CUDA Driver Version = 9.1,
CUDA Runtime Version = 9.1,
NumDevs = 2
Result = PASS
All the tests above shows that the cuda runtime version matches the cuda driver version. I hence don't know where comes from this runtime error, and how to fix it?
Thank you.
From looking at this error in the internet, it might be a conflict between CUDA versions and drivers:
https://github.com/torch/cutorch/issues/809
https://devtalk.nvidia.com/default/topic/1028320/cuda-driver-version-is-insufficient-for-cuda-runtime-version/?offset=6
I'm not sure what would be the best solution here, but maybe downgrading to CUDA 9.0 or updating to CUDA 9.2 would fix it maybe?
Downgrade to cuda 9.0 but still can't help
Collecting environment information...
PyTorch version: 1.0.0.dev20181106
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 6.4.0-17ubuntu1) 6.4.0 20180424
CMake version: version 3.10.2
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
/usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1
/usr/local/cuda-9.0/lib64/libcudnn_static.a
Versions of relevant libraries:
[pip3] numpy (1.15.1)
[pip3] torch (0.4.1)
[conda] pytorch-nightly 1.0.0.dev20181106 py3.7_cuda9.0.176_cudnn7.1.2_0 pytorch
Here is the Trackback:
RuntimeError Traceback (most recent call last)
<ipython-input-51-17ba477dc913> in <module>
1 # compute predictions
----> 2 predictions = coco_demo.run_on_opencv_image(image)
3 imshow(predictions)
4 get_ipython().system('nvcc -V')
~/github/maskrcnn-benchmark/demo/predictor.py in run_on_opencv_image(self, image)
167 the BoxList via `prediction.fields()`
168 """
--> 169 predictions = self.compute_prediction(image)
170 top_predictions = self.select_top_predictions(predictions)
171
~/github/maskrcnn-benchmark/demo/predictor.py in compute_prediction(self, original_image)
198 # compute predictions
199 with torch.no_grad():
--> 200 predictions = self.model(image_list)
201 predictions = [o.to(self.cpu_device) for o in predictions]
202
~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
--> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)
~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py in forward(self, images, targets)
48 images = to_image_list(images)
49 features = self.backbone(images.tensors)
---> 50 proposals, proposal_losses = self.rpn(images, features, targets)
51 if self.roi_heads:
52 x, result, detector_losses = self.roi_heads(features, proposals, targets)
~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
--> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)
~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py in forward(self, images, features, targets)
94 return self._forward_train(anchors, objectness, rpn_box_regression, targets)
95 else:
---> 96 return self._forward_test(anchors, objectness, rpn_box_regression)
97
98 def _forward_train(self, anchors, objectness, rpn_box_regression, targets):
~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py in _forward_test(self, anchors, objectness, rpn_box_regression)
120
121 def _forward_test(self, anchors, objectness, rpn_box_regression):
--> 122 boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
123 if self.cfg.MODEL.RPN_ONLY:
124 # For end-to-end models, the RPN proposals are an intermediate state
~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
--> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)
~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in forward(self, anchors, objectness, box_regression, targets)
136 anchors = list(zip(*anchors))
137 for a, o, b in zip(anchors, objectness, box_regression):
--> 138 sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
139
140 boxlists = list(zip(*sampled_boxes))
~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in forward_for_single_feature_map(self, anchors, objectness, box_regression)
116 self.nms_thresh,
117 max_proposals=self.post_nms_top_n,
--> 118 score_field="objectness",
119 )
120 result.append(boxlist)
~/github/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py in boxlist_nms(boxlist, nms_thresh, max_proposals, score_field)
25 boxes = boxlist.bbox
26 score = boxlist.get_field(score_field)
---> 27 keep = _box_nms(boxes, score, nms_thresh)
28 if max_proposals > 0:
29 keep = keep[: max_proposals]
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/tianchuzhang/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103
Have you tried removing all torch related libs and then reinstall with
conda install pytorch-nightly cuda92 -c pytorch ?
I'l try doing what @lanpa mentioned. I don't really have any other suggestions :-/
Most helpful comment
Have you tried removing all torch related libs and then reinstall with
conda install pytorch-nightly cuda92 -c pytorch?