Thanks for the great implementation of Mask-RCNN.
I am running the codes following the demo of ~/maskrcnn-benchmark/demo/Mask_R-CNN_demo.ipynb.
It works well when I set cfg.merge_from_list(["MODEL.DEVICE", "cpu"]) as cfg.merge_from_list(["MODEL.DEVICE", "cuda"]) where I want to test with GPU environment. However, when I used cfg.merge_from_list(["MODEL.DEVICE", "cuda: 2"]), where I want to run with a specific GPU index, the following error occurred.
THCudaCheck FAIL file=~/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu line=103 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
File "mytest.py", line 56, in
predictions = coco_demo.run_on_opencv_image(image)
File "~/maskrcnn-benchmark/demo/predictor.py", line 167, in run_on_opencv_image
predictions = self.compute_prediction(image)
File "~/maskrcnn-benchmark/demo/predictor.py", line 198, in compute_prediction
predictions = self.model(image_list)
File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(input, *kwargs)
File "~/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(input, *kwargs)
File "~/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 96, in forward
return self._forward_test(anchors, objectness, rpn_box_regression)
File "~/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 122, in _forward_test
boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(input, *kwargs)
File "~/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 138, in forward
sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
File "~/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 118, in forward_for_single_feature_map
score_field="objectness",
File "~/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 27, in boxlist_nms
keep = _box_nms(boxes, score, nms_thresh)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at ~/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103
Is there anything I've made wrong? How could I solve this problem? Thanks for your help!
Thanks for the bug report.
I believe I'm allocating memory in the default device in https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/csrc/cuda/nms.cu#L89 , which makes it not work in your case.
As a quick workaround, can you set in the beginning of the notebook
torch.cuda.set_device(2)
?
Then you can just specify
["MODEL.DEVICE", "cuda"]
and that should be enough I believe
I'll send a proper fix later today
It works well when torch.cuda.set_device(...) is employed. Thanks for your help!
Also running into this issue. The workaround works great for single gpu but is it possible to make it work for multi-gpus? I'd like to e.g be able to use devices 3 to 8 or 2, 4 and 6 (we have shared 8 GPUs machines)
This works for multiple GPUs as well.
You should pass before a CUDA_VISIBLE_DEVICS=2,3 jupyter notebook to restrict the code to use only GPUs 2 and 3.
But indeed the right way of fixing it is to fix the CUDA kernel. I can try fixing it next week.
Most helpful comment
It works well when torch.cuda.set_device(...) is employed. Thanks for your help!