Trying to build and run the repo, and on running I am getting this runtime error:
2019-02-21 02:14:18,430 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
File "tools/train_net.py", line 174, in <module>
main()
File "tools/train_net.py", line 167, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
loss_dict = model(images, targets)
File "/raid/sanjay/.conda/envs/nightly/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in __call__
result = self.forward(*input, **kwargs)
File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/raid/sanjay/.conda/envs/nightly/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in __call__
result = self.forward(*input, **kwargs)
File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 159, in forward
return self._forward_train(anchors, objectness, rpn_box_regression, targets)
File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 175, in _forward_train
anchors, objectness, rpn_box_regression, targets
File "/raid/sanjay/.conda/envs/nightly/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in __call__
result = self.forward(*input, **kwargs)
File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 138, in forward
sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 118, in forward_for_single_feature_map
score_field="objectness",
File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 27, in boxlist_nms
keep = _box_nms(boxes, score, nms_thresh)
RuntimeError: Not compiled with GPU support (nms at /raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/csrc/nms.h:22)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fd5399b58b5 in /raid/sanjay/.conda/envs/nightly/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: nms(at::Tensor const&, at::Tensor const&, float) + 0xd4 (0x7fd52d3313a4 in /raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x14ebf (0x7fd52d33debf in /raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x11d55 (0x7fd52d33ad55 in /raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
I am using nightly pytorch build (installed with conda install -c pytorch pytorch-nightly cuda92). I downloaded both the nightly pytorch build and the maskrcnn repo today (2/21).
Current versions are:
$ python -c "import torch; print(torch.__version__); print(torch.version.cuda)"
1.0.0.dev20190201
9.0.176
$ conda list | grep torch
cuda92 1.0 0 pytorch
libtorch 0.1.12 nomkl_0
pytorch-ignite 0.1.2 <pip>
pytorch-nightly 1.0.0.dev20190201 py3.6_cuda9.0.176_cudnn7.4.1_0 pytorch
torch 1.0.1.post2 <pip>
torchvision-nightly 0.2.1 <pip>
md5-364a6b5ae0ebb86d3a84f9495900d9e6
$ git log -1
commit b23eee0cb72af70f4e4a72e73537f0884cfd1cff
Author: Stzpz <[email protected]>
Date: Wed Feb 20 07:47:10 2019 -0800
Supported FBNet architecture. (#463)
I have seen other closed issues re: this problem and I have tried to follow the solutions in those issues but am still experiencing this error. I would appreciate any help on this. Thanks!
Could you please tell what is the output of python -c "import torch;from torch.utils.cpp_extension import CUDA_HOME;print(CUDA_HOME);print(torch.cuda.is_available())" ?
It should be something like :
/usr/local/cuda
True
ah! your question helped me realize my mistake. I was not running python setup.py with the CUDA_VISIBLE_DEVICES flag so the cuda code was not being compiled on any gpu. Problem fixed. Thanks for your help!
Thanks @LeviViana for helping out figure the reason!
Can you tell me how to run python setup.py with the CUDA_VISIBLE_DEVICES flag?
On the command line I just ran
$ CUDA_VISIBLE_DEVICES=0 python setup.py <options>
with a 0 because when I ran $ nvidia-smi it showed the ID of the gpu as 0 (and with the options in the INSTALL.md instructions). This fixed the issue for me because running
$ python -c "import torch; print(torch.cuda.is_available())"
printed False but running
$ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.is_available())"
printed True.
Thanks @sanjmohan
I'm getting the same error even after setting CUDA_VISIBLE_DEVICES flag. Also, even $ python -c "import torch; print(torch.cuda.is_available())" is returning True for me.
Can you please help me with this?
Hey no sure but I have same issue here ..
``` (maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ python -c "import torch; print(torch.__version__); print(torch.version.cuda)"
1.3.0
10.1.243
(maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ python -c "import torch; print(torch.__version__); print(torch.version.cuda)"
1.3.0
10.1.243
(maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ python -c "import torch; from torch.utils.cpp_extension import CUDA_HOME; print(CUDA_HOME); print(torch.cuda.is_available())"
/usr/local/cuda-10.0
True
(maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ nvidia-smi
Thu Jan 23 13:04:01 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34 Driver Version: 430.34 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:07:00.0 On | N/A |
| 0% 41C P8 N/A / 120W | 110MiB / 1999MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1019 G /usr/lib/xorg/Xorg 107MiB |
+-----------------------------------------------------------------------------+
(maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.is_available())"
True```
how can i recompile maskrcnn-benchmark with cuda enabled ?
ah! your question helped me realize my mistake. I was not running
python setup.pywith theCUDA_VISIBLE_DEVICESflag so the cuda code was not being compiled on any gpu. Problem fixed. Thanks for your help!
That's quite helpful! I also can't compile before because I don't find CUDA_NAME in my computer, and I reinstall my NIVIDA driver. It works!!
Most helpful comment
Could you please tell what is the output of
python -c "import torch;from torch.utils.cpp_extension import CUDA_HOME;print(CUDA_HOME);print(torch.cuda.is_available())"?It should be something like :
/usr/local/cudaTrue