Detectron2: RuntimeError: No such operator torchvision::nms

Created on 16 Oct 2019 · 6Comments · Source: facebookresearch/detectron2

hi, all, I'm trying to train a model using detectron2.
I compiled to install pytorch'1.3.0a0+e367f60', and installed tochvision'0.5.0a0+da89dad',both of which were installed from source code.

When I run the command here:https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md
python tools/train_net.py --num-gpus 8 \
--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml
an error occured as follows:

Traceback (most recent call last):
File "tools/train_net.py", line 161, in
args=(args,),
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 5 terminated with the following error:
Traceback (most recent call last):
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, args)
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/launch.py", line 84, in _distributed_worker
main_func(args)
File "/home/users/gaosiqi/download/detectron2/tools/train_net.py", line 149, in main
return trainer.train()
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/defaults.py", line 329, in train
super().train(self.start_iter, self.max_iter)
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/train_loop.py", line 212, in run_step
loss_dict = self.model(data)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 545, in __call__
result = self.forward(input, *kwargs)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(inputs[0], *kwargs[0])
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 545, in __call__
result = self.forward(input, *kwargs)
File "/home/users/gaosiqi/download/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 82, in forward
proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 545, in __call__
result = self.forward(input, *kwargs)
File "/home/users/gaosiqi/download/detectron2/detectron2/modeling/proposal_generator/rpn.py", line 179, in forward
self.training,
File "/home/users/gaosiqi/download/detectron2/detectron2/modeling/proposal_generator/rpn_outputs.py", line 136, in find_top_rpn_proposals
keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)
File "/home/users/gaosiqi/download/detectron2/detectron2/layers/nms.py", line 17, in batched_nms
return box_ops.batched_nms(boxes, scores, idxs, iou_threshold)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torchvision-0.5.0a0+da89dad-py3.7-linux-x86_64.egg/torchvision/ops/boxes.py", line 70, in batched_nms
keep = nms(boxes_for_nms, scores, iou_threshold)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torchvision-0.5.0a0+da89dad-py3.7-linux-x86_64.egg/torchvision/ops/boxes.py", line 31, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/_ops.py", line 61, in __getattr__
op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator torchvision::nms

Environments:
centos6.3
cuda10.0
cudnn7.0.3
pytorch'1.3.0a0+e367f60'
tochvision'0.5.0a0+da89dad'

upstream issues

Source

githubgsq

Most helpful comment

I uninstalled torchvision and checkout to v0.4.0, and resetup, now the error disappears !
torchvision: '0.4.0a0+d31eafa'.

githubgsq on 16 Oct 2019

👍2 🎉1

All 6 comments

I uninstalled torchvision and checkout to v0.4.0, and resetup, now the error disappears !
torchvision: '0.4.0a0+d31eafa'.

githubgsq on 16 Oct 2019

👍2 🎉1

@fmassa I can see the same error after I download and install

pip install torch-1.4.0.dev20191017+cu100-cp37-cp37m-linux_x86_64.whl torchvision-0.5.0.dev20191017+cu100-cp37-cp37m-linux_x86_64.whl

ppwwyyxx on 17 Oct 2019

@ppwwyyxx thanks for the heads up. I'll try fixing it tomorrow morning Paris time. Can you open an issue in TorchVision?

fmassa on 17 Oct 2019

👍1

I'm facing this issue with torch 1.5.0 and torchvision 0.6.0a0+82fd1c8, downgrading to torch 1.4.0 and torchvision 0.5.0 did not helped.

What's the current solution?

lgvaz on 29 Apr 2020

I started from a clean env, and installed pytorch with conda install pytorch==1.4.0 torchvision cudatoolkit=10.1 -c pytorch, leaving me with torchvision 0.5.0, the issue seems to be gone now.

I'm not exactly sure why, maybe I did something wrong when downgrading torch before.

I don't know if it matters, but I'm on python 3.8.2

lgvaz on 29 Apr 2020

🎉1

@lgvaz Thanks for posting your fix. I had this same problem and downgrading to torchvision 0.5.0 fixed it for me. I'll note that I had this working under CPU: I only encountered this error when I had the cudatoolkit installation active in the environment (Python 3.7.x).