Detectron2: RuntimeError: No such operator torchvision::nms

Created on 16 Oct 2019  路  6Comments  路  Source: facebookresearch/detectron2

hi, all, I'm trying to train a model using detectron2.
I compiled to install pytorch'1.3.0a0+e367f60', and installed tochvision'0.5.0a0+da89dad',both of which were installed from source code.

When I run the command here:https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md
python tools/train_net.py --num-gpus 8 \
--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml
an error occured as follows:

Traceback (most recent call last):
File "tools/train_net.py", line 161, in
args=(args,),
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 5 terminated with the following error:
Traceback (most recent call last):
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, args)
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/launch.py", line 84, in _distributed_worker
main_func(
args)
File "/home/users/gaosiqi/download/detectron2/tools/train_net.py", line 149, in main
return trainer.train()
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/defaults.py", line 329, in train
super().train(self.start_iter, self.max_iter)
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/users/gaosiqi/download/detectron2/detectron2/engine/train_loop.py", line 212, in run_step
loss_dict = self.model(data)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 545, in __call__
result = self.forward(input, *kwargs)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(inputs[0], *kwargs[0])
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 545, in __call__
result = self.forward(input, *kwargs)
File "/home/users/gaosiqi/download/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 82, in forward
proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 545, in __call__
result = self.forward(input, *kwargs)
File "/home/users/gaosiqi/download/detectron2/detectron2/modeling/proposal_generator/rpn.py", line 179, in forward
self.training,
File "/home/users/gaosiqi/download/detectron2/detectron2/modeling/proposal_generator/rpn_outputs.py", line 136, in find_top_rpn_proposals
keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)
File "/home/users/gaosiqi/download/detectron2/detectron2/layers/nms.py", line 17, in batched_nms
return box_ops.batched_nms(boxes, scores, idxs, iou_threshold)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torchvision-0.5.0a0+da89dad-py3.7-linux-x86_64.egg/torchvision/ops/boxes.py", line 70, in batched_nms
keep = nms(boxes_for_nms, scores, iou_threshold)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torchvision-0.5.0a0+da89dad-py3.7-linux-x86_64.egg/torchvision/ops/boxes.py", line 31, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File "/home/users/gaosiqi/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/_ops.py", line 61, in __getattr__
op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator torchvision::nms

Environments:
centos6.3
cuda10.0
cudnn7.0.3
pytorch'1.3.0a0+e367f60'
tochvision'0.5.0a0+da89dad'

upstream issues

Most helpful comment

I uninstalled torchvision and checkout to v0.4.0, and resetup, now the error disappears !
torchvision: '0.4.0a0+d31eafa'.

All 6 comments

I uninstalled torchvision and checkout to v0.4.0, and resetup, now the error disappears !
torchvision: '0.4.0a0+d31eafa'.

@fmassa I can see the same error after I download and install

pip install torch-1.4.0.dev20191017+cu100-cp37-cp37m-linux_x86_64.whl torchvision-0.5.0.dev20191017+cu100-cp37-cp37m-linux_x86_64.whl

@ppwwyyxx thanks for the heads up. I'll try fixing it tomorrow morning Paris time. Can you open an issue in TorchVision?

I'm facing this issue with torch 1.5.0 and torchvision 0.6.0a0+82fd1c8, downgrading to torch 1.4.0 and torchvision 0.5.0 did not helped.

What's the current solution?

I started from a clean env, and installed pytorch with conda install pytorch==1.4.0 torchvision cudatoolkit=10.1 -c pytorch, leaving me with torchvision 0.5.0, the issue seems to be gone now.

I'm not exactly sure why, maybe I did something wrong when downgrading torch before.

I don't know if it matters, but I'm on python 3.8.2

@lgvaz Thanks for posting your fix. I had this same problem and downgrading to torchvision 0.5.0 fixed it for me. I'll note that I had this working under CPU: I only encountered this error when I had the cudatoolkit installation active in the environment (Python 3.7.x).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

choasup picture choasup  路  3Comments

limsijie93 picture limsijie93  路  3Comments

Ormagardskvaedi picture Ormagardskvaedi  路  4Comments

marcoippolito picture marcoippolito  路  4Comments

LotharTUM picture LotharTUM  路  3Comments