Vision: FasterRCNN to ONNX model

Created on 30 Dec 2019 · 24Comments · Source: pytorch/vision

Hi there,
I tried to convert a fasterrcnn model to onnx format, and followed the instruction from test/test_onnx.py https://github.com/pytorch/vision/blob/master/test/test_onnx.py.

Here is my code:
model=models.detection.faster_rcnn.fasterrcnn_resnet50_fpn(pretrained=True,min_size=800,max_size=1333)
image=cv2.imread("test.jpg")
image=cv2.resize(image,(1333,800))
image1 = Image.fromarray(cv2.cvtColor(image.copy(),cv2.COLOR_BGR2RGB))
image_tensor=to_tensor(image1)
model.eval()
onnx_io = io.BytesIO()
torch.onnx.export(model, [image_tensor], "faster_rcnn.onnx",do_constant_folding=True, opset_version=_onnx_opset_version)

I have succeed convert the model with the above code, however, when I tried to convert the tensor and model to cuda tensor with .to(device), there is an error that isline 359, in _get_top_n_idx r.append(top_n_idx + offset) RuntimeError: expected device cuda:0 but got device cpu.
I don't know how to solve it.

Please help me with that.

Cheers!

bug models onnx object detection

Source

Finniu

Most helpful comment

I replace the dummy input:

input_data = [torch.rand((3, 600, 600), device = cpu_device)]

with:

input_data = [torch.randn((3, 600, 600), device = cpu_device)]

and it worked.

This issue may be related to Export object detection model to ONNX:empty output by ONNX inference.

danilopeixoto on 14 Jul 2020

👍2

All 24 comments

Hi,

Sorry for the delay in replying.

My advise would be to make sure you convert your inputs and model to CUDA before exporting to ONNX, this is the safest way.

So it would look something like:

model=models.detection.faster_rcnn.fasterrcnn_resnet50_fpn(pretrained=True,min_size=800,max_size=1333)
image=cv2.imread("test.jpg")
image=cv2.resize(image,(1333,800))
image1 = Image.fromarray(cv2.cvtColor(image.copy(),cv2.COLOR_BGR2RGB))
image_tensor=to_tensor(image1)
model.eval()
model.cuda()
image_tensor = image_tensor.cuda()
# just to be safe, run it once to initialize all buffers
out = model([image_tensor])
# now export it
onnx_io = io.BytesIO()
torch.onnx.export(model, [image_tensor], "faster_rcnn.onnx",do_constant_folding=True, opset_version=_onnx_opset_version)

Let me know if you still have issues.

fmassa on 8 Jan 2020

I also get this error with both FasterRCNN and MaskRCNN, and I'm sure that the model and input tensor are on the GPU. I also run the model once before exporting. Exporting with device = 'cpu' works. It's not specific to ONNX export, the error also appears just by trying to torch.jit.trace the model.

img = Image.open(sys.argv[1]).convert('RGB')
img = np.array(img)

device = 'cuda'
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True, min_size=800, max_size=800)
model.eval()
model.to(device)

img_ = transforms.ToTensor()(img)
img_ = img_.to(device)

out = model([img_])

torch.onnx.export(model, ([img_],), "/tmp/mask_rcnn.onnx", verbose=True, do_constant_folding=True, opset_version=11)

  File "segment_image.py", line 119, in <module>
    torch.onnx.export(model, ([img_],), "/tmp/mask_rcnn.onnx", verbose=True, do_constant_folding=True, opset_version=11)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/__init__.py", line 156, in export
    custom_opsets)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/utils.py", line 67, in export
    custom_opsets=custom_opsets)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/utils.py", line 466, in _export
    fixed_batch_size=fixed_batch_size)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/utils.py", line 319, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/opt/env/lib/python3.7/site-packages/torch/onnx/utils.py", line 276, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, _force_outplace=False, _return_inputs_states=True)
  File "/opt/env/lib/python3.7/site-packages/torch/jit/__init__.py", line 282, in _get_trace_graph
    outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torch/jit/__init__.py", line 365, in forward
    self._force_outplace,
  File "/opt/env/lib/python3.7/site-packages/torch/jit/__init__.py", line 352, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 537, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 523, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 70, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 537, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 523, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/opt/env/lib/python3.7/site-packages/torchvision/models/detection/rpn.py", line 472, in forward
    boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
  File "/opt/env/lib/python3.7/site-packages/torchvision/models/detection/rpn.py", line 379, in filter_proposals
    top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
  File "/opt/env/lib/python3.7/site-packages/torchvision/models/detection/rpn.py", line 359, in _get_top_n_idx
    r.append(top_n_idx + offset)
RuntimeError: expected device cuda:0 but got device cpu

>>> import torchvision; torchvision.__version__
'0.5.0.dev20200108+cu100'
>>> import torch; torch.__version__
'1.5.0.dev20200109+cu100'

janstrohbeck on 9 Jan 2020

@janstrohbeck Thanks for the detailed report!

After digging a bit further, there seems to be a couple of issues. The first one is that torch.onnx.operators.shape_as_tensor doesn't take the device of the original tensor into account, so that https://github.com/pytorch/vision/blob/61763fa955ef74077a1d3e1aa5da36f7c606943a/torchvision/models/detection/rpn.py#L21
is always a CPU tensor, and the second one is that once we fix the above we also need to fix https://github.com/pytorch/vision/blob/61763fa955ef74077a1d3e1aa5da36f7c606943a/torchvision/models/detection/rpn.py#L24-L26 to use the device of the original tensor.

@lara-hdr do you think we should change shape_as_tensor in pytorch ONNX to take the device of the original tensor into account as well? Otherwise we can just add casts in the model right away as a workaround solution.

@janstrohbeck @Finniu in the mean time, please convert the model to CPU before exporting to ONNX.

fmassa on 13 Jan 2020

Unsure of whether or not this is coincidental, but I successfully exported the model to ONNX while the model was on the CPU. When serving the ONNX model in a TensorRT server, the model mostly evaluates on the CPU even though the server supposedly loads the model onto the GPU. I know this because, while evaluating, my CPU goes to almost 100% while my GPU utilization remains below 10%.

Could this be related? Without understanding too much about how torch.onnx.export method works, it's unclear to me whether evaluating the model on the CPU during tracing leads to the ONNX model executing on the CPU.

nikhilshinday on 25 Jan 2020

@nikhilshinday I don't know the answer to your question, maybe @lara-hdr knows it?

fmassa on 27 Jan 2020

@nikhilshinday, torch.onnx.export() does not track if the model was on CPU/GPU when exported, and the exported ONNX model should run on the device you specify regardless of if it was running on CPU/GPU when exported.
I am not sure why CPU utilization goes up when you load the ONNX model on GPU; do you know if the engine you are using the run the ONNX model fully supports running it on GPU?

lara-hdr on 27 Jan 2020

👍1

Hi there, any updates?

Finniu on 15 May 2020

Although exporting the model in GPU mode fails, exporting it in CPU mode and then loading it into GPU-enabled onnx runtime (using onnxruntime-gpu PyPi package) works just fine.
I'm using torch==1.5.0 and torchvision==0.6.0

raviv on 15 May 2020

Although exporting the model in GPU mode fails, exporting it in CPU mode and then loading it into GPU-enabled onnx runtime (using onnxruntime-gpu PyPi package) works just fine.
I'm using torch==1.5.0 and torchvision==0.6.0

@raviv Thanks, I will try. Another question is have you tried to convert the onnx model to tensorrt?

Finniu on 15 May 2020

@Finniu No, I don't use tensorrt.

BTW, if you want to export maskrcnn_resnet50_fpn so that it accepts any input size, do:

dynamic_axes = {'input': [0, 2, 3], 'output': [0, 2, 3]}
torch.onnx.export(net,  ..., dynamic_axes=dynamic_axes)

raviv on 15 May 2020

Exporting Faster R-CNN model:

...
device = torch.device('cuda')

model.to(device)
input = torch.randn((1, 3, 600, 600), device = device)

torch.onnx.export(model, input, 'model.onnx', opset_version = 11)
...

Error:

```
/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/rpn.py in _get_top_n_idx(self, objectness, num_anchors_per_level)
372 pre_nms_top_n = min(self.pre_nms_top_n(), num_anchors)
373 _, top_n_idx = ob.topk(pre_nms_top_n, dim=1)
--> 374 r.append(top_n_idx + offset)
375 offset += num_anchors
376 return torch.cat(r, dim=1)

RuntimeError: expected device cuda:0 but got device cpu
```
I exported successfully in CPU mode. GPU not supported?

danilopeixoto on 16 May 2020

❤1 👍1

@danilopeixoto This error has been reported above.
My solution was to export using CPU.
On inference you can use CPU, GPU, TensorRT, etc. depending on onnx runtime you use.
I'm using this one https://microsoft.github.io/onnxruntime/ and very happy with it.

raviv on 16 May 2020

👍2

Hello Team,

I am trying to convert the FasterRCNN to ONNX. I was able to successfully export to ONNX, but I am not able to infer any image. I tried to export the model to have a dynamic input size for image as well. Still with no luck.

I am unable to get clear instructions on what should be image data input to the model.
I think I am messing up somewhere in the input to the model

Below is the code I am trying to implement to export and infer.

# **This piece of code is implemented from the test_onnx.py file**

def get_image_from_url(url, size=None):
    import requests
    from PIL import Image
    from io import BytesIO
    from torchvision import transforms as T

    data = requests.get(url)
    image = Image.open(BytesIO(data.content)).convert("RGB")

    if size is None:
        size = (300, 200)
    image = image.resize(size, Image.BILINEAR)
#     plt.imshow(image)

#     img = Image.open(img_path) # Load the image

    transform = T.Compose([T.ToTensor()]) # Defing PyTorch Transform
    image = transform(image)



#     to_tensor = transforms.ToTensor()
    return image

def get_test_images():
    image_url = "http://farm3.staticflickr.com/2469/3915380994_2e611b1779_z.jpg"
    image = get_image_from_url(url=image_url, size=(100, 320))

    image_url2 = "https://pytorch.org/tutorials/_static/img/tv_tutorial/tv_image05.png"
    image2 = get_image_from_url(url=image_url2, size=(250, 380))

    images = image
    test_images = [image2]
    return images, test_images

images, test_images = get_test_images()

dummy_input = torch.randn(1, 3, 224, 224)

model_name = r"fasterrcnn_resnet50_fpn_dynamic_try4_with_image_input"
final_path = model_name + ".onnx"
dynamic_axes = {'input': [0, 2, 3], 'output': [0, 2, 3]}

torch.onnx.export(model, images.unsqueeze(0),final_path ,
                  do_constant_folding=True, opset_version=11,
                  dynamic_axes=dynamic_axes, input_names=['input'], output_names=['output'])

Below is the code I am using to use the onnx model to infer.

folder = my path

model_name = r"fasterrcnn_resnet50_fpn_dynamic_try4_with_image_input.onnx"

final_path = os.path.join(folder,model_name)

# Load the ONNX model
model_onnx = onnx.load(final_path)

# Check that the IR is well formed
onnx.checker.check_model(model_onnx)

# Print a human readable representation of the graph
print(onnx.helper.printable_graph(model_onnx.graph))

import onnxruntime as rt

sess = rt.InferenceSession(final_path)

input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

pred = sess.run([output_name], {input_name:images})

This crashes my kernel and restarts it. I don't know why?
I think I am messing up with the input type and dimensions.

Could you please help me to get this onnx model up and running!!

Attached is the graph log for the converted model.
Let me know if I am missing somewhere in the conversion procedure as well.
torchvision_frcnn_try4_dynamic_onnx_log.docx

Thanks a lot!!

veer5551 on 12 Jun 2020

Torchvision Faster R-CNN model does not support dynamic input shape according to documentation.

Faster R-CNN is exportable to ONNX for a fixed batch size with inputs images of fixed size.

danilopeixoto on 12 Jun 2020

@danilopeixoto Dynamic shape support should now work on ONNX if using a very recently torchvision nightly

fmassa on 12 Jun 2020

👍1

Hey @danilopeixoto, @fmassa
Thank you for the suggestions.
But I am still not able to get any output either for a fix or dynamic input image.

Could you please have a look into the code let me know where am I going wrong?

Also, I tried to tun the test_faster_rcnn from the latest test_onnx.py (here) file and I got the following error.
As I am a newbie here, I don't exactly know what this error means:

log:

>>> test_object = test_onnx.ONNXExporterTester()
>>> test_object.test_faster_rcnn()
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\nn\functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
..\torch\csrc\utils\python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
        nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
        nonzero(Tensor input, *, bool as_tuple)
..\aten\src\ATen\native\BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torchvision\models\detection\rpn.py:164: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[1] / g[1], dtype=torch.int64, device=device)] for g in grid_sizes]
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\tensor.py:467: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  'incorrect results).', category=RuntimeWarning)
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torchvision\ops\boxes.py:117: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device))
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torchvision\ops\boxes.py:119: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device))
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torchvision\models\detection\transform.py:217: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  for s, s_orig in zip(new_size, original_size)
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\onnx\symbolic_opset9.py:2115: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  "If indices include negative values, the exported graph will produce incorrect results.")
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\onnx\utils.py:915: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input images_tensors
  'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\onnx\utils.py:915: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input outputs
  'Automatically generated names will be applied to each dynamic axes of input {}'.format(key))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\test_onnx.py", line 357, in test_faster_rcnn
    tolerate_small_mismatch=True)
  File "C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\test_onnx.py", line 49, in run_model
    self.ort_validate(onnx_io, test_inputs, test_ouputs, tolerate_small_mismatch)
  File "C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\test_onnx.py", line 71, in ort_validate
    torch.testing.assert_allclose(outputs[i], ort_outs[i], rtol=1e-03, atol=1e-05)
  File "C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\Lib\site-packages\torch\testing\__init__.py", line 24, in assert_allclose
    expected = expected.expand_as(actual)
RuntimeError: The expanded size of the tensor (52) must match the existing size (54) at non-singleton dimension 0.  Target sizes: [52, 4].  Tensor sizes: [54, 4]

Thanks a lot!

veer5551 on 12 Jun 2020

cc @neginraoof if you could have a look

fmassa on 12 Jun 2020

cc @neginraoof if you could have a look

Hi @fmassa, is there any tool that I can convert the faster rcnn onnx to tensorrt? I have tried with onnx-tensorrt, but failed in nms converting. Thanks

finnickniu on 17 Jun 2020

Hi. I am also experiencing the original RuntimeError: expected device cuda:0 but got device cpu message when JIT tracing any rcnn model (on the same line as OP). Strangely, the first iteration through for ob in objectness.split(num_anchors_per_level, 1) succeeds, but the second one fails.

I am working on a JIT related project that requires the model to be traced on the gpu, so the export on cpu workaround does not apply to me. I am not concerned with ONNX right now, only Torchscript. Is there a timeline on a fix for this? Even guidance on how to fix this myself would be appreciated. @fmassa

benschreiber on 18 Jun 2020

Hi,

I was able to export the model to ONNX and it was working fine, but now only empty detections are returned. I've tried to downgrade the package versions, change the opset, check the code for changes.

Model inference using PyTorch still works.

Has anyone experienced this issue exporting Faster RCNN model to ONNX?

danilopeixoto on 13 Jul 2020

👍1

I replace the dummy input:

input_data = [torch.rand((3, 600, 600), device = cpu_device)]

with:

input_data = [torch.randn((3, 600, 600), device = cpu_device)]

and it worked.

This issue may be related to Export object detection model to ONNX:empty output by ONNX inference.

danilopeixoto on 14 Jul 2020

👍2

I replace the dummy input:
input_data = [torch.rand((3, 600, 600), device = cpu_device)]
with:
input_data = [torch.randn((3, 600, 600), device = cpu_device)]
and it worked.

This issue may be related to Export object detection model to ONNX:empty output by ONNX inference.

Hi @danilopeixoto, do I get this right that your only change was from rand to randn?
I am experiencing the same issue.

FraPochetti on 19 Jan 2021

@FraPochetti Yes, that was the only change in the code.

danilopeixoto on 19 Jan 2021

@danilopeixoto Thanks! Do you happen to have the code snippet you used by any chance?
(as you probably did yourself) I tried a ton of things and nothing is really working, yours included, unfortunately.
Maybe I am doing something really silly somewhere else.

FraPochetti on 19 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings