Yolov5: ONNX Export as a INT32 matrix

Created on 1 Jul 2020 · 31Comments · Source: ultralytics/yolov5

🚀 Feature

When exporting and ONNX model add the option to use INT32 matrix instead of INT64 as INT64 is not supported by OpenCV

Motivation

I exported the model as an ONNX model, the I tried to import it in OpenCV 4.2.0 using cv2.dnn.readNetFromONNX(model_path) and I got the following error:

self.model = cv2.dnn.readNetFromONNX(model_path)
cv2.error: OpenCV(4.2.0) /io/opencv/modules/dnn/src/onnx/onnx_importer.cpp:101: error: (-211:One of the arguments' values is out of range) Input is out of OpenCV 32S range in function 'convertInt64ToInt32'

Pitch

I want to have a parameter when exporting to be able to select INT32 matrix.

Alternatives

An alternative solution could be to just use INT32 always instead of INT64.

Additional context

https://github.com/opencv/opencv/issues/14830

Stale enhancement

Source

edurenye

👍4

Most helpful comment

Well according to https://github.com/opencv/opencv/issues/14830#issuecomment-503279466 OpenCV tries to convert it to INT32 but it fails because it is out of the range.

Lot's of people like me will try to use it on the edge with OpenCV and OpenVINO because of how light your model is, makes it ideal for those use cases.

Can't I just remove those tensors somehow when exporting to ONNX?

edurenye on 1 Jul 2020

👍3

All 31 comments

Hello @edurenye, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

github-actions[bot] on 1 Jul 2020

@edurenye thanks for the comment. There is not specific requirement for int64 anywhere in yolov5, so I don't know exactly where these variables are originating or why their datatype is as such. If you debug this and find a solution then please let us know.

A hint I have is that it probably has nothing to do with ONNX, it's probably a pytorch variable in int64 for some reason that's converted to the same in ONNX.

glenn-jocher on 1 Jul 2020

Thanks @glenn-jocher. My guess is that comes from this issue https://github.com/pytorch/pytorch/issues/7870, we have to find where is using the LongTensor and force it to int32 using dtype=torch.int as specified in that thread.

edurenye on 1 Jul 2020

Hi @glenn-jocher I tried to add the dtype=torch.int without luck, see https://github.com/ultralytics/yolov5/compare/master...edurenye:remove_int64
I still get:

Model Summary: 140 layers, 7.26e+06 parameters, 6.61683e+06 gradients
graph torch-jit-export (
  %images[FLOAT, 1x3x416x416]
) initializers (
  %483[INT64, 1]
  %484[INT64, 1]
  %485[INT64, 1]
  %486[INT64, 1]
  %487[INT64, 1]
  %488[INT64, 1]
  %model.0.conv.conv.bias[FLOAT, 32]
...

edurenye on 1 Jul 2020

As well as #225 I wonder from where those parameters come from.

edurenye on 1 Jul 2020

Hi @glenn-jocher
Our issue is the same as here https://github.com/pytorch/pytorch/issues/16218

So I used this code:

layer_names = list()
        for name, param_tensor in model.state_dict().items():
            if param_tensor.dtype == torch.int64:
                print('hola 1')
                new_param = param_tensor.int()
                print('hola 2')
                rsetattr(model, name, new_param)
                layer_names.append(name)
        print(layer_names)

To find the Tensors that are INT64 and transform them to INT32 in the export.py file.

And It got me the following list of Tensors:

['model.0.conv.bn.num_batches_tracked', 'model.1.bn.num_batches_tracked', 'model.2.cv1.bn.num_batches_tracked', 
'model.2.cv4.bn.num_batches_tracked', 'model.2.bn.num_batches_tracked', 'model.2.m.0.cv1.bn.num_batches_tracked', 
'model.2.m.0.cv2.bn.num_batches_tracked', 'model.3.bn.num_batches_tracked', 'model.4.cv1.bn.num_batches_tracked', 
'model.4.cv4.bn.num_batches_tracked', 'model.4.bn.num_batches_tracked', 'model.4.m.0.cv1.bn.num_batches_tracked', 
'model.4.m.0.cv2.bn.num_batches_tracked', 'model.4.m.1.cv1.bn.num_batches_tracked', 
'model.4.m.1.cv2.bn.num_batches_tracked', 'model.4.m.2.cv1.bn.num_batches_tracked', 
'model.4.m.2.cv2.bn.num_batches_tracked', 'model.5.bn.num_batches_tracked', 'model.6.cv1.bn.num_batches_tracked', 
'model.6.cv4.bn.num_batches_tracked', 'model.6.bn.num_batches_tracked', 'model.6.m.0.cv1.bn.num_batches_tracked', 
'model.6.m.0.cv2.bn.num_batches_tracked', 'model.6.m.1.cv1.bn.num_batches_tracked', 
'model.6.m.1.cv2.bn.num_batches_tracked', 'model.6.m.2.cv1.bn.num_batches_tracked', 
'model.6.m.2.cv2.bn.num_batches_tracked', 'model.7.bn.num_batches_tracked', 'model.8.cv1.bn.num_batches_tracked', 
'model.8.cv2.bn.num_batches_tracked', 'model.9.cv1.bn.num_batches_tracked', 'model.9.cv4.bn.num_batches_tracked', 
'model.9.bn.num_batches_tracked', 'model.9.m.0.cv1.bn.num_batches_tracked', 'model.9.m.0.cv2.bn.num_batches_tracked', 
'model.10.bn.num_batches_tracked', 'model.13.cv1.bn.num_batches_tracked', 'model.13.cv4.bn.num_batches_tracked', 
'model.13.bn.num_batches_tracked', 'model.13.m.0.cv1.bn.num_batches_tracked', 
'model.13.m.0.cv2.bn.num_batches_tracked', 'model.14.bn.num_batches_tracked', 'model.17.cv1.bn.num_batches_tracked', 
'model.17.cv4.bn.num_batches_tracked', 'model.17.bn.num_batches_tracked', 'model.17.m.0.cv1.bn.num_batches_tracked', 
'model.17.m.0.cv2.bn.num_batches_tracked', 'model.19.bn.num_batches_tracked', 'model.21.cv1.bn.num_batches_tracked', 
'model.21.cv4.bn.num_batches_tracked', 'model.21.bn.num_batches_tracked', 'model.21.m.0.cv1.bn.num_batches_tracked', 
'model.21.m.0.cv2.bn.num_batches_tracked', 'model.23.bn.num_batches_tracked', 'model.25.cv1.bn.num_batches_tracked', 
'model.25.cv4.bn.num_batches_tracked', 'model.25.bn.num_batches_tracked', 'model.25.m.0.cv1.bn.num_batches_tracked', 
'model.25.m.0.cv2.bn.num_batches_tracked']

As you can see, the problem is num_batches_tracked as well as in this issue: https://github.com/pytorch/pytorch/issues/16218

Looks like the transformation worked, because afterwards I checked the Tensors again and there where no INT64 Tensors, but when I did the export to ONNX I got the following error:

ONNX export failed.

Is there a way to have a more verbose output?

edurenye on 1 Jul 2020

I found that I had something later that was breaking the code, I fixed it and now it 'works'.
Well, it exports it as ONNX, but I still get the same output, like if I did not transformation:

Fusing layers...
Model Summary: 140 layers, 7.26e+06 parameters, 6.61683e+06 gradients
graph torch-jit-export (
  %images[FLOAT, 1x3x416x416]
) initializers (
  %483[INT64, 1]
  %484[INT64, 1]
  %485[INT64, 1]
  %486[INT64, 1]
  %487[INT64, 1]
  %488[INT64, 1]
  %model.0.conv.conv.bias[FLOAT, 32]

Any further ideas?

edurenye on 1 Jul 2020

@edurenye ok. These are just batch-norm statistics. It makes sense then that pytorch matinains them as int64's to decrease the chance of them overflowing. I would say since there is no ONNX error (the export process runs a full suite of checks), then this is just between ONNX and opencv.

https://github.com/ultralytics/yolov5/blob/dfd63de20a92152e53ed13346322fe91369de240/models/export.py#L49-L53

glenn-jocher on 1 Jul 2020

Well according to https://github.com/opencv/opencv/issues/14830#issuecomment-503279466 OpenCV tries to convert it to INT32 but it fails because it is out of the range.

Lot's of people like me will try to use it on the edge with OpenCV and OpenVINO because of how light your model is, makes it ideal for those use cases.

Can't I just remove those tensors somehow when exporting to ONNX?

edurenye on 1 Jul 2020

👍3

@edurenye yes, I agree export should be easier, but this is the current state of affairs. The ONNX guys will say they are doing their job correctly, as will the opencv guys and the pytorch guys, and they are all technically correct since their responsibilities don't really extend past their packages, and all the packages are working correctly in their standalone capacities.

By the way, you can get a verbose export by setting the verbose flag:
https://github.com/ultralytics/yolov5/blob/e74ccb2985ea747e1d4a2d92cad5f4f7738fb54f/models/export.py#L42-L48

glenn-jocher on 1 Jul 2020

@edurenye just realized that there are only 6 int64's in your onnx export, but there are many more batchnorm values. The 6 int64's must originate somewhere else.

glenn-jocher on 1 Jul 2020

Yes @glenn-jocher, because the code should turn them into int32, but they are still there.

I used the debug and I could find the origin of them, they are used in the 'Concat' operations, the following is the ending of the graph:

%468 : Tensor = onnx::Unsqueeze[axes=[0]](%459)
  %471 : Tensor = onnx::Unsqueeze[axes=[0]](%462)
  %472 : Tensor = onnx::Unsqueeze[axes=[0]](%465)
  %473 : Tensor = onnx::Concat[axis=0](%468, %480, %481, %471, %472)
  %474 : Float(1:89232, 3:29744, 11:2704, 52:52, 52:1) = onnx::Reshape(%384, %473) # /usr/src/app/models/yolo.py:26:0
  %475 : Float(1:89232, 3:29744, 52:572, 52:11, 11:1) = onnx::Transpose[perm=[0, 1, 3, 4, 2]](%474) # /usr/src/app/models/yolo.py:26:0
  return (%output, %456, %475)

The 'Concat' has 5 inputs, 3 are the outputs from the 'Unsqueeze' and the other 2 are this INT64, there are 3 of this blocks of layers, so it makes the 6 parameters INT64

Here is a image using Netron:
end_of_graph

I guess this is part of the 'Detect' block, right?

edurenye on 1 Jul 2020

So basically, I was not getting the INT64, because is the transformation of the model to ONNX who generates them, if I understand this correctly. So I should somehow turn the INT64 to INT32 after exporting the model to ONNX. But I don't really know how to do that...

edurenye on 1 Jul 2020

BTW the 2 INT64 values are the same in the 3 cases, their values are 3 and 11:
details_of_concat

edurenye on 2 Jul 2020

@edurenye I pushed a few updates to export.py today for improved introspection and incremented to opset 12. The initial int64's no longer appear in the results, though I believe the batchnorm int64's remain. You might want to git pull and see where the updates put you.

glenn-jocher on 2 Jul 2020

Thanks @glenn-jocher, but same result. I'll try a diferent approach I'll try to use something diferent to OpenCV, but I don't really know how to use OpenVINO in anything else, that is why I wanted to use the model with OpenCV.

Thanks for your help, I think the problem is more in the plate of the OpenCV guys.

edurenye on 2 Jul 2020

@edurenye oh that's too bad. My output looks like this now, the original list of 6 int64's no longer appear:

cd yolov5
export PYTHONPATH="$PWD"  # add path
python models/export.py --weights yolov5s.pt --img 640 --batch 1  # export

Output is:

Namespace(batch_size=1, img_size=[640, 640], weights='yolov5s.pt')
TorchScript export success, saved as yolov5s.torchscript  # <-------- TorchScript exports first

Fusing layers...
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients
graph torch-jit-export (
  %images[FLOAT, 1x3x640x640]
) initializers (
  %model.0.conv.conv.bias[FLOAT, 32]
  %model.0.conv.conv.weight[FLOAT, 32x12x3x3]
  %model.1.conv.bias[FLOAT, 64]
  %model.1.conv.weight[FLOAT, 64x32x3x3]
  %model.10.conv.bias[FLOAT, 256]
  %model.10.conv.weight[FLOAT, 256x512x1x1]
...
  %650 = Gather[axis = 0](%648, %649)
  %653 = Unsqueeze[axes = [0]](%644)
  %656 = Unsqueeze[axes = [0]](%647)
  %657 = Unsqueeze[axes = [0]](%650)
  %658 = Concat[axis = 0](%653, %665, %666, %656, %657)
  %659 = Reshape(%569, %658)
  %660 = Transpose[perm = [0, 1, 3, 4, 2]](%659)
  return %output, %641, %660
}
ONNX export success, saved as yolov5s.onnx  # <-------- ONNX exports second
View with https://github.com/lutzroeder/netron

glenn-jocher on 2 Jul 2020

👀1

Strange, I tried again and I still have them :thinking:
What types have now in your exported model the Tensors %665 and %666?

I'm using docker, but inside I'm using my local code as a volume (so I can have the last version from master). Might be something to do with the container, the versions of something? I'm building the Dockerfile from this repo, I just uncommented this line:

#RUN pip install -r requirements.txt

Why is it commented in the Dockerfile?

edurenye on 2 Jul 2020

I don't have much to add unfortunately, but I'm having the same issue. Running in Google Colab & same error while importing into openCV.

TrInsanity on 2 Jul 2020

👍2

@TrInsanity are you seeing the same output as https://github.com/ultralytics/yolov5/issues/250#issuecomment-653102358 at least?

glenn-jocher on 2 Jul 2020

@TrInsanity are you seeing the same output as #250 (comment) at least?

Namespace(batch_size=1, img_size=[640, 640], weights='./weights/last_weights.pt')
TorchScript export success, saved as ./weights/last_weights.torchscript
Fusing layers...
Model Summary: 236 layers, 4.74077e+07 parameters, 4.48868e+07 gradients
graph torch-jit-export (
  %images[FLOAT, 1x3x640x640]
) initializers (
  %model.0.conv.conv.bias[FLOAT, 64]
  %model.0.conv.conv.weight[FLOAT, 64x12x3x3]
  %model.1.conv.bias[FLOAT, 128]
  %model.1.conv.weight[FLOAT, 128x64x3x3]
  %model.10.conv.bias[FLOAT, 512]
  %model.10.conv.weight[FLOAT, 512x1024x1x1]
  ...
  %output = Transpose[perm = [0, 1, 3, 4, 2]](%649)
  %651 = Shape(%606)
  %652 = Constant[value = <Scalar Tensor []>]()
  %653 = Gather[axis = 0](%651, %652)
  %654 = Shape(%606)
  %655 = Constant[value = <Scalar Tensor []>]()
  %656 = Gather[axis = 0](%654, %655)
  %657 = Shape(%606)
  %658 = Constant[value = <Scalar Tensor []>]()
  %659 = Gather[axis = 0](%657, %658)
  %660 = Constant[value = <Scalar Tensor []>]()
  %661 = Constant[value = <Scalar Tensor []>]()
  %662 = Unsqueeze[axes = [0]](%653)
  %663 = Unsqueeze[axes = [0]](%660)
  %664 = Unsqueeze[axes = [0]](%661)
  %665 = Unsqueeze[axes = [0]](%656)
  %666 = Unsqueeze[axes = [0]](%659)
  %667 = Concat[axis = 0](%662, %663, %664, %665, %666)
  %668 = Reshape(%606, %667)
  %669 = Transpose[perm = [0, 1, 3, 4, 2]](%668)
  %670 = Shape(%581)
  %671 = Constant[value = <Scalar Tensor []>]()
  %672 = Gather[axis = 0](%670, %671)
  %673 = Shape(%581)
  %674 = Constant[value = <Scalar Tensor []>]()
  %675 = Gather[axis = 0](%673, %674)
  %676 = Shape(%581)
  %677 = Constant[value = <Scalar Tensor []>]()
  %678 = Gather[axis = 0](%676, %677)
  %679 = Constant[value = <Scalar Tensor []>]()
  %680 = Constant[value = <Scalar Tensor []>]()
  %681 = Unsqueeze[axes = [0]](%672)
  %682 = Unsqueeze[axes = [0]](%679)
  %683 = Unsqueeze[axes = [0]](%680)
  %684 = Unsqueeze[axes = [0]](%675)
  %685 = Unsqueeze[axes = [0]](%678)
  %686 = Concat[axis = 0](%681, %682, %683, %684, %685)
  %687 = Reshape(%581, %686)
  %688 = Transpose[perm = [0, 1, 3, 4, 2]](%687)
  return %output, %669, %688
}
ONNX export success, saved as ./weights/last_weights.onnx
View with https://github.com/lutzroeder/netron

This is the output from the export command. Sorry, I'm not sure what I'm looking for but it looks similar to https://github.com/ultralytics/yolov5/issues/250#issuecomment-653102358

Edit: There's a few examples of lines with INT64 - all num_batches_tracked:
%model.9.bn.num_batches_tracked[INT64, scalar]

TrInsanity on 2 Jul 2020

@TrInsanity looks good, you are seeing the same thing then. The values you see there are batchnorm statistics which pytorch tracks in int64's to reduce the risk of overflow from very high numbers. @edurenye had a loop above to change these to int32 that may or may not address your problem.

glenn-jocher on 2 Jul 2020

After using export.py to get .onnx, I use
model=cv2.dnn.readNetFromONNX(.onnx)
error happens:
cv2.error: OpenCV(4.2.0) /io/opencv/modules/dnn/src/onnx/onnx_importer.cpp:134: error: (-215:Assertion failed) !field.empty() in function 'getMatFromTensor'

123liluky on 3 Jul 2020

I'm having the same issues. @edurenye what did you have as the rsetattr function?

JoshChristie on 6 Jul 2020

👀1

Hi @THINK989, that error looks completely unrelated to the one in this issue, as the INT64 error happens in the 'Concat' layer.

edurenye on 16 Jul 2020

Hello @edurenye

Yes Sorry about that, I got confirmation from OpenVino that the model is unsupported. I have deleted the comment.

THINK989 on 16 Jul 2020

Any updates on this issue or how to resolve them? I'm having the exact same problem.

ubicgothg on 4 Aug 2020

I will be great if this is resolved. this is the only good option to run Yolov5 on the edge.

wardeha on 12 Aug 2020

@edurenye Did you manage to solve this issue?

ubicgothg on 21 Aug 2020

No, I just avoided exporting in my project and had to use PyTorch on the edge with was not nice, but is what I had to do.

edurenye on 24 Aug 2020

😕6

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] on 24 Sep 2020

👎1

Was this page helpful?

0 / 5 - 0 ratings