Yolov3: How to visualize the output of onnx exported model?

Created on 10 Sep 2019 · 21Comments · Source: ultralytics/yolov3

I train the model on VOC dataset and it works fine. Now I export the model to onnx and want to build a TensorRT engine using this exported model.

        elif ONNX_EXPORT:
            output = torch.cat(output, 1)  # cat 3 layers 85 x (507, 2028, 8112) to 85 x 10647
            nc = self.module_list[self.yolo_layers[0]].nc  # number of classes
            return output[5:5 + nc].t(), output[:4].t()  # ONNX scores, boxes

But when it comes to draw prediction boxes, I am confused at the output format of onnx.
How to do nms on these two output tensors? I try to concatenate them and pass it to non_max_suppression but it get no detections. Could someone provide an example on how to process the output? How to draw detection box using these two tensors? @glenn-jocher Thank you!

Source

Sephirot1st

👍4

Most helpful comment

@monocongo - I am also interested in training a YOLOv3 on my own data and then running it using Tensorrt. From your posts in this issue and another one, it looks like there is no simple or well documented process. Wondering if you have managed to get clarity on the steps involved. Will appreciate any insights you have.

priya-dwivedi on 1 Oct 2019

👍2

All 21 comments

I download the yolov3-tiny weights and export it to onnx, then extract the output tensors of yolo layers:

Pytorch model:
tensor([[[2.23289e+01, 2.39876e+01, 6.23362e+01,  ..., 2.82642e-03, 8.05851e-04, 6.14768e-04],
         [4.39845e+01, 2.45797e+01, 6.13970e+01,  ..., 6.47999e-03, 2.88799e-03, 2.90063e-03],
         [7.74273e+01, 2.18662e+01, 7.33655e+01,  ..., 6.64331e-03, 2.38184e-03, 6.70278e-04],
         ...,
         [3.29765e+02, 3.99280e+02, 2.80741e+02,  ..., 5.06803e-04, 3.05751e-04, 1.04740e-04],
         [3.60454e+02, 3.99895e+02, 2.81256e+02,  ..., 3.78499e-03, 9.65613e-04, 4.63651e-04],
         [4.01170e+02, 4.01764e+02, 2.98211e+02,  ..., 1.62020e-02, 3.41212e-03, 3.65535e-03]]], device='cuda:0') torch.Size([1, 507, 85])

tensor([[[8.52021e+00, 5.94188e+00, 1.88654e+01,  ..., 6.57246e-04, 2.18668e-04, 2.22751e-04],
         [2.22569e+01, 5.04309e+00, 2.80985e+01,  ..., 2.69663e-04, 1.32480e-04, 5.18955e-05],
         [3.80070e+01, 4.84328e+00, 2.73324e+01,  ..., 2.15795e-03, 4.77015e-04, 7.30404e-05],
         ...,
         [3.75732e+02, 4.07327e+02, 8.67790e+01,  ..., 2.74969e-03, 2.55870e-03, 7.53397e-04],
         [3.89625e+02, 4.07777e+02, 7.81029e+01,  ..., 4.46527e-03, 3.85391e-03, 1.18331e-03],
         [4.09011e+02, 4.07866e+02, 7.52183e+01,  ..., 7.46385e-03, 9.41188e-03, 2.10419e-03]]], device='cuda:0') torch.Size([1, 2028, 85])


Onnx exported model:
tensor([[5.36753e-02, 5.76624e-02, 1.49847e-01,  ..., 1.99781e-08, 5.68447e-09, 4.33576e-09],
        [1.05732e-01, 5.90859e-02, 1.47589e-01,  ..., 4.18286e-08, 1.85750e-08, 1.86565e-08],
        [1.86123e-01, 5.25630e-02, 1.76359e-01,  ..., 6.23207e-09, 2.22485e-09, 6.25028e-10],
        ...,
        [7.92705e-01, 9.59808e-01, 6.74858e-01,  ..., 3.57216e-11, 2.15464e-11, 7.37964e-12],
        [8.66476e-01, 9.61286e-01, 6.76097e-01,  ..., 3.77220e-11, 9.59645e-12, 4.60555e-12],
        [9.64350e-01, 9.65778e-01, 7.16854e-01,  ..., 2.40635e-10, 5.00275e-11, 5.36069e-11]]) torch.Size([507, 85])

tensor([[2.04813e-02, 1.42834e-02, 4.53495e-02,  ..., 4.16441e-10, 1.38491e-10, 1.41077e-10],
        [5.35022e-02, 1.21228e-02, 6.75444e-02,  ..., 1.65168e-10, 8.11323e-11, 3.17788e-11],
        [9.13629e-02, 1.16425e-02, 6.57029e-02,  ..., 1.67914e-10, 3.70549e-11, 5.67155e-12],
        ...,
        [9.03201e-01, 9.79152e-01, 2.08603e-01,  ..., 2.85656e-09, 2.65763e-09, 7.81119e-10],
        [9.36598e-01, 9.80234e-01, 1.87748e-01,  ..., 1.70294e-09, 1.46888e-09, 4.49800e-10],
        [9.83200e-01, 9.80448e-01, 1.80813e-01,  ..., 4.50132e-10, 5.68730e-10, 1.26219e-10]]) torch.Size([2028, 85])

They are so different, does onnx output need some conversion to get the prediction?

Sephirot1st on 11 Sep 2019

Hello, I am also wanting to build a engine by TensorRT on my TX2, but I have facing some bug--when I try to build the engine, TensorRT tells "ERROR: Network must have at least one outout".Have you ever faced such a problem?If it's possible, do you want to share your script that translate onnx file to trt file?
Thank you very much!@Sephirot1st

linsongxue on 16 Sep 2019

QQ截图20190916191343
Maybe this?
The author make three kinds of output to fit the train model,inference model and onnx model.
This code from "models.py"

linsongxue on 16 Sep 2019

@Sephirot1st Hello, Can you tell me the mAP you trained on VOC dataset? thx.

eleflea on 18 Sep 2019

I'm also interested in training this model on my own dataset of images/annotations and then exporting to ONNX for use as input to TensorRT. From what I can see here there is not (yet) a straight-forward process to do so. Once this issue has been resolved I will be very interested to find out what the recipe is to get from A to Z, please update here if this ever gets worked out so I can benefit (and thanks in advance for all of the effort that will go into ironing out this process, let me know if I can help).

monocongo on 20 Sep 2019

👍1

Same problem, I do not understand, how can I view the onnx model detections at all. Actually, is it possible to directly injest NMS into last layers of the models? Because right now I am trying to export model to tensorflow-serving via ONNX-model and it would be really nice, if the model had the pretty output

AlexKaravaev on 27 Sep 2019

@AlexKaravaev NMS is not typically part of a model, is it applied after inference to reduce redundant detections. In this repo we have a function in utils/utils.py which applies it. In our iOS iDetection app, we pipeline an NMS operation after YOLOv3 to produce the results seen in the app.

glenn-jocher on 27 Sep 2019

@glenn-jocher Thank you for such a quick reply, Glenn!
Yes, I have seen that function, for me the main problem is that it uses pytorch and I want to at least be able to do NMS not in torch environment(smth lightweight like numpy for example) or cut the post-processing at all(for example with incapsulating model with tf-serving). Do you think it is possible? Or the best thing to do is to re-write this NMS function with numpy?

AlexKaravaev on 27 Sep 2019

@AlexKaravaev oh, yes of course, you can just rewrite the function with numpy then. It's almost the same as pytorch.

glenn-jocher on 27 Sep 2019

@glenn-jocher Got it, thank you for getting back to me, Glenn!!

AlexKaravaev on 27 Sep 2019

@AlexKaravaev哦，是的，当然，您可以使用numpy重写该函数。它与pytorch几乎相同。

l use tensorrt's the function with numpy,but it is too slow.Can you help me.

17702513221 on 28 Sep 2019

@17702513221 Well, how can I help you without knowing anything from your code

AlexKaravaev on 30 Sep 2019

priya-dwivedi on 1 Oct 2019

👍2

@priya-dwivedi I have not had good luck with transfer learning using this repository whereas it worked well when using darknet so I may go back to using that instead. My understanding, however, is that once training of this model on a custom dataset is complete the next step is to convert the *.pt weights from the training (transfer learning) into a corresponding ONNX model. Once you have an ONNX file it can be converted to a TensorRT engine and then used by an inference plugin within DeepStream. I apologize I can't offer too much more than that, as I've not managed to go through the process successfully yet myself.

BTW NVIDIA has recently released its Transfer Learning Toolkit which will hopefully facilitate the process of transfer learning on a model and then utilizing the result as a TenosrRT engine, you may want to look into that. It would be interesting to see if this implementation of YOLOv3 can fit into that workflow.

monocongo on 1 Oct 2019

Thanks @monocongo for the detailed response.

priya-dwivedi on 1 Oct 2019

@monocongo @priya-dwivedi this shows the coco_64img.data tutorial starting from a few different options, including transfer learning. Transfer learning seems to be a not very useful hyped term to me, if you are going to start from a fully trained network, train normally as the plots below show. You can replicate these results with this code and looking at the resultant results.png file.

python3 train.py --arc default --data data/coco_16img.data --batch-size 16 --accumulate 1 --nosave --weights weights/yolov3-spp.weights --transfer  --name yolov3-spp_transfer  # TRANSFER LEARNING COMPARISON
python3 train.py --arc default --data data/coco_16img.data --batch-size 16 --accumulate 1 --nosave --name from_scratch
python3 train.py --arc default --data data/coco_16img.data --batch-size 16 --accumulate 1 --nosave --weights weights/darknet53.conv.74 --name darknet53_backbone
python3 train.py --arc default --data data/coco_16img.data --batch-size 16 --accumulate 1 --nosave --weights weights/yolov3-spp.weights --name yolov3-spp_backbone

results

glenn-jocher on 1 Oct 2019

Thanks for your help, @glenn-jocher. I don't fully understand your comment, it seems that what you're suggesting is to just train the model without the --transfer option. Is this correct? If this is the right way to go, and it appears to be based upon the above graphs, then what purpose does the transfer option serve? To be clear what I'm trying to accomplish is to take the weights from training on COCO as a starting point and then further refine/train the model for detection of only the object classes in my custom dataset (this allows me to skip the long/expensive training on COCO).

monocongo on 1 Oct 2019

@monocongo transfer learning freezes all layers except the output layers. This allows you to train a mediocre model quickly in a resource starved environment, such as an edge device.

glenn-jocher on 2 Oct 2019

@Sephirot1st
Did you solve the problem of different outputs? I have got the same error. The output shape of onnx model and torch model are the same but the contents are different for some values.

The torch model output:

tensor([[[ 16.84095, 16.47220, 56.58881, ..., -2.55992, -3.06906, -2.42858], [ 46.63834, 13.54175, 101.19768, ..., -1.88477, -2.48154, -1.97134], [ 78.64623, 15.68744, 129.09305, ..., -1.94273, -2.21519, -1.96026], ..., [587.65106, 603.65839, 36.97690, ..., -2.95302, -2.95624, -2.13854], [596.13910, 603.38928, 28.42757, ..., -2.67128, -2.73842, -1.90765], [602.78510, 603.08624, 21.24530, ..., -3.23338, -3.31461, -2.42249]]], grad_fn=<CatBackward>)

The onnx model output:

tensor([[[ 0.10522, 0.05905, -0.71778, ..., -2.55992, -3.06906, -2.42858], [-0.17062, -0.30973, -0.13651, ..., -1.88476, -2.48154, -1.97134], [-0.16963, -0.03908, 0.10694, ..., -1.94273, -2.21519, -1.96026], ..., [-0.17493, -0.17122, 0.11379, ..., -2.95302, -2.95624, -2.13854], [ 0.06959, -0.30777, -0.14915, ..., -2.67128, -2.73842, -1.90765], [-0.62723, -0.46508, -0.44037, ..., -3.23338, -3.31461, -2.42249]]])

Testing the output tensors:
np.testing.assert_allclose(torch_out.detach().numpy(), onnx_out.detach().numpy(), rtol=1e-03, atol=1e-05)

result:

AssertionError:
Not equal to tolerance rtol=0.001, atol=1e-05
Mismatch: 2.42%
Max absolute difference: 655.71
Max relative difference: 1.0425e+10
x: array([[[ 16.841, 16.472, 56.589, ..., -2.5599, -3.0691, -2.4286],
[ 46.638, 13.542, 101.2, ..., -1.8848, -2.4815, -1.9713],
[ 78.646, 15.687, 129.09, ..., -1.9427, -2.2152, -1.9603],...
y: array([[[ 0.10522, 0.059047, -0.71778, ..., -2.5599, -3.0691, -2.4286],
[ -0.17062, -0.30973, -0.13651, ..., -1.8848, -2.4815, -1.9713],
[ -0.16963, -0.039078, 0.10694, ..., -1.9427, -2.2152, -1.9603],...

I download the yolov3-tiny weights and export it to onnx, then extract the output tensors of yolo layers:

Pytorch model:
tensor([[[2.23289e+01, 2.39876e+01, 6.23362e+01,  ..., 2.82642e-03, 8.05851e-04, 6.14768e-04],
         [4.39845e+01, 2.45797e+01, 6.13970e+01,  ..., 6.47999e-03, 2.88799e-03, 2.90063e-03],
         [7.74273e+01, 2.18662e+01, 7.33655e+01,  ..., 6.64331e-03, 2.38184e-03, 6.70278e-04],
         ...,
         [3.29765e+02, 3.99280e+02, 2.80741e+02,  ..., 5.06803e-04, 3.05751e-04, 1.04740e-04],
         [3.60454e+02, 3.99895e+02, 2.81256e+02,  ..., 3.78499e-03, 9.65613e-04, 4.63651e-04],
         [4.01170e+02, 4.01764e+02, 2.98211e+02,  ..., 1.62020e-02, 3.41212e-03, 3.65535e-03]]], device='cuda:0') torch.Size([1, 507, 85])

tensor([[[8.52021e+00, 5.94188e+00, 1.88654e+01,  ..., 6.57246e-04, 2.18668e-04, 2.22751e-04],
         [2.22569e+01, 5.04309e+00, 2.80985e+01,  ..., 2.69663e-04, 1.32480e-04, 5.18955e-05],
         [3.80070e+01, 4.84328e+00, 2.73324e+01,  ..., 2.15795e-03, 4.77015e-04, 7.30404e-05],
         ...,
         [3.75732e+02, 4.07327e+02, 8.67790e+01,  ..., 2.74969e-03, 2.55870e-03, 7.53397e-04],
         [3.89625e+02, 4.07777e+02, 7.81029e+01,  ..., 4.46527e-03, 3.85391e-03, 1.18331e-03],
         [4.09011e+02, 4.07866e+02, 7.52183e+01,  ..., 7.46385e-03, 9.41188e-03, 2.10419e-03]]], device='cuda:0') torch.Size([1, 2028, 85])


Onnx exported model:
tensor([[5.36753e-02, 5.76624e-02, 1.49847e-01,  ..., 1.99781e-08, 5.68447e-09, 4.33576e-09],
        [1.05732e-01, 5.90859e-02, 1.47589e-01,  ..., 4.18286e-08, 1.85750e-08, 1.86565e-08],
        [1.86123e-01, 5.25630e-02, 1.76359e-01,  ..., 6.23207e-09, 2.22485e-09, 6.25028e-10],
        ...,
        [7.92705e-01, 9.59808e-01, 6.74858e-01,  ..., 3.57216e-11, 2.15464e-11, 7.37964e-12],
        [8.66476e-01, 9.61286e-01, 6.76097e-01,  ..., 3.77220e-11, 9.59645e-12, 4.60555e-12],
        [9.64350e-01, 9.65778e-01, 7.16854e-01,  ..., 2.40635e-10, 5.00275e-11, 5.36069e-11]]) torch.Size([507, 85])

tensor([[2.04813e-02, 1.42834e-02, 4.53495e-02,  ..., 4.16441e-10, 1.38491e-10, 1.41077e-10],
        [5.35022e-02, 1.21228e-02, 6.75444e-02,  ..., 1.65168e-10, 8.11323e-11, 3.17788e-11],
        [9.13629e-02, 1.16425e-02, 6.57029e-02,  ..., 1.67914e-10, 3.70549e-11, 5.67155e-12],
        ...,
        [9.03201e-01, 9.79152e-01, 2.08603e-01,  ..., 2.85656e-09, 2.65763e-09, 7.81119e-10],
        [9.36598e-01, 9.80234e-01, 1.87748e-01,  ..., 1.70294e-09, 1.46888e-09, 4.49800e-10],
        [9.83200e-01, 9.80448e-01, 1.80813e-01,  ..., 4.50132e-10, 5.68730e-10, 1.26219e-10]]) torch.Size([2028, 85])

They are so different, does onnx output need some conversion to get the prediction?

Salehoof on 1 Jan 2020

@Salehoof yes this ONNX output is correct. These are the 4 box coordinates and the 80 class confidences, which have been multiplied by the objectness confidence.

See #653 for a stricter conversion which retains the 85 outputs.

glenn-jocher on 1 Jan 2020

I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.

glenn-jocher on 16 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings