Yolov5: YOLOv5 in LibTorch produce different results

Created on 6 Jul 2020 · 22Comments · Source: ultralytics/yolov5

🐛 Bug

A clear and concise description of what the bug is.

I followed https://gist.github.com/jakepoz/eb36163814a8f1b6ceb31e8addbba270 to derive the script model.

In my C++ code and my python code, I tested the same picture, I checked that the input tensors were the same after pre-processing of the picture, but the model output is different.

To Reproduce (REQUIRED)

the picture shape is (channel = 3, height = 360, width = 640)

python Input:

import cv2
img_path = 'test.png'
img = cv2.imread(img_path)
img = letterbox(img, new_shape = (640,640))[0]
img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB
img = np.ascontiguousarray(img)
img = torch.from_numpy(img).to(device).float()
img /= 255.0  # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
    img = img.unsqueeze(0)#shape(1,3,384,640)
pred = model(img, augment=False)
print(pred[0].shape)

Python Output:

torch.Size([1, 15120, 85])

C++ input

string img_path = "test.png";
  Mat img = imread(img_path);
  img = letterbox(img);//resize
  cvtColor(img, img, CV_BGR2RGB);// bgr->rgb
  img.convertTo(img, CV_32FC3, 1.0f / 255.0f);// 1/255
  auto tensor_img = torch::from_blob(img.data, {img.rows, img.cols, img.channels()});
  tensor_img = tensor_img.permute({2, 0, 1});
  tensor_img = tensor_img.unsqueeze(0);
  cout << "line 111, tensor size is " << tensor_img.sizes() << endl;//(1,3,384,640)

  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(tensor_img);
  torch::jit::IValue output = model.forward(inputs);

  auto op = output.toList().get(0).toTensor();

  cout << "line 133, op[0] is " << op.sizes() << endl;

C++ output

output tensor shape  [1, 3, 48, 80, 85],
and 3*48*80 = 11520 != 15120

Expected behavior

A clear and concise description of what you expected to happen.

I would wish the model output in C++ will be the same as it in python.

Environment

OS: [Ubuntu]
CPU

Stale

Source

zherlock030

👀1

Most helpful comment

@zherlock030 hi no worries about open sourcing your work! The only requirement is that you retain the current GPL3 license on modifications.

We eventually want to open source 100% of everything, including the export pipelines and the iDetection iOS app source code. We are trying to adjust our business model to make this happen either later this year or next year.

glenn-jocher on 24 Jul 2020

👍3

All 22 comments

Hello @zherlock030, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

github-actions[bot] on 6 Jul 2020

@jakepoz

zherlock030 on 6 Jul 2020

We have updated export.py to support torchscript export now, among others. The tutorial is here: https://github.com/ultralytics/yolov5/issues/251

Note that these are simple examples to get you started. Actual export and deployment (to an edge device) for example is a very complicated journey. We have not open sourced the entire process, but we do offer paid support in this area. If you have a business need let us know and we'd be happy to help you!

glenn-jocher on 6 Jul 2020

@zherlock030, this is because the final Detect layer in yolov5 is undoing the action of yolo's "anchor" system when in regular operations, but this is not being exported in the export script:

https://github.com/pjreddie/darknet/issues/568

Unfortunately, I have not yet figured out the details here, it seems as if some of the variables like the self.anchors and self.anchor_grid are stored as registered parameters, but self.strides is not, and I have difficulty exporting the model with the anchor code turned on.

jakepoz on 7 Jul 2020

👍1

@zherlock030 @jakepoz do you solve solve the problem? I meet the same. looking forward to your reply. thank you .

winself on 23 Jul 2020

@zherlock030 @jakepoz do you solve solve the problem? I meet the same. looking forward to your reply. thank you .

@winself
Think I have made it. Im not sure if I should open source it since @glenn-jocher have his concern.
I could share my code with u.

zherlock030 on 24 Jul 2020

@zherlock030, this is because the final Detect layer in yolov5 is undoing the action of yolo's "anchor" system when in regular operations, but this is not being exported in the export script:

pjreddie/darknet#568

Unfortunately, I have not yet figured out the details here, it seems as if some of the variables like the self.anchors and self.anchor_grid are stored as registered parameters, but self.strides is not, and I have difficulty exporting the model with the anchor code turned on.

@jakepoz
Thanks for your reply. I just treats self.strides as constants. And for now it produces reasonable results as it does in python.

zherlock030 on 24 Jul 2020

We have updated export.py to support torchscript export now, among others. The tutorial is here: #251

Note that these are simple examples to get you started. Actual export and deployment (to an edge device) for example is a very complicated journey. We have not open sourced the entire process, but we do offer paid support in this area. If you have a business need let us know and we'd be happy to help you!

Thanks for ur reply. Think I have made it, yolov5s is so fast.

zherlock030 on 24 Jul 2020

@zherlock030 hi no worries about open sourcing your work! The only requirement is that you retain the current GPL3 license on modifications.

glenn-jocher on 24 Jul 2020

👍3

@zherlock030 Thanks for ur reply!!! " self.training |= self.export" cause this results. when export is True -> training is True. so the Torchscript product the training output. we need write some code to process the result . Is this right ?

winself on 27 Jul 2020

@zherlock030 Thanks for ur reply!!! " self.training |= self.export" cause this results. when export is True -> training is True. so the Torchscript product the training output. we need write some code to process the result . Is this right ?

yeah, we need to write code for image preprocess, detect layer and nms.
U can see my implementation in https://github.com/zherlock030/YOLOv5_Torchscript.

zherlock030 on 27 Jul 2020

Hello,
I also interested in running Yolov5 in C++. @zherlock030 when you run yolov5, how many GB of GPU do you use? Is it lower than running in python?
Thanks

phamdat09 on 28 Jul 2020

@zherlock030,Hi!! It's similar to you that I also write the nms.cpp code. I used the official export.py to export the torchscript files,the output op is a tensor of [1,gridx, gridy, 9], however, the 9 vector is totally wrong. Does the exported torchscript files is not right?? Do I need any modificaton? Because I find the warnning words :TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. in the torch.jit.trace

easycome2009 on 30 Jul 2020

@easycome2009 what I did is set "model.model[-1].export = False" in exprot.py line#28, I get similar result from python and c++

yasenh on 30 Jul 2020

@phamdat09 hi, actually Im using a CPU.

zherlock030 on 5 Aug 2020

@easycome2009 yes, when u run export.py, u need to modify the detect layer, let it just output the imputed list 'x', and then implement detect layer in ur c++ code.

zherlock030 on 5 Aug 2020

@yasenh yes, I tried that too, that way we can't feed the network pictures in different shapes.

zherlock030 on 5 Aug 2020

@zherlock030, here is my implementation just FYI: https://github.com/yasenh/libtorch-yolov5
The image will be padded to fix size e.g (640, 640)

yasenh on 5 Aug 2020

👍2

@yasenh yeah I know what u mean, but actually with function letterbox, image in any shape can be feeded to yolo.

zherlock030 on 5 Aug 2020

@yasenh yeah I know what u mean, but actually with function letterbox, image in any shape can be feeded to yolo.

I think you can still do that, but I think the benefit of padding images to same size is that we can process images as a batch. Otherwise you might need to process images with different sizes one by one.

yasenh on 5 Aug 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] on 5 Sep 2020

C++ output

output tensor shape  [1, 3, 48, 80, 85],
and 3*48*80 = 11520 != 15120

That's because the c++ output is a list [(1, 3, height / 8, width / 8, 6), (1, 3, height / 16, width / 16, 6), (1, 3, height / 16, width / 16, 6)], while the python output is a tuple ([1, num_anchors, 6], [(1, 3, height / 8, width / 8, 6), (1, 3, height / 16, width / 16, 6), (1, 3, height / 16, width / 16, 6)]).

In your case: 3*(48*80+24*40+12*20) == 15120