Hi,
In your implementation of Darknet, when resizing the image, the aspect ratio of the image is not kept in the function _get_image_from_stream_resize_: https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/image.c#L627
used instead of the original resizing function _letterbox_image_into_ in the pjreddie version:
https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/image.c#L892
https://github.com/pjreddie/darknet/blob/532c6e1481e78ba0e27c56f4492a7e8d3cc36597/src/image.c#L913
(in the original repo)
that was resizing the image keeping the aspect ratio and putting it in a letterbox.
It is the same in the OpenCV implementation you pushed some days ago (no kept aspect ratio)
https://github.com/opencv/opencv/blob/73af899b7c737677f008b831c8e61eaeb2984342/samples/dnn/yolo_object_detection.cpp#L60
Why this difference?
It seems to me it is the old behavior of YOLO (v1) : https://github.com/pjreddie/darknet/blame/179ed8ec76f329eb22360440c3836fdcb2560330/src/demo.c#L44
Why didn't you update this behavior in the same way?
It seems to me than not keeping the aspect ratio would mean you HAVE TO use the same aspect ratio for the training images and the test/application images/videos (and I saw you writing that at different places)
Does this mean that a network trained with one version of YOLO isn't fully usable with the other one (having the same results)?
In my case, I have several training images, but some of them are in portrait orientation, while the application video will be a landscape orientation video. Does that mean I can't use them? (Only with your implementation not keeping aspect ratio, or in any case?)
Thank you
Hi,
In the OpenCV version of Yolo you can keep aspect ratio right now, just replace this code:
https://github.com/opencv/opencv/blob/73af899b7c737677f008b831c8e61eaeb2984342/samples/dnn/yolo_object_detection.cpp#L58-L65
//! [Resizing without keeping aspect ratio]
cv::Mat resized;
cv::resize(frame, resized, cv::Size(network_width, network_height));
//! [Resizing without keeping aspect ratio]
//! [Prepare blob]
Mat inputBlob = blobFromImage(resized, 1 / 255.F);
//! [Prepare blob]
to this:
//! [Prepare blob]
Mat inputBlob = blobFromImage(frame, 1 / 255.F, cv::Size(network_width, network_height)); //Convert Mat to batch of images
//! [Prepare blob]
There are at least 3 versions of Yolo:
It seems to me it is the old behavior of YOLO (v1) : https://github.com/pjreddie/darknet/blame/179ed8ec76f329eb22360440c3836fdcb2560330/src/demo.c#L44
No. Yolo v1 used fully conected layers, file yolo_demo.c instead of demo.c and had to small accuracy, you can find Yolo v1 here: https://github.com/AlexeyAB/yolo-windows
This my fork fully corresponds to the Yolo v2 that uses yolo-voc.2.0.cfg or yolo.2.0.cfg and with accuracy 78.6 mAP (VOC 2007), 73.4 mAP (VOC 2012), 44.0 mAP (COCO - table 5): https://arxiv.org/abs/1612.08242
Yolo v2 released at 17 Nov 2016 (1 year ago): https://github.com/pjreddie/darknet/commit/c6afc7ff1499fbbe64069e1843d7929bd7ae2eaa
resize_image() replaced by letterbox_image() with keeping aspect ratio at 10 Apr 2017 (6 monthes ago): https://github.com/pjreddie/darknet/commit/8d9ed0a1d680c8d31e453e2e1cebfda66b357c11#diff-4e71a2cf0098713e52e5dae1dfd56c06L44
So now with keeping of aspect ratio we can get about 48.1 mAP (COCO) so it adds about +4.1 mAP for COCO: https://pjreddie.com/darknet/yolo/
Why didn't you update this behavior in the same way?
May be I will update it latter. Maybe soon Joseph will release a new version of Yolo with new improvements, and I'll add it all together.
This version of Yolo v2 works a bit worse with different aspects of the training and detection datasets, but it works. Aspect ratio invariance is achieved by using crop that depends on jitter parameter in .cfg-file: https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/data.c#L697
Thank you once again for your completer answer.
If I'm modifying your code to include the keeping of the aspect ratio, would you be interested in a Pull Request?
For OpenCV, looking at the definition of blobFromImage, it appears to me that the behaviour is different to kept aspect ratio in pjreddie Darknet:
input image is resized so one side after resize is equal to corresponding dimension in size and another one is equal or larger. Then, crop from the center is performed.
It seems to resize and cut the parts of the image that aren't fitting in a square, instead of adding black margins (letterbox)
Yes, you are right, blobFromImage does it in the different way than Darknet.
But there are trade-off in any cases - below is an example of resizing an image in different ways:
blobFromImage(): object size 71 x 43, keeps aspect ratio, but part of image is lost (cropped)letterbox_image(): the smallest object size 48 x 28, keeps aspect ratio and see whole imageresize_image(): object size 48 x 43, see whole image but doesn't keep aspect ratioIf I'm modifying your code to include the keeping of the aspect ratio, would you be interested in a Pull Request?
Known Yolo problem is a difficult to detect of small objects. And letterbox_image() has the smallest object size 48 x 28.
So yes, I'll apply your pull request, but I think there should be an if-branch that depends on command line flag, which allows us to use the current resize_image() version without keeping the aspect ratio, and letterbox_image() version with keeping the aspect ratio.
For example:

blobFromImage():
letterbox_image():
resize_image():
Hi AlexeyAB ,
I found letterbox_image function in image.c file in your repo. But it seems that none of the files use this function. Also, get_image_from_stream_resize is also there in image.c and it has been used in demo.c. So if I replace get_image_from_resize by letterbox_image in demo.c will the aspect ratio be maintained by padding black margin as described above?
@2598Nitz
Update your code from GitHub.
To use letterbox:
static int letter_box = 1;Thanks for your reply.
My purpose is to implement letterbox function at both train and test time.I still have a few questions regarding resizing of images.
The above changes that you mentioned ,are they for training or for testing?If it is for testing what changes should I make for training on custom dataset to implement letterbox function?
I assume demo.c is for test time and detector.c is for training purpose.
There's one more line in detector.c file(line no. 495) where letterbox is defined.Is that for training purpose?Should I also set it to 1?
Also does resizing implementation vary for different versions of yolo?
Thanks for your reply.
My purpose is to implement letterbox function at both train and test time.I still have a few questions regarding resizing of images.
- The above changes that you mentioned ,are they for training or for testing?If it is for testing what changes should I make for training on custom dataset to implement letterbox function?
I assume demo.c is for test time and detector.c is for training purpose.- There's one more line in detector.c file(line no. 495) where letterbox is defined.Is that for training purpose?Should I also set it to 1?
- Also does resizing implementation vary for different versions of yolo?
Bother to ask, do you have any answers of your questions? I'd really like to know, ty.
@2598Nitz
Update your code from GitHub.
To use letterbox:
for video - set this flag to 1:
[darknet/src/demo.c](https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/demo.c#L62) Line 62 in [455b2fc](/AlexeyAB/darknet/commit/455b2fc06ff590914f3a1fb13261510d2b95de2e) static int letter_box = 0;
static int letter_box = 1;for image - comment this line:
[darknet/src/detector.c](https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/detector.c#L1121) Line 1121 in [455b2fc](/AlexeyAB/darknet/commit/455b2fc06ff590914f3a1fb13261510d2b95de2e) image sized = resize_image(im, net.w, net.h);and un-comment this line:
[darknet/src/detector.c](https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/detector.c#L1122) Line 1122 in [455b2fc](/AlexeyAB/darknet/commit/455b2fc06ff590914f3a1fb13261510d2b95de2e) //image sized = letterbox_image(im, net.w, net.h); letterbox = 1;
This worked!Now Win performs as well as Ubuntu!
Shall I use a varied aspect ratio for training or not? finally, I have to do inference on a constant aspect ratio.and in the .cfg file what should be the width and height? if my training images are of size 150(width)x80(height) to 600(width)x200(height). I am using YOLOV4
thanks @AlexeyAB
Most helpful comment
Yes, you are right,
blobFromImagedoes it in the different way than Darknet.But there are trade-off in any cases - below is an example of resizing an image in different ways:
blobFromImage(): object size 71 x 43, keeps aspect ratio, but part of image is lost (cropped)letterbox_image(): the smallest object size 48 x 28, keeps aspect ratio and see whole imageresize_image(): object size 48 x 43, see whole image but doesn't keep aspect ratioKnown Yolo problem is a difficult to detect of small objects. And
letterbox_image()has the smallest object size 48 x 28.So yes, I'll apply your pull request, but I think there should be an if-branch that depends on command line flag, which allows us to use the current
resize_image()version without keeping the aspect ratio, andletterbox_image()version with keeping the aspect ratio.For example:
blobFromImage():letterbox_image():resize_image():