Darknet: Resizing : keeping aspect ratio, or not

Created on 16 Oct 2017 · 11Comments · Source: AlexeyAB/darknet

Hi,

In your implementation of Darknet, when resizing the image, the aspect ratio of the image is not kept in the function _get_image_from_stream_resize_: https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/image.c#L627
used instead of the original resizing function _letterbox_image_into_ in the pjreddie version:
https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/image.c#L892
https://github.com/pjreddie/darknet/blob/532c6e1481e78ba0e27c56f4492a7e8d3cc36597/src/image.c#L913
(in the original repo)
that was resizing the image keeping the aspect ratio and putting it in a letterbox.

It is the same in the OpenCV implementation you pushed some days ago (no kept aspect ratio)
https://github.com/opencv/opencv/blob/73af899b7c737677f008b831c8e61eaeb2984342/samples/dnn/yolo_object_detection.cpp#L60

Why this difference?

It seems to me it is the old behavior of YOLO (v1) : https://github.com/pjreddie/darknet/blame/179ed8ec76f329eb22360440c3836fdcb2560330/src/demo.c#L44
Why didn't you update this behavior in the same way?

It seems to me than not keeping the aspect ratio would mean you HAVE TO use the same aspect ratio for the training images and the test/application images/videos (and I saw you writing that at different places)

Does this mean that a network trained with one version of YOLO isn't fully usable with the other one (having the same results)?

In my case, I have several training images, but some of them are in portrait orientation, while the application video will be a landscape orientation video. Does that mean I can't use them? (Only with your implementation not keeping aspect ratio, or in any case?)

Thank you

Explanations question

Source

iraadit

👍1

Most helpful comment

Yes, you are right, blobFromImage does it in the different way than Darknet.

But there are trade-off in any cases - below is an example of resizing an image in different ways:

blobFromImage(): object size 71 x 43, keeps aspect ratio, but part of image is lost (cropped)
letterbox_image(): the smallest object size 48 x 28, keeps aspect ratio and see whole image
resize_image(): object size 48 x 43, see whole image but doesn't keep aspect ratio

If I'm modifying your code to include the keeping of the aspect ratio, would you be interested in a Pull Request?

Known Yolo problem is a difficult to detect of small objects. And letterbox_image() has the smallest object size 48 x 28.

So yes, I'll apply your pull request, but I think there should be an if-branch that depends on command line flag, which allows us to use the current resize_image() version without keeping the aspect ratio, and letterbox_image() version with keeping the aspect ratio.

For example:

Original image:

Resized (416x416) with keeping aspect ratio - OpenCV blobFromImage():

Resized (416x416) with keeping aspect ratio - Darknet letterbox_image():

Resized (416x416) without keeping aspect ratio - this fork of Darknet resize_image():

AlexeyAB on 16 Oct 2017

👍26 ❤6

All 11 comments

Hi,

In the OpenCV version of Yolo you can keep aspect ratio right now, just replace this code:
https://github.com/opencv/opencv/blob/73af899b7c737677f008b831c8e61eaeb2984342/samples/dnn/yolo_object_detection.cpp#L58-L65

    //! [Resizing without keeping aspect ratio]
    cv::Mat resized;
    cv::resize(frame, resized, cv::Size(network_width, network_height));
    //! [Resizing without keeping aspect ratio]

    //! [Prepare blob]
    Mat inputBlob = blobFromImage(resized, 1 / 255.F);
    //! [Prepare blob]

to this:

    //! [Prepare blob]
    Mat inputBlob = blobFromImage(frame, 1 / 255.F, cv::Size(network_width, network_height)); //Convert Mat to batch of images
    //! [Prepare blob]

AlexeyAB on 16 Oct 2017

There are at least 3 versions of Yolo:

Yolo v1 with fully connected layers: https://pjreddie.com/darknet/yolov1/
Yolo v2 fully convolutional network yolo.2.0.cfg and yolo-voc.2.0.cfg - that used in my fork: https://arxiv.org/pdf/1612.08242.pdf
Yolo v2.x with keeping of aspect ratio - current (since 10 Apr 2017): https://pjreddie.com/darknet/yolo/

It seems to me it is the old behavior of YOLO (v1) : https://github.com/pjreddie/darknet/blame/179ed8ec76f329eb22360440c3836fdcb2560330/src/demo.c#L44

No. Yolo v1 used fully conected layers, file yolo_demo.c instead of demo.c and had to small accuracy, you can find Yolo v1 here: https://github.com/AlexeyAB/yolo-windows

This my fork fully corresponds to the Yolo v2 that uses yolo-voc.2.0.cfg or yolo.2.0.cfg and with accuracy 78.6 mAP (VOC 2007), 73.4 mAP (VOC 2012), 44.0 mAP (COCO - table 5): https://arxiv.org/abs/1612.08242

Yolo v2 released at 17 Nov 2016 (1 year ago): https://github.com/pjreddie/darknet/commit/c6afc7ff1499fbbe64069e1843d7929bd7ae2eaa
resize_image() replaced by letterbox_image() with keeping aspect ratio at 10 Apr 2017 (6 monthes ago): https://github.com/pjreddie/darknet/commit/8d9ed0a1d680c8d31e453e2e1cebfda66b357c11#diff-4e71a2cf0098713e52e5dae1dfd56c06L44

So now with keeping of aspect ratio we can get about 48.1 mAP (COCO) so it adds about +4.1 mAP for COCO: https://pjreddie.com/darknet/yolo/

Why didn't you update this behavior in the same way?

May be I will update it latter. Maybe soon Joseph will release a new version of Yolo with new improvements, and I'll add it all together.

This version of Yolo v2 works a bit worse with different aspects of the training and detection datasets, but it works. Aspect ratio invariance is achieved by using crop that depends on jitter parameter in .cfg-file: https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/data.c#L697

AlexeyAB on 16 Oct 2017

Thank you once again for your completer answer.

If I'm modifying your code to include the keeping of the aspect ratio, would you be interested in a Pull Request?

iraadit on 16 Oct 2017

For OpenCV, looking at the definition of blobFromImage, it appears to me that the behaviour is different to kept aspect ratio in pjreddie Darknet:

input image is resized so one side after resize is equal to corresponding dimension in size and another one is equal or larger. Then, crop from the center is performed.

It seems to resize and cut the parts of the image that aren't fitting in a square, instead of adding black margins (letterbox)

iraadit on 16 Oct 2017

Yes, you are right, blobFromImage does it in the different way than Darknet.

But there are trade-off in any cases - below is an example of resizing an image in different ways:

blobFromImage(): object size 71 x 43, keeps aspect ratio, but part of image is lost (cropped)
letterbox_image(): the smallest object size 48 x 28, keeps aspect ratio and see whole image
resize_image(): object size 48 x 43, see whole image but doesn't keep aspect ratio

If I'm modifying your code to include the keeping of the aspect ratio, would you be interested in a Pull Request?

Known Yolo problem is a difficult to detect of small objects. And letterbox_image() has the smallest object size 48 x 28.

For example:

Original image:

Resized (416x416) with keeping aspect ratio - OpenCV blobFromImage():

Resized (416x416) with keeping aspect ratio - Darknet letterbox_image():

Resized (416x416) without keeping aspect ratio - this fork of Darknet resize_image():

AlexeyAB on 16 Oct 2017

👍26 ❤6

Hi AlexeyAB ,
I found letterbox_image function in image.c file in your repo. But it seems that none of the files use this function. Also, get_image_from_stream_resize is also there in image.c and it has been used in demo.c. So if I replace get_image_from_resize by letterbox_image in demo.c will the aspect ratio be maintained by padding black margin as described above?

2598Nitz on 21 Jun 2018

@2598Nitz

Update your code from GitHub.

To use letterbox:

for video - set this flag to 1: https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/demo.c#L62
static int letter_box = 1;

for image - comment this line: https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/detector.c#L1121
and un-comment this line: https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/detector.c#L1122

AlexeyAB on 21 Jun 2018

Thanks for your reply.
My purpose is to implement letterbox function at both train and test time.I still have a few questions regarding resizing of images.

The above changes that you mentioned ,are they for training or for testing?If it is for testing what changes should I make for training on custom dataset to implement letterbox function?
I assume demo.c is for test time and detector.c is for training purpose.
There's one more line in detector.c file(line no. 495) where letterbox is defined.Is that for training purpose?Should I also set it to 1?
Also does resizing implementation vary for different versions of yolo?

2598Nitz on 21 Jun 2018

Thanks for your reply.
My purpose is to implement letterbox function at both train and test time.I still have a few questions regarding resizing of images.

The above changes that you mentioned ,are they for training or for testing?If it is for testing what changes should I make for training on custom dataset to implement letterbox function?
I assume demo.c is for test time and detector.c is for training purpose.

There's one more line in detector.c file(line no. 495) where letterbox is defined.Is that for training purpose?Should I also set it to 1?

Also does resizing implementation vary for different versions of yolo?

Bother to ask, do you have any answers of your questions? I'd really like to know, ty.

WhoIAmm on 26 Mar 2019

@2598Nitz

Update your code from GitHub.

To use letterbox:

for video - set this flag to 1:

[darknet/src/demo.c](https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/demo.c#L62)


   Line 62
in
[455b2fc](/AlexeyAB/darknet/commit/455b2fc06ff590914f3a1fb13261510d2b95de2e)







     static int letter_box = 0;

static int letter_box = 1;

for image - comment this line:

[darknet/src/detector.c](https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/detector.c#L1121)


   Line 1121
in
[455b2fc](/AlexeyAB/darknet/commit/455b2fc06ff590914f3a1fb13261510d2b95de2e)







     image sized = resize_image(im, net.w, net.h);

and un-comment this line:

[darknet/src/detector.c](https://github.com/AlexeyAB/darknet/blob/455b2fc06ff590914f3a1fb13261510d2b95de2e/src/detector.c#L1122)


   Line 1122
in
[455b2fc](/AlexeyAB/darknet/commit/455b2fc06ff590914f3a1fb13261510d2b95de2e)







     //image sized = letterbox_image(im, net.w, net.h); letterbox = 1;

This worked!Now Win performs as well as Ubuntu!

Git-yangzai on 8 Apr 2019

Shall I use a varied aspect ratio for training or not? finally, I have to do inference on a constant aspect ratio.and in the .cfg file what should be the width and height? if my training images are of size 150(width)x80(height) to 600(width)x200(height). I am using YOLOV4
thanks @AlexeyAB