Darknet: Can Yolo3 take different width-height-ratio images as training input?

Created on 18 May 2018 · 10Comments · Source: pjreddie/darknet

Images from VOC or some other datasets does not share exactly the same width-height ratio. For example, in VOC2012 some images are 334x500, some are 500x332, some are 486x500. In KITTY dataset, the width is always roughly 3 times of the height (1200x300).

I don't see any fully connected layers in yolo3. Does it mean that yolo3 can take different width-height-ration images as yolo3's training input?

Or do I need to crop images to the same size or apply SPP-Net technique to yolo3 before training. If SPP-Net is needed, before which yolo3 layer shall I apply the SPP-Net?

Source

jiqiyang

👍2

Most helpful comment

@danieltwx Hi, you shouldn't resize images.

AlexeyAB on 4 Mar 2019

👍8 😄2

All 10 comments

Yolo v1/v2/v3 can take different width/height/ratio of images as training/validation/test input.
Fully connected layers doesn't make network invariant to aspect-ratio. Fully connected layers only increase receptive field of each of final activation to the full-image-size. But in the Yolo v3 each final activation in the first [yolo] layer already has large receptive field.

AlexeyAB on 18 May 2018

Thank you, @AlexeyAB .

Question 1

I also notice that there is a method of configuration on "dim" in darknet/src/detector.c#L87.

just change these 2 lines to: https://github.com/AlexeyAB/darknet/blob/5bc62b14e06a3fcfda4e3a19fba77589920eddee/src/detector.c#L87

    args.w = dim*2;    
    resize_network(nets + i, dim*2, dim);

(https://groups.google.com/forum/#!topic/darknet/HrkhOhxCgLk).

If Yolo v1/v2/v3 can take different width/height/ratio of images as training/validation/test input, then what is the point of configuring something like "dim*2"?

Does it mean that I should just keep it as the original when I am combining different width/height/ratio images as my training data?

args.w = dim;
args.h = dim;

for(i = 0; i < ngpus; ++i){
                resize_network(nets[i], dim, dim);
            }

Question 2

I am also confused by another thing. The instruction from Google groups mention that the "detector.c" should be in the src folder (https://github.com/AlexeyAB/darknet/tree/5bc62b14e06a3fcfda4e3a19fba77589920eddee/src), however I can only find "detector.c" from the examples folder (https://github.com/pjreddie/darknet/tree/master/examples). Should I just leave the detector.c in the examples folder if I am using pjreddie's yolo3 repo (https://github.com/pjreddie/darknet)?

jiqiyang on 18 May 2018

Very simply put, Yolo can take different width/height/ratio of images as input data.
But the more width/height/ratio different (in training and testing datasets) - the worse it detect.
To avoid this, there is data augmentation:
jitter - randomly resizes image
random - randomly resizes network resize_network(nets[i], dim, dim);
Joseph moved it from src to the example folder, so just keep it there.

AlexeyAB on 19 May 2018

❤7

@AlexeyAB
Hi AlexeyAB, does it means that i do not have to resize my images as the network has
jitter - randomly resizes image
random - randomly resizes network resize_network(nets[i], dim, dim);

I am planning to train images on drones using YOLOv3, would want to ask if resizing the image would help the detector become more accurate. If so, what size would be the recommended size.

Appreciate ur help thank you!

danieltwx on 4 Mar 2019

@danieltwx Hi, you shouldn't resize images.

AlexeyAB on 4 Mar 2019

👍8 😄2

@AlexeyAB
I have trained the network with cfg [net]: width=416, height=416, training dataset provided to network are of different different sizes as suggested by everyone to not resize images while training, network will do itself. (training done with loss=0.5)

while prediction some of the images are in ratio of 1:3, will model work for it, or do i need to resize to 1:1?

Appreciate ur help thank you!

amankumarjain on 2 Apr 2019

👍2

@AlexeyAB
Hi Alexey,

I'm training YOLOv3 on a dataset with just 1 object to be detected and classified per image (classes=4). The object is a rectangle that takes 80-95% of the image space almost always (it is a business card). The ratio of the images is 1:1.5 approx.

Given that the borders of the object are very close to the limits of the image (sometimes even touching them), I've set width=640, height=416 in my .cfg file, for the moment.

Is it safe to set both width and height at 416 as recommended? Or I'm risking losing valuable information due to the closeness of the object to the image limits?

Thanks for your great contribution and support to the community!

MurreyCode on 8 May 2019

Hello,
I'm beginning in AI.
currently, I import Yolov3 to onnx,
so could anyone please share me the sample .cpp file to import yolov3.

Thanks

hoaquocphan on 3 Mar 2020

@MurreyCode you don't need to adjust height and width differently in your config or resize your database images. YOLO architecture does it by itself keeping the aspect ratio safe (no information will be ignored) according to the resolution in .cfg file. For Example, if you have image size 1248 x 936, YOLO will resize it to 416 x 312 and then pad the extra space with black bars to fit into a 416 x 416 network.

pullmyleg on 6 Apr 2020

👍9

@MurreyCode you don't need to adjust height and width differently in your config or resize your database images. YOLO architecture does it by itself keeping the aspect ratio safe (no information will be ignored) according to the resolution in .cfg file. For Example, if you have image size 1248 x 936, YOLO will resize it to 416 x 312 and then pad the extra space with black bars to fit into a 416 x 416 network.

for which version ?