Darknet: few question about get better result in detection while training

Created on 3 Apr 2018 · 6Comments · Source: AlexeyAB/darknet

I have my image all at 1920*1080, while I am labeling some small object in image, so I think that when training , all image will be resize to 320x320 , which means actually YOLO are learning from just some little black dot.

if I crop the image maybe keep them no more then 960*540 will it get a batter detection result?

or, should I just train them at a larger size , maybe 1056*1056, and to avoid out of memory, use a higher subdivision (maybe 256?)

(I assume the image size when training will not change the speed of detection?)

if I corp the image to different size, when they all resized to 320x320, scale will not be same ,will it be a problem because the image are transformed ?

since in my project, all detection input will be same width/height radio (16:9), when I corp the image, should I keep all the training image with same width/height radio?

another question is about negative sample, they are actually easy to make( I just need to put random picture without object I want to detect) ,I can easily make 100000 of them, but how many should I put for each 1000 of positive sample? I assume its not the more the better?

an answer will be appreciated!!!

Source

akinohana

Most helpful comment

I have my image all at 1920*1080, while I am labeling some small object in image, so I think that when training , all image will be resize to 320x320 , which means actually YOLO are learning from just some little black dot.

if I crop the image maybe keep them no more then 960*540 will it get a batter detection result?

Yes, it will get a better result. But in this case detection images should have resolution about ~960x540 too. Or you can increase network resolution from 320x320 to 640x640 or more.

or, should I just train them at a larger size , maybe 1056*1056, and to avoid out of memory, use a higher subdivision (maybe 256?)

Yes. You can us any subdivision not more thant batch.
Image size when training will not change the speed of detection

if I corp the image to different size, when they all resized to 320x320, scale will not be same ,will it be a problem because the image are transformed ?

It will be a problem, because object will have different scale (object size relative to image size)

since in my project, all detection input will be same width/height radio (16:9), when I corp the image, should I keep all the training image with same width/height radio?

very disarable to keep aspect ratio

another question is about negative sample, they are actually easy to make( I just need to put random picture without object I want to detect) ,I can easily make 100000 of them, but how many should I put for each 1000 of positive sample? I assume its not the more the better?

I usually use equal number of positive and negative samples. For example, I have 12000 labeled images for 6 classes and also 12000 non-labeled images of negative samples.

AlexeyAB on 3 Apr 2018

👍2 🎉1

All 6 comments

I also wanted to know whether keeping negative samples (images with no object) would be helpful during training and if so, what are the ideal amounts?

Will be following this post.

golars497 on 3 Apr 2018

I have my image all at 1920*1080, while I am labeling some small object in image, so I think that when training , all image will be resize to 320x320 , which means actually YOLO are learning from just some little black dot.

if I crop the image maybe keep them no more then 960*540 will it get a batter detection result?

Yes, it will get a better result. But in this case detection images should have resolution about ~960x540 too. Or you can increase network resolution from 320x320 to 640x640 or more.

or, should I just train them at a larger size , maybe 1056*1056, and to avoid out of memory, use a higher subdivision (maybe 256?)

Yes. You can us any subdivision not more thant batch.
Image size when training will not change the speed of detection

if I corp the image to different size, when they all resized to 320x320, scale will not be same ,will it be a problem because the image are transformed ?

It will be a problem, because object will have different scale (object size relative to image size)

since in my project, all detection input will be same width/height radio (16:9), when I corp the image, should I keep all the training image with same width/height radio?

very disarable to keep aspect ratio

another question is about negative sample, they are actually easy to make( I just need to put random picture without object I want to detect) ,I can easily make 100000 of them, but how many should I put for each 1000 of positive sample? I assume its not the more the better?

I usually use equal number of positive and negative samples. For example, I have 12000 labeled images for 6 classes and also 12000 non-labeled images of negative samples.

AlexeyAB on 3 Apr 2018

👍2 🎉1

Thanks for detailed reply! now people will know how to train better :D

akinohana on 3 Apr 2018

@AlexeyAB
for negative samples, we just keep ground truth txt empty, is it right?