Darknet: few question about get better result in detection while training

Created on 3 Apr 2018  路  6Comments  路  Source: AlexeyAB/darknet

I have my image all at 1920*1080, while I am labeling some small object in image, so I think that when training , all image will be resize to 320x320 , which means actually YOLO are learning from just some little black dot.

if I crop the image maybe keep them no more then 960*540 will it get a batter detection result?

or, should I just train them at a larger size , maybe 1056*1056, and to avoid out of memory, use a higher subdivision (maybe 256?)

(I assume the image size when training will not change the speed of detection?)

if I corp the image to different size, when they all resized to 320x320, scale will not be same ,will it be a problem because the image are transformed ?

since in my project, all detection input will be same width/height radio (16:9), when I corp the image, should I keep all the training image with same width/height radio?

another question is about negative sample, they are actually easy to make( I just need to put random picture without object I want to detect) ,I can easily make 100000 of them, but how many should I put for each 1000 of positive sample? I assume its not the more the better?

an answer will be appreciated!!!

Most helpful comment

I have my image all at 1920*1080, while I am labeling some small object in image, so I think that when training , all image will be resize to 320x320 , which means actually YOLO are learning from just some little black dot.

if I crop the image maybe keep them no more then 960*540 will it get a batter detection result?

  • Yes, it will get a better result. But in this case detection images should have resolution about ~960x540 too. Or you can increase network resolution from 320x320 to 640x640 or more.

or, should I just train them at a larger size , maybe 1056*1056, and to avoid out of memory, use a higher subdivision (maybe 256?)

  • Yes. You can us any subdivision not more thant batch.
  • Image size when training will not change the speed of detection

if I corp the image to different size, when they all resized to 320x320, scale will not be same ,will it be a problem because the image are transformed ?

  • It will be a problem, because object will have different scale (object size relative to image size)

since in my project, all detection input will be same width/height radio (16:9), when I corp the image, should I keep all the training image with same width/height radio?

  • very disarable to keep aspect ratio

another question is about negative sample, they are actually easy to make( I just need to put random picture without object I want to detect) ,I can easily make 100000 of them, but how many should I put for each 1000 of positive sample? I assume its not the more the better?

I usually use equal number of positive and negative samples. For example, I have 12000 labeled images for 6 classes and also 12000 non-labeled images of negative samples.

All 6 comments

I also wanted to know whether keeping negative samples (images with no object) would be helpful during training and if so, what are the ideal amounts?

Will be following this post.

I have my image all at 1920*1080, while I am labeling some small object in image, so I think that when training , all image will be resize to 320x320 , which means actually YOLO are learning from just some little black dot.

if I crop the image maybe keep them no more then 960*540 will it get a batter detection result?

  • Yes, it will get a better result. But in this case detection images should have resolution about ~960x540 too. Or you can increase network resolution from 320x320 to 640x640 or more.

or, should I just train them at a larger size , maybe 1056*1056, and to avoid out of memory, use a higher subdivision (maybe 256?)

  • Yes. You can us any subdivision not more thant batch.
  • Image size when training will not change the speed of detection

if I corp the image to different size, when they all resized to 320x320, scale will not be same ,will it be a problem because the image are transformed ?

  • It will be a problem, because object will have different scale (object size relative to image size)

since in my project, all detection input will be same width/height radio (16:9), when I corp the image, should I keep all the training image with same width/height radio?

  • very disarable to keep aspect ratio

another question is about negative sample, they are actually easy to make( I just need to put random picture without object I want to detect) ,I can easily make 100000 of them, but how many should I put for each 1000 of positive sample? I assume its not the more the better?

I usually use equal number of positive and negative samples. For example, I have 12000 labeled images for 6 classes and also 12000 non-labeled images of negative samples.

Thanks for detailed reply! now people will know how to train better :D

@AlexeyAB
for negative samples, we just keep ground truth txt empty, is it right?

@goodtogood yes

@AlexeyAB
got it !
thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zihaozhang9 picture zihaozhang9  路  3Comments

Greta-A picture Greta-A  路  3Comments

Mididou picture Mididou  路  3Comments

siddharth2395 picture siddharth2395  路  3Comments

shootingliu picture shootingliu  路  3Comments