Darknet: How to calculate the pixel size of the smallest target that yolov2 can detect?

Created on 4 Dec 2017  路  20Comments  路  Source: AlexeyAB/darknet

If the size of input is 416x416, how to calculate the pixel size of the smallest target that yolov2 can detect?

Most helpful comment

@git-sohib you can just scale your image in inkscape, paint etc. to 416x416, and see if your object is still greater than one pixel. As Alexey mentioned if it is more than 32x32 (after resize) it will make good detections.

All 20 comments

  • After the image is resized to 416x416, the size of the object should be more than 1 pixel. So if you have image 832x832 and network width=416 height=416 (in the cfg-file), then object size on the image should be more than 4x4.

  • Good detection will be if your object size more than 32x32 after the image is resized to 416x416.

  • Also you should train your own model on dataset with small objects. Because any pre-trained weights yolo-voc.weights / yolo.weights aren't trained for small objects.

  • You can try to use ResNet-Yolo to detect small objects cut model: https://github.com/AlexeyAB/darknet/issues/179#issuecomment-330047738

Thank you very much! I will try it. In #179 (comment), You said that densenet201_yolo.cfg is ~2x slower than yolo-voc.cfg. How about ResNet-Yolo?

say, I have a 1920x1080 frame image, and cfg is 416x416, what is is the smallest pixel size of object?

@git-sohib you can just scale your image in inkscape, paint etc. to 416x416, and see if your object is still greater than one pixel. As Alexey mentioned if it is more than 32x32 (after resize) it will make good detections.

@git-sohib

  • 1920 / 416 x 1080 / 416 = 4x2 - it can be detected as 1x1 pixels object
  • 32 x 1920 / 416 x 32 x 1080 / 416 = 147x83 - it can be detected as 32x32 pixels object

Any image automatically is resized to the neural network size 416x416, so small objects can't be detected because will have size less than 1x1 pixels

@AlexeyAB , @TheMikeyR Thank you.
Is it true for yolov2 as well?

Anyway, How can I learn about all these things, papers, posts, repos?

Is it true for yolov2 as well?

Yes.

Anyway, How can I learn about all these things, papers, posts, repos?

May be I will write something.

@AlexeyAB thank you, I am looking forward to it

Hi Mr. @AlexeyAB, any updates on topic (papers, posts, repos)?

If my cfg's size is
height=448 width=640
and image size is 1920x1080, then 1920/640 x 1080/448 = 3x2 can be detected as 1x1 pixel object, right?
Also, 32 x 1920 / 640 x 32 x 1080 / 448= 99x77 can be detected as 32x32 pixel object, right?

@git-sohib

any updates on topic (papers, posts, repos)?

Not yet.

and image size is 1920x1080, then [1920/640] x [1080/448] = 3x2 can be detected as 1x1 pixel object, right?
Also, [32 x 1920 / 640] x [32 x 1080 / 448]= 99x77 can be detected as 32x32 pixel object, right?

Yes.

Thanks @AlexeyAB.
For example, my object size is 2-3 times smaller than 99x77. What is the promising way to detect those objects. Should I use tiny yolo or should I increase input size (640x448) in cfg to (1280x896)?

@git-sohib

You can do 2 steps: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

  1. increase network resolution in your .cfg-file (height=832, width=832 or any value multiple of 32) - it will increase precision

  2. for training for small objects - set layers = -1, 11 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L720 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L717

And then train your model.

thank you @AlexeyAB , I chose 1st option and increase input size from 640x448 to 1280x896.
And apparently, it showed Cuda Memory error. Then I changed

batch=64 
subdivisions=8 

to

batch=64 
subdivisions=32 

Now it's training, am I on the the right path?

@git-sohib

Yes.

Also you can try to recalculate anchors for new resolution 1280x896, and train with new anchors.

Mr. @AlexeyAB, it gave terrible results. After 1000 iterations, all the precision and recalls have become either 0 or nand. I validated those 1000 iterations and it drastically dercreased.

@git-sohib

  • Did you train from the begining by using width=1280 height=896 and new anchors in cfg-file?

  • Did you set layers = -1, 11 and stride=4?

  • If yes, then try to train about 10 000 iterations

@AlexeyAB, my cfg file is a bit different than yolo3. It doesn鈥檛 have anchors or layers. I just increase the network size by 2 and trained 10000 iterations with previously trained weights.

Mr. @AlexeyAB, here I am sharing my cfg file, would you look at it and suggest some edit?

[net]     
batch=64 
subdivisions=8 
height=896  
width=1280 
channels=3
momentum=0.99
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0000000001
max_batches = 36000 
policy=steps
steps=18000,25000,33000
scales=10,.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky


#######

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky


[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
size=1
stride=1
pad=1
filters=128
activation=leaky


[convolutional]
batch_normalize=1
size=1
stride=1
pad=1
filters=3
activation=linear

[patch_region]
classes=3
softmax=1
rescore=1
class_scales= 33.78, 14.80, 1.00 

Maybe I should get rid of some conv layers?

@git-sohib
learning_rate=0.0000000001

It looks like learning rate is too small. I don't know can any model be trained with this learning rate.
Set learning_rate=0.001

@AlexeyAB If you look at my height and width sizes in cfg they are now

height=896  
width=1280

They were

height=448
width=640

When I test these in vms (video management system) with increased sizes, the system is throwing CUDA error: __global__ function call is not configured error. However, when I do demo with that same increased size, there is no error, it slows down a bit and can detect a bit better. I know that if the network size increased, the system uses more memory. But it has enough allocated memory. What do you think the reason is? Maybe I should change something in cfg?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

HilmiK picture HilmiK  路  3Comments

louisondumont picture louisondumont  路  3Comments

off99555 picture off99555  路  3Comments

yongcong1415 picture yongcong1415  路  3Comments

jasleen137 picture jasleen137  路  3Comments