Darknet: How to calculate the pixel size of the smallest target that yolov2 can detect?

Created on 4 Dec 2017 · 20Comments · Source: AlexeyAB/darknet

If the size of input is 416x416, how to calculate the pixel size of the smallest target that yolov2 can detect?

Source

CBIR-LL

Most helpful comment

@git-sohib you can just scale your image in inkscape, paint etc. to 416x416, and see if your object is still greater than one pixel. As Alexey mentioned if it is more than 32x32 (after resize) it will make good detections.

TheMikeyR on 17 May 2018

👍2

All 20 comments

After the image is resized to 416x416, the size of the object should be more than 1 pixel. So if you have image 832x832 and network width=416 height=416 (in the cfg-file), then object size on the image should be more than 4x4.
Good detection will be if your object size more than 32x32 after the image is resized to 416x416.
Also you should train your own model on dataset with small objects. Because any pre-trained weights yolo-voc.weights / yolo.weights aren't trained for small objects.
You can try to use ResNet-Yolo to detect small objects cut model: https://github.com/AlexeyAB/darknet/issues/179#issuecomment-330047738

AlexeyAB on 4 Dec 2017

Thank you very much! I will try it. In #179 (comment), You said that densenet201_yolo.cfg is ~2x slower than yolo-voc.cfg. How about ResNet-Yolo?

CBIR-LL on 4 Dec 2017

say, I have a 1920x1080 frame image, and cfg is 416x416, what is is the smallest pixel size of object?

bit-scientist on 17 May 2018

TheMikeyR on 17 May 2018

👍2

@git-sohib

1920 / 416 x 1080 / 416 = 4x2 - it can be detected as 1x1 pixels object
32 x 1920 / 416 x 32 x 1080 / 416 = 147x83 - it can be detected as 32x32 pixels object

Any image automatically is resized to the neural network size 416x416, so small objects can't be detected because will have size less than 1x1 pixels

AlexeyAB on 17 May 2018

❤1

@AlexeyAB , @TheMikeyR Thank you.
Is it true for yolov2 as well?

Anyway, How can I learn about all these things, papers, posts, repos?

bit-scientist on 18 May 2018

Is it true for yolov2 as well?

Yes.

Anyway, How can I learn about all these things, papers, posts, repos?

May be I will write something.

AlexeyAB on 18 May 2018

@AlexeyAB thank you, I am looking forward to it

bit-scientist on 19 May 2018

Hi Mr. @AlexeyAB, any updates on topic (papers, posts, repos)?

If my cfg's size is
height=448 width=640
and image size is 1920x1080, then 1920/640 x 1080/448 = 3x2 can be detected as 1x1 pixel object, right?
Also, 32 x 1920 / 640 x 32 x 1080 / 448= 99x77 can be detected as 32x32 pixel object, right?

bit-scientist on 6 Jul 2018

@git-sohib

any updates on topic (papers, posts, repos)?

Not yet.

and image size is 1920x1080, then [1920/640] x [1080/448] = 3x2 can be detected as 1x1 pixel object, right?
Also, [32 x 1920 / 640] x [32 x 1080 / 448]= 99x77 can be detected as 32x32 pixel object, right?

Yes.

AlexeyAB on 6 Jul 2018

Thanks @AlexeyAB.
For example, my object size is 2-3 times smaller than 99x77. What is the promising way to detect those objects. Should I use tiny yolo or should I increase input size (640x448) in cfg to (1280x896)?

bit-scientist on 10 Jul 2018

@git-sohib

You can do 2 steps: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

increase network resolution in your .cfg-file (height=832, width=832 or any value multiple of 32) - it will increase precision
for training for small objects - set layers = -1, 11 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L720 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L717

And then train your model.

AlexeyAB on 10 Jul 2018

thank you @AlexeyAB , I chose 1st option and increase input size from 640x448 to 1280x896.
And apparently, it showed Cuda Memory error. Then I changed

batch=64 
subdivisions=8

batch=64 
subdivisions=32

Now it's training, am I on the the right path?

bit-scientist on 10 Jul 2018

@git-sohib

Yes.

Also you can try to recalculate anchors for new resolution 1280x896, and train with new anchors.

AlexeyAB on 10 Jul 2018

Mr. @AlexeyAB, it gave terrible results. After 1000 iterations, all the precision and recalls have become either 0 or nand. I validated those 1000 iterations and it drastically dercreased.

bit-scientist on 11 Jul 2018

@git-sohib

Did you train from the begining by using width=1280 height=896 and new anchors in cfg-file?
Did you set layers = -1, 11 and stride=4?
If yes, then try to train about 10 000 iterations

AlexeyAB on 11 Jul 2018

@AlexeyAB, my cfg file is a bit different than yolo3. It doesn’t have anchors or layers. I just increase the network size by 2 and trained 10000 iterations with previously trained weights.

bit-scientist on 11 Jul 2018

Mr. @AlexeyAB, here I am sharing my cfg file, would you look at it and suggest some edit?

[net]     
batch=64 
subdivisions=8 
height=896  
width=1280 
channels=3
momentum=0.99
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0000000001
max_batches = 36000 
policy=steps
steps=18000,25000,33000
scales=10,.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky


#######

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky


[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
size=1
stride=1
pad=1
filters=128
activation=leaky


[convolutional]
batch_normalize=1
size=1
stride=1
pad=1
filters=3
activation=linear

[patch_region]
classes=3
softmax=1
rescore=1
class_scales= 33.78, 14.80, 1.00

Maybe I should get rid of some conv layers?

bit-scientist on 11 Jul 2018

@git-sohib
learning_rate=0.0000000001

It looks like learning rate is too small. I don't know can any model be trained with this learning rate.
Set learning_rate=0.001

AlexeyAB on 11 Jul 2018

@AlexeyAB If you look at my height and width sizes in cfg they are now

height=896  
width=1280

They were

height=448
width=640

When I test these in vms (video management system) with increased sizes, the system is throwing CUDA error: __global__ function call is not configured error. However, when I do demo with that same increased size, there is no error, it slows down a bit and can detect a bit better. I know that if the network size increased, the system uses more memory. But it has enough allocated memory. What do you think the reason is? Maybe I should change something in cfg?

bit-scientist on 13 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings