I am going to detect logo of vehicle in road and logo is small against vehicle...
Is it available?
How small things can i detect with YOLO?
Hello @richardminh ,
I am interested in the same question. I hope someone can help us.
What are the minimum pixels an object must occupy to be able to detect it with Yolo?
On the other hand, I've been able to detect some logos in cars with Yolov2, even training with a more generic dataset (not only logos in cars but also logos in advertising, etc.). But it's true that I couldn't detect the logos when the car was far from the point of view. Don't know if training with more specific images to this application will success... Hope it helps.
Ana
@richardminh @anavc94
Resize your images to the network size (width=416 height=416 in your cfg-file), and if you can recognize objects by your eyes, then Yolo can do it too.
@AlexeyAB I Have been reading in other issues (https://github.com/AlexeyAB/darknet/issues/1475) that there's like a criteria for recognition which is:
obj_size = 32*(img_size/416) (in case we are using width=416 or height=416)
If i have a resolution of, for example, 3840x2160, my obj_size would be about (32*3840)/416 = 295. Doing the same for vertical resolution, obj_size = 295x166. Does it mean that to be able to recognize the object it must ocuppy a minimum area of 295x166 p铆xels in the image or frame? And if it does, how did you get that criteria?
Thanks for the reply,
Ana
@anavc94
And if it does, how did you get that criteria?
Yes, if you want to know the best minimal size for recognition, then use this criteria obj_size = 32*(img_size/416), i.e. your object size should be 32x32 pixels or more after that image is resized to the network size 416x416.
Why 32x32? This is subsampling multiplier of the first [yolo] layer with the most generalizing ability, 32 = pow(2,5) i.e. there is 5 subsampling-layers with stride=2 between input and the 1st [yolo]-layer:
the first [yolo] layer: https://github.com/AlexeyAB/darknet/blob/6b4dca27d3dd3c4c7b41076596a32e7171638412/cfg/yolov3.cfg#L607
Does it mean that to be able to recognize the object it must ocuppy a minimum area of 295x166 p铆xels in the image or frame?
Object can be recognized if you can recognize it by your eyes, after that images is resized to the network size 416x416
Object can be recognized with ~77.2% on 1000 classes (there is used backbone network Darknet53 with 77.2% top1 accuracy on 1000 classes ImageNet https://pjreddie.com/darknet/imagenet/ ) if the object size is 32x32 or more, after that images is resized to the network size 416x416
These are approximate theoretical criteria.
Interesting! Thanks for the reply @AlexeyAB , I really appreciate it.
Ana
@AlexeyAB If one split a large image into 4, 9, 16 equal pieces, and use 416 network size, (width=416 height=416 in your cfg-file). will darknet pickup some details?
@c2h2
Yes.
But better to split into pieces with overlaps, so if some object will be in the edge of one piece, then it should fully be visible on another piece for better detection.
Thanks! might need to use some algorithm to de-duplicate the overlap boxes if needed.
Most helpful comment
@anavc94
Yes, if you want to know the best minimal size for recognition, then use this criteria
obj_size = 32*(img_size/416), i.e. your object size should be 32x32 pixels or more after that image is resized to the network size 416x416.Why 32x32? This is subsampling multiplier of the first [yolo] layer with the most generalizing ability,
32 = pow(2,5)i.e. there is5subsampling-layers withstride=2between input and the 1st [yolo]-layer:the first [yolo] layer: https://github.com/AlexeyAB/darknet/blob/6b4dca27d3dd3c4c7b41076596a32e7171638412/cfg/yolov3.cfg#L607
Object can be recognized if you can recognize it by your eyes, after that images is resized to the network size 416x416
Object can be recognized with ~77.2% on 1000 classes (there is used backbone network Darknet53 with 77.2% top1 accuracy on 1000 classes ImageNet https://pjreddie.com/darknet/imagenet/ ) if the object size is 32x32 or more, after that images is resized to the network size 416x416
These are approximate theoretical criteria.