Hello,
I read a lot of comments about the process of detecting objects with Yolo, but I still have some questions concerning the detection process:
Yolov3 divides the input image in 13x13 parts and on each part, yolo checks for anchor-points, right?
So when you have 9 anchors in your .cfg-file, you can detext up to 13x13x9 objects?
Thanks for your help!
Knust
How many objects are detectable in Yolo?
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
the global maximum number of objects that can be detected by YoloV3 is
0,0615234375*(width*height)
Ok, thank you very much!
But it seems, I didn't understand the whole process at all.
Can you give me a hint where to find a good explanation of the yolov3 net? I just find questions to specific topics.
I assume you're asking about YOLOv3. Then this diagram might be helpful.

source: https://www.cyberailab.com/home/a-closer-look-at-yolov3
In short, predictions are made at 3 scales, 1/8, 1/16 and 1/32 of the original dimension. For input size of 416x416, it'll be 52x52, 26x26, 13x13. For every grid in each scale, 3 anchor boxes are predicted. There are 9 anchor boxes given, because there are 3 for every scale.
So the total no. of prediction is (52x52 + 26x26 + 13x13)*3 = 10647. AlexeyAB's formula will give you the same number.
thank you very much. that helped a lot
Most helpful comment
I assume you're asking about YOLOv3. Then this diagram might be helpful.
source: https://www.cyberailab.com/home/a-closer-look-at-yolov3
In short, predictions are made at 3 scales, 1/8, 1/16 and 1/32 of the original dimension. For input size of 416x416, it'll be 52x52, 26x26, 13x13. For every grid in each scale, 3 anchor boxes are predicted. There are 9 anchor boxes given, because there are 3 for every scale.
So the total no. of prediction is (52x52 + 26x26 + 13x13)*3 = 10647. AlexeyAB's formula will give you the same number.