Hi AlexeyAB,
Thank you very much for sharing this project to people, especially for your patience to answer all the questions. 馃憤
Hope you can help me with these questions:
I understand that the number of grids will change along with the size of input image.
I just wonder if you can point out where to change it in the code, and provide any possible suggestion regarding how to decide the number of grids.
(I think it's related to the number of possible overlapping and complexity of regions to be recognized.)
Bounding boxes shake a lot when there's a recognition - it's probably because of the process of box-regression. Is there any way to stabilize them a little bit more? (say...use anchors at approximately the same size to targets?)
Thank you!
Hi @jackwei0117
You can try to decrease this param nms=0.1 - it means that if two bounded boxes are overlaped more than 10% then there will stay only one bounded box with highest probability: https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/src/demo.c#L74
Output grid size depends on:
maxpool layer with stride=2In both yolo-voc.2.0.cfg and tiny-yolo-voc.cfg there is 416x416 input size, and there are 5 maxpool layers with stride=2. So output size = 416x416 / pow(2, 5) = 416x416 / 32 = 13x13
For each cell of 13x13 there are 5 anchors.
You can add here code, that decreases precision of width and height: https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/src/demo.c#L97
int j;
for (j = 0; j < l.w*l.h*l.n; ++j) {
boxes[j].w = ((float)((int)(20 * boxes[j].w))+0.5) / 20.0F;
boxes[j].h = ((float)((int)(20 * boxes[j].h))+0.5) / 20.0F;
}
#define FRAMES 7 - it means that final feature map (output grid) will be averaged over 7 frames so it will change smoothly - but this will increase latency : https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/src/demo.c#L18C:\Python27\python.exe gen_anchors.py -filelist data/train.txt -output_dir data/anchors -num_clusters 10num=10: https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/cfg/yolo-voc.2.0.cfg#L232(say...use anchors at approximately the same size to targets?)
What do you mean?
Thank you very much AlexeyAB, you really saved me !!!
As for Q.3, what I mean is: If my targets are all at about [50,50] <=50 by 50 pixels
Will setting my anchors to [48,48], [49, 49], [50, 50], [51, 51], [52, 52] help reducing the shaking problem?
(rather than [5, 5], [25, 25], [75, 75], [100,100],[200, 200] )
Probably, I think yes, but you should re-train after that anchors are changed.
And do not forget that any input image is resized to 416x416, so if your object size [50,50] in the source image that has resolution 640x480, then anchor will be 13 x [50,50] / [640, 480] = 1.01, 1.35
What anchors can you get using this command?
C:\Python27\python.exe gen_anchors.py -filelist data/train.txt -output_dir data/anchors -num_clusters 5
gen_anchors.py is here: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/gen_anchors.py
Python 2.x: https://www.python.org/downloads/release/python-2714/
And probably then do C:\Python27\Scripts\pip install numpy
AlexeyAB,
Where does this "13" comes from?
"13 x [50,50] / [640, 480] = 1.01, 1.35"
Thank you for the information, I'll try to run the command later.
According to the default YOLO V2 cfg, the input image is downsized from 416x416 to 32x32 size features.
416/32=13
416x416 -> Input image size
32x32 -> Final conv layer feature size
sivagnanamn,
Understood, thank you very much for your explanation.
- You can add here code, that decreases precision of width and height: https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/src/demo.c#L97
int j; for (j = 0; j < l.w*l.h*l.n; ++j) { boxes[j].w = ((float)((int)(20 * boxes[j].w))+0.5) / 20.0F; boxes[j].h = ((float)((int)(20 * boxes[j].h))+0.5) / 20.0F; }
Hi @AlexeyAB
For last version of demo.c, is this code still working ? I put this code here but i am not sure that is right place. If not where is the right place or is there any other way to stabilize the boxes.
I'll be glad if you help me
https://github.com/AlexeyAB/darknet/blob/d51d89053afc4b7f50a30ace7b2fcf1b2ddd7598/src/demo.c#L247
int j;
for (j = 0; j < l.w*l.h*l.n; ++j) {
local_nboxes[j].w = ((float)((int)(20 * local_nboxes[j].w))+0.5) / 20.0F;
local_nboxes[j].h = ((float)((int)(20 * local_nboxes[j].h))+0.5) / 20.0F;
}
Most helpful comment
Hi @jackwei0117
You can try to decrease this param
nms=0.1- it means that if two bounded boxes are overlaped more than 10% then there will stay only one bounded box with highest probability: https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/src/demo.c#L74Output grid size depends on:
maxpoollayer withstride=2Output grid size = (input network size) / pow(2, (number of maxpool with stride=2) )
In both yolo-voc.2.0.cfg and tiny-yolo-voc.cfg there is 416x416 input size, and there are 5 maxpool layers with
stride=2. So output size = 416x416 / pow(2, 5) = 416x416 / 32 = 13x13For each cell of 13x13 there are 5 anchors.
You can add here code, that decreases precision of width and height: https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/src/demo.c#L97
#define FRAMES 7- it means that final feature map (output grid) will be averaged over 7 frames so it will change smoothly - but this will increase latency : https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/src/demo.c#L18C:\Python27\python.exe gen_anchors.py -filelist data/train.txt -output_dir data/anchors -num_clusters 10put generated anchors here: https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/cfg/yolo-voc.2.0.cfg#L228
and put here number of anchors
num=10: https://github.com/AlexeyAB/darknet/blob/e96a454ca11f140a7f7fb82daefe4cc9555a0f26/cfg/yolo-voc.2.0.cfg#L232What do you mean?