When changing the input size of the network, how should the anchors be updated?
In https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py#L52, calculated anchors are finally multiplied by the size of the network and divided by 32.
By example, following this logic, we should multiply each anchors values by 2, when using 832 as input values instead of 416. However, it doesn't give me good results.
I have to precise that I want to change the network size to be able to detect smaller objects, but that I should still be able to detect "normal sized" objects too.
Should I then add new anchor values in addition to the old ones?
In #199, you (@AlexeyAB) wrote
Try to use for detection 1088x1088 and multiply each anchor value by 1.6 (but if you trained with random=1, then multiple by 2.4)
I don't understand from where is coming the 1.6 factor.
And how should we determine the factor when using _random=1_ (what is my case)?
Thank you for your help
The best way is to use 1088x1088 and 10 anchors (20 values) for training. Where 5 unchanged anchors for small objects copied from yolo-voc.cfg, and 5 anchors scaled at 2.4 times for large object. And train on images (with size larger than 1000x1000) that contain small and large objects.
For models already trained on 416x416, you can use some 3 scaled anchors and 2 unchanged anchors for detection on 1088x1088.
Also you can try to train densenet201_yolo2.cfg with initial weights densenet201.300: https://github.com/AlexeyAB/darknet/issues/179#issuecomment-330047738
darknet.exe detector train data/obj.data densenet201_yolo.cfg densenet201.300
If I'm training on bigger networks, I can imagine it will take a lot more time to train?
I still don't get where the 2.4 times is coming from (I want to understand it to be able to adapt it for other size networks).
I will look at the use of DenseNet with YOLO !
Thank you for your answers
If you train using yolo-voc.cfg and width=416 height=416 random=0 and then detect on 1088x1088, then you should multiply the anchors by 2.6 = 1088/416
If you train using yolo-voc.cfg and random=1 and then detect on 1088x1088, then you should multiply the anchors by ~2.4 = 1088/464 = 1088/((320+608)/2)
(when random=1, then during training for each 10 iterations network size will be resized randomly from 320x320 to 608x608)
- If you train using yolo-voc.cfg and
width=416 height=416 random=0and then detect on 1088x1088, then you should multiply the anchors by2.6= 1088/416- If you train using yolo-voc.cfg and
random=1and then detect on 1088x1088, then you should multiply the anchors by ~2.4= 1088/464 = 1088/((320+608)/2)
(when random=1, then during training for each 10 iterations network size will be resized randomly from 320x320 to 608x608)
maybe actually it is (288+576)/2, and their average is different. 288= int(416/1.4/32)32, 576=int(416x
1.4/32)32.
Please let me know if I am wrong.
Most helpful comment
If you train using yolo-voc.cfg and
width=416 height=416 random=0and then detect on 1088x1088, then you should multiply the anchors by2.6= 1088/416If you train using yolo-voc.cfg and
random=1and then detect on 1088x1088, then you should multiply the anchors by ~2.4= 1088/464 = 1088/((320+608)/2)(when random=1, then during training for each 10 iterations network size will be resized randomly from 320x320 to 608x608)