Darknet: How to update anchors when changing the input size of the network

Created on 11 Oct 2017 · 4Comments · Source: AlexeyAB/darknet

When changing the input size of the network, how should the anchors be updated?

In https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py#L52, calculated anchors are finally multiplied by the size of the network and divided by 32.
By example, following this logic, we should multiply each anchors values by 2, when using 832 as input values instead of 416. However, it doesn't give me good results.

I have to precise that I want to change the network size to be able to detect smaller objects, but that I should still be able to detect "normal sized" objects too.
Should I then add new anchor values in addition to the old ones?

In #199, you (@AlexeyAB) wrote

Try to use for detection 1088x1088 and multiply each anchor value by 1.6 (but if you trained with random=1, then multiple by 2.4)

I don't understand from where is coming the 1.6 factor.
And how should we determine the factor when using _random=1_ (what is my case)?

Thank you for your help

Source

iraadit

Most helpful comment

If you train using yolo-voc.cfg and width=416 height=416 random=0 and then detect on 1088x1088, then you should multiply the anchors by 2.6 = 1088/416
If you train using yolo-voc.cfg and random=1 and then detect on 1088x1088, then you should multiply the anchors by ~2.4 = 1088/464 = 1088/((320+608)/2)
(when random=1, then during training for each 10 iterations network size will be resized randomly from 320x320 to 608x608)

AlexeyAB on 12 Oct 2017

👍3 ❤1 🎉1

All 4 comments

The best way is to use 1088x1088 and 10 anchors (20 values) for training. Where 5 unchanged anchors for small objects copied from yolo-voc.cfg, and 5 anchors scaled at 2.4 times for large object. And train on images (with size larger than 1000x1000) that contain small and large objects.

For models already trained on 416x416, you can use some 3 scaled anchors and 2 unchanged anchors for detection on 1088x1088.

Also you can try to train densenet201_yolo2.cfg with initial weights densenet201.300: https://github.com/AlexeyAB/darknet/issues/179#issuecomment-330047738
darknet.exe detector train data/obj.data densenet201_yolo.cfg densenet201.300

AlexeyAB on 11 Oct 2017

If I'm training on bigger networks, I can imagine it will take a lot more time to train?

I still don't get where the 2.4 times is coming from (I want to understand it to be able to adapt it for other size networks).

I will look at the use of DenseNet with YOLO !

Thank you for your answers

iraadit on 12 Oct 2017

If you train using yolo-voc.cfg and width=416 height=416 random=0 and then detect on 1088x1088, then you should multiply the anchors by 2.6 = 1088/416
If you train using yolo-voc.cfg and random=1 and then detect on 1088x1088, then you should multiply the anchors by ~2.4 = 1088/464 = 1088/((320+608)/2)
(when random=1, then during training for each 10 iterations network size will be resized randomly from 320x320 to 608x608)

AlexeyAB on 12 Oct 2017

👍3 ❤1 🎉1

If you train using yolo-voc.cfg and width=416 height=416 random=0 and then detect on 1088x1088, then you should multiply the anchors by 2.6 = 1088/416

If you train using yolo-voc.cfg and random=1 and then detect on 1088x1088, then you should multiply the anchors by ~2.4 = 1088/464 = 1088/((320+608)/2)
(when random=1, then during training for each 10 iterations network size will be resized randomly from 320x320 to 608x608)

maybe actually it is (288+576)/2, and their average is different. 288= int(416/1.4/32)32, 576=int(416x
1.4/32)32.
Please let me know if I am wrong.