Darknet: anchor box

Created on 26 Sep 2018 · 12Comments · Source: AlexeyAB/darknet

[yolo]
mask = 0,1,2
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80

help , please
anchors，what dose it mean?
How do you determine the number?

Source

baidu88vip

Most helpful comment

Anchors are initial sizes of object.
You can obtain them as average sizes of objects: https://arxiv.org/pdf/1804.02767v1.pdf

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file

AlexeyAB on 26 Sep 2018

👍3

All 12 comments

Anchors are initial sizes of object.
You can obtain them as average sizes of objects: https://arxiv.org/pdf/1804.02767v1.pdf

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file

AlexeyAB on 26 Sep 2018

👍3

@AlexeyAB Quick follow-up on this one: I'm playing with training with resolution 64x64 and detecting with 608x608. My training-images are small, appr. 100x80 but images for detection are larger, typically 1000x1000. However, the objects have same sizes in terms of pixels, e.g 100x60, in both train- and detect-images.

How should I calculate the anchors in this case? Intuition says to calculate them for training-set and use the same anchors (with different network resolution) for detection. Does that make sense?

BjarneHerland on 26 Sep 2018

@BjarneHerland

Intuition says to calculate them for training-set and use the same anchors (with different network resolution) for detection.

Yes, you should calculate anchors for 64x64, and use the same anchors for both Training and Detection:
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 64 -height 64

Quick follow-up on this one: I'm playing with training with resolution 64x64 and detecting with 608x608. My training-images are small, appr. 100x80 but images for detection are larger, typically 1000x1000. However, the objects have same sizes in terms of pixels, e.g 100x60, in both train- and detect-images.

Yes, this is the right approach.

You use 10x smaller resolution for training, because your training images 10x smaller than detection images, while object sizes are the same - According to the general rule: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

While train_obj_width == detection_obj_width since the objects have same sizes in terms of pixels:

64 * train_obj_width / 100 ~= 608 * detection_obj_width / 1000
64/100 = 0.64 ~= 0.608 = 608/1000

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

AlexeyAB on 26 Sep 2018

👍1

Thank u.Get it!

baidu88vip on 27 Sep 2018

Anchors are initial sizes of object.
You can obtain them as average sizes of objects: https://arxiv.org/pdf/1804.02767v1.pdf

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file

Thank u.Get it

baidu88vip on 27 Sep 2018

@AlexeyAB Brilliant - thx for confirming. :)

BjarneHerland on 27 Sep 2018

@CHNWhite Sorry for intruding, but this might be relevant for you also...

@AlexeyAB Please advice if you can: Is it possible / advisable to train a single set of weights using multiple datasets? My objects vary in size by factor ~6-7 in each dimension. Can I split into e.g. 3 different training-sets, construct corresponding config-files with proper resolutions, then use this to train a single set of weights? How would you approach this? (I try to avoid resizing and padding..)

BjarneHerland on 27 Sep 2018

@BjarneHerland Do you want to train 1 weights-file by using 3 different datasets sequantially, one-by-one?
It's a bad idea, because while you train model for 2nd dataset, your model will forget objects from 1st dataset.

AlexeyAB on 27 Sep 2018

@AlexeyAB Well, the idea was to group different-sized objects in different datasets, say, small, medium and large. Small objects would be represented by small images (e.g. 30x30 pixels), medium objects by larger images (e.g. 100x100) and large objects by even larger images (e.g. 200x200).

Network resolution for each dataset would be set up according to your rules-of-thumb and the comments above.

So, would it work to train e.g. first 100 epochs with the small config+dataset, then switch config + dataset and train 100 epochs starting from previous weights on medium dataset, then move to large dataset, and cycle back to small dataset. Bad idea? :)

BjarneHerland on 27 Sep 2018

@BjarneHerland Yes, I understand.

Using correspond network resolution for each dataset is a good idea, but training the 1 weights-file with different datasets is a bad idea ) So may be you should do padding.

But you can try to use your apporach and compare accuracy mAP.

AlexeyAB on 27 Sep 2018

@AlexeyAB Thanks for feedback. As you probably already guessed, I'm trying to auto-generate training-images. The issue is that these images currently end up as described above. I'll follow your advise and try padding them... :)

BjarneHerland on 27 Sep 2018

@BjarneHerland

Intuition says to calculate them for training-set and use the same anchors (with different network resolution) for detection.

Yes, you should calculate anchors for 64x64, and use the same anchors for both Training and Detection:
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 64 -height 64

Quick follow-up on this one: I'm playing with training with resolution 64x64 and detecting with 608x608. My training-images are small, appr. 100x80 but images for detection are larger, typically 1000x1000. However, the objects have same sizes in terms of pixels, e.g 100x60, in both train- and detect-images.

Yes, this is the right approach.

You use 10x smaller resolution for training, because your training images 10x smaller than detection images, while object sizes are the same - According to the general rule: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

While train_obj_width == detection_obj_width since the objects have same sizes in terms of pixels:

64 * train_obj_width / 100 ~= 608 * detection_obj_width / 1000
64/100 = 0.64 ~= 0.608 = 608/1000

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width