[yolo]
mask = 0,1,2
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
help , please
anchors锛寃hat dose it mean?
How do you determine the number?
Anchors are initial sizes of object.
You can obtain them as average sizes of objects: https://arxiv.org/pdf/1804.02767v1.pdf
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
recalculate anchors for your dataset for width and height from cfg-file:
darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file
@AlexeyAB Quick follow-up on this one: I'm playing with training with resolution 64x64 and detecting with 608x608. My training-images are small, appr. 100x80 but images for detection are larger, typically 1000x1000. However, the objects have same sizes in terms of pixels, e.g 100x60, in both train- and detect-images.
How should I calculate the anchors in this case? Intuition says to calculate them for training-set and use the same anchors (with different network resolution) for detection. Does that make sense?
@BjarneHerland
Intuition says to calculate them for training-set and use the same anchors (with different network resolution) for detection.
Yes, you should calculate anchors for 64x64, and use the same anchors for both Training and Detection:
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 64 -height 64
Quick follow-up on this one: I'm playing with training with resolution 64x64 and detecting with 608x608. My training-images are small, appr. 100x80 but images for detection are larger, typically 1000x1000. However, the objects have same sizes in terms of pixels, e.g 100x60, in both train- and detect-images.
Yes, this is the right approach.
You use 10x smaller resolution for training, because your training images 10x smaller than detection images, while object sizes are the same - According to the general rule: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
While train_obj_width == detection_obj_width since the objects have same sizes in terms of pixels:
64 * train_obj_width / 100 ~= 608 * detection_obj_width / 1000
64/100 = 0.64 ~= 0.608 = 608/1000
General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
Thank u.Get it!
Anchors are initial sizes of object.
You can obtain them as average sizes of objects: https://arxiv.org/pdf/1804.02767v1.pdfhttps://github.com/AlexeyAB/darknet#how-to-improve-object-detection
recalculate anchors for your dataset for width and height from cfg-file:
darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file
Thank u.Get it
@AlexeyAB Brilliant - thx for confirming. :)
@CHNWhite Sorry for intruding, but this might be relevant for you also...
@AlexeyAB Please advice if you can: Is it possible / advisable to train a single set of weights using multiple datasets? My objects vary in size by factor ~6-7 in each dimension. Can I split into e.g. 3 different training-sets, construct corresponding config-files with proper resolutions, then use this to train a single set of weights? How would you approach this? (I try to avoid resizing and padding..)
@BjarneHerland Do you want to train 1 weights-file by using 3 different datasets sequantially, one-by-one?
It's a bad idea, because while you train model for 2nd dataset, your model will forget objects from 1st dataset.
@AlexeyAB Well, the idea was to group different-sized objects in different datasets, say, small, medium and large. Small objects would be represented by small images (e.g. 30x30 pixels), medium objects by larger images (e.g. 100x100) and large objects by even larger images (e.g. 200x200).
Network resolution for each dataset would be set up according to your rules-of-thumb and the comments above.
So, would it work to train e.g. first 100 epochs with the small config+dataset, then switch config + dataset and train 100 epochs starting from previous weights on medium dataset, then move to large dataset, and cycle back to small dataset. Bad idea? :)
@BjarneHerland Yes, I understand.
Using correspond network resolution for each dataset is a good idea, but training the 1 weights-file with different datasets is a bad idea ) So may be you should do padding.
But you can try to use your apporach and compare accuracy mAP.
@AlexeyAB Thanks for feedback. As you probably already guessed, I'm trying to auto-generate training-images. The issue is that these images currently end up as described above. I'll follow your advise and try padding them... :)
@BjarneHerland
Intuition says to calculate them for training-set and use the same anchors (with different network resolution) for detection.
Yes, you should calculate anchors for 64x64, and use the same anchors for both Training and Detection:
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 64 -height 64Quick follow-up on this one: I'm playing with training with resolution 64x64 and detecting with 608x608. My training-images are small, appr. 100x80 but images for detection are larger, typically 1000x1000. However, the objects have same sizes in terms of pixels, e.g 100x60, in both train- and detect-images.
Yes, this is the right approach.
You use 10x smaller resolution for training, because your training images 10x smaller than detection images, while object sizes are the same - According to the general rule: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_widthWhile
train_obj_width == detection_obj_widthsincethe objects have same sizes in terms of pixels:
64 * train_obj_width / 100 ~= 608 * detection_obj_width / 1000
64/100 = 0.64~=0.608 = 608/1000General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
@AlexeyAB shouldn't train_network_width = detection_network_width ? , e.g. 608 defined in .cfg file
Most helpful comment
Anchors are initial sizes of object.
You can obtain them as average sizes of objects: https://arxiv.org/pdf/1804.02767v1.pdf
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection