Darknet: View anchors and model cfg

Created on 1 Feb 2019 · 11Comments · Source: AlexeyAB/darknet

Hi @AlexeyAB. I am new to this forums. I have been learning YOLO alone, reading and reviewing different tutorials and forums. I'm working on detection of two-class vehicles, small cars and small and large trucks (View images attached) [images_Cars_Trucks.zip] https://github.com/AlexeyAB/darknet/files/2819995/images_Cars_Trucks.zip)
You can see in the images that the model does not label large trucks well. I am using YoloV3 Tiny and this text file cfg cars_cfg.zip

I used darknet.exe detector calc_anchors EntrenaCars/code/SS/Train/cars.data -num_of_clusters 8 -width 480 -height 480 -show. I got the next anchors:

num_of_clusters = 8, width = 480, height = 480. read labels from 5966 images loaded
image: 5966 box: 6746 all loaded.
calculating k-means++ ... avg IoU = 86.84 % . Saving anchors to the file: anchors.txt
anchors = 30, 61, 39, 59, 36, 87, 41,109, 46,136, 52,171, 58,230, 65,367

The next image show cluster anchors: ![clusters anchors_8]
(https://user-images.githubusercontent.com/47233592/52101794-2f9d8b00-25ab-11e9-86a4-96f681864838.PNG)

How can I make my model consider the biggest trucks?
What should I modify in my cfg file?
What do you recommend in your experience?

Thanks @AlexeyAB

Source

GustavoAndresMoreno

Most helpful comment

@GustavoAndresMoreno Hi,

The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt
If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer:
cars12_2.cfg.txt

AlexeyAB on 1 Feb 2019

❤1 😄1 👍1

All 11 comments

@GustavoAndresMoreno
What's your model accurracy mAP? How many samples do u have for each class?
Anchor box is not fitting properly for the big truck. You can reduce the resolution to 320*320 (also make sure that car is visible). Probably, you have less training examples of big truck which can affect the anchor box co-ordinates and try with different number of clusters as well. You can visualize with the anchor co-ordinates whether it can fit the objects in the training set.

Sudhakar17 on 1 Feb 2019

@GustavoAndresMoreno Hi,

The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt
If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer:
cars12_2.cfg.txt

AlexeyAB on 1 Feb 2019

❤1 😄1 👍1

1). What's your model accurracy mAP:
This is fine: calculation mAP (mean average precision)...
5968 detections_count = 6975, unique_truth_count = 6746
class_id = 0, name = car, ap = 99.88 % class_id = 1, name = truck, ap = 99.99 %
for thresh = 0.70, precision = 1.00, recall = 0.99, F1-score = 1.00
for thresh = 0.70, TP = 6712, FP = 16, FN = 34, average IoU = 89.66 %
mean average precision (mAP) = 0.999314, or 99.93 %.

The model works very well, the problem is in the label that it generates for large trucks. I need you to cover the truck in order to determine its size and differentiate it from others

2). Anchor box is not fitting properly for the big truck. You can reduce the resolution to 320320 (also make sure that car is visible):
If the resolution is 320320 the model loss precision, so the resolution in 480*480 is good.

3). Probably, you have less training examples of big truck which can affect the anchor box co-ordinates and try with different number of clusters as well. You can visualize with the anchor co-ordinates whether it can fit the objects in the training set:
This can be one of the problems I have. The number of very large trucks is very low compared to the other vehicles. However, it is difficult to have a greater number of samples, I will try to obtain them.

Thanks @Sudhakar17

GustavoAndresMoreno on 1 Feb 2019

@GustavoAndresMoreno Hi,

The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt

If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer:
cars12_2.cfg.txt

Hi @AlexeyAB.
I will review the models that you recommend and have a feedback of the results.
Thank you.

GustavoAndresMoreno on 1 Feb 2019

Hi, Sorry. I closed the issue by mistake.

GustavoAndresMoreno on 1 Feb 2019

@GustavoAndresMoreno Hi,
The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt

If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer:
cars12_2.cfg.txt

Hi @AlexeyAB.
I will review the models that you recommend and have a feedback of the results.
Thank you.

Hi @AlexeyAB. The result of the model in accuracy is very similar to the previous one. It is not yet possible to label the truck completely.
What other process could I do?
If I use YoloV3, could the result improve?

Thanks @AlexeyAB

GustavoAndresMoreno on 2 Feb 2019

@GustavoAndresMoreno You should add much more examples with full truck to your training dataset.

If I use YoloV3, could the result improve?

Yes.

AlexeyAB on 2 Feb 2019

@GustavoAndresMoreno You should add much more examples with full truck to your training dataset.

If I use YoloV3, could the result improve?

Yes.

Ok @AlexeyAB. I add more examples and I will try with Yolo V3. Thank You.

GustavoAndresMoreno on 4 Feb 2019

Hi @AlexeyAB and @Sudhakar17,

In the cfg model the Yolo layers how they connect with the upper convolutional layers. For example, in the cfg model I use the last layer of Yolo for small objects with the mask 0,1,2. And for large objects I use the first layer of Yolo with the mask 3,4,5,6,7,8. What convolutional layers should I modify so that the model can better recognize small objects and large objects?

I hope you can understand my question.

Thank you.

GustavoAndresMoreno on 19 Feb 2019

@GustavoAndresMoreno
As described here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
filters=(classes + coords + 1)*<number of mask>

So you should set filters=(classes + coords + 1)*6 in the [convolutional] layer before the 1st yolo-lyaer (where are mask 3,4,5,6,7,8).
And filters=(classes + coords + 1)*3 in the [convolutional] layer before the last yolo-lyaer (where are mask 0,1,2).

Also: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

AlexeyAB on 19 Feb 2019

layer filters size 0 conv 32 3 x 3 / 1 1 conv 64 3 x 3 / 2 2 conv 32 1 x 1 / 1 3 conv 64 3 x 3 / 1 4 Shortcut Layer: 1
5 conv 128 3 x 3 / 2 6 conv 64 1 x 1 / 1 7 conv 128 3 x 3 / 1 8 Shortcut Layer: 5
9 conv 64 1 x 1 / 1 10 conv 128 3 x 3 / 1 11 Shortcut Layer: 8
12 conv 256 3 x 3 / 2 13 conv 128 1 x 1 / 1 14 conv 256 3 x 3 / 1 15 Shortcut Layer: 12
16 conv 128 1 x 1 / 1 17 conv 256 3 x 3 / 1 18 Shortcut Layer: 15
19 conv 128 1 x 1 / 1 20 conv 256 3 x 3 / 1 21 Shortcut Layer: 18
22 conv 128 1 x 1 / 1 23 conv 256 3 x 3 / 1 24 Shortcut Layer: 21
25 conv 128 1 x 1 / 1 26 conv 256 3 x 3 / 1 27 Shortcut Layer: 24
28 conv 128 1 x 1 / 1 29 conv 256 3 x 3 / 1 30 Shortcut Layer: 27
31 conv 128 1 x 1 / 1 32 conv 256 3 x 3 / 1 33 Shortcut Layer: 30
34 conv 128 1 x 1 / 1 35 conv 256 3 x 3 / 1 36 Shortcut Layer: 33
37 conv 512 3 x 3 / 2 38 conv 256 1 x 1 / 1 39 conv 512 3 x 3 / 1 40 Shortcut Layer: 37
41 conv 256 1 x 1 / 1 42 conv 512 3 x 3 / 1 43 Shortcut Layer: 40
44 conv 256 1 x 1 / 1 45 conv 512 3 x 3 / 1 46 Shortcut Layer: 43
47 conv 256 1 x 1 / 1 48 conv 512 3 x 3 / 1 49 Shortcut Layer: 46
50 conv 256 1 x 1 / 1 51 conv 512 3 x 3 / 1 52 Shortcut Layer: 49
53 conv 256 1 x 1 / 1 54 conv 512 3 x 3 / 1 55 Shortcut Layer: 52
56 conv 256 1 x 1 / 1 57 conv 512 3 x 3 / 1 58 Shortcut Layer: 55
59 conv 256 1 x 1 / 1 60 conv 512 3 x 3 / 1 61 Shortcut Layer: 58
62 conv 1024 3 x 3 / 2 63 conv 512 1 x 1 / 1 64 conv 1024 3 x 3 / 1 65 Shortcut Layer: 62
66 conv 512 1 x 1 / 1 67 conv 1024 3 x 3 / 1 68 Shortcut Layer: 65
69 conv 512 1 x 1 / 1 70 conv 1024 3 x 3 / 1 71 Shortcut Layer: 68
72 conv 512 1 x 1 / 1 73 conv 1024 3 x 3 / 1 74 Shortcut Layer: 71
75 conv 512 1 x 1 / 1 76 conv 1024 3 x 3 / 1 77 conv 512 1 x 1 / 1 78 conv 1024 3 x 3 / 1 79 conv 512 1 x 1 / 1 80 conv 1024 3 x 3 / 1 81 conv 21 1 x 1 / 1 82 yolo
83 route 79
84 conv 256 1 x 1 / 1 85 upsample 86 route 85 61
87 conv 256 1 x 1 / 1 88 conv 512 3 x 3 / 1 89 conv 256 1 x 1 / 1 90 conv 512 3 x 3 / 1 91 conv 256 1 x 1 / 1 92 conv 512 3 x 3 / 1 93 conv 21 1 x 1 / 1 94 yolo
95 route 91
96 conv 128 1 x 1 / 1 97 upsample 98 route 97 11
99 conv 128 1 x 1 / 1 100 conv 256 3 x 3 / 1 101 conv 128 1 x 1 / 1 102 conv 256 3 x 3 / 1 103 conv 128 1 x 1 / 1 104 conv 256 3 x 3 / 1 105 conv 21 1 x 1 / 1 106 yolo input output
416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
208 x 208 x 64 -> 208 x 208 x 32 0.177 BF
208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
208 x 208 x 64 -> 104 x 104 x 128 1.595 BF
104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
104 x 104 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 21 0.007 BF
13 x 13 x 512 -> 13 x 13 x 256 0.044 BF
2x 13 x 13 x 256 -> 26 x 26 x 256
26 x 26 x 768 -> 26 x 26 x 256 0.266 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 21 0.015 BF
26 x 26 x 256 -> 26 x 26 x 128 0.044 BF
4x 26 x 26 x 128 -> 104 x 104 x 128
104 x 104 x 256 -> 104 x 104 x 128 0.709 BF
104 x 104 x 128 -> 104 x 104 x 256 6.380 BF
104 x 104 x 256 -> 104 x 104 x 128 0.709 BF
104 x 104 x 128 -> 104 x 104 x 256 6.380 BF
104 x 104 x 256 -> 104 x 104 x 128 0.709 BF
104 x 104 x 128 -> 104 x 104 x 256 6.380 BF
104 x 104 x 256 -> 104 x 104 x 21 0.116 BF

from the output of yolo-v3, we can see that the smaller objects can be detected with the larger feature map of size(104*104). So you can upsample the intermediate(conv layers) stages before yolo as well. Even though we followed this approach, sometimes we need min size(pixels) to detect an object.

Sudhakar17 on 19 Feb 2019

Was this page helpful?

0 / 5 - 0 ratings