Darknet: View anchors and model cfg

Created on 1 Feb 2019  路  11Comments  路  Source: AlexeyAB/darknet

Hi @AlexeyAB. I am new to this forums. I have been learning YOLO alone, reading and reviewing different tutorials and forums. I'm working on detection of two-class vehicles, small cars and small and large trucks (View images attached) [images_Cars_Trucks.zip] https://github.com/AlexeyAB/darknet/files/2819995/images_Cars_Trucks.zip)
You can see in the images that the model does not label large trucks well. I am using YoloV3 Tiny and this text file cfg cars_cfg.zip

I used darknet.exe detector calc_anchors EntrenaCars/code/SS/Train/cars.data -num_of_clusters 8 -width 480 -height 480 -show. I got the next anchors:

num_of_clusters = 8, width = 480, height = 480. read labels from 5966 images loaded
image: 5966 box: 6746 all loaded.
calculating k-means++ ... avg IoU = 86.84 % . Saving anchors to the file: anchors.txt
anchors = 30, 61, 39, 59, 36, 87, 41,109, 46,136, 52,171, 58,230, 65,367

The next image show cluster anchors: ![clusters anchors_8]
(https://user-images.githubusercontent.com/47233592/52101794-2f9d8b00-25ab-11e9-86a4-96f681864838.PNG)

How can I make my model consider the biggest trucks?
What should I modify in my cfg file?
What do you recommend in your experience?

Thanks @AlexeyAB

Most helpful comment

@GustavoAndresMoreno Hi,

The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

  1. Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt

  2. If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer:
    cars12_2.cfg.txt

All 11 comments

@GustavoAndresMoreno
What's your model accurracy mAP? How many samples do u have for each class?
Anchor box is not fitting properly for the big truck. You can reduce the resolution to 320*320 (also make sure that car is visible). Probably, you have less training examples of big truck which can affect the anchor box co-ordinates and try with different number of clusters as well. You can visualize with the anchor co-ordinates whether it can fit the objects in the training set.

@GustavoAndresMoreno Hi,

The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

  1. Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt

  2. If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer:
    cars12_2.cfg.txt

1). What's your model accurracy mAP:
This is fine: calculation mAP (mean average precision)...
5968 detections_count = 6975, unique_truth_count = 6746
class_id = 0, name = car, ap = 99.88 % class_id = 1, name = truck, ap = 99.99 %
for thresh = 0.70, precision = 1.00, recall = 0.99, F1-score = 1.00
for thresh = 0.70, TP = 6712, FP = 16, FN = 34, average IoU = 89.66 %
mean average precision (mAP) = 0.999314, or 99.93 %.

The model works very well, the problem is in the label that it generates for large trucks. I need you to cover the truck in order to determine its size and differentiate it from others

2). Anchor box is not fitting properly for the big truck. You can reduce the resolution to 320320 (also make sure that car is visible):
If the resolution is 320
320 the model loss precision, so the resolution in 480*480 is good.

3). Probably, you have less training examples of big truck which can affect the anchor box co-ordinates and try with different number of clusters as well. You can visualize with the anchor co-ordinates whether it can fit the objects in the training set:
This can be one of the problems I have. The number of very large trucks is very low compared to the other vehicles. However, it is difficult to have a greater number of samples, I will try to obtain them.

Thanks @Sudhakar17

@GustavoAndresMoreno Hi,

The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

  1. Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt
  2. If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer:
    cars12_2.cfg.txt

Hi @AlexeyAB.
I will review the models that you recommend and have a feedback of the results.
Thank you.

Hi, Sorry. I closed the issue by mistake.

@GustavoAndresMoreno Hi,
The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

  1. Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt
  2. If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer:
    cars12_2.cfg.txt

Hi @AlexeyAB.
I will review the models that you recommend and have a feedback of the results.
Thank you.

Hi @AlexeyAB. The result of the model in accuracy is very similar to the previous one. It is not yet possible to label the truck completely.
What other process could I do?
If I use YoloV3, could the result improve?

Thanks @AlexeyAB

@GustavoAndresMoreno You should add much more examples with full truck to your training dataset.

If I use YoloV3, could the result improve?

Yes.

@GustavoAndresMoreno You should add much more examples with full truck to your training dataset.

If I use YoloV3, could the result improve?

Yes.

Ok @AlexeyAB. I add more examples and I will try with Yolo V3. Thank You.

Hi @AlexeyAB and @Sudhakar17,

In the cfg model the Yolo layers how they connect with the upper convolutional layers. For example, in the cfg model I use the last layer of Yolo for small objects with the mask 0,1,2. And for large objects I use the first layer of Yolo with the mask 3,4,5,6,7,8. What convolutional layers should I modify so that the model can better recognize small objects and large objects?

I hope you can understand my question.

Thank you.

@GustavoAndresMoreno
As described here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
filters=(classes + coords + 1)*<number of mask>

So you should set filters=(classes + coords + 1)*6 in the [convolutional] layer before the 1st yolo-lyaer (where are mask 3,4,5,6,7,8).
And filters=(classes + coords + 1)*3 in the [convolutional] layer before the last yolo-lyaer (where are mask 0,1,2).


Also: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
2 conv 32 1 x 1 / 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF
3 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
4 Shortcut Layer: 1
5 conv 128 3 x 3 / 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF
6 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
7 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
8 Shortcut Layer: 5
9 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
10 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
11 Shortcut Layer: 8
12 conv 256 3 x 3 / 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF
13 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
14 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
15 Shortcut Layer: 12
16 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
17 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
18 Shortcut Layer: 15
19 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
20 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
21 Shortcut Layer: 18
22 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
23 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
24 Shortcut Layer: 21
25 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
26 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
27 Shortcut Layer: 24
28 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
29 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
30 Shortcut Layer: 27
31 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
32 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
33 Shortcut Layer: 30
34 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
35 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
36 Shortcut Layer: 33
37 conv 512 3 x 3 / 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF
38 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
39 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
40 Shortcut Layer: 37
41 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
42 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
43 Shortcut Layer: 40
44 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
45 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
46 Shortcut Layer: 43
47 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
48 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
49 Shortcut Layer: 46
50 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
51 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
52 Shortcut Layer: 49
53 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
54 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
55 Shortcut Layer: 52
56 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
57 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
58 Shortcut Layer: 55
59 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
60 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
61 Shortcut Layer: 58
62 conv 1024 3 x 3 / 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF
63 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
64 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
65 Shortcut Layer: 62
66 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
67 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
68 Shortcut Layer: 65
69 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
70 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
71 Shortcut Layer: 68
72 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
73 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
74 Shortcut Layer: 71
75 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
76 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
77 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
78 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
79 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
80 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
81 conv 21 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 21 0.007 BF
82 yolo
83 route 79
84 conv 256 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF
85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256
86 route 85 61
87 conv 256 1 x 1 / 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF
88 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
89 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
90 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
91 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
92 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
93 conv 21 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 21 0.015 BF
94 yolo
95 route 91
96 conv 128 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF
97 upsample 4x 26 x 26 x 128 -> 104 x 104 x 128
98 route 97 11
99 conv 128 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 128 0.709 BF
100 conv 256 3 x 3 / 1 104 x 104 x 128 -> 104 x 104 x 256 6.380 BF
101 conv 128 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 128 0.709 BF
102 conv 256 3 x 3 / 1 104 x 104 x 128 -> 104 x 104 x 256 6.380 BF
103 conv 128 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 128 0.709 BF
104 conv 256 3 x 3 / 1 104 x 104 x 128 -> 104 x 104 x 256 6.380 BF
105 conv 21 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 21 0.116 BF
106 yolo

from the output of yolo-v3, we can see that the smaller objects can be detected with the larger feature map of size(104*104). So you can upsample the intermediate(conv layers) stages before yolo as well. Even though we followed this approach, sometimes we need min size(pixels) to detect an object.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Yumin-Sun-00 picture Yumin-Sun-00  路  3Comments

off99555 picture off99555  路  3Comments

louisondumont picture louisondumont  路  3Comments

HilmiK picture HilmiK  路  3Comments

Jacky3213 picture Jacky3213  路  3Comments