Darknet: explanation on cfg file parameters

Created on 26 Nov 2017 · 37Comments · Source: AlexeyAB/darknet

Hi @AlexeyAB
could you please kindly document or explain parameters of the .cfg file

saturation, exposure and hue values
steps and scales values
anchors, bias_match
jitter, rescore, thresh
object_scale, noobject_scale, class_scale, coord_scale values
absolute

Explanations

Source

anandkoirala

👍12 ❤2

Most helpful comment

Hi,

saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV
The larger the value, the more invariance would neural network to change of lighting and color of the objects.
steps and scales values - steps is a checkpoints (number of itarations) at which scales will be applied, scales is a coefficients at which learning_rate will be multipled at this checkpoints.
Determines how the learning_rate will be changed during increasing number of iterations during training.
anchors, bias_match
anchors are frequent initial of objects in terms of output network resolution.
bias_match used only for training, if bias_match=1 then detected object will have the same as in one of anchor, else if bias_match=0 then of anchor will be refined by a neural network: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L275-L283
If you train with height=416,width=416,random=0, then max values of anchors will be 13,13.
But if you train with random=1, then max input resolution can be 608x608, and max values of anchors can be 19,19.
jitter, rescore, thresh
jitter can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/data.c#L513-L528

rescore determines what the loss (delta, cost, ...) function will be used - more about this: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558
https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L302-L305

thresh is a minimum IoU when should be used delta_region_class() during training: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L235

object_scale, noobject_scale, class_scale, coord_scale values - all used for training
object_scale used for loss (delta, cost, ...) function for objects: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558
noobject_scale - used for loss (delta, cost, ...) function for objects and backgrounds: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L232-L233
class_scale - used as scale in the delta_region_class(): https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L108
coord_scale - used as scale in the delta_region_box(): https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L87
absolute - isn't used

AlexeyAB on 26 Nov 2017

👍62 ❤20 🎉7 🚀4

All 37 comments

Hi,

saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV
The larger the value, the more invariance would neural network to change of lighting and color of the objects.
steps and scales values - steps is a checkpoints (number of itarations) at which scales will be applied, scales is a coefficients at which learning_rate will be multipled at this checkpoints.
Determines how the learning_rate will be changed during increasing number of iterations during training.
anchors, bias_match
anchors are frequent initial of objects in terms of output network resolution.
bias_match used only for training, if bias_match=1 then detected object will have the same as in one of anchor, else if bias_match=0 then of anchor will be refined by a neural network: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L275-L283
If you train with height=416,width=416,random=0, then max values of anchors will be 13,13.
But if you train with random=1, then max input resolution can be 608x608, and max values of anchors can be 19,19.
jitter, rescore, thresh
jitter can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/data.c#L513-L528

thresh is a minimum IoU when should be used delta_region_class() during training: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L235

object_scale, noobject_scale, class_scale, coord_scale values - all used for training
object_scale used for loss (delta, cost, ...) function for objects: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558
noobject_scale - used for loss (delta, cost, ...) function for objects and backgrounds: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L232-L233
class_scale - used as scale in the delta_region_class(): https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L108
coord_scale - used as scale in the delta_region_box(): https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L87
absolute - isn't used

AlexeyAB on 26 Nov 2017

👍62 ❤20 🎉7 🚀4

Hi @AlexeyAB, I didn't get what object_scale does in the link you mentioned (#185) and how to set it. Well, to be honest, I don't have much of a clue about the other ones either (noobject_scale, class_scale, coord_scale) but I have a feeling that this odject_scale parameter is more important!!! Is it in any way related to other parameters such as the number of classes, etc?

szm-R on 21 Jan 2018

How to change the no of iterations after which weights are created

dfsaw on 6 Jun 2018

If you mean period on 100 iterations to write snapshot, it is hardcoded here: https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L202
If you mean ultimate no of iterations (when training is stopped and blabla_final.weights is created - it is "max_batches" parameter.

IlyaOvodov on 6 Jun 2018

@IlyaOvodov In the link mentioned above which line should i change.. Suppose i want to look at weights after 10 iterations.

dfsaw on 7 Jun 2018

What does
[route]
layers=-9

[reorg]
stride=2 and

[route]
layers=-1,-4 means?

Can anyone please help me out?

dfsaw on 14 Jun 2018

@dfsaw

[route] layer - is the same as Concat-layer in the Caffe
layers=-1, -4 means that will be concatenated two layers, with relative indexies -1 and -4
[reorg] layer - just reshapes feature map - decreases size and increases number of channels, without changing elements.
stride=2 mean that width and height will be decreased by 2 times, and number of channels will be increased by 2x2 = 4 times, so the total number of element will still the same:
width_old*height_old*channels_old = width_new*height_new*channels_new

For example:

If we use [route] layers=-1, we simply takes as input the result of the preceding layer (current_layer_number-1), without any processing.
If we use [route] layers=-2, we takes as input the result of the layer with index = (current_layer_number-2), without any processing.
If we use [route] layers= -1, -3, we takes as input the result of the layers with indexes = (current_layer_number-1) and (current_layer_number-3), and merge them into one layer
If at layer-27 we have [route] layers= -1, -3, then it will take two layers 26=(27-1) and 24=(27-3), and merge its in depth: 13x13x1024 + 13x13x2048 = 13x13x3072 - is output of layer-27.

yolo_voc 2 0

AlexeyAB on 14 Jun 2018

👍26 🚀3 ❤2

If I have 7 classes, should I change classes only in yolo-obj.cfg? Are there any other files where i should be changing?

dfsaw on 18 Jun 2018

@dfsaw Change classes= in 3 [yolo]-layers and filters= in 3 [convolutional]-layers.

Read carefully: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Create file obj.data in the directory build\darknet\x64\data\, containing (where classes = number of objects):
classes= 2
train  = data/train.txt
valid  = data/test.txt
names = data/obj.names
backup = backup/

AlexeyAB on 18 Jun 2018

@AlexeyAB I am to predict images using weights created after 2nd iteration..but after that none of the weights are predicting anything.Can someone please help me out

dfsaw on 18 Jun 2018

@AlexeyAB In yolo_layer.c there is
if (best_iou > l.truth_thresh)
but in cfg the yolo layer has

ignore_thresh = .7
truth_thresh = 1

so the if sentence will never reach?
and could i say "if the best iou of one object > ignore_thresh, then yolo take it as detected and its loss will be ingored"?

hemp110 on 21 Jun 2018

@hemp110 Currently it will never reach. It is just for experiments.

and could i say "if the best iou of one object > ignore_thresh, then yolo take it as detected and its loss will be ingored"?

Yes, then objectness will not be decreased.

AlexeyAB on 21 Jun 2018

@AlexeyAB Just for confirmation In yolov3.cfg the width and height is image width and height or bounding boxes ? According to my knowledge it must be image width and height since bounding box dimensions changes in every image we use for training,

also here https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects you mentioned batches=64 but shouldn't it be dependent on number of training images be are using ? I read it somewhere that batches are actually number of training images so just confirming

Eyshika on 19 Jul 2018

@Eyshika

@AlexeyAB Just for confirmation In yolov3.cfg the width and height is image width and height or bounding boxes ? According to my knowledge it must be image width and height since bounding box dimensions changes in every image we use for training,

This is neither image width nor bounding boxes width.

width= height= in the yolov3.cfg is the size of neural network. Any image will be automatically resized to this size (width height) during training or detection. Only after that the resized image will be passed to the neural network.

AlexeyAB on 20 Jul 2018

@AlexeyAB so during testing will it result with original size with bounding boxes at correct position ?

Eyshika on 20 Jul 2018

@Eyshika Yes. All these things are automatic and always correct.

AlexeyAB on 20 Jul 2018

Hey @AlexeyAB does route layer...copy output of some prior layer or does it simply reference output weights? If so...does the momentum based gradient optimization update a copy of the weights or the original?

MarquiseRosier on 2 Aug 2018

@MarquiseRosier

Original weights will be updated. Delta will summed of route delta + current layer delta
Route layer updates original weights: https://github.com/AlexeyAB/darknet/blob/6682f0b98984e2b92049e985b21ed81b76666565/src/route_layer.c#L123-L131

AlexeyAB on 2 Aug 2018

❤1

@AlexeyAB You are amazing! Thank you :)

MarquiseRosier on 2 Aug 2018

Hey @AlexeyAB I have some yolo problems to ask you. I use yolov3-voc to train the car plate images. And the training images size are 4192*3264, and training cfg height and width I set 416 416. After training, I take the training images for testing, and it can detect the label I have trained. However, when I take the Panorama images and its sizes are 8192*4096, and I found that it cant detect any car plate labels in the images. I want to ask you what the problems are happening. Sorry for bother you to help me solve the problems. Thank you!

weiting0032 on 18 Oct 2018

@WEITINGLIN32

It seems this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

In your case you should change network resolution fater training.

What is the average size of objects

in Training dataset?
in Detection dataset?

Then calculate new width= in cfg-file:
detection_network_width =

train_network_width * train_obj_width / train_image_width / (detection_obj_width / detection_image_width) =

416 * average_train_obj_width / 4192 / (average_detection_obj_width / 8192) = ???

AlexeyAB on 18 Oct 2018

@WEITINGLIN32

It seems this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

In your case you should change network resolution fater training.

What is the average size of objects

in Training dataset?

in Detection dataset?

Then calculate new width= in cfg-file:
detection_network_width =

train_network_width * train_obj_width / train_image_width / (detection_obj_width / detection_image_width) =

416 * average_train_obj_width / 4192 / (average_detection_obj_width / 8192) = ???

Hello, @AlexeyAB, Now I have to edit is cfg-file width and height? And I want to ask how can I calculate the average_train_obj_width and average_detection_obj_width? Meanwhile, whether I have to retrain my model or not. If not, how should I do to detect panorama images. Thanks a lot!

weiting0032 on 19 Oct 2018

@WEITINGLIN32

Now I have to edit is cfg-file width and height?

Yes.

And I want to ask how can I calculate the average_train_obj_width and average_detection_obj_width?

The simplest way to get average width of object is to calculate 1 anchor that will not be used in cfg-file:
./darknet detector calc_anchors data/obj.data -num_of_clusters 1 -width 416 -height 416

I.e. calculate 1 anchor

for Training dataset (it is average_train_obj_width and average_train_obj_height)
and for Test datasets (it is average_detection_obj_width and average_detection_obj_height)

AlexeyAB on 29 Oct 2018

Please explain the parameters in classifier cfg file:
@anandkoirala @AlexeyAB
[softmax]
groups=1 ---- ?
temperature=3 ------?

Sudhakar17 on 26 Feb 2019

Hi @AlexeyAB
Could you please explain how cnn detects the bounding box coordinates, objectness score and probability of the object in yolo.

ashnaeldho on 11 Mar 2019

Hi @AlexeyAB,

I know you already explained how some of the YOLO layer parameters work, but there are some of them that I'm missing.
Can you please explain the following?
*mask
*anchors
*num
*ignore_thresh
*truth_thresh
*random

I'll really appreciate the help! Cheers.

oscarzasa on 9 Oct 2019

@oscarzasa Read wiki:

https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-%5Bnet%5D-section

https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers

AlexeyAB on 9 Oct 2019

❤1

@AlexeyAB thanks a lot! Cheers!

oscarzasa on 9 Oct 2019

Hey @AlexeyAB does route layer work for concating three layers, something like this:
[route]
layers=-1,-4,-3. Can this be performed?? If not how to concate three different layers??

Thanks,
Madan

Madankumar90 on 9 Dec 2019

@Madankumar90 [route] - concatenation layer, Concat for several input-layers, or Identity for one input-layer

More: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers

AlexeyAB on 9 Dec 2019

yeah i saw this late. Anyways, Thank you for the quick reply.

Madankumar90 on 9 Dec 2019

I understand the range for hue is 0.0 to 1.0. But what about saturation and exposure? I assume that setting both of these to 0.0 turns it off, but what is the upper range?

stephanecharette on 10 Jan 2020

@stephanecharette
default values

hue=0
exposure=1
saturation=1

hue=0.3 - means hue from -0.3 to +0.3
exposure=1.5 - means exposure from 1/1.5 to 1*1.15
saturation=1.5 - means saturation from 1/1.5 to 1*1.15

how it will be calculated: https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/data.c#L1017-L1019

how it will be applied: https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/image_opencv.cpp#L1183-L1198

AlexeyAB on 10 Jan 2020

Thanks, @AlexeyAB. I don't think I explained my question very well. I'm trying to figure out what is the maximum range someone can set for these three values in the .cfg file.

Hue: valid range is 0.0 to 1.0 (I think, but I'm not 100% certain)
Saturation: valid range is 0.0 to ...?
Exposure: valid range is 0.0 to ...?

stephanecharette on 10 Jan 2020

Hue from 0 to 1.0
Saturation from 0.003 to 256
Exposure from 0.003 to 256

AlexeyAB on 10 Jan 2020

👍2

Hue from 0 to 1.0
Saturation from 0.003 to 256
Exposure from 0.003 to 256

I found 256 is extreme and probably unusable for most images. If anyone else in the future is reading this thread, you can see the results of modifying hue, saturation, and exposure here, with examples of what images look like when applying different values for hue, saturation, and exposure: https://www.ccoderun.ca/DarkMark/DataAugmentationColour.html

stephanecharette on 11 Jan 2020

👍1

Hey @AlexeyAB I'm sorry but can you explain the output parameters? Namely iou_norm, class_norm, cls_norm, scale_x_y from, iou_loss , mse?

[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00

And BF from
96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF