Darknet: explanation on cfg file parameters

Created on 26 Nov 2017  路  37Comments  路  Source: AlexeyAB/darknet

Hi @AlexeyAB
could you please kindly document or explain parameters of the .cfg file

  1. saturation, exposure and hue values
  2. steps and scales values
  3. anchors, bias_match
  4. jitter, rescore, thresh
  5. object_scale, noobject_scale, class_scale, coord_scale values
  6. absolute
Explanations

Most helpful comment

Hi,

  1. saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV
    The larger the value, the more invariance would neural network to change of lighting and color of the objects.

  2. steps and scales values - steps is a checkpoints (number of itarations) at which scales will be applied, scales is a coefficients at which learning_rate will be multipled at this checkpoints.
    Determines how the learning_rate will be changed during increasing number of iterations during training.

  3. anchors, bias_match
    anchors are frequent initial of objects in terms of output network resolution.
    bias_match used only for training, if bias_match=1 then detected object will have the same as in one of anchor, else if bias_match=0 then of anchor will be refined by a neural network: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L275-L283
    If you train with height=416,width=416,random=0, then max values of anchors will be 13,13.
    But if you train with random=1, then max input resolution can be 608x608, and max values of anchors can be 19,19.

  4. jitter, rescore, thresh
    jitter can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/data.c#L513-L528

rescore determines what the loss (delta, cost, ...) function will be used - more about this: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558
https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L302-L305

thresh is a minimum IoU when should be used delta_region_class() during training: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L235


  1. object_scale, noobject_scale, class_scale, coord_scale values - all used for training
  2. object_scale used for loss (delta, cost, ...) function for objects: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558
  3. noobject_scale - used for loss (delta, cost, ...) function for objects and backgrounds: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L232-L233
  4. class_scale - used as scale in the delta_region_class(): https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L108
  5. coord_scale - used as scale in the delta_region_box(): https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L87

  6. absolute - isn't used

All 37 comments

Hi,

  1. saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV
    The larger the value, the more invariance would neural network to change of lighting and color of the objects.

  2. steps and scales values - steps is a checkpoints (number of itarations) at which scales will be applied, scales is a coefficients at which learning_rate will be multipled at this checkpoints.
    Determines how the learning_rate will be changed during increasing number of iterations during training.

  3. anchors, bias_match
    anchors are frequent initial of objects in terms of output network resolution.
    bias_match used only for training, if bias_match=1 then detected object will have the same as in one of anchor, else if bias_match=0 then of anchor will be refined by a neural network: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L275-L283
    If you train with height=416,width=416,random=0, then max values of anchors will be 13,13.
    But if you train with random=1, then max input resolution can be 608x608, and max values of anchors can be 19,19.

  4. jitter, rescore, thresh
    jitter can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/data.c#L513-L528

rescore determines what the loss (delta, cost, ...) function will be used - more about this: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558
https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L302-L305

thresh is a minimum IoU when should be used delta_region_class() during training: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L235


  1. object_scale, noobject_scale, class_scale, coord_scale values - all used for training
  2. object_scale used for loss (delta, cost, ...) function for objects: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558
  3. noobject_scale - used for loss (delta, cost, ...) function for objects and backgrounds: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L232-L233
  4. class_scale - used as scale in the delta_region_class(): https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L108
  5. coord_scale - used as scale in the delta_region_box(): https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L87

  6. absolute - isn't used

Hi @AlexeyAB, I didn't get what object_scale does in the link you mentioned (#185) and how to set it. Well, to be honest, I don't have much of a clue about the other ones either (noobject_scale, class_scale, coord_scale) but I have a feeling that this odject_scale parameter is more important!!! Is it in any way related to other parameters such as the number of classes, etc?

How to change the no of iterations after which weights are created

If you mean period on 100 iterations to write snapshot, it is hardcoded here: https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L202
If you mean ultimate no of iterations (when training is stopped and blabla_final.weights is created - it is "max_batches" parameter.

@IlyaOvodov In the link mentioned above which line should i change.. Suppose i want to look at weights after 10 iterations.

What does
[route]
layers=-9

[reorg]
stride=2 and

[route]
layers=-1,-4 means?

Can anyone please help me out?

@dfsaw

  1. [route] layer - is the same as Concat-layer in the Caffe
    layers=-1, -4 means that will be concatenated two layers, with relative indexies -1 and -4

  2. [reorg] layer - just reshapes feature map - decreases size and increases number of channels, without changing elements.
    stride=2 mean that width and height will be decreased by 2 times, and number of channels will be increased by 2x2 = 4 times, so the total number of element will still the same:
    width_old*height_old*channels_old = width_new*height_new*channels_new


For example:

  • If we use [route] layers=-1, we simply takes as input the result of the preceding layer (current_layer_number-1), without any processing.
  • If we use [route] layers=-2, we takes as input the result of the layer with index = (current_layer_number-2), without any processing.
  • If we use [route] layers= -1, -3, we takes as input the result of the layers with indexes = (current_layer_number-1) and (current_layer_number-3), and merge them into one layer

  • If at layer-27 we have [route] layers= -1, -3, then it will take two layers 26=(27-1) and 24=(27-3), and merge its in depth: 13x13x1024 + 13x13x2048 = 13x13x3072 - is output of layer-27.

yolo_voc 2 0

If I have 7 classes, should I change classes only in yolo-obj.cfg? Are there any other files where i should be changing?

@dfsaw Change classes= in 3 [yolo]-layers and filters= in 3 [convolutional]-layers.

Read carefully: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

  1. Create file obj.data in the directory build\darknet\x64\data\, containing (where classes = number of objects):
classes= 2
train  = data/train.txt
valid  = data/test.txt
names = data/obj.names
backup = backup/

@AlexeyAB I am to predict images using weights created after 2nd iteration..but after that none of the weights are predicting anything.Can someone please help me out

@AlexeyAB In yolo_layer.c there is
if (best_iou > l.truth_thresh)
but in cfg the yolo layer has

ignore_thresh = .7
truth_thresh = 1
  1. so the if sentence will never reach?
  2. and could i say "if the best iou of one object > ignore_thresh, then yolo take it as detected and its loss will be ingored"?

@hemp110 Currently it will never reach. It is just for experiments.

and could i say "if the best iou of one object > ignore_thresh, then yolo take it as detected and its loss will be ingored"?

Yes, then objectness will not be decreased.

@AlexeyAB Just for confirmation In yolov3.cfg the width and height is image width and height or bounding boxes ? According to my knowledge it must be image width and height since bounding box dimensions changes in every image we use for training,

also here https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects you mentioned batches=64 but shouldn't it be dependent on number of training images be are using ? I read it somewhere that batches are actually number of training images so just confirming

@Eyshika

@AlexeyAB Just for confirmation In yolov3.cfg the width and height is image width and height or bounding boxes ? According to my knowledge it must be image width and height since bounding box dimensions changes in every image we use for training,

This is neither image width nor bounding boxes width.

width= height= in the yolov3.cfg is the size of neural network. Any image will be automatically resized to this size (width height) during training or detection. Only after that the resized image will be passed to the neural network.

@AlexeyAB so during testing will it result with original size with bounding boxes at correct position ?

@Eyshika Yes. All these things are automatic and always correct.

Hey @AlexeyAB does route layer...copy output of some prior layer or does it simply reference output weights? If so...does the momentum based gradient optimization update a copy of the weights or the original?

@MarquiseRosier

Original weights will be updated. Delta will summed of route delta + current layer delta
Route layer updates original weights: https://github.com/AlexeyAB/darknet/blob/6682f0b98984e2b92049e985b21ed81b76666565/src/route_layer.c#L123-L131

@AlexeyAB You are amazing! Thank you :)

Hey @AlexeyAB I have some yolo problems to ask you. I use yolov3-voc to train the car plate images. And the training images size are 4192*3264, and training cfg height and width I set 416 416. After training, I take the training images for testing, and it can detect the label I have trained. However, when I take the Panorama images and its sizes are 8192*4096, and I found that it cant detect any car plate labels in the images. I want to ask you what the problems are happening. Sorry for bother you to help me solve the problems. Thank you!

@WEITINGLIN32

It seems this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

In your case you should change network resolution fater training.

What is the average size of objects

  • in Training dataset?
  • in Detection dataset?

Then calculate new width= in cfg-file:
detection_network_width =

train_network_width * train_obj_width / train_image_width / (detection_obj_width / detection_image_width) =

416 * average_train_obj_width / 4192 / (average_detection_obj_width / 8192) = ???

#

@WEITINGLIN32

It seems this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

In your case you should change network resolution fater training.

What is the average size of objects

  • in Training dataset?
  • in Detection dataset?

Then calculate new width= in cfg-file:
detection_network_width =

train_network_width * train_obj_width / train_image_width / (detection_obj_width / detection_image_width) =

416 * average_train_obj_width / 4192 / (average_detection_obj_width / 8192) = ???

Hello, @AlexeyAB, Now I have to edit is cfg-file width and height? And I want to ask how can I calculate the average_train_obj_width and average_detection_obj_width? Meanwhile, whether I have to retrain my model or not. If not, how should I do to detect panorama images. Thanks a lot!

@WEITINGLIN32

Now I have to edit is cfg-file width and height?

Yes.

And I want to ask how can I calculate the average_train_obj_width and average_detection_obj_width?

The simplest way to get average width of object is to calculate 1 anchor that will not be used in cfg-file:
./darknet detector calc_anchors data/obj.data -num_of_clusters 1 -width 416 -height 416

I.e. calculate 1 anchor

  • for Training dataset (it is average_train_obj_width and average_train_obj_height)
  • and for Test datasets (it is average_detection_obj_width and average_detection_obj_height)

Please explain the parameters in classifier cfg file:
@anandkoirala @AlexeyAB
[softmax]
groups=1 ---- ?
temperature=3 ------?

Hi @AlexeyAB
Could you please explain how cnn detects the bounding box coordinates, objectness score and probability of the object in yolo.

Hi @AlexeyAB,

I know you already explained how some of the YOLO layer parameters work, but there are some of them that I'm missing.
Can you please explain the following?
*mask
*anchors
*num
*ignore_thresh
*truth_thresh
*random

I'll really appreciate the help! Cheers.

@AlexeyAB thanks a lot! Cheers!

Hey @AlexeyAB does route layer work for concating three layers, something like this:
[route]
layers=-1,-4,-3. Can this be performed?? If not how to concate three different layers??

Thanks,
Madan

@Madankumar90 [route] - concatenation layer, Concat for several input-layers, or Identity for one input-layer

More: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers

yeah i saw this late. Anyways, Thank you for the quick reply.

I understand the range for hue is 0.0 to 1.0. But what about saturation and exposure? I assume that setting both of these to 0.0 turns it off, but what is the upper range?

@stephanecharette
default values

hue=0
exposure=1
saturation=1

hue=0.3 - means hue from -0.3 to +0.3
exposure=1.5 - means exposure from 1/1.5 to 1*1.15
saturation=1.5 - means saturation from 1/1.5 to 1*1.15

how it will be calculated: https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/data.c#L1017-L1019

how it will be applied: https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/image_opencv.cpp#L1183-L1198

Thanks, @AlexeyAB. I don't think I explained my question very well. I'm trying to figure out what is the maximum range someone can set for these three values in the .cfg file.

  • Hue: valid range is 0.0 to 1.0 (I think, but I'm not 100% certain)
  • Saturation: valid range is 0.0 to ...?
  • Exposure: valid range is 0.0 to ...?

Hue from 0 to 1.0
Saturation from 0.003 to 256
Exposure from 0.003 to 256

Hue from 0 to 1.0
Saturation from 0.003 to 256
Exposure from 0.003 to 256

I found 256 is extreme and probably unusable for most images. If anyone else in the future is reading this thread, you can see the results of modifying hue, saturation, and exposure here, with examples of what images look like when applying different values for hue, saturation, and exposure: https://www.ccoderun.ca/DarkMark/DataAugmentationColour.html

Hey @AlexeyAB I'm sorry but can you explain the output parameters? Namely iou_norm, class_norm, cls_norm, scale_x_y from, iou_loss , mse?

[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00

And BF from
96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zihaozhang9 picture zihaozhang9  路  3Comments

siddharth2395 picture siddharth2395  路  3Comments

HilmiK picture HilmiK  路  3Comments

louisondumont picture louisondumont  路  3Comments

Jacky3213 picture Jacky3213  路  3Comments