Hi
I'm training a tiny yolov2 to detect 4 classes.
Without setting random=1, I'm getting an IOU of around 57% and a mAP of around 58% on my validation set. The region average IOU of the subdivisions is also around 58% (so I assume no overfitting, since validation accuracy increases with training accuracy: https://bit.ly/2HLEWff - so I'm getting the red and green lines). My input resolution is 960x256.
Now I wanted to improve my accuracy further, so I checked https://bit.ly/2FHZxez - and set random=1. However, after 13000 iterations my validation accuracy (both IOU and mAP) is still 0. But I'm not getting NAN during training - training seems to be going well. What could cause this? Should I maybe not start training from the beginning but rather take checkpoint weights from my past training where random was set to 0?
I also read about the parameters small_object=1 and max=200 - in some images I do have quite small objects, so maybe setting small_object=1 would help? However, I'm confused where I should set it: in [net] at the beginning or [region] at the end? Also, what does max=200 exactly do? Unfortunately, it's not explained in detail. In some images I can have up to around 30 objects.
Also regarding mAP: how can I calculate mAP on the training set? During training there is no output for mAP, for example:
Region Avg IOU: 0.660574, Class: 0.980628, Obj: 0.606347, No Obj: 0.007689, Avg Recall: 0.808511, count: 94
But can I calculate it from the values given?
Thanks in advance!
@getaleks Hi,
how can I calculate mAP on the training set?
set in you obj.data param valid=train.txt and do darknet detector map...
small_object=1 this is just experemental param. Don't use it.
max=200 - if on the training images there are more than 30 object, then you can set max=100 or 200 in the [region] layer (or in the all of [yolo] layers)
Should I maybe not start training from the beginning but rather take checkpoint weights from my past training where random was set to 0?
You can use both ways.
Now I wanted to improve my accuracy further, so I checked https://bit.ly/2FHZxez - and set random=1. However, after 13000 iterations my validation accuracy (both IOU and mAP) is still 0. But I'm not getting NAN during training - training seems to be going well. What could cause this? Should I maybe not start training from the beginning but rather take checkpoint weights from my past training where random was set to 0?
Can you show output, what mAP and other indicators can you get for Validation dataset and for Training dataset for weight that is trained with random=1?
What params in the Makefile do you use?
Do you use 1 GPU or many GPUs for training?
Hi @AlexeyAB
thanks for the quick response!
max=200 - if on the training images there are more than 30 object, then you can set max=100 or 200 in the [region] layer (or in the all of [yolo] layers)
What exactly does the max variable do? How can it be beneficial?
Can you show output, what mAP and other indicators can you get for Validation dataset and for Training dataset for weight that is trained with random=1?
Validation Set Output:
detections_count = 307, unique_truth_count = 5053
class_id = 0, name = class1, ap = 0.00 %
class_id = 1, name = class2, ap = 0.00 %
class_id = 2, name = class3, ap = 0.00 %
class_id = 3, name = class4, ap = 0.00 %
for thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan
for thresh = 0.25, TP = 0, FP = 0, FN = 5053, average IoU = 0.00 %
mean average precision (mAP) = 0.000000, or 0.00 %
Total Detection Time: 7.000000 Seconds
Training Set output:
detections_count = 1513, unique_truth_count = 20682
class_id = 0, name = class1, ap = 0.00 %
class_id = 1, name = class2, ap = 0.00 %
class_id = 2, name = class3, ap = 0.00 %
class_id = 3, name = class4, ap = 0.00 %
for thresh = 0.25, precision = 0.00, recall = 0.00, F1-score = -nan
for thresh = 0.25, TP = 0, FP = 5, FN = 20682, average IoU = 0.00 %
mean average precision (mAP) = 0.000000, or 0.00 %
Total Detection Time: 24.000000 Seconds
In the terminal output I'm not getting any NANs, training seems to be going well, Loss function average is <0.5. Region Avg IOUs are between 0.5-0.7 for the subdivisions.
I'm training with Batch=64 and subdivisions=16 when I set random=1. When random=0 then Batch=64, subdivisions=8.
What params in the Makefile do you use?
GPU=1
CUDNN=1
CUDNN_HALF=0
OPENCV=0
AVX=0
OPENMP=0
LIBSO=0
DEBUG=0
ARCH= -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=[sm_50,compute_50] \
-gencode arch=compute_52,code=[sm_52,compute_52] \
-gencode arch=compute_61,code=[sm_61,compute_61]
Do you use 1 GPU or many GPUs for training?
I used 1 GPU, however I have the option to use up to 4 if necessary. GPUs are all Titan X
What exactly does the max variable do? How can it be beneficial?
By default max = 30, so if your training image has more than 30 labes, then will be used only first 30 labels and the rest will be ignored - those, the network will learn not to detect these objects - it is extremely bad for learning: https://github.com/AlexeyAB/darknet/blob/89354d0a0ce6fbb22ff262658045cdb8796ff6fd/src/region_layer.c#L225
random=1 and CUDNN=0By default max = 30, so if your training image has more than 30 labes, then will be used only first 30
labels and the rest will be ignored - those, the network will learn not to detect these objects - it is
extremely bad for learning
Ok, I will definitely try setting max=100 then. Thank you for the explanation.
Show your training output for random=1
Here is the output for one iteration:
10944: 0.698326, 0.606800 avg, 0.000100 rate, 4.247402 seconds, 700416 images
Loaded: 0.000041 seconds
Region Avg IOU: 0.538765, Class: 0.949305, Obj: 0.381784, No Obj: 0.002311, Avg Recall: 0.622642, c$
Region Avg IOU: 0.608241, Class: 0.997215, Obj: 0.517079, No Obj: 0.002347, Avg Recall: 0.727273, c$
Region Avg IOU: 0.542258, Class: 0.925931, Obj: 0.412059, No Obj: 0.002720, Avg Recall: 0.627119, c$
Region Avg IOU: 0.657510, Class: 0.994819, Obj: 0.516037, No Obj: 0.002527, Avg Recall: 0.823529, c$
Region Avg IOU: 0.805646, Class: 0.999609, Obj: 0.753932, No Obj: 0.001953, Avg Recall: 1.000000, c$
Region Avg IOU: 0.687529, Class: 0.978422, Obj: 0.585691, No Obj: 0.002423, Avg Recall: 0.852941, c$
Region Avg IOU: 0.762361, Class: 0.999753, Obj: 0.532092, No Obj: 0.001657, Avg Recall: 0.894737, c$
Region Avg IOU: 0.720794, Class: 0.957670, Obj: 0.709484, No Obj: 0.001788, Avg Recall: 0.950000, c$
Region Avg IOU: 0.573824, Class: 0.919283, Obj: 0.439784, No Obj: 0.002074, Avg Recall: 0.687500, c$
Region Avg IOU: 0.524779, Class: 0.936893, Obj: 0.372699, No Obj: 0.002111, Avg Recall: 0.658537, c$
Region Avg IOU: 0.671374, Class: 0.945988, Obj: 0.511913, No Obj: 0.002412, Avg Recall: 0.783784, c$
Region Avg IOU: 0.721791, Class: 0.964815, Obj: 0.608239, No Obj: 0.001687, Avg Recall: 0.866667, c$
Region Avg IOU: 0.692922, Class: 0.970347, Obj: 0.471404, No Obj: 0.002033, Avg Recall: 0.863636, c$
Region Avg IOU: 0.548665, Class: 0.979806, Obj: 0.343780, No Obj: 0.002244, Avg Recall: 0.651163, c$
Region Avg IOU: 0.714786, Class: 0.994679, Obj: 0.518827, No Obj: 0.002070, Avg Recall: 0.812500, c$
Region Avg IOU: 0.770224, Class: 0.999746, Obj: 0.680687, No Obj: 0.002476, Avg Recall: 0.941176, c$
What software did you use for labeling? Check your dataset using Yolo_mark.
I used LabelMe and my own script to convert to Yolo format. I built in a feature to visualize the annotations that will be written for each image. Also, training goes well with random=0 - so I think we can assume the data is ok. But I will try it with Yolo_mark.
@getaleks
Also what width=, height= and jitter= in your cfg-file do you use?
@AlexeyAB
Also what width=, height= and jitter= in your cfg-file do you use?
Here is my whole cfg-file:
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=960
height=256
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
max_batches = 300000
policy=steps
steps=100,2000,5000
scales=.1,10,.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
###########
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=45
activation=linear
[region]
anchors = 0.26,0.35, 0.42,0.55, 0.66,0.84, 1.08,1.34, 1.83,2.23
bias_match=1
classes=4
coords=4
num=5
softmax=1
jitter=.2
rescore=1
max=100
object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1
absolute=1
thresh = .6
random=1
@getaleks
If random=1 and width=960 height=256
then for each 10 iterations the network resolution rendomly will be changed to [800x800] - [1152x1152], i.e. if random=1 then it will be trained using square network as in the original repository.
So if the network is trained on Square network (random=1), but you try to detect using Non-Square network (960x256), it can work badly.
Try to change this code: https://github.com/AlexeyAB/darknet/blob/85388d67082d85999d86c9a66cbe0b56eaa0aeb6/src/detector.c#L123-L139
to this, recompile and train again:
printf("Resizing\n");
int dim_w = (rand() % 12 + (init_w / 32 - 5)) * 32; // +-160
int dim_h = (rand() % 12 + (init_h / 32 - 5)) * 32; // +-160
//if (get_current_batch(net)+100 > net.max_batches) dim = 544;
//int dim = (rand() % 4 + 16) * 32;
printf("%d x %d \n", dim_w, dim_h);
args.w = dim_w;
args.h = dim_h;
pthread_join(load_thread, 0);
train = buffer;
free_data(train);
load_thread = load_data(args);
for(i = 0; i < ngpus; ++i){
resize_network(nets + i, dim_w, dim_h);
}
net = nets[0];
If it will help you - I will add this fix to the repository.
@AlexeyAB
This is probably the issue. I didn't know it resizes to a square - the input image is probably extremely distorted. Especially because inference is done on the original 960x256 resolution - the network doesn't see the objects as it did during training.
I will make the changes, retrain and get back to you on how it went.
Hi @AlexeyAB
I implemented the changes and retrained for ~41000 iterations.
These are the results:
mAP (on validation set): 68.16%
IOU (on validation set): 66.69%
mAP (on training set): 71.14%
IOU (on training set): 68.84%
So thats an mAP and IOU boost of about 10%! This is quite impressive. Is there an explanation why resizing the images boosts accuracy?
I will keep training until the accuracy values on the validation and training set start to significally drift apart.
In your experience, how big of a gap between training and validation mAP is an optimal stopping point? For example, at the moment the gap is 2.98% - how big should I let it get? Is there a rule of thumb?
So I think you can implement this fix to darknet, it really helped:
printf("Resizing\n");
int dim_w = (rand() % 12 + (init_w / 32 - 5)) * 32; // +-160
int dim_h = (rand() % 12 + (init_h / 32 - 5)) * 32; // +-160
//if (get_current_batch(net)+100 > net.max_batches) dim = 544;
//int dim = (rand() % 4 + 16) * 32;
printf("%d x %d \n", dim_w, dim_h);
args.w = dim_w;
args.h = dim_h;
pthread_join(load_thread, 0);
train = buffer;
free_data(train);
load_thread = load_data(args);
for(i = 0; i < ngpus; ++i){
resize_network(nets + i, dim_w, dim_h);
}
net = nets[0];
@getaleks
So thats an mAP and IOU boost of about 10%! This is quite impressive. Is there an explanation why resizing the images boosts accuracy?
Any modern convolutional neural network isn't Scale-invariance, i.e. if neural network is trained only on the objects with size 50x50 pixels, then it can't detect objects which sizes differe more than ~30%.
So we should train the network on the objects with different sizes - we should:
jitter=0.4random=1I will keep training until the accuracy values on the validation and training set start to significally drift apart.
In your experience, how big of a gap between training and validation mAP is an optimal stopping point? For example, at the moment the gap is 2.98% - how big should I let it get? Is there a rule of thumb?
This gap can be very vary, 1 - 10%. There is another rule: while mAP increases on the Validation dataset - you should train.
@getaleks
Also you can try to train with such code - and compare with the previous results.
It will slightly keep aspect ratio of the neural network during resizing:
printf("Resizing\n");
int random_val = rand() % 12;
int dim_w = (random_val + (init_w / 32 - 5)) * 32; // +-160
int dim_h = (random_val + (init_h / 32 - 5)) * 32; // +-160
//if (get_current_batch(net)+100 > net.max_batches) dim = 544;
//int dim = (rand() % 4 + 16) * 32;
printf("%d x %d \n", dim_w, dim_h);
args.w = dim_w;
args.h = dim_h;
pthread_join(load_thread, 0);
train = buffer;
free_data(train);
load_thread = load_data(args);
for(i = 0; i < ngpus; ++i){
resize_network(nets + i, dim_w, dim_h);
}
net = nets[0];
@getaleks Hi, did you try last changes?
Hi @AlexeyAB
Sorry for the late response.
I will implement your last suggestion in the coming days and get back to you.
Hi @AlexeyAB,
I let my network keep training. After several iterations I logged the following output:
Resizing
928 x 224
try to allocate workspace = 16777216 * sizeof(float), CUDA allocate done!
Loaded: 0.085420 seconds
Region Avg IOU: 0.565531, Class: 0.918954, Obj: 0.507883, No Obj: 0.010119, Avg Recall: 0.647059, count: 34
Region Avg IOU: 0.761860, Class: 0.998151, Obj: 0.643455, No Obj: 0.006509, Avg Recall: 0.882353, count: 17
Region Avg IOU: 0.534672, Class: 0.952310, Obj: 0.445837, No Obj: 0.008414, Avg Recall: 0.625000, count: 24
Region Avg IOU: 0.670565, Class: 0.998984, Obj: 0.577750, No Obj: 0.007548, Avg Recall: 0.774194, count: 31
Region Avg IOU: 0.647614, Class: 0.999028, Obj: 0.536284, No Obj: 0.010018, Avg Recall: 0.785714, count: 42
Region Avg IOU: 0.637582, Class: 0.999055, Obj: 0.621059, No Obj: 0.011510, Avg Recall: 0.793103, count: 29
Region Avg IOU: 0.764560, Class: 0.999518, Obj: 0.607268, No Obj: 0.006886, Avg Recall: 0.823529, count: 17
Region Avg IOU: 0.764177, Class: 0.999543, Obj: 0.623120, No Obj: 0.006899, Avg Recall: 0.875000, count: 16
Region Avg IOU: 0.568964, Class: 0.956178, Obj: 0.532808, No Obj: 0.008105, Avg Recall: 0.636364, count: 33
Region Avg IOU: 0.709501, Class: 0.999994, Obj: 0.653718, No Obj: 0.006137, Avg Recall: 0.857143, count: 14
Region Avg IOU: 0.918820, Class: 0.999981, Obj: 0.867690, No Obj: 0.003889, Avg Recall: 1.000000, count: 2
Region Avg IOU: 0.469352, Class: 0.970771, Obj: 0.477122, No Obj: 0.009719, Avg Recall: 0.526316, count: 38
Region Avg IOU: 0.603007, Class: 0.955491, Obj: 0.583474, No Obj: 0.009656, Avg Recall: 0.711111, count: 45
Region Avg IOU: 0.785822, Class: 0.999451, Obj: 0.764054, No Obj: 0.009295, Avg Recall: 0.960000, count: 25
Region Avg IOU: 0.756087, Class: 0.999966, Obj: 0.628357, No Obj: 0.006094, Avg Recall: 0.869565, count: 23
Region Avg IOU: 0.758601, Class: 0.999325, Obj: 0.686974, No Obj: 0.013243, Avg Recall: 0.918919, count: 37
303661: 0.424482, 0.431736 avg, 0.000100 rate, 0.690727 seconds, 19434304 images
Loaded: 0.000044 seconds
Region Avg IOU: 0.700450, Class: 0.999518, Obj: 0.670815, No Obj: 0.009634, Avg Recall: 0.846154, count: 26
Region Avg IOU: 0.642220, Class: 0.937568, Obj: 0.565064, No Obj: 0.008583, Avg Recall: 0.757576, count: 33
Region Avg IOU: 0.554659, Class: 0.999987, Obj: 0.575553, No Obj: 0.006808, Avg Recall: 0.647059, count: 17
Region Avg IOU: 0.717102, Class: 0.989687, Obj: 0.638753, No Obj: 0.008326, Avg Recall: 0.869565, count: 23
Region Avg IOU: 0.557611, Class: 0.839688, Obj: 0.551909, No Obj: 0.006702, Avg Recall: 0.600000, count: 25
Region Avg IOU: 0.584550, Class: 0.962364, Obj: 0.576944, No Obj: 0.009351, Avg Recall: 0.593750, count: 32
Region Avg IOU: 0.733921, Class: 0.998907, Obj: 0.611047, No Obj: 0.009998, Avg Recall: 0.866667, count: 30
Region Avg IOU: 0.738171, Class: 0.987796, Obj: 0.654399, No Obj: 0.009131, Avg Recall: 0.866667, count: 15
Region Avg IOU: 0.797160, Class: 0.999939, Obj: 0.764540, No Obj: 0.009239, Avg Recall: 1.000000, count: 15
Region Avg IOU: 0.771978, Class: 0.999933, Obj: 0.758914, No Obj: 0.004670, Avg Recall: 0.923077, count: 13
Region Avg IOU: 0.665634, Class: 0.999910, Obj: 0.562245, No Obj: 0.007331, Avg Recall: 0.842105, count: 19
Region Avg IOU: 0.721120, Class: 0.999046, Obj: 0.680478, No Obj: 0.010761, Avg Recall: 0.857143, count: 35
Region Avg IOU: 0.681026, Class: 0.998170, Obj: 0.647905, No Obj: 0.011373, Avg Recall: 0.851064, count: 47
Region Avg IOU: 0.754325, Class: 0.999072, Obj: 0.652044, No Obj: 0.006992, Avg Recall: 0.900000, count: 20
Region Avg IOU: 0.687942, Class: 0.977743, Obj: 0.627960, No Obj: 0.009840, Avg Recall: 0.806452, count: 31
Region Avg IOU: 0.752960, Class: 0.999963, Obj: 0.722535, No Obj: 0.006150, Avg Recall: 1.000000, count: 11
303662: 0.425448, 0.431107 avg, 0.000100 rate, 0.712752 seconds, 19434368 images
Loaded: 0.000044 seconds
Region Avg IOU: 0.573326, Class: 0.904857, Obj: 0.543694, No Obj: 0.008172, Avg Recall: 0.677419, count: 31
Region Avg IOU: 0.627690, Class: 0.907638, Obj: 0.599120, No Obj: 0.009689, Avg Recall: 0.735294, count: 34
Region Avg IOU: 0.756265, Class: 0.999733, Obj: 0.616566, No Obj: 0.007993, Avg Recall: 0.933333, count: 15
Region Avg IOU: 0.531520, Class: 0.984669, Obj: 0.474825, No Obj: 0.006454, Avg Recall: 0.586207, count: 29
Region Avg IOU: 0.695591, Class: 0.999876, Obj: 0.660493, No Obj: 0.009583, Avg Recall: 0.837838, count: 37
Region Avg IOU: 0.753212, Class: 0.999941, Obj: 0.742148, No Obj: 0.008253, Avg Recall: 0.882353, count: 17
Region Avg IOU: 0.720549, Class: 0.938521, Obj: 0.709180, No Obj: 0.008353, Avg Recall: 0.850000, count: 20
Region Avg IOU: 0.522821, Class: 0.923765, Obj: 0.406803, No Obj: 0.005230, Avg Recall: ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@$
Region Avg IOU: 0.7
After this no more logging is done, even though training continues and weights are being saved to backup/. All the weights from this point on produce the following output on the validation set:
detections_count = 2625600, unique_truth_count = 5053
class_id = 0, name = class1, 6 ap = 0.03 %
class_id = 1, name = class2, ap = 0.02 %
class_id = 2, name = class3, ap = 0.01 %
class_id = 3, name = class4, ap = 0.00 %
for thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan
for thresh = 0.25, TP = 0, FP = 0, FN = 5053, average IoU = 0.00 %
mean average precision (mAP) = 0.000147, or 0.01 %
Total Detection Time: 18.000000 Seconds
mAP and IOU kept growing incrementally up until this point.
@getaleks Thanks!
I'll watch what could break if I can find it.
What is the maximum mAP can you achive using last approach?
@AlexeyAB Around 69%, close to what I had before. It only ran for about another 3662 iterations before crashing. The last saved weights were at 303600.
Did you start from 300 000 iterations? I see the last iteration is 303 662.
Did this error occur only once, or is it repeated?
I had pretrained weights from a previous training which were trained to 251 100 iterations.
I used your first suggestion and trained to 300 000 iterations -> 10% improvement.
I then tried to continue to 500 000 and it failed.
I retried with your second suggestion and it failed again at 303 662.
This is strange that training can't go more after 300 000 iteration.
Most helpful comment
@getaleks
Any modern convolutional neural network isn't Scale-invariance, i.e. if neural network is trained only on the objects with size 50x50 pixels, then it can't detect objects which sizes differe more than ~30%.
So we should train the network on the objects with different sizes - we should:
jitter=0.4random=1This gap can be very vary, 1 - 10%. There is another rule: while mAP increases on the Validation dataset - you should train.