Darknet: Average loss value not going below 1

Created on 26 May 2018  路  5Comments  路  Source: AlexeyAB/darknet

I'm using a subset of COCO dataset (around 35-40k images) to train 8 classes with Yolov2. I've modified the cfg file to be

learning_rate=0.0001         # I've also tried with 0.001

[region]
classes=8

[convolutional]
filters=65                             # (8+5)*5

But even after 13000 iterations, the average loss value is not going below 1.

Region Avg IOU: 0.567968, Class: 0.892152, Obj: 0.360092, No Obj: 0.008364, Avg Recall: 0.750000,  count: 12

 13370: 0.955227, 1.153029 avg, 0.000100 rate, 3.976288 seconds, 855680 images
Resizing
384 x 384 
 try to allocate workspace = 10616832 * sizeof(float),  CUDA allocate done! 
Loaded: 0.000045 seconds
Region Avg IOU: 0.554906, Class: 0.751182, Obj: 0.234398, No Obj: 0.009464, Avg Recall: 0.656250,  count: 32
Region Avg IOU: 0.618055, Class: 0.982063, Obj: 0.314850, No Obj: 0.009789, Avg Recall: 0.736842,  count: 19
Region Avg IOU: 0.629569, Class: 0.921974, Obj: 0.383278, No Obj: 0.009722, Avg Recall: 0.769231,  count: 13
Region Avg IOU: 0.701359, Class: 0.939804, Obj: 0.623751, No Obj: 0.008059, Avg Recall: 0.900000,  count: 10
Region Avg IOU: 0.565361, Class: 0.939754, Obj: 0.294956, No Obj: 0.011120, Avg Recall: 0.677419,  count: 31
Region Avg IOU: 0.670586, Class: 0.980710, Obj: 0.372620, No Obj: 0.006666, Avg Recall: 0.900000,  count: 10
Region Avg IOU: 0.704796, Class: 0.967884, Obj: 0.418710, No Obj: 0.007288, Avg Recall: 0.857143,  count: 14
Region Avg IOU: 0.546709, Class: 0.989108, Obj: 0.359880, No Obj: 0.009863, Avg Recall: 0.631579,  count: 19

 13371: 0.613410, 1.099067 avg, 0.000100 rate, 4.825989 seconds, 855744 images
Loaded: 0.000050 seconds
Region Avg IOU: 0.408547, Class: 0.902766, Obj: 0.239194, No Obj: 0.008172, Avg Recall: 0.465116,  count: 43
Region Avg IOU: 0.681122, Class: 0.954415, Obj: 0.351681, No Obj: 0.011090, Avg Recall: 0.900000,  count: 30
Region Avg IOU: 0.664868, Class: 0.993532, Obj: 0.382928, No Obj: 0.009634, Avg Recall: 0.842105,  count: 19
Region Avg IOU: 0.482414, Class: 0.941180, Obj: 0.314420, No Obj: 0.007617, Avg Recall: 0.583333,  count: 24
Region Avg IOU: 0.563489, Class: 0.930170, Obj: 0.346102, No Obj: 0.009122, Avg Recall: 0.600000,  count: 20
Region Avg IOU: 0.446973, Class: 0.919570, Obj: 0.192612, No Obj: 0.010219, Avg Recall: 0.533333,  count: 45
Region Avg IOU: 0.693021, Class: 0.951645, Obj: 0.363812, No Obj: 0.006403, Avg Recall: 0.900000,  count: 10
Region Avg IOU: 0.456339, Class: 0.947699, Obj: 0.347462, No Obj: 0.008376, Avg Recall: 0.481481,  count: 27

 13372: 1.040705, 1.093231 avg, 0.000100 rate, 5.179664 seconds, 855808 images
Loaded: 0.000061 seconds

Is there something wrong here or is it the expected behaviour? Should I keep on training in hope that it will go down?

All 5 comments

Your learning rate is too small.
Use this cfg-file without any changes: https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov2.cfg

It should be trained about 500 000 iterations, so you should wait.

@AlexeyAB I should change the classes number and filters numbers accordingly right?
I have 8 classes, so filters = 65

Yes.

@AlexeyAB One more thing, since I've already run around 15,000 iterations with learning rate 0.0001, can I use the weights from the last iteration and train it with the cfg file you linked to above?
Or do I need to run it from the beginning?
Thanks

Better to train from the begining.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jasleen137 picture jasleen137  路  3Comments

kebundsc picture kebundsc  路  3Comments

siddharth2395 picture siddharth2395  路  3Comments

Cipusha picture Cipusha  路  3Comments

zihaozhang9 picture zihaozhang9  路  3Comments