Darknet: Save weights after each 100 iteration

Created on 14 Jan 2019  路  7Comments  路  Source: AlexeyAB/darknet

Hi, AlexeyAB!
Is there a possibility for saving trained weights after each 100 iterations?

Most helpful comment

@dreambit

does this option affect perfomace? i mean, is mAP calcultion cpu/gpu intensive? may it lead to total training time increase?

Yes, it increases training time, about ~25%.
As if you calculate mAP by using ./darknet detector map... for each 4 Epochs, so it uses GPU with batch=1 for mAP calculation. For training it still uses batch= that is specified in cfg.


In the last version of darknet:

if i run -map and -dont_show, will mAP value be printed to stdout?

You will see in the console for each iteration line: Last accuracy [email protected] = 12,25%
This line will appear after the first mAP calculation if you use flag -map

Also for each 100 iterations will be saved chart.png file with avg-Loss & mAP-chart.


Also you can try to train by using this command:
/darknet detector train cfg/coco.data yolov3.cfg darknet53.conv.74 -dont_show -mjpeg_port 8090 -map

So you can connect to the Darknet by using Chrome/Firefox using URL http://server-ip-address:8090 to see the avg-Loss & mAP-chart, if your remote server allows external connections to the port 8090.

But do not leave the Web-Browser tab-window connected for a long time, since it can consume a lot of internet traffic, because this is a image-jpeg-stream.

All 7 comments

+1 for this question, is it possible to write weights file on each 100 iteration?
like it was before:
yolov3-custom_200.weights
yolov3-custom_300.weights
yolov3-custom_400.weights
etc.
thx

@dreambit Hi,

Currently it saves weights

  • for each 1000 iterations it saves weights to separate files yolov3-custom_2000.weights, yolov3-custom_3000.weights, ...

  • for each 100 iterations it overwrites yolov3-custom_last.weights file


+1 for this question, is it possible to write weights file on each 100 iteration?

Yes, change here 1000 to 100: https://github.com/AlexeyAB/darknet/blob/d9e559a245829830dec03c6d3b909857c6d7937f/src/detector.c#L281

@AlexeyAB thanks, just great, I just hope that having more weights file will help to find the best one with highest mAP

@dreambit

Yes, may be.

Also you can try to train the model with flag -map so you will see accuracy mAP during training for each 4 epochs (4 * images_in_train_txt / 64 iterations): https://github.com/AlexeyAB/darknet#when-should-i-stop-training

./darknet detector train data/obj.data yolo-obj.cfg darknet53.conv.74 -map

68747470733a2f2f6873746f2e6f72672f776562742f79642f766c2f61672f7964766c616775746f66327a636e6a6f64737467726f656e3861632e6a706567

@AlexeyAB, thx, does this option affect perfomace? i mean, is mAP calcultion cpu/gpu intensive? may it lead to total training time increase? if i run -map and -dont_show, will mAP value be printed to stdout?

@dreambit

does this option affect perfomace? i mean, is mAP calcultion cpu/gpu intensive? may it lead to total training time increase?

Yes, it increases training time, about ~25%.
As if you calculate mAP by using ./darknet detector map... for each 4 Epochs, so it uses GPU with batch=1 for mAP calculation. For training it still uses batch= that is specified in cfg.


In the last version of darknet:

if i run -map and -dont_show, will mAP value be printed to stdout?

You will see in the console for each iteration line: Last accuracy [email protected] = 12,25%
This line will appear after the first mAP calculation if you use flag -map

Also for each 100 iterations will be saved chart.png file with avg-Loss & mAP-chart.


Also you can try to train by using this command:
/darknet detector train cfg/coco.data yolov3.cfg darknet53.conv.74 -dont_show -mjpeg_port 8090 -map

So you can connect to the Darknet by using Chrome/Firefox using URL http://server-ip-address:8090 to see the avg-Loss & mAP-chart, if your remote server allows external connections to the port 8090.

But do not leave the Web-Browser tab-window connected for a long time, since it can consume a lot of internet traffic, because this is a image-jpeg-stream.

I have 4 GPUs. Why does it save weights like this: 1008, 2016, 3024, 4032, 5040...?

Was this page helpful?
0 / 5 - 0 ratings