Darknet: someting about coco_voc_person training

Created on 4 May 2018 · 3Comments · Source: AlexeyAB/darknet

Dear sir,
I get the person labels from coco2017 and voc2012, and then convert to yolo style. I joint coco2017_train and voc2012_trainval for training, coco2017_val for testing. Here is my question:

All test on coco2017_val

the office model yolov3.weight gets the result：
map=64.96% precision=0.68 recall=0.64 ap(person)=73.53%
my train:
#method 1, train on darknet53.conv.74 (iter_num: 159824, lr = 0.0001): ap = 54.44% precision = 0.8 recall = 0.44
#method 2, train on yolov3.weights (iter_num:45000, lr = 0.001): ap = 56.48% precision = 0.6 recall = 0.61
My model is far below the office one.
What is the difference of my two kinds of training, is it ok to train on yolov3.weights? In my experiment， it seems method 2 is better than 1. But i do not know how to explain this.

Also if i use lr=0.001, after some batch nan appears. Again and again i start the training, and luckly it can go on training normally. What maybe the reson?

Expect for your reply!
THANKS SO MUCH!

Source

Jacky3213

Most helpful comment

@Jacky3213

Also if i use lr=0.001, after some batch nan appears. Again and again i start the training, and luckly it can go on training normally. What maybe the reson?

You should train first 1000 iterations using only 1 GPU: https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu

You should train Yolo v3 for COCO about 500 000 iterations: https://github.com/AlexeyAB/darknet/blob/6b8fd6f33f6a61138136fd022c2b887ae39e2c42/cfg/yolov3.cfg#L20

AlexeyAB on 7 May 2018

👍3

All 3 comments

Hi,

Also if i use lr=0.001, after some batch nan appears. Again and again i start the training, and luckly it can go on training normally. What maybe the reson?

Usually nan can happen for [0-1000] iteration number, if set burn_in=1000, due to very low learning rate: https://github.com/AlexeyAB/darknet/blob/89354d0a0ce6fbb22ff262658045cdb8796ff6fd/src/network.c#L88

Or something wrong in the training dataset.

I get the person labels from coco2017 and voc2012, and then convert to yolo style. I joint coco2017_train and voc2012_trainval for training, coco2017_val for testing. Here is my question:

All test on coco2017_val

the office model yolov3.weight gets the result：
map=64.96% precision=0.68 recall=0.64 ap(person)=73.53%

Did you get map=64.96% on yolov3.weights only for Person or for all 80 classes?

the office model yolov3.weight gets the result：
map=64.96% precision=0.68 recall=0.64 ap(person)=73.53%
my train:

method 1, train on darknet53.conv.74 (iter_num: 159824, lr = 0.0001): ap = 54.44% precision = 0.8 recall = 0.44

method 2, train on yolov3.weights (iter_num:45000, lr = 0.001): ap = 56.48% precision = 0.6 recall = 0.61

My model is far below the office one.

Did you use the same Test dataset (coco2017_val) for all 3 cases?

Did you use 1 GPU or many GPUs?

AlexeyAB on 5 May 2018

👍2

@AlexeyAB
（1） On yolov3.weights, i get map=64.96% for 80 classes, and ap=73.53% only for person.
（2）Yes, all the tests are under the same condition（only coco2017_val used）.
（3）For training 4 gpus are used, for testing only 1 gps used.

Recent result:
Person detection training on yolov3.weights , i get ap = 66.3% (office person ap = 73.53%), and the average loss is still decreasing(now about 1.25).

Jacky3213 on 7 May 2018

@Jacky3213

Also if i use lr=0.001, after some batch nan appears. Again and again i start the training, and luckly it can go on training normally. What maybe the reson?

You should train first 1000 iterations using only 1 GPU: https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu

You should train Yolo v3 for COCO about 500 000 iterations: https://github.com/AlexeyAB/darknet/blob/6b8fd6f33f6a61138136fd022c2b887ae39e2c42/cfg/yolov3.cfg#L20

AlexeyAB on 7 May 2018

👍3

Was this page helpful?

0 / 5 - 0 ratings