Dear sir,
I get the person labels from coco2017 and voc2012, and then convert to yolo style. I joint coco2017_train and voc2012_trainval for training, coco2017_val for testing. Here is my question:
All test on coco2017_val
the office model yolov3.weight gets the result:
map=64.96% precision=0.68 recall=0.64 ap(person)=73.53%
my train:
#method 1, train on darknet53.conv.74 (iter_num: 159824, lr = 0.0001): ap = 54.44% precision = 0.8 recall = 0.44
#method 2, train on yolov3.weights (iter_num:45000, lr = 0.001): ap = 56.48% precision = 0.6 recall = 0.61
My model is far below the office one.
What is the difference of my two kinds of training, is it ok to train on yolov3.weights? In my experiment, it seems method 2 is better than 1. But i do not know how to explain this.
Also if i use lr=0.001, after some batch nan appears. Again and again i start the training, and luckly it can go on training normally. What maybe the reson?
Expect for your reply!
THANKS SO MUCH!
Hi,
Also if i use lr=0.001, after some batch nan appears. Again and again i start the training, and luckly it can go on training normally. What maybe the reson?
Usually nan can happen for [0-1000] iteration number, if set burn_in=1000, due to very low learning rate: https://github.com/AlexeyAB/darknet/blob/89354d0a0ce6fbb22ff262658045cdb8796ff6fd/src/network.c#L88
Or something wrong in the training dataset.
I get the person labels from coco2017 and voc2012, and then convert to yolo style. I joint coco2017_train and voc2012_trainval for training, coco2017_val for testing. Here is my question:
All test on coco2017_val
the office model yolov3.weight gets the result:
map=64.96% precision=0.68 recall=0.64 ap(person)=73.53%
Did you get map=64.96% on yolov3.weights only for Person or for all 80 classes?
the office model yolov3.weight gets the result:
map=64.96% precision=0.68 recall=0.64 ap(person)=73.53%
my train:method 1, train on darknet53.conv.74 (iter_num: 159824, lr = 0.0001): ap = 54.44% precision = 0.8 recall = 0.44
method 2, train on yolov3.weights (iter_num:45000, lr = 0.001): ap = 56.48% precision = 0.6 recall = 0.61
My model is far below the office one.
Did you use the same Test dataset (coco2017_val) for all 3 cases?
Did you use 1 GPU or many GPUs?
@AlexeyAB
(1) On yolov3.weights, i get map=64.96% for 80 classes, and ap=73.53% only for person.
(2)Yes, all the tests are under the same condition(only coco2017_val used).
(3)For training 4 gpus are used, for testing only 1 gps used.
Recent result:
Person detection training on yolov3.weights , i get ap = 66.3% (office person ap = 73.53%), and the average loss is still decreasing(now about 1.25).
@Jacky3213
Also if i use lr=0.001, after some batch nan appears. Again and again i start the training, and luckly it can go on training normally. What maybe the reson?
You should train first 1000 iterations using only 1 GPU: https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu
You should train Yolo v3 for COCO about 500 000 iterations: https://github.com/AlexeyAB/darknet/blob/6b8fd6f33f6a61138136fd022c2b887ae39e2c42/cfg/yolov3.cfg#L20
Most helpful comment
@Jacky3213
You should train first 1000 iterations using only 1 GPU: https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu
You should train Yolo v3 for COCO about 500 000 iterations: https://github.com/AlexeyAB/darknet/blob/6b8fd6f33f6a61138136fd022c2b887ae39e2c42/cfg/yolov3.cfg#L20