Darknet: Training rate is 0.000 during training, although 0.001 in the cfg file

Created on 15 May 2018  路  7Comments  路  Source: pjreddie/darknet

Hi,

I have followed the official instructions to run training on the VOC datasets. In my yolov3-voc.cfg I see

[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=64
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 50200
policy=steps
steps=40000,45000
scales=.1,.1

When running
./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

I first see

Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
352
Loaded: 0.249344 seconds

but then, after some computations, I am surprised to find that the learning rate is 0.000:

1: 703.991638, 703.991638 avg, 0.000000 rate, 7.736669 seconds, 64 images
Loaded: 0.000032 seconds

Am I misinterpreting the output? Or there actually is something wrong with the training process?

Most helpful comment

The issue lies with the burn_in parameter. It basically says "ramp up to learning_rate gradually over n iterations". If you remove that parameter, your learning rate will start out with the initial value set in the cfg file. A more detailed write up can be found here

All 7 comments

Hello, I have the same problem. Did you have solved it?

No, but maybe we can share our setup to see what we have in common that could give hints to others.

I run on Ubuntu 16.04, on a built darknet from a commit that's around 1 month old, with CUDA 9.1 ande a GTX 1060 with 3GB memory as GPU.

Do we have anything in common? Tonight I'll try to rebuild darknet from the latest release available and let you know if the problem was solved.

The issue lies with the burn_in parameter. It basically says "ramp up to learning_rate gradually over n iterations". If you remove that parameter, your learning rate will start out with the initial value set in the cfg file. A more detailed write up can be found here

Thanks! I had looked for an existing issue but couldn't find it. I'll close this issue.

Thanks so much! But when I removed that parameter, I met a new problem.

Loaded: 0.000044 seconds
2, 0.003: inf, inf avg, 0.099920 rate, 1.575415 seconds, 64 images
Loaded: 0.000054 seconds
3, 0.004: inf, inf avg, 0.099880 rate, 1.549280 seconds, 96 images

Do you know what is a "inf" loss?

My noob guess is your training rate is too high, but that doesn't really answer your question.

Thanks so much! But when I removed that parameter, I met a new problem.

Loaded: 0.000044 seconds
2, 0.003: inf, inf avg, 0.099920 rate, 1.575415 seconds, 64 images
Loaded: 0.000054 seconds
3, 0.004: inf, inf avg, 0.099880 rate, 1.549280 seconds, 96 images

Do you know what is a "inf" loss?

Its Infinite Loss, Did you try to run the model again? Its showing every time?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ryuzakinho picture ryuzakinho  路  4Comments

cadip92 picture cadip92  路  3Comments

job2003 picture job2003  路  3Comments

sayanmutd picture sayanmutd  路  3Comments

AaronYKing picture AaronYKing  路  3Comments