Darknet: Training rate is 0.000 during training, although 0.001 in the cfg file

Created on 15 May 2018 · 7Comments · Source: pjreddie/darknet

Hi,

I have followed the official instructions to run training on the VOC datasets. In my yolov3-voc.cfg I see

[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=64
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 50200
policy=steps
steps=40000,45000
scales=.1,.1

When running
./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

I first see

Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
352
Loaded: 0.249344 seconds

but then, after some computations, I am surprised to find that the learning rate is 0.000:

1: 703.991638, 703.991638 avg, 0.000000 rate, 7.736669 seconds, 64 images
Loaded: 0.000032 seconds

Am I misinterpreting the output? Or there actually is something wrong with the training process?

Source

raggot

👍4

Most helpful comment

The issue lies with the burn_in parameter. It basically says "ramp up to learning_rate gradually over n iterations". If you remove that parameter, your learning rate will start out with the initial value set in the cfg file. A more detailed write up can be found here

AustinDoolittle on 16 May 2018

👍7 🚀1

All 7 comments

Hello, I have the same problem. Did you have solved it?

csj007 on 16 May 2018

No, but maybe we can share our setup to see what we have in common that could give hints to others.

I run on Ubuntu 16.04, on a built darknet from a commit that's around 1 month old, with CUDA 9.1 ande a GTX 1060 with 3GB memory as GPU.

Do we have anything in common? Tonight I'll try to rebuild darknet from the latest release available and let you know if the problem was solved.

raggot on 16 May 2018

AustinDoolittle on 16 May 2018

👍7 🚀1

Thanks! I had looked for an existing issue but couldn't find it. I'll close this issue.

raggot on 16 May 2018

Thanks so much! But when I removed that parameter, I met a new problem.

Loaded: 0.000044 seconds
2, 0.003: inf, inf avg, 0.099920 rate, 1.575415 seconds, 64 images
Loaded: 0.000054 seconds
3, 0.004: inf, inf avg, 0.099880 rate, 1.549280 seconds, 96 images

Do you know what is a "inf" loss?

csj007 on 16 May 2018

My noob guess is your training rate is too high, but that doesn't really answer your question.

raggot on 16 May 2018

Thanks so much! But when I removed that parameter, I met a new problem.

Loaded: 0.000044 seconds
2, 0.003: inf, inf avg, 0.099920 rate, 1.575415 seconds, 64 images
Loaded: 0.000054 seconds
3, 0.004: inf, inf avg, 0.099880 rate, 1.549280 seconds, 96 images

Do you know what is a "inf" loss?

Its Infinite Loss, Did you try to run the model again? Its showing every time?