Darkflow: Loss and Moving Average Loss NaN

Created on 9 May 2017 · 12Comments · Source: thtrieu/darkflow

So I copied the tiny yolo cfg and put it in a new cfg file. I changed the classes and filters on the new cfg to match the classes on my data set. I also changed the index.txt to match my classes. I then ran:

python flow --train --model cfg/new.cfg --load bin/tiny-yolo.weights --dataset --annotation

The training kicks off but for every step I get
step 1 - loss nan - moving ave loss nan
So i feel like I'm doing something wrong. Any help on this issue would be greatly appreciated.

Source

dddevo26

All 12 comments

Dataset and annotation are blank in your command. You have to specify the path to the .xml annotations dir and dataset image dir.

solomondg on 9 May 2017

Oh ya my actual command is:
python flow --train --model cfg/adas.cfg --load bin/tiny-yolo.weights --dataset "signDatabasePublicFramesOnly/annotations/" --annotation "signDatabasePublicFramesOnly/bb_annotations/"
I was just writing my command like that because I figured no one cared what my path was

dddevo26 on 9 May 2017

Just making sure.

Can you post your cfg file? Are your annotations correct?

solomondg on 9 May 2017

So heres the new cfg file.

[net]
batch=64
subdivisions=8
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
max_batches = 120000
policy=steps
steps=-1,100,80000,100000
scales=.1,10,.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

#

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=80
activation=linear

[region]
anchors = 0.738768,0.874946, 2.42204,2.65704, 4.30971,7.04493, 10.246,4.59428, 12.6868,11.8741
bias_match=1
classes=11
coords=4
num=5
softmax=1
jitter=.2
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

Also I had to write a script to make the xml files from the dataset that I had. Here's an example of of one my annotation xmls:

<annotation>
    <source>
        <image>UCSD</image>
        <annotation>UCSD LISA</annotation>
        <flickrid>0000000</flickrid>
        <database>The VOC2007 Database</database>
        </source>
    <object>
        <bndbox>
            <xmin>474</xmin>
            <ymin>206</ymin>
            <ymax>144</ymax>
            <xmax>526</xmax></bndbox>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <name>speedLimit25</name>
        <difficult>0</difficult>
    </object>
    <filename>speedLimit25_1333392931.avi_image0.png</filename>
    <segmented>0</segmented>
    <owner>
        <name>LISA</name>
        <flickrid>UCSD</flickrid>
    </owner>
    <folder>annotations</folder>
    <size>
        <width>1024</width>
        <depth>3</depth>
        <height>522</height>
    </size>
</annotation>

Also thanks for all the help.

dddevo26 on 9 May 2017

I also just realized an error I was getting.
C:\Users\Jorda\Documents\darkflow\darkflow\net\yolov2\data.py:41: RuntimeWarning: invalid value encountered in sqrt obj[4] = np.sqrt(obj[4])
Training statistics:
Learning rate : 1e-05
Batch size : 16
Epoch number : 1000
Backup every : 2000
Im debugging it now, but any insight would be appreciated.

dddevo26 on 9 May 2017

😕1

Figured out it was a problem with the script that made the xml files.

dddevo26 on 9 May 2017

That's what I figured, hence the sqrt thing.

solomondg on 9 May 2017

Hi @dddevo26 ,
Can you tell me what the script is? or which scripy , I also encountered this mistake.
I was step 884375. print loss nan - moving are loss nan
Thank you .

jackweiwang on 4 Dec 2017

@dddevo26 yea, what exactly is the problem with the xml write script? I'm facing the same problem, so you can help me check which part of my script could go wrong. thanks

onurbarut on 26 Dec 2017

C:\Users\Jorda\Documents\darkflow\darkflow\net\yolov2\data.py:41: RuntimeWarning: invalid value encountered in sqrt obj[4] = np.sqrt(obj[4])
you are getting this problem because your xmin > xmax and ymin > ymax.

hope this will help!!!