Darkflow: Minimum number of training steps

Created on 26 Jul 2017  路  12Comments  路  Source: thtrieu/darkflow

Hi.everyone.

I train just 5 class from VOK2012 dataset

training result is ...

step: more than 14000
loss: The value is continuously changed between 2 and 3.

and then I created my-yolo.pb and my-yolo.meta, and prediction like as:
flow --pbLoad built_graph/my-yolo.pb --metaLoad built_graph/my-yolo.meta --imgdir sample_img/

but it can't detect anything.

What is the minimum number of steps to detect?

Most helpful comment

Hey Heidisnaps,

its a bug i think :(
The only way to get a better accuracy is to train more than 300000 steps (depends on your data).
But the problem is, if you do this your pb file does not work anymore.

I hope the author will fix it soon. Maybe you can try also send him a Mail and ask him for fixing :/
I think this is the only way.

All 12 comments

Usually until your lost has just converged. You can visualise the training from the summary folder using Tensorboard.

It might not be the number of steps that are the problem. It could be that you're training was never set correctly in the first place, your threshold is too high, different category names etc. Have you managed to do any further troubleshooting?

@jubjamie

my class:

bicycle
bird
boat
person
There were four, not five.
However, the number of labels and the number of classes in cfg are the same.

my cfg:
`[net]
batch=32
subdivisions=8
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
max_batches = 20100
policy=steps
steps=-1,100,20000,30000
scales=10,.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

#

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=45
activation=linear

[region]
anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52
bias_match=1
classes=4
coords=4
num=5
softmax=1
jitter=.2
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh=.6
random=1`

I chose four classes, but in fact, I just want to find people. Other classes are not important.
How do I change the initial value?

What initial value?

@jubjamie

I tried again with 20 classes.


~$ flow --train --model cfg/tiny-yolo-voc-new.cfg --dataset "~/VOCdevkit/VOC2007/JPEGImages" --annotation "~/VOCdevkit/VOC2007/Annotations" --gpu 1.0

Parsing cfg/tiny-yolo-voc-new.cfg
Loading None ...
Finished in 0.00010204315185546875s

Building net ...
Source | Train? | Layer description | Output size
-------+--------+----------------------------------+---------------
| | input | (?, 416, 416, 3)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 416, 416, 16)
Load | Yep! | maxp 2x2p0_2 | (?, 208, 208, 16)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 208, 208, 32)
Load | Yep! | maxp 2x2p0_2 | (?, 104, 104, 32)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 104, 104, 64)
Load | Yep! | maxp 2x2p0_2 | (?, 52, 52, 64)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 52, 52, 128)
Load | Yep! | maxp 2x2p0_2 | (?, 26, 26, 128)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 256)
Load | Yep! | maxp 2x2p0_2 | (?, 13, 13, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 512)
Load | Yep! | maxp 2x2p0_1 | (?, 13, 13, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 1x1p0_1 linear | (?, 13, 13, 125)
-------+--------+----------------------------------+---------------
GPU mode with 1.0 usage

cfg/tiny-yolo-voc-new.cfg loss hyper-parameters:
H = 13
W = 13
box = 5
classes = 20
scales = [1.0, 5.0, 1.0, 1.0]

Parsing for ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
[====================>]100% 007614.xml
Statistics:
bird: 599
train: 328
diningtable: 310
chair: 1432
bus: 272
dog: 538
cat: 389
bottle: 634
motorbike: 390
tvmonitor: 367
person: 5447
pottedplant: 625
sheep: 353
bicycle: 418
car: 1644
horse: 406
aeroplane: 331
sofa: 425
cow: 356
boat: 398
Dataset size: 5011
Dataset of 5011 instance(s)
Training statistics:
Learning rate : 1e-05
Batch size : 16
Epoch number : 1000
Backup every : 2000

step 1 - loss 109.52924346923828 - moving ave loss 109.52924346923828
step 2 - loss 109.86571502685547 - moving ave loss 109.56289062500001
step 3 - loss 110.08113098144531 - moving ave loss 109.61471466064454
.
.
.
step 12492 - loss 10.850701332092285 - moving ave loss 8.618697419224091
step 12493 - loss 6.189124584197998 - moving ave loss 8.375740135721482
step 12494 - loss 9.410148620605469 - moving ave loss 8.479180984209881
step 12495 - loss 7.525200843811035 - moving ave loss 8.383782970169996

~$ flow --model cfg/tiny-yolo-voc-new.cfg --load -1 --savepb
~$ flow --pbLoad built_graph/tiny-yolo-voc-new.pb --metaLoad built_graph/tiny-yolo-voc-new.meta --imgdir sample_img/ --json

But this also does not detect anything.

does it detect before you translate to pb files? Check to see if it's a translation error. I also don't see where you are saving your weights? What if you use transfer learning?

Hey Heidisnaps,

i think you have the same problem as i have.
The problem is, that after a training with more than 14000 stages/steps savepb doesn't work anymore!
If you build the savepb file and want to detect something with it in Android
(or use :
flow --pbLoad built_graph/yolo-new.pb --metaLoad built_graph/yolo-new.meta --imgdir sample_img/)
it doesnt works :(

I think there is a Bug, because the weights works right.
For example you can try:
./flow --imgdir sample_img/ --model cfg/yolo-new.cfg --load 1500
and it works!

And the problem appears after 14000 training steps!
I try a lot of combinations and train more than 600000 steps and it doesnt works!
But with less than 14000 steps you didnt have any problems!

@jubjamie

Translate to pb files is no error
~$ flow --model cfg/tiny-yolo-voc-new.cfg --load 153 --savepb

Parsing cfg/tiny-yolo-voc-new.cfg
Loading None ...
Finished in 0.00011944770812988281s

Building net ...
Source | Train? | Layer description | Output size
-------+--------+----------------------------------+---------------
| | input | (?, 416, 416, 3)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 416, 416, 16)
Load | Yep! | maxp 2x2p0_2 | (?, 208, 208, 16)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 208, 208, 32)
Load | Yep! | maxp 2x2p0_2 | (?, 104, 104, 32)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 104, 104, 64)
Load | Yep! | maxp 2x2p0_2 | (?, 52, 52, 64)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 52, 52, 128)
Load | Yep! | maxp 2x2p0_2 | (?, 26, 26, 128)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 256)
Load | Yep! | maxp 2x2p0_2 | (?, 13, 13, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 512)
Load | Yep! | maxp 2x2p0_1 | (?, 13, 13, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 1x1p0_1 linear | (?, 13, 13, 125)
-------+--------+----------------------------------+---------------
Running entirely on CPU
Loading from ./ckpt/tiny-yolo-voc-new-153
Finished in 7.258143663406372s
Rebuild a constant version ...
Done

@Savash2016
I tried under 14000. like as:
~$ flow --model cfg/tiny-yolo-voc-new.cfg --load 12000 --savepb

it is work!thanks!!
But the accuracy is too low. How did you improve the accuracy?

Hey Heidisnaps,

its a bug i think :(
The only way to get a better accuracy is to train more than 300000 steps (depends on your data).
But the problem is, if you do this your pb file does not work anymore.

I hope the author will fix it soon. Maybe you can try also send him a Mail and ask him for fixing :/
I think this is the only way.

@Heidisnaps
Hey Heidisnaps,can you send to me the code about the accuracy.
thanks,my gmail [email protected]

@Savash2016
hey Savash2016 ;

the cfg have the max_batches ,in yolo-voc max_batches is 45000,in tiny-yolo-voc max_batches is 40100(voc 2007). i think the max_batches is the max_step.
i want to get a better accuracy,does it need to train more than 300000 steps?
and,when it run above 40000 steps ,the loss does not descend any more.( the loss is between 4.0-5.0)
thanks ,my gmai is [email protected]

@junxuezheng
Sorry, but I do not have any code at this time.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

anonym24 picture anonym24  路  5Comments

bareblackfoot picture bareblackfoot  路  5Comments

1NNcoder picture 1NNcoder  路  3Comments

pribadihcr picture pribadihcr  路  5Comments

ma3252788 picture ma3252788  路  3Comments