Darknet: YOLOv3 low accuracy

Created on 18 Apr 2020  路  22Comments  路  Source: AlexeyAB/darknet

Hey @AlexeyAB I've trained full yolov3 with pedestrian dataset from OpenImages. [Downloaded using OIDToolkitv4]

Train: 2400
Valid: 600
(Split from same dataset)

Trained for 6000 Iterations, used darknet53.conv.74 pretrained weights
batch = 64
subdivisions = 32
width & height = 416

chart_yolov3-custom-train

  • What can I do to improve accuracy?
  • Tried to perform detection on a 1 min video clip, average FPS was around 16-18. What can I do to improve FPS?

All 22 comments

What can I do to improve accuracy?

  1. Use more training images
  2. Train this cfg https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1 with this pre-trained weights file https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV and train with flag -clear at the end of training command

Tried to perform detection on a 1 min video clip, average FPS was around 1618. What can I do to improve FPS?

  1. Download the latest Darknet version, and recompile with GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1

  2. What GPU do you use?

Thanks for replying. I'll train again with that config.

Im using NVIDIA Tesla T4 (1 GPU) on google cloud.

But I actually need it to be fast enough for NVIDIA Gtx 1050ti

Also, is there any documentation for all the hyper parameters?

@Devin97
Read: https://github.com/AlexeyAB/darknet/wiki
Some will be added later.

SharedScreenshot

Hey @AlexeyAB

This time I have around 6000 images and training for 6000 iterations.

Recompiled darknet with these enabled:
GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=1

After 1000 iterations, mAP is 0% and avg loss is -nan. Is this common?
Command:-
./darknet detector train data/obj.data cfg/cd53paspp-gamma-train.cfg cd53paspp-gamma_final.weights -map -dont_show -clear

Something goes wrong.

  • Did you change burn_in= or learninig_rate= in cfg-file? Don't do it.

  • Try to set before line ########################## in cfg-file:

stopbackward=2000
train_only_bn=1

and train again

Did you change burn_in= or learninig_rate= in cfg-file? Don't do it.
No haven't changed these parameters. Just changed batch=64, subdivision=16 and classes=1

Also, is it ok to directly use the full pretrained weights? Do I have to get a portion of these weights using "darknet partial" command?

Better if you will use partial:
https://github.com/AlexeyAB/darknet/blob/4786d557f9ac7452ded74ce2f3df06fbe991924e/build/darknet/x64/partial.cmd#L12

Getting Segmentation Fault error.
SharedScreenshot1

Tried to train with final weights again and added these parameters as you suggested
stopbackward=2000
train_only_bn=1

Still facing the same issue

SharedScreenshot2

chart_cd53paspp-gamma-train

Here's my cfg file

cd53paspp-gamma-train.txt

  1. Don' use obj.data file for partial command.
    ./darknet partial cfg/cd53paspp-omega.cfg cd53paspp-omega_final.weights cd53paspp-omega.conv.137 137

  2. Use

stopbackward=2000
train_only_bn=1
  1. set learning_rate=0.001

I see, I'll train again with learning_rate=0.001
What is the reason for the avg loss to be nan? Does this mean that something's wrong with the dataset?

Exploding gradients(backward) / features(forward): https://machinelearningmastery.com/exploding-gradients-in-neural-networks/

To solve this:

  • reduce learning_rate=
  • reduce batch=
  • increase burn_in=
  • increase decay=
  • use max_delta=3 in [yolo] layers
  • use stopbackward=2000 and train_only_bn=1 at the last backbone layer
  • use less layers
  • fix your dataset
  • use another model
  • use gradinent clipping

Hey @AlexeyAB
It's been more than 2000 iterations and I didn't get any "nan" avg loss, so thats good news. However, there's another problem..

chart_cd53paspp-gamma-train

Avg loss seems to be not decreasing much. Is it normal for this to happen at the beginning of the training with this cd53paspp config? or something's wrong?

Screenshot from 2020-04-20 01-06-15

avg loss for cd53paspp is higher than for yolov3, but the mAP is also higher.

Also the lower learning_rate= - the more stable training - but training is slower. So you should find optimal learning rate.

  • you can keep learninig_rate=0.001 and increase max_batches= and steps= to train longer
  • or you can try to add max_delta=3 in each [yolo] layer, and use higher learninig_rate=0.00261

My current configurations has
learning_rate = 0.001

and I've added max_delta = 3 in yolo layers.

you can keep learninig_rate=0.001 and increase max_batches= and steps= to train longer

I'll also try this out on next training session.

I'll post my results when the training is finished. Thanks for the help @AlexeyAB !

This is my training results
chart_cd53paspp-gamma-train (1)

Accuracy didn't improve, its almost close to yolov3 config.

Performed detection on a 1min clip. The FPS is around 9, but detection is somewhat more stable as compared to yolov3.

Video: https://drive.google.com/open?id=1mMe-S2XL2InaTjhzEQAc3O50fpyPRxlF

It might be overfitting..

Try to train again:
with stopbackward=2000
but without train_only_bn=1

use max_delta=3 in [yolo] layers.
and use learning_rate=0.00261

And set 2x higher max_batches=12000 and steps=...


If the avg loss Nan will occur, then train with learning_rate=0.001

Try to train again:
with stopbackward=2000
but without train_only_bn=1

use max_delta=3 in [yolo] layers.
and use learning_rate=0.00261

And set 2x higher max_batches=12000 and steps=...

If the avg loss Nan will occur, then train with learning_rate=0.001

So Im training with the parameters above for 12000 iterations.
learning_rate = 0.00261 results in "nan" avg loss, so reduced it back to 0.001..

While its training, I wanted to ask, what is the expected FPS rate with cd53paspp-gamma cfg. On previous training I got around 9 FPS, which does makes sense, since it has more layers, detections are supposed to be slow? Is this correct?

0.9x speed, but 1.3x higher accuracy.

How did you let mAP show on that chart?
for me, there's only loss.

What can I do to improve accuracy?

  1. Use more training images
  2. Train this cfg https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1 with this pre-trained weights file https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV and train with flag -clear at the end of training command

Tried to perform detection on a 1 min video clip, average FPS was around 1618. What can I do to improve FPS?

  1. Download the latest Darknet version, and recompile with GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1
  2. What GPU do you use?

@AlexeyAB whats the -clear mean? and is this flag necessary 锛焧s

@Jureong

How did you let mAP show on that chart?
for me, there's only loss.

Use -map flag

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rezaabdullah picture rezaabdullah  路  3Comments

HilmiK picture HilmiK  路  3Comments

zihaozhang9 picture zihaozhang9  路  3Comments

yongcong1415 picture yongcong1415  路  3Comments

qianyunw picture qianyunw  路  3Comments