Darknet: Map high but no detection

Created on 27 Nov 2019 · 14Comments · Source: AlexeyAB/darknet

Hello,

I need to debug and understand whats going wrong with my training.
I created a pretty small dataset (6 classes with 14000 images) and training on yolov3-tiny_3l.cfg - when i run map command without -iou_tresh option i get the the following stats:

14000
detections_count = 287726, unique_truth_count = 255096
class_id = 0, name = WORD, ap = 98.58% (TP = 14790, FP = 363)
class_id = 1, name = TEXTFIELD, ap = 94.62% (TP = 1, FP = 0)
class_id = 2, name = COMBOBOX, ap = 97.29% (TP = 0, FP = 0)
class_id = 3, name = CHECKBOX, ap = 99.97% (TP = 10063, FP = 3)
class_id = 4, name = RADIOBUTTON, ap = 99.94% (TP = 11206, FP = 13)
class_id = 5, name = BUTTON, ap = 96.57% (TP = 34, FP = 8)

The avg loss output seems also fine to me
3874: 0.963631, 1.092447 avg loss, 0.001000 rate, 3.230680 seconds, 232440 images

However - when i try to detect anything on the test dataset - its fails miserably.
So i read further what alex wrote and noticed there is a -iou_thresh parameter.
After running with this parameter i get the following output

14000
detections_count = 291979, unique_truth_count = 255096
class_id = 0, name = WORD, ap = 51.71% (TP = 11815, FP = 3369)
class_id = 1, name = TEXTFIELD, ap = 35.64% (TP = 4, FP = 1)
class_id = 2, name = COMBOBOX, ap = 38.97% (TP = 0, FP = 0)
class_id = 3, name = CHECKBOX, ap = 89.53% (TP = 9178, FP = 556)
class_id = 4, name = RADIOBUTTON, ap = 96.85% (TP = 11236, FP = 114)
class_id = 5, name = BUTTON, ap = 48.78% (TP = 10, FP = 12)

This does not look so good anymore - more close to the results i get on my test dataset.
Now here my questions:

Why do i have so big differences between using -iou_thresh or not and why is there such a parameter at all?
Is my training going well at all?
One stat(the first map, and also the avg loss) say yes the other no(the one with -iou_trash) and the overall result(end user testing) also says no.
Should i just train a bit further?
Alex suggest i should train classs * 2000 iterations (in my case 6* 2000 = 12000)
I am only at iteration 3874. But in contrast he also wrote "When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training". Thats the case for me.
Do i just have too many parameters (i have a lot of colour combinations)
"...model of object, side, illimination, scale, each 30 grad of the turn and inclination angles - these are different objects from an internal perspective of the neural network. So the more different objects you want to detect, the more complex network model should be used...."
So do i need to either reduce combinations or use a bigger network(not tiny but yolov3 model config)

Any insights / comments on this is highly welcome.
Thank you very much.

Greetings,
Holger

Source

holger-prause

Most helpful comment

I did the following:

updated my yolo repo from alex
restricted my generated images to be of size of my network
generated custom anchors
set max = 4000
disabled color augmentation(but keeping random=1)
generated a validation dataset and use it

Now my problems are somewhat gone. I am still wondering what the problem was :-)
But the steps from above make sense to me anyways.

holger-prause on 29 Nov 2019

😄1 👍1

All 14 comments

What -iou_thresh do you use?
What mAP do you get on Training dataset?
What mAP do you get on Validation dataset?

AlexeyAB on 27 Nov 2019

Hello Alex - thank you for responding - i included the stats in my first posting.
Let me make them bold for better viewing.
All the stats posted are done on the training set to get a trend and an overall idea.
Also included the -iou_thresh value (was using the ones suggested in your docs -iou_thresh 0.75)

holger-prause on 27 Nov 2019

What mAP do you get on Validation dataset?

AlexeyAB on 27 Nov 2019

Lets me check this. Will take me some time to upload it. I wanted to check on training dataset first.
But i even fail detections on the train dataset - where it should be best. But your are right lets compare low iou tresh on validation dataset too too see if theres also such a big difference.
With some time i mean like 30-50 minutes let me do this.

holger-prause on 27 Nov 2019

Ok i used 1000 images as validation dataset (distinct images from train dataset )

Without -iou_thresh
1000
detections_count = 19660, unique_truth_count = 18253
class_id = 0, name = WORD, ap = 99.11% (TP = 969, FP = 14)
class_id = 1, name = TEXTFIELD, ap = 94.97% (TP = 0, FP = 0)
class_id = 2, name = COMBOBOX, ap = 98.48% (TP = 0, FP = 0)
class_id = 3, name = CHECKBOX, ap = 99.89% (TP = 716, FP = 0)
class_id = 4, name = RADIOBUTTON, ap = 99.90% (TP = 776, FP = 2)
class_id = 5, name = BUTTON, ap = 97.07% (TP = 0, FP = 0)

With -iou_thresh 0.75
1000
detections_count = 19709, unique_truth_count = 18253
class_id = 0, name = WORD, ap = 54.64% (TP = 965, FP = 302)
class_id = 1, name = TEXTFIELD, ap = 69.13% (TP = 0, FP = 0)
class_id = 2, name = COMBOBOX, ap = 62.74% (TP = 0, FP = 0)
class_id = 3, name = CHECKBOX, ap = 95.54% (TP = 752, FP = 11)
class_id = 4, name = RADIOBUTTON, ap = 98.17% (TP = 835, FP = 5)
class_id = 5, name = BUTTON, ap = 67.10% (TP = 8, FP = 3)

So this is consistent with the trend on the train dataset. Which is a good thing :-)
I noticed a higher accurac on the val dataset which is strange. But the training process is still running and i am testing on latest_weights.

So i would conclude that i just need to wait and train a bit longer?
What do you think?

holger-prause on 27 Nov 2019

I check by train dataset map again - and it went up too.

With -iou_thresh 0.75
14000
detections_count = 276236, unique_truth_count = 255096
class_id = 0, name = WORD, ap = 55.78% (TP = 13435, FP = 4741)
class_id = 1, name = TEXTFIELD, ap = 71.17% (TP = 0, FP = 0)
class_id = 2, name = COMBOBOX, ap = 60.27% (TP = 0, FP = 0)
class_id = 3, name = CHECKBOX, ap = 94.79% (TP = 10524, FP = 223)
class_id = 4, name = RADIOBUTTON, ap = 97.87% (TP = 11646, FP = 127)
class_id = 5, name = BUTTON, ap = 66.93% (TP = 95, FP = 35)

So think i should just wait a bit longer.
@AlexeyAB Does this makes sense?

If yes just let me close this and say thank you and sorry for "my panic"
Its just that i had many(4!) failed attempts on my dataset (too big image resolution).

Thank you very much for making me think again.
Greetings,
Holger

holger-prause on 27 Nov 2019

Train 12 000 iterations.

Training goes well.
Your test images are very different from training.

Read: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

AlexeyAB on 27 Nov 2019

👍1

Got it :-)
I guess i tend to underestimate the computational power needed to train even a small dataset. I am on google cloud with a Tesla K80 for 12 hours now.

May i ask a last question:
You wrote:

"Your test images are very different from training."
Do you mean they should/must be very different from the train dataset?
In my case they are somewhat similar.

And is it possible to donate something to this repo - like paypal or something?
Any something personal on your account you have "hard time in my life"
Well i have been there - its getting better with time.
Wish you all the best + thank you again.
Closing ticket as solved :-)

Greetings,
Holger

holger-prause on 27 Nov 2019

To anyone facing a similar problem:

Dont only look at the avg loss values or the output from the map command without -iou_thresh
Check map with -iou_thresh 0.75
Be a bit patient and measure from time to time to see if theres an actual improvement

Dont be such a noob like me :-)

holger-prause on 27 Nov 2019

You can look at the mAP without -iou_thresh

AlexeyAB on 28 Nov 2019

And sponsoring can be done by clicking the sponsor button (see on the top of this page) :-)
Keep up your good work.

holger-prause on 28 Nov 2019

Oh man - so i just finished the 12000th iteration and the map looks pretty good.
Note that map was done on the training set
Note that i am using yolov3-tiny_3l.cfg and pretrained weight from yolo tiny yolov3-tiny.conv.15

I am not sure this is right anymore because your wrote
darknet.exe partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15

Here the stats:

With -iou_thresh 0.75
14000
detections_count = 276236, unique_truth_count = 255096
class_id = 0, name = WORD, ap = 81.34% (TP = 13435, FP = 4741)
class_id = 1, name = TEXTFIELD, ap = 82.28% (TP = 0, FP = 0)
class_id = 2, name = COMBOBOX, ap = 85,7% (TP = 0, FP = 0)
class_id = 3, name = CHECKBOX, ap = 97.89% (TP = 10524, FP = 223)
class_id = 4, name = RADIOBUTTON, ap = 99.12% (TP = 11646, FP = 127)
class_id = 5, name = BUTTON, ap = 98.34% (TP = 95, FP = 35)

I tried to detect a sample from the training set (to make it easy) - i don't care about overfitting right now - i just want to see it working on the training set at least one time.

It can detect class 3 and 4 nearly perfectly - even on a test set!
It cannot detect a single instance of class 1 or 2 even on the training set!

I am really really really confused. The stats say it should be fine but i am really not.

So what did i do - i retrained on 3 classes (RADIOBUTTON,CHECKBOX,TEXTFIELD)
This time it can detect on the training set pretty good but has no good performance on test set - well this is overfitting this time.
The dataset is generated with a lot of color / length / padding variations.

I also have the impression that training on a input field class is somehow fruitless - its always makes problem for me(a lot of false positives this time). Is this object class just too simple(not enough features) or my network config?

I am really doubting myself right now.
The numbers say i am fine but i am not - i am a bit desperate right now.
Should i just spent my time on collecting real live data?

holger-prause on 29 Nov 2019

I did the following: