Yolov3: The training result

Created on 29 May 2020  路  17Comments  路  Source: ultralytics/yolov3

Hi,

I trained the code from the beginning(not based on pre-trained model).
After 55 epoch, the p, r, map and mf1 are very close to the results as your pre-trained model. So I tested model. The following figures are my test results. It seems that the results are not good enough.

My current doubts are:
The mAP is around 0.473, but the plotted figures seem not match the good mAP. I mean for the mAP 0.473, the plotted figures should be better. is this normal? If I trained more epochs, the results should be better, right?

|epoch|mem|lossbox|lossobj|losscls|losstotal|targets|img_size|P|R|mAP|F1|GIoU|obj|cls|
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
|48/299|8.8G|3.18|1.96|1.76| 6.9| 5| 640|0.433|0.505|0.46|0.457|2.62|1.31|1.33|
|49/299|8.8G|3.17|1.99|1.75|6.91| 5| 320|0.431|0.507|0.462|0.457|2.61| 1.3|1.32|
|50/299|8.8G|3.17|1.95|1.74|6.86| 5| 512|0.43|0.51|0.464|0.458|2.61| 1.3|1.31|
|51/299|8.8G|3.17|1.97|1.75|6.89| 5| 576|0.43|0.513|0.466|0.459| 2.6| 1.3|1.31|
|52/299|8.8G|3.16|1.96|1.74|6.86| 5| 448|0.429|0.515|0.467|0.459| 2.6| 1.3| 1.3|
|53/299|8.8G|3.16|1.98|1.74|6.88| 5| 448|0.427|0.518|0.469|0.46|2.59| 1.3|1.29|
|54/299|8.8G|3.16|1.96|1.72|6.84| 5| 384|0.426|0.52|0.471|0.461|2.59| 1.3|1.29|
|55/299|8.8G|3.15|1.95|1.72|6.82| 5| 448|0.428|0.522|0.473|0.463|2.58|1.29|1.28|

image

image

image

image

image

image

Stale

Most helpful comment

mAP is the total result to show how model performs. In fact, behind mAP are three losses: iou, cls, obj.

obj is to determine: is there a target?
cls is to classify: What the target is?
iou: How far the predicted bounding box is between ground truth

iou commonly drop slower than others, and cls and obj drop much more fast than it.
SO there is indeed a case like yours: higher mAP but poorly detect result. Just train more epoch and model will be more stable.

All 17 comments

This is the parameters curves. It seems that everything goes normal.
however, I cannot understand why the mAP is 0.473. The plotted figures is not good as the mAP.

image

@ardeal all reported metrics are correct. Full coco training is 300 epochs, not sure why you would train 50 epochs and then say your results are not good enough.

If the recipe tells you to bake a cake for 30 minutes, do you bake it for 5 and then try to eat it?

@glenn-jocher ,
Thanks for your answer!

I am aware that coco training is 300 epochs.
The reason why I try the model of epoch 50 is that I saw the mAP is 0.473 which is rather high.

I only tried the model of epoch 50, and the training is still going on.

Thanks,

mAP is the total result to show how model performs. In fact, behind mAP are three losses: iou, cls, obj.

obj is to determine: is there a target?
cls is to classify: What the target is?
iou: How far the predicted bounding box is between ground truth

iou commonly drop slower than others, and cls and obj drop much more fast than it.
SO there is indeed a case like yours: higher mAP but poorly detect result. Just train more epoch and model will be more stable.

@www7890 ,

Many thanks for your comments!

@www7890 yes this is correct. obj and cls will typically overtrain faster than iou, it is one of the issues that we are looking at currently. This usually creates a scenario where iou has not converged yet, but obj and cls val losses begin increasing, forcing early stopping of the training.

Unfortunately increasing or lowering hyp['giou'] does not seem to help much in this case. Perhaps the best solution might be to reduce obj and cls hyps to near zero individually as they begin to overtrain, while leaving giou training going longer.

@www7890 here is an example of this effect. This is a plot from epoch 30-250 of a recent training of yolov3-spp on coco2017 using some new training techniques. Unfortunately we can not train fully to 300 epochs because obj and cls begin to overtrain. giou is just fine though, and shows no signs of overtraining.

I'm not really sure how to handle this. Do you have any ideas?

results

@glenn-jocher I have tried your implementation of ciou, and it indeed helps to converge iou loss faster. It gives me about 5% increases of mAP of my custom dataset.

Ah that's interesting! Does it slow down your training or use more RAM?

I'd looked at ciou a while back but didn't see any improvement on coco if I remember.

@glenn-jocher my device is V100, I don't notice significant effect to neither training speed nor memory cost.
I am not sure if ciou helps case by case. However yolov4 of darknet use it as well, so I think it worth a shot!
For the fist time I tried ciou is on your repo, which is updated around early May. Maybe ciou is helpful because some bugs were fixed by your great work?

@www7890 actually we've also seen some improvement using it on smaller datasets, though not on coco. I have not tried it on coco in a while though, and many things have changed in the intervening time. I suppose I should give it another try!

@glenn-jocher Okay!

@glenn-jocher , and @www7890 ,
Hi,

I trained around 140 epochs. The loss, accuracy and etc. curves are as follows.
All parameters seem good, but the detection result is not the same good as those curves. I will train more epochs.

By the way,
The issue(obj and cls begin to overtrain. giou is just fine though) @glenn-jocher mentioned doesn't appear during my training.
I will try ciou you both mentioned once my training encountered this issue. so what is ciou? where is it implemented in the code?

I am thinking of the following experiment to solve the issue:
is it possible decrease the weight of obj loss and cls loss but increase the weight of giou loss?
such as total_loss = a*obj_loss + b*cls_loss + c*giou_loss, here, we decrease a and b, but increase c.

image

Thanks,
Ardeal

Hi,@glenn-jocher

I trained the network from 0 epoch to 227 epoch.

The following is the training curve, it seems that the loss decreases very slowly.
In this case, should I alter the LR scheduler function?

image

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

@www7890 could you share how to implement the ciou in this repo

CIoU is implemented in utils/utils.py IoU function.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Sibozhu picture Sibozhu  路  4Comments

mehrdadazizi72 picture mehrdadazizi72  路  3Comments

acburigo picture acburigo  路  4Comments

suarezjessie picture suarezjessie  路  5Comments

JiahongXue picture JiahongXue  路  5Comments