hi, I have trained a yolo-small model to step 4648, but most of loss values are greater than 1.0, and the result of test is not very well. I want to know how well can loss value be, and could you please show some key parameters when training, e.g learning rate, training time, the final loss value, and so on.
I train the model on iMac(4 GHz Inter Core i7, 16GB memory), CPU mode.
thank you!
What batch size are you using? Because without the batch size, step number cannot say anything about how far you've gone. According to the author of YOLO, he used pretty powerful machine and the training have two stages with the first stage (training convolution layer with average pool) takes about a week. So you should be patient if you're not that far from the beginning.
Training deep net is more of an art than science. So my suggestion is you first train your model on a small data size first to see if the model is able to overfit over training set, if not then there's a problem to solve before proceeding. Notice due to data augmentation built in the code, you can't really reach 0.0 for the loss.
I've trained a few configs on my code and the loss can shrink down well from > 10.0 to around 0.5 or below (parameters C, B, S are not relevant since the loss is averaged across the output tensor). I usually start with default learning rate 1e-5, and batch size 16 or even 8 to speed up the loss first until it stops decreasing and seem to be unstable.
Then, learning rate will be decreased down to 1e-6 and batch size increase to 32 and 64 whenever I feel that the loss get stuck (and testing still does not give good result). You can switch to other adaptive learning rate training algorithm (e.g. Adadelta, Adam, etc) if you feel like familiar with them by editing ./yolo/train.py/yolo_loss()
You can also look at the learning rate policy the YOLO author used, inside .cfg files.
Best of luck
@thtrieu What a nice suggestion !
I also encountered similar issues, and find out that pre-trained weight might be a really help. More, quality and quantities of data-itself is really important especially while training a yolo-style network, it just too hard to converge well ...
I am still struggling on this ~'~
@thtrieu thank you~
In my first round of training, the batch size is 12. I get your point when you say patient.
My final goal is to find the bounding box of object which is not in the Imagenet, so I do the training without pre-trained model.
Thanks again!
Just a friendly ping. I've finish training for a YOLO of 4 classes, if you are interested I will write some notes about the process of training it.
@thtrieu Yes, I am looking forward to it.
I have updated the code for many cycles since then, so it will affect the scaling of loss value. But mechanism is the same. Here are my notes:
You should really re-use the trained weights, this is a supported feature in darkflow. Preferably 2 or 3 first layers taken from the original YOLO would be good.
Before training, run a fine-tuning on some trained models to see the loss value. These are converged values, so that is your goal to get down around these numbers. (Approximately 1.5 ~ 1.7)
Make sure to overfit a very small training dataset before going further. This makes sure the logic is working.
When get stuck at the loss value, overfit a very small set of data training again. If you are able to get the loss down, your model is underfitting, so consider two options: 1. increase the size of layers, 2. increase the depth. The later is usually better in terms of generalization and speed.
Occasionally visualise the prediction and see what kind of mistake the model is making. In my case it was predicting almost all classes to be person due to heavily skewed data. When I gradually set the weight for class term in the loss objective higher, this mistake get less severe. Notice replicating other class鈥檚 data to achieve balance will result in an unnatural distribution of training data. So I would advise against this.
Good luck, I'd love to hear update from your training.
@thtrieu I run a fine-tuning on tiny-yolo-voc models, but the loss value is approximately 6, not 1.5~1.7.
I don't have much experience in YOLOv2, maybe @ryansun1900 does.
Here is why YOLOv2's loss is much higher than that of v1:
13 x 13 x 5 = 845 proposal bounding boxes, each with its own confidence (objectness) and conditional class probability terms.7 x 7 x 2 = 98 proposal bounding boxes, sharing the same confidence term as well as conditional class probability terms.So the output volume of v2 is much larger than v1 (21125 vs 1470), and so is the loss.
So far, I don't have much experience in training large data too.
But thtrieu's explanation is correct. The loss implementation is different between yolov1 & yolov2. I think the loss difference is reasonable.
thanks for the good tips :)
Hi ,
- When get stuck at the loss value, overfit a very small set of data training again. If you are able to get the loss down, your model is underfitting, so consider two options: 1. increase the size of layers, 2. increase the depth. The later is usually better in terms of generalization and speed.
@thtrieu can you please explain what do you mean by increase the deapth? How do we do it? By changing something in the cfg file? I am training for 9 classes with yolov2 and have creazed a cfg file called yolov2-tiny-9c.cfg. SO i make changes in this file or in the original yolov2-tiny.cfg file?
I`m training a model for 1 class, yolov3-tiny.cfg. Training set 6800 jpegs ranging from 1 to 24 objects in each jpeg. Training set images normalized to 720 lines (height) but variable width. Batch size 24, subdivisions 2. Image size 512x512. Learning rate 0.0015. Max batches 450000. Although mAP is high (about 98%) average loss is still above 0.5. I guess that model is fully trained at iteration 31500 because beyond this point mAP is stable at 0.98 (98%).
My doubt is: I feel the model is overfit because it does not generalizes well or it does not generalizes well because average loss is still high?

I`m training a model for 1 class, yolov3-tiny.cfg. Training set 6800 jpegs ranging from 1 to 24 objects in each jpeg. Training set images normalized to 720 lines (height) but variable width. Batch size 24, subdivisions 2. Image size 512x512. Learning rate 0.0015. Max batches 450000. Although mAP is high (about 98%) average loss is still above 0.5. I guess that model is fully trained at iteration 31500 because beyond this point mAP is stable at 0.98 (98%).
My doubt is: I feel the model is overfit because it does not generalizes well or it does not generalizes well because average loss is still high?
hey can you tell me how to print chart like this when you training your model?
hey can you tell me how to print chart like this when you training your model?
I think he's using AlexeyAB's repo which has GUI support.
Most helpful comment
What batch size are you using? Because without the batch size, step number cannot say anything about how far you've gone. According to the author of YOLO, he used pretty powerful machine and the training have two stages with the first stage (training convolution layer with average pool) takes about a week. So you should be patient if you're not that far from the beginning.
Training deep net is more of an art than science. So my suggestion is you first train your model on a small data size first to see if the model is able to overfit over training set, if not then there's a problem to solve before proceeding. Notice due to data augmentation built in the code, you can't really reach 0.0 for the loss.
I've trained a few configs on my code and the loss can shrink down well from > 10.0 to around 0.5 or below (parameters C, B, S are not relevant since the loss is averaged across the output tensor). I usually start with default learning rate 1e-5, and batch size 16 or even 8 to speed up the loss first until it stops decreasing and seem to be unstable.
Then, learning rate will be decreased down to 1e-6 and batch size increase to 32 and 64 whenever I feel that the loss get stuck (and testing still does not give good result). You can switch to other adaptive learning rate training algorithm (e.g. Adadelta, Adam, etc) if you feel like familiar with them by editing
./yolo/train.py/yolo_loss()You can also look at the learning rate policy the YOLO author used, inside .cfg files.
Best of luck