i have a data-set of around 7000 images having 12 classes.... upon training the loss gets decrease to around 3 which was achieved at 1500 iterations, after that till 3500 iterations there wasn't any remarkable change in it... can anyone help me out with this problem?
@hashirali2604, you can try the following:
Remember to use --summary parameter when starting the training to have logs for Tensorboard, to monitor the loss curve.
In my case, this caused a really nice drop of the loss function.
@k-lyda can you guide me exactly from where i can change that learning rate? as i think it should be change from .cfg file of the model but upon changing the learning rate from there it still shows 1e-05 while training...
@hashirali2604 there is a parameter lr - you can pass it either via cli (e.g. --lr 0.0001) or in python code. All possible variables are listed here:
https://github.com/thtrieu/darkflow/blob/master/darkflow/defaults.py
@k-lyda thanks for the help... <3 I will update you soon
@k-lyda hey man... I have changed my dataset, as well as annotations and now my dataset is more than 16,000 having 11 classes but now at a learning rate of 1e-03 my loss and ave loss is getting as low as power -8 in just 1 epoch..... and after evaluating the model it doesn't show any detection even on 0.01 threshold... what do i have to do?
also right now i started training on 1e-05 and now after 1200 iterations its ave loss is 0.0098..
@hashirali2604, one question, have you modified the cfg file that in the last layer you have proper value for filters number? If you have 11 classes, it should be num*(11+5) - num is set in last section, [region]
@k-lyda, yes I have modified that and it is 5(11+5)=80 filters in my case, as the num is 5 by default
hi,
i am using tiny-yolo-voc.cfg and tiny-yolo-voc.weights. i am using 100 images and each image have 39 small objects (4088 instances). i have created XML annotations from these 100 images and changed the number of classes to 1 as there is only object in all these images and changed the number of filters to 30 in the last [convolutional] layer.
i tried to change the learning rate and batch size but the moving avg loss and loss is not decreasing even after 1500 epochs.
can you please suggest something to help me decrease the loss...?
thanks in advance...
The config file:
[net]
batch=64
subdivisions=8
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
max_batches = 40100
policy=steps
steps=-1,100,20000,30000
scales=.1,10,.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear
[region]
anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1
object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1
absolute=1
thresh = .2
random=1
Command:
python flow --model cfg/tiny-yolo-voc-1c.cfg --load bin/tiny-yolo-voc.weights
--train --annotation annotations --dataset images --gpu 1 --epoch 300
I will really appreciate your quick help because i have tried everything from the last 1 week and its not working .....
@BlackCode101 ever get an answer?
@BlackCode101 can you share your solution if you solved it?
@DomagojJaksic increasing the learning and massively increasing the number of epochs got my learning rate to decrease. However, when I went to use my new weights, they weren't detecting anything, so am going to attempt to re-run when I have the time. I thought I also saw somewhere that others were having similar issues with the tiny-yolo weights, but alas...
@mpky I ran into a similar issue but resolved it by first overfitting on a subset of images (3-5, or the minimum number to get an instance of all classes). After 1000 - 2000 epochs on five images, the net correctly detected the objects with ~0.9 confidence. After getting that confidence, I began training on the entire dataset and it is working like a charm. hope this helps!
@rockstardotb This looks like a very good idea
@mpky I ran into a similar issue but resolved it by first overfitting on a subset of images (3-5, or the minimum number to get an instance of all classes). After 1000 - 2000 epochs on five images, the net correctly detected the objects with ~0.9 confidence. After getting that confidence, I began training on the entire dataset and it is working like a charm. hope this helps!
I am training a ball detector on yolo v2. I have done the first step of overtraining the model on a small subset, and I am now training on my full dataset of 700 pictures. The problem is that my loss is not going down significantly. It started out as 2.5 and is now fluctuating between 0.3 and 1.5 after about 65 epochs. How much epochs did it take for you to get a decent result?
@mpky I ran into a similar issue but resolved it by first overfitting on a subset of images (3-5, or the minimum number to get an instance of all classes). After 1000 - 2000 epochs on five images, the net correctly detected the objects with ~0.9 confidence. After getting that confidence, I began training on the entire dataset and it is working like a charm. hope this helps!
I am training a ball detector on yolo v2. I have done the first step of overtraining the model on a small subset, and I am now training on my full dataset of 700 pictures. The problem is that my loss is not going down significantly. It started out as 2.5 and is now fluctuating between 0.3 and 1.5 after about 65 epochs. How much epochs did it take for you to get a decent result?
What is your batch size and learning rate? Should be 16 and 1E-4, respectively. If it鈥檚 loss is now fluctuating between 0.3 and 1.5, you may need to stop and restart training from the last checkpoint with a lower learning rate (I.e., 1E-5). It鈥檚 been over a year since I worked on this particular project. I trained on a single class and, if I remember correctly, I had a good model somewhere between 800 - 1200 epochs. Note, my dataset was very large, approximately 12000 images. One last question, did you change the .cfg file to make the last layer a single class?
@mpky I ran into a similar issue but resolved it by first overfitting on a subset of images (3-5, or the minimum number to get an instance of all classes). After 1000 - 2000 epochs on five images, the net correctly detected the objects with ~0.9 confidence. After getting that confidence, I began training on the entire dataset and it is working like a charm. hope this helps!
I am training a ball detector on yolo v2. I have done the first step of overtraining the model on a small subset, and I am now training on my full dataset of 700 pictures. The problem is that my loss is not going down significantly. It started out as 2.5 and is now fluctuating between 0.3 and 1.5 after about 65 epochs. How much epochs did it take for you to get a decent result?
What is your batch size and learning rate? Should be 16 and 1E-4, respectively. If it鈥檚 loss is now fluctuating between 0.3 and 1.5, you may need to stop and restart training from the last checkpoint with a lower learning rate (I.e., 1E-5). It鈥檚 been over a year since I worked on this particular project. I trained on a single class and, if I remember correctly, I had a good model somewhere between 800 - 1200 epochs. Note, my dataset was very large, approximately 12000 images. One last question, did you change the .cfg file to make the last layer a single class?
My batch size is 8 and my learning rate is 1e-05, i'll change it and see if that makes any difference. And yes i have changed my cfg to work for one class. Ill keep you updated, thanks for the advice!
@mpky I ran into a similar issue but resolved it by first overfitting on a subset of images (3-5, or the minimum number to get an instance of all classes). After 1000 - 2000 epochs on five images, the net correctly detected the objects with ~0.9 confidence. After getting that confidence, I began training on the entire dataset and it is working like a charm. hope this helps!
I am training a ball detector on yolo v2. I have done the first step of overtraining the model on a small subset, and I am now training on my full dataset of 700 pictures. The problem is that my loss is not going down significantly. It started out as 2.5 and is now fluctuating between 0.3 and 1.5 after about 65 epochs. How much epochs did it take for you to get a decent result?
What is your batch size and learning rate? Should be 16 and 1E-4, respectively. If it鈥檚 loss is now fluctuating between 0.3 and 1.5, you may need to stop and restart training from the last checkpoint with a lower learning rate (I.e., 1E-5). It鈥檚 been over a year since I worked on this particular project. I trained on a single class and, if I remember correctly, I had a good model somewhere between 800 - 1200 epochs. Note, my dataset was very large, approximately 12000 images. One last question, did you change the .cfg file to make the last layer a single class?
Edited the learning rate and batch size. Now i run into the problem that my loss turns into NaN after a couple of steps. Any ideas?
@mpky I ran into a similar issue but resolved it by first overfitting on a subset of images (3-5, or the minimum number to get an instance of all classes). After 1000 - 2000 epochs on five images, the net correctly detected the objects with ~0.9 confidence. After getting that confidence, I began training on the entire dataset and it is working like a charm. hope this helps!
I am training a ball detector on yolo v2. I have done the first step of overtraining the model on a small subset, and I am now training on my full dataset of 700 pictures. The problem is that my loss is not going down significantly. It started out as 2.5 and is now fluctuating between 0.3 and 1.5 after about 65 epochs. How much epochs did it take for you to get a decent result?
What is your batch size and learning rate? Should be 16 and 1E-4, respectively. If it鈥檚 loss is now fluctuating between 0.3 and 1.5, you may need to stop and restart training from the last checkpoint with a lower learning rate (I.e., 1E-5). It鈥檚 been over a year since I worked on this particular project. I trained on a single class and, if I remember correctly, I had a good model somewhere between 800 - 1200 epochs. Note, my dataset was very large, approximately 12000 images. One last question, did you change the .cfg file to make the last layer a single class?
Edited the learning rate and batch size. Now i run into the problem that my loss turns into NaN after a couple of steps. Any ideas?
Sounds like the learning rate is still too high. I鈥檇 try making it smaller. You may try restarting from the checkpoint where you overfitted and use a learning rate of 1E-5 instead of 1E-4. Keep batch size at 16.
Most helpful comment
@mpky I ran into a similar issue but resolved it by first overfitting on a subset of images (3-5, or the minimum number to get an instance of all classes). After 1000 - 2000 epochs on five images, the net correctly detected the objects with ~0.9 confidence. After getting that confidence, I began training on the entire dataset and it is working like a charm. hope this helps!