Caffe: weired behavior of training

Created on 14 Jul 2015 · 7Comments · Source: BVLC/caffe

Hi,

I'm new to caffe so please bear with my naive questions. I am training imagenet with 2 classes (1/0). The problem is while after training for over 20K iterations, training loss does not significantly decrease and sort of gets stuck. Even weired thing is that validation error is always zero. Could anyone please help me interpreting what might be wrong here?

tain

Source

jrasi

Most helpful comment

Are you training from scratch (random initialization)? 2000-6000 images isn't enough to train a deep net from scratch. You can probably fine-tune a net with that many images (from ImageNet), if you freeze the initial layers, and only fine-tune the last few layers.

In general, fine-tuning from a model trained on ImageNet almost always performs better than training from random initialization (I actually have yet to see anyone report random initialization being better than ImageNet initialization).

Also, assuming that your images are correctly labeled and the classes are valid, training with more classes will perform better and produce better feature representation than fewer classes.

Are you shuffling the dataset? If not, the oscillations in accuracy could be coming from periodicity in having the same image order each epoch.

seanbell on 15 Jul 2015

👍3

All 7 comments

Have you tried different settings of hyper-parameters? When training stagnates, you usually need to lower the learning rate. A learning rate that decreases over time is the best strategy.

Please ask questions like this on the caffe-users group (https://groups.google.com/forum/#!forum/caffe-users). Github issues are for code/development.

seanbell on 15 Jul 2015

I've been trying different learning rate and gamma combinations. Any learning rate above 0.01 makes the optimization diverge in 3-4 iterations (i.e loss suddenly shoots to 87.x value). I ran 2000 iteration with 0.01 and then i tried many different learning rates (i.e. 0.001, 0.1 etc) but i get a consistent behavior, error value oscillates in 3.5-4.5 range, no matter what i do with the data. Initially i was training large set of classes but i later divided the data into only two classes, however, the behavior remains the same. I changed the inital bias_filler as well to lower value (i.e. 0.5) but without any luck.
Here is what i get for 6000 images ,
train_val_iters

jrasi on 15 Jul 2015

Also, assuming that your images are correctly labeled and the classes are valid, training with more classes will perform better and produce better feature representation than fewer classes.

Are you shuffling the dataset? If not, the oscillations in accuracy could be coming from periodicity in having the same image order each epoch.

seanbell on 15 Jul 2015

👍3

Thanks for the suggestions. yes i am training from scratch. Is it possible to change the number of classes of a pre-trained model? Actually, i have a class (i.e. my own set of images) which i would like to learn. The accuracy of this class is most important because i would only want to be able to effectively recognize that class and be able to retrieve the features for further processing. What would be the best direction to choose in this situation?
Yes, I am shuffling the data when i generate the lmvdb. I am also unsure if i should even use imagenet model. Whats your opinion in terms of comparison with other models (i.e. LeNet/Alexnet/googlenet) especially while keeping in mind that the application would be a realtime one?

jrasi on 15 Jul 2015

Yes, you can change the number of classes when fine-tuning from a pre-trained model. You are going to be re-learning the last layer anyway, so there's no reason why the number of categories has to be the same. In train_val.prototxt, rename the last layer ("fc8", assuming you're using AlexNet) to something else (e.g. "fc8-ft"), and then change the number of outputs to the number of classes. You need to rename the layer so that caffe doesn't try and copy it from the pre-trained model file. Here's an example of fine-tuning: http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

Even if you only care about 1 class, it's still much better to train for as many class labels as you have available. At test time, you can group together all of the predictions after the fact and convert it into a binary decision.

GoogLeNet is slower than AlexNet, but the accuracy is significantly better. Both AlexNet and GoogLeNet should both be able to classify images in realtime, assuming your images are small, and you have a good GPU.

seanbell on 15 Jul 2015

Thank you for great answer. I've one more question: pretraining docs says that one should decrease the learning rate and boost (increase?) the blob_lr on the last layer. But i am confused as deploy.prototxt for the pretrained model contains these but the train_val.prototxt has lr_mult instead. Should i add this manually in the layer definition? It is confusing why we need two different prototxt anyways? If i understood correctly, deploy should be used at the time of testing. There is one more difference in the deploy file which is confusing is the definition of the data. The pretrained model that i have has following data format:
input_dim: 10
input_dim: 3
input_dim: 227
input_dim: 227

which according to my understanding is saying that it requires 227x227x3 images with 10 ?? What is this 10, in convolution layers it is kernel# but i don't understand this. While data layer in train_val is different ; provides lmvdb paths.

jrasi on 15 Jul 2015

👍1

Closing as this looks like a usage issue/request for help.

This tracker is reserved for specific Caffe development issues and bugs; please ask usage questions on the caffe-users list.

For more information, see our contributing guide.

Thanks!