Keras: Accuracy decreases as epoch increases

Created on 15 Mar 2016 · 7Comments · Source: keras-team/keras

When running my neural network and fitting it like so:

model.fit(x, t, batch_size=256, nb_epoch=100, verbose=2, validation_split=0.1, show_accuracy=True)

I have found that as the number of epochs increases, there are times where the validation accuracy actually decreases.

For example at epoch 12 I got:

Epoch 12/100
4s - loss: 0.1026 - acc: 0.9667 - val_loss: 0.1384 - val_acc: 0.9733

But by the end I git:

Epoch 95/100
3s - loss: 4.6988e-04 - acc: 1.0000 - val_loss: 0.1290 - val_acc: 0.9600
Epoch 96/100
2s - loss: 5.7437e-04 - acc: 1.0000 - val_loss: 0.1321 - val_acc: 0.9600
Epoch 97/100
1s - loss: 6.3242e-04 - acc: 1.0000 - val_loss: 0.1312 - val_acc: 0.9600
Epoch 98/100
1s - loss: 5.3643e-04 - acc: 1.0000 - val_loss: 0.1322 - val_acc: 0.9600
Epoch 99/100
2s - loss: 4.2413e-04 - acc: 1.0000 - val_loss: 0.1326 - val_acc: 0.9600
Epoch 100/100
1s - loss: 4.8201e-04 - acc: 1.0000 - val_loss: 0.1295 - val_acc: 0.9600

Is this supposed to happen. Why?
Also will the final network have an accuracy of 0.96 or 0.9733.

stale

Source

anik786

👍3

Most helpful comment

i have a similar issue. My loss decreases over epochs but so does accuracy. Is this normal behavior? Accuracy goes up to 58 percent till 2 epochs and keeps decreases on further training even though loss is decreasing as well.

Oblivion4x on 9 Jan 2017

👍3

All 7 comments

Is this supposed to happen. Why?

Yes, you are overfitting on your training data.

If you train long enough, you will have a very high train accuracy (100% in your case) and the validation accuracy will decrease because your model won't be able to generalize well.

You can add regularizers and/or dropout to decrease the learning capacity of your model and/or stop the training using an EarlyStopping callback as mentionned in the faq.

tboquet on 15 Mar 2016

👍2

Intresting. So your saying increasing the number of epochs can potentially give a worse result.

Is there a way to do x number of epochs and retrieve the best result instead of just the last epoch (or does it do that already)

anik786 on 15 Mar 2016

This behavior is closely related to the bias-variance trade-off. You could take a look here from slide 17 to 25.

You could also use a ModelCheckpoint callback to save your weights. You will be able to select the epoch based on the results and to reload the weights. You could find an example here.

tboquet on 15 Mar 2016

👍2

@anik786 I am experiencing a similar issue as your problem. In my case, val_loss starts increasing at some point and val_acc fluctuates in a small interval. I post my log below.

Using Theano backend.
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 80)
X_test shape: (25000, 80)
Build model...
Train...
Train on 22500 samples, validate on 2500 samples
Epoch 1/100
22500/22500 [==============================] - 236s - loss: 0.5438 - acc: 0.7209 - val_loss: 0.4305 - val_acc: 0.8076
Epoch 2/100
22500/22500 [==============================] - 237s - loss: 0.3843 - acc: 0.8346 - val_loss: 0.3791 - val_acc: 0.8332
Epoch 3/100
22500/22500 [==============================] - 245s - loss: 0.3099 - acc: 0.8716 - val_loss: 0.3736 - val_acc: 0.8440
Epoch 4/100
22500/22500 [==============================] - 243s - loss: 0.2458 - acc: 0.9023 - val_loss: 0.4206 - val_acc: 0.8372
Epoch 5/100
22500/22500 [==============================] - 239s - loss: 0.2120 - acc: 0.9138 - val_loss: 0.3844 - val_acc: 0.8384
....
....
Epoch 75/100
22500/22500 [==============================] - 238s - loss: 0.0134 - acc: 0.9868 - val_loss: 0.9045 - val_acc: 0.8132
Epoch 76/100
22500/22500 [==============================] - 241s - loss: 0.0156 - acc: 0.9845 - val_loss: 0.9078 - val_acc: 0.8211
Epoch 77/100
22500/22500 [==============================] - 235s - loss: 0.0129 - acc: 0.9883 - val_loss: 0.9105 - val_acc: 0.8234

@tboquet At first I thought that this is overfitting. But why wouldn't val_acc decrease continuously at some point? Increase in val_loss explains overfitting but I couldn't come up with an explanation for fluctuations in val_acc. I also do not understand why val_acc did not start from a low value instead of 0.8076. Any help would be much appreciated.