Keras: Loss Increases after some epochs

Created on 11 Aug 2017  路  12Comments  路  Source: keras-team/keras

I have tried different convolutional neural network codes and I am running into a similar issue. The network starts out training well and decreases the loss but after sometime the loss just starts to increase. I have shown an example below:
Epoch 15/800
1562/1562 [==============================] - 49s - loss: 0.9050 - acc: 0.6827 - val_loss: 0.7667 - val_acc: 0.7323
Epoch 16/800
1562/1562 [==============================] - 49s - loss: 0.8906 - acc: 0.6864 - val_loss: 0.7404 - val_acc: 0.7434
Epoch 380/800
1562/1562 [==============================] - 49s - loss: 1.5519 - acc: 0.4880 - val_loss: 1.4250 - val_acc: 0.5233
Epoch 381/800
1562/1562 [==============================] - 48s - loss: 1.5416 - acc: 0.4897 - val_loss: 1.5032 - val_acc: 0.4868
Epoch 800/800
1562/1562 [==============================] - 49s - loss: 1.8483 - acc: 0.3402 - val_loss: 1.9454 - val_acc: 0.2398

I have tried this on different cifar10 architectures I have found on githubs. I am training this on a GPU Titan-X Pascal. This only happens when I train the network in batches and with data augmentation. I have changed the optimizer, the initial learning rate etc. I have also attached a link to the code. I just want a cifar10 model with good enough accuracy for my tests, so any help will be appreciated. The code is from this:
https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

Most helpful comment

Look, when using raw SGD, you pick a gradient of loss function w.r.t. parameters (the direction which increases function value) and go to opposite direction little bit (in order to minimize the loss function).
There are different optimizers built on top of SGD using some ideas (momentum, learning rate decay, etc...) to make convergence faster.
If you look how momentum works, you'll understand where's the problem. In the beginning, the optimizer may go in same direction (not wrong) some long time, which will cause very big momentum. Then the opposite direction of gradient may not match with momentum causing optimizer "climb hills" (get higher loss values) some time, but it may eventually fix himself.
(I encourage you to see how momentum works)
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum

All 12 comments

I believe that you have tried different optimizers, but please try raw SGD with smaller initial learning rate.
Most likely the optimizer gains high momentum and continues to move along wrong direction since some moment.

So something like this?
lrate = 0.001
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.90, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

No, without any momentum and decay, just a raw SGD.

model.compile(loss='categorical_crossentropy', optimizer='SGD', metrics=['accuracy'])

Thanks, that works. I was wondering if you know why that is?

Look, when using raw SGD, you pick a gradient of loss function w.r.t. parameters (the direction which increases function value) and go to opposite direction little bit (in order to minimize the loss function).
There are different optimizers built on top of SGD using some ideas (momentum, learning rate decay, etc...) to make convergence faster.
If you look how momentum works, you'll understand where's the problem. In the beginning, the optimizer may go in same direction (not wrong) some long time, which will cause very big momentum. Then the opposite direction of gradient may not match with momentum causing optimizer "climb hills" (get higher loss values) some time, but it may eventually fix himself.
(I encourage you to see how momentum works)
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum

Ok, I will definitely keep this in mind in the future. Thanks for the help.

Hello,
I'm using CNN for regression and I'm using MAE metric to evaluate the performance of the model. But I noted that the Loss, Val_loss, Mean absolute value and Val_Mean absolute value are not changed after some epochs.

My loss was at 0.05 but after some epoch it went up to 15 , even with a raw SGD. High epoch dint effect with Adam but only with SGD optimiser.
Pls help

@mahnerak
Hi thank you for your explanation. I experienced similar problem.

BTW, I have an question about _"but it may eventually fix himself"_.
Does it mean loss can start going down again after many more epochs even with momentum, at least theoretically?

Thanks in advance.

Hi @kouohhashi,
I suggest you reading Distill publication: https://distill.pub/2017/momentum/

Authors mention "It is possible, however, to construct very specific counterexamples where momentum does not converge, even on convex functions."
Please also take a look https://arxiv.org/abs/1408.3595 for more details.

Look, when using raw SGD, you pick a gradient of loss function w.r.t. parameters (the direction which increases function value) and go to opposite direction little bit (in order to minimize the loss function).
There are different optimizers built on top of SGD using some ideas (momentum, learning rate decay, etc...) to make convergence faster.
If you look how momentum works, you'll understand where's the problem. In the beginning, the optimizer may go in same direction (not wrong) some long time, which will cause very big momentum. Then the opposite direction of gradient may not match with momentum causing optimizer "climb hills" (get higher loss values) some time, but it may eventually fix himself.
(I encourage you to see how momentum works)
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum

Are you suggesting that momentum be removed altogether or for troubleshooting? If you mean the latter how should one use momentum after debugging?
Thanks.

increase the batch-size. and be aware of the memory

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danFromTelAviv picture danFromTelAviv  路  63Comments

hotplot picture hotplot  路  59Comments

lmoesch picture lmoesch  路  89Comments

ayalalazaro picture ayalalazaro  路  90Comments

EderSantana picture EderSantana  路  219Comments