Keras: Auto encoder -- validation loss consistently lower than training loss

Created on 9 Jul 2016 · 15Comments · Source: keras-team/keras

I'm getting very strange results from building an auto encoder in Keras.

https://gist.github.com/reesepathak/118c5573e320c82cf99d1bec26dc6366

As you'll see, I'm passing in a total data matrix of 0/1s with 5008 examples, 607 components per example. I consistently get lower validation loss versus training loss. This is so weird! Any idea as to what might be going wrong?

Train on 2804 samples, validate on 1202 samples
Epoch 1/50
2804/2804 [==============================] - 3s - loss: 0.0679 - val_loss: 0.0422
Epoch 2/50
2804/2804 [==============================] - 3s - loss: 0.0361 - val_loss: 0.0318
Epoch 3/50
2804/2804 [==============================] - 3s - loss: 0.0291 - val_loss: 0.0274
Epoch 4/50
2804/2804 [==============================] - 3s - loss: 0.0256 - val_loss: 0.0249
Epoch 5/50
2804/2804 [==============================] - 3s - loss: 0.0235 - val_loss: 0.0232
Epoch 6/50
2804/2804 [==============================] - 3s - loss: 0.0221 - val_loss: 0.0219
Epoch 7/50
2804/2804 [==============================] - 3s - loss: 0.0210 - val_loss: 0.0211
Epoch 8/50
2804/2804 [==============================] - 3s - loss: 0.0202 - val_loss: 0.0205
Epoch 9/50
2804/2804 [==============================] - 3s - loss: 0.0195 - val_loss: 0.0199
Epoch 10/50
2804/2804 [==============================] - 3s - loss: 0.0189 - val_loss: 0.0192
Epoch 11/50
2804/2804 [==============================] - 3s - loss: 0.0183 - val_loss: 0.0186
Epoch 12/50

I've tried changing the loss, the activation, adding extra layers to see if it was just under-fitting. This error consistently happens. The key is that if you actually calculate np.linalg.norm(pred-true)/# of examples, you actually get the expected result of the validation loss being > than the training loss. So Keras is just printing the wrong thing. How do I fix that?

tensorflow

Source

reesepathak

👍7

Most helpful comment

Don't close it, I would like to see the answer !

letyrodridc on 5 Aug 2017

👍11

All 15 comments

The FAQ may have an answer for this question Why is the training loss much higher than the testing loss?:

Besides, the training loss is the average of the losses over each batch of training data.

ChristianThomae on 10 Jul 2016

👍1

Alright, how do I have the model print out the training loss at the end of each epoch of training rather than taking the running average?

reesepathak on 10 Jul 2016

Do you mean at the end of each batch? If so, use a callback to save the batch losses.

evoclue on 11 Jul 2016

No. What I want is to basically get the loss on the predictions on the training set after each epoch. Basically, I want to see that the validation loss is consistently greater than the training loss. I'd rather not get the running average of training loss per example/batch.

reesepathak on 14 Jul 2016

As an update, I tried setting it to train on only one batch (all training examples in one batch) -- and I'm still getting the validation loss lower than training loss! I don't know how this is possible.

reesepathak on 14 Jul 2016

👍2

I'm interested in this too as it also happens to me when fitting an autoencoder using the mnist dataset. I have both training loss and validation loss decrementing, but the validation loss is lower than the training one. Even in my case using a single batch doesn't solve the "problem".

se7entyse7en on 9 Apr 2017

Even this code copied from the Keras blog produces the above mentioned behaviour (I've just changed the number of epochs).

import numpy as np

from keras.datasets import mnist
from keras.layers import Dense
from keras.layers import Input
from keras.models import Model

encoding_dim = 32
input_img = Input(shape=(784,))
encoded = Dense(encoding_dim, activation='relu')(input_img)
decoded = Dense(784, activation='sigmoid')(encoded)
autoencoder = Model(input_img, decoded)

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

autoencoder.fit(x_train, x_train, epochs=50, batch_size=256,
                shuffle=True, validation_data=(x_test, x_test))

Here's the output:

Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 4s - loss: 0.3793 - val_loss: 0.2733
Epoch 2/5
60000/60000 [==============================] - 3s - loss: 0.2666 - val_loss: 0.2570
Epoch 3/5
60000/60000 [==============================] - 3s - loss: 0.2469 - val_loss: 0.2344
Epoch 4/5
60000/60000 [==============================] - 4s - loss: 0.2259 - val_loss: 0.2154
Epoch 5/5
60000/60000 [==============================] - 4s - loss: 0.2098 - val_loss: 0.2019

And if I change the batch_size in order to use a single batch:

autoencoder.fit(x_train, x_train, epochs=5, batch_size=len(x_train),
                shuffle=True, validation_data=(x_test, x_test))

here's the output:

Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 9s - loss: 0.6947 - val_loss: 0.6940
Epoch 2/5
60000/60000 [==============================] - 9s - loss: 0.6940 - val_loss: 0.6933
Epoch 3/5
60000/60000 [==============================] - 6s - loss: 0.6933 - val_loss: 0.6927
Epoch 4/5
60000/60000 [==============================] - 6s - loss: 0.6926 - val_loss: 0.6920
Epoch 5/5
60000/60000 [==============================] - 6s - loss: 0.6920 - val_loss: 0.6913

As you can see val_loss is always lower then loss.

se7entyse7en on 9 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 9 Jul 2017

Don't close it, I would like to see the answer !

letyrodridc on 5 Aug 2017

👍11

+1. I'm seeing the same thing. Any update?

edurand on 14 Aug 2017

+1, i'd like to know how.

orico on 14 Oct 2017

same for an LSTM model I'm training

nikrao on 31 Oct 2017

Have the same issue

LeandroRitter on 18 Jan 2019

I think the 'issue' is that Keras is computing the average of the training batches in the epoch for the training loss. So first batches will have an higher loss than the last ones (because it will have done a bunch of gradient updates by that time). The validation loss is computed at the end of the epoch and should and is thus lower ( due to the high loss first training batches). You cannot really compared them directly as they are not computed with the same model weights basically. If you would compute model.evaluate(x_train) and model.evaluate(x_val) at the end of each epoch, training will be lower than validation loss.