I'm getting very strange results from building an auto encoder in Keras.
https://gist.github.com/reesepathak/118c5573e320c82cf99d1bec26dc6366
As you'll see, I'm passing in a total data matrix of 0/1s with 5008 examples, 607 components per example. I consistently get lower validation loss versus training loss. This is so weird! Any idea as to what might be going wrong?
Train on 2804 samples, validate on 1202 samples
Epoch 1/50
2804/2804 [==============================] - 3s - loss: 0.0679 - val_loss: 0.0422
Epoch 2/50
2804/2804 [==============================] - 3s - loss: 0.0361 - val_loss: 0.0318
Epoch 3/50
2804/2804 [==============================] - 3s - loss: 0.0291 - val_loss: 0.0274
Epoch 4/50
2804/2804 [==============================] - 3s - loss: 0.0256 - val_loss: 0.0249
Epoch 5/50
2804/2804 [==============================] - 3s - loss: 0.0235 - val_loss: 0.0232
Epoch 6/50
2804/2804 [==============================] - 3s - loss: 0.0221 - val_loss: 0.0219
Epoch 7/50
2804/2804 [==============================] - 3s - loss: 0.0210 - val_loss: 0.0211
Epoch 8/50
2804/2804 [==============================] - 3s - loss: 0.0202 - val_loss: 0.0205
Epoch 9/50
2804/2804 [==============================] - 3s - loss: 0.0195 - val_loss: 0.0199
Epoch 10/50
2804/2804 [==============================] - 3s - loss: 0.0189 - val_loss: 0.0192
Epoch 11/50
2804/2804 [==============================] - 3s - loss: 0.0183 - val_loss: 0.0186
Epoch 12/50
I've tried changing the loss, the activation, adding extra layers to see if it was just under-fitting. This error consistently happens. The key is that if you actually calculate np.linalg.norm(pred-true)/# of examples, you actually get the expected result of the validation loss being > than the training loss. So Keras is just printing the wrong thing. How do I fix that?
The FAQ may have an answer for this question Why is the training loss much higher than the testing loss?:
Besides, the training loss is the average of the losses over each batch of training data.
Alright, how do I have the model print out the training loss at the end of each epoch of training rather than taking the running average?
Do you mean at the end of each batch? If so, use a callback to save the batch losses.
No. What I want is to basically get the loss on the predictions on the training set after each epoch. Basically, I want to see that the validation loss is consistently greater than the training loss. I'd rather not get the running average of training loss per example/batch.
As an update, I tried setting it to train on only one batch (all training examples in one batch) -- and I'm still getting the validation loss lower than training loss! I don't know how this is possible.
I'm interested in this too as it also happens to me when fitting an autoencoder using the mnist dataset. I have both training loss and validation loss decrementing, but the validation loss is lower than the training one. Even in my case using a single batch doesn't solve the "problem".
Even this code copied from the Keras blog produces the above mentioned behaviour (I've just changed the number of epochs).
import numpy as np
from keras.datasets import mnist
from keras.layers import Dense
from keras.layers import Input
from keras.models import Model
encoding_dim = 32
input_img = Input(shape=(784,))
encoded = Dense(encoding_dim, activation='relu')(input_img)
decoded = Dense(784, activation='sigmoid')(encoded)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
autoencoder.fit(x_train, x_train, epochs=50, batch_size=256,
shuffle=True, validation_data=(x_test, x_test))
Here's the output:
Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 4s - loss: 0.3793 - val_loss: 0.2733
Epoch 2/5
60000/60000 [==============================] - 3s - loss: 0.2666 - val_loss: 0.2570
Epoch 3/5
60000/60000 [==============================] - 3s - loss: 0.2469 - val_loss: 0.2344
Epoch 4/5
60000/60000 [==============================] - 4s - loss: 0.2259 - val_loss: 0.2154
Epoch 5/5
60000/60000 [==============================] - 4s - loss: 0.2098 - val_loss: 0.2019
And if I change the batch_size in order to use a single batch:
autoencoder.fit(x_train, x_train, epochs=5, batch_size=len(x_train),
shuffle=True, validation_data=(x_test, x_test))
here's the output:
Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 9s - loss: 0.6947 - val_loss: 0.6940
Epoch 2/5
60000/60000 [==============================] - 9s - loss: 0.6940 - val_loss: 0.6933
Epoch 3/5
60000/60000 [==============================] - 6s - loss: 0.6933 - val_loss: 0.6927
Epoch 4/5
60000/60000 [==============================] - 6s - loss: 0.6926 - val_loss: 0.6920
Epoch 5/5
60000/60000 [==============================] - 6s - loss: 0.6920 - val_loss: 0.6913
As you can see val_loss is always lower then loss.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Don't close it, I would like to see the answer !
+1. I'm seeing the same thing. Any update?
+1, i'd like to know how.
+1
same for an LSTM model I'm training
Have the same issue
I think the 'issue' is that Keras is computing the average of the training batches in the epoch for the training loss. So first batches will have an higher loss than the last ones (because it will have done a bunch of gradient updates by that time). The validation loss is computed at the end of the epoch and should and is thus lower ( due to the high loss first training batches). You cannot really compared them directly as they are not computed with the same model weights basically. If you would compute model.evaluate(x_train) and model.evaluate(x_val) at the end of each epoch, training will be lower than validation loss.
Having the exact same issue with an autoencoder in pytorch as well
Most helpful comment
Don't close it, I would like to see the answer !