Keras: Validation loss is different even when should be equal to training.

Created on 29 Nov 2016 · 8Comments · Source: keras-team/keras

Using TensorFlow backend.

[X] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
[X] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

When setting training data as validation data, the validation loss is different than training loss, and sometimes, much different, when it should be the same.
To reproduce: (1) copy mnist_mlp.py example, (2) pass training data as validation data, (3) run.

Code:

'''Trains a simple deep NN on the MNIST dataset.
Gets to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils


batch_size = 128
nb_classes = 10
nb_epoch = 20

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
            optimizer=RMSprop(),
            metrics=['accuracy'])

history = model.fit(X_train, Y_train,
                    batch_size=batch_size, nb_epoch=nb_epoch,
                    verbose=1, validation_data=(X_train, Y_train))
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Related to #605 (it's closed but issue still persists?)
I noticed batch size affects the loss differences.

Evaluating the model manually after each epoch(with callback), will produce different loss/acc than Keras logs in both val and train data (related to this or a separate issue?).

stale

Source

israelg99

Most helpful comment

@yhenon I still have large differences with the batch_size equal to the size of the train set and without
dropout. Is there any other subdivision the acc is averaged over?

Anton-4 on 23 Oct 2017

👍7

All 8 comments

I have the same issue.

I run a training procedure and save the mode with best validation error.

But when I reload the model and run eval_generator on the same validation data, the validation error is much larger than that in the training log.

I wonder whether there is some mistake during the model saving or logging procedure .

rudaoshi on 3 Feb 2017

Remove the dropout. Drop is on when training but off when validating.

See #5263

isaacgerg on 3 Feb 2017

This is answered in the FAQ here: https://keras.io/getting-started/faq/#why-is-the-training-loss-much-higher-than-the-testing-loss

As @isaacgerg said, the dropout is one reason for the discrepancy. The other reason is that fit() is updating the weights after each mini-batch, and the reported loss and acc are the average across the epoch. Hence they dont match the loss and acc calculated at the end of the epoch on the val set.

yhenon on 3 Feb 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

stale[bot] on 23 May 2017

@yhenon I still have large differences with the batch_size equal to the size of the train set and without
dropout. Is there any other subdivision the acc is averaged over?

Anton-4 on 23 Oct 2017

👍7

I encountered something really strange: my training loss seems OK, i.e. keep decreasing, but my validation loss almost remains the same. I read a lot of posts and tried many things, including

remove dropout
remove batch normalization
reduce learning rate
check training data

, but none of them helped.

To debug what's going on, I replaced my training data with my validation data, but I observe the exact same phenomenon, i.e. training loss keeps decreasing, but validation loss remains the same. For me, this does not make any sense!!!

Accidentally, after I switched the optimizer from Adam to SGD, everything looks normal! Later, I tried the rest of optimizers, but none of them works. Don't know why, but my model can only be trained with the SGD optimizer.

rex-yue-wu on 16 Sep 2018

Accidentally, after I switched the optimizer from Adam to SGD, everything looks normal! Later, I tried the rest of optimizers, but none of them works. Don't know why, but my model can only be trained with the SGD optimizer.

I have this problem too. validation is only correct with SGD.

M6stafa on 17 Sep 2018

Accidentally, after I switched the optimizer from Adam to SGD, everything looks normal! Later, I tried the rest of optimizers, but none of them works. Don't know why, but my model can only be trained with the SGD optimizer.

I have this problem too. validation is only correct with SGD.

I have the same problem, but “SGD" does not work either.