Keras: model.fit() does not return the best model

Created on 21 May 2016 · 27Comments · Source: keras-team/keras

Unless you are using ModelCheckpoint callback with save_best_only parameter, model.fit() returns not the best model encountered during the training (i.e. the one with the lowest validation loss or with highest validation accuracy), but rather the model whatever it happens to be at the last epoch (which cannot be counted on to have lowest loss or highest accuracy).

Since many users, esp beginners are unaware of this callback, by default their results are always not the best possible with the current parameters and the accuracy of their network is unnecessarily negatively impacted, despite that a better model has likely been already encountered during the already performed training pass.

Proposed change: model.fit() should return the best model encountered during training by default, or if it will negatively impact performance, at least provide a parameter to do so.

The current workaround is the following:
model = Sequential()
...
model.compile(loss='mse', optimizer=opt)
checkpointer = ModelCheckpoint(filepath="weights.hdf5", verbose=1, save_best_only=True)
hist = model.fit(..., callbacks=[checkpointer])
model.load_weights('weights.hdf5')
predicted = model.predict(X_test_mat)

[X] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
[X] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[X] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Source

DSA101

👍24

Most helpful comment

Could this maybe be an optional parameter to fit? Something along these lines:
fit.(..., return_best_model=False)

Doing so, we would keep the current behaviour and also ease retrieving the best models (with return_best_model=True) without storing files to disk and with less code.

nikicc on 26 May 2016

👍50

All 27 comments

I think changing behavior of fit is not appropriate and Callback is powerful enough to achieve it. Maybe we should make some tutorials in FAQ to help user new to Keras.

joelthchao on 21 May 2016

👍7

What would be a down side of such change? Why would one not want model.fit() to return the best you can get during the current run?

DSA101 on 21 May 2016

👍8

Why would one not want model.fit() to return the best you can get during the current run?

because the word "best" is relative. sometime user want the lowest loss result on train data, most of the time on validation data.

and yes, +1 for FAQ entry.

geovedi on 24 May 2016

I see. But with the current behavior it's neither lowest training loss nor lowest validation loss - it is whatever happens to be at the last epoch. I think selecting an intelligent default (e.g. lowest val loss) would be a right thing to do, in particular to make it easier for beginners to start and get better results. There are no backwards compatibility issues and the experts will still have full control.

Case in point - I've started with Keras about three months ago, trying to use an LSTM for time series forecasting. With the "default" behavior I've been getting accuracy of the forecast around or barely above 50%. But it's only recently I've realized that the model returned after fit() that I've been using to predict() is not the best model and added the ModelCheckpoint callback saving the best model - my accuracy with otherwise the same parameters went up to 55-60%. I am just thinking about others trying machine learning and some for this reason getting unnecessarily wrong impression that "it doesn't work yet", before they get to the depths of the documentation.

And of course I support covering this in FAQ, if the decision is not to make the change.

DSA101 on 24 May 2016

👍7 🎉2

What would be a down side of such change?

The behaviors of fit in other ML framework are the same as Keras. Modifying it might produce unexpected result for most of the users.

joelthchao on 24 May 2016

👍2

Could this maybe be an optional parameter to fit? Something along these lines:
fit.(..., return_best_model=False)

Doing so, we would keep the current behaviour and also ease retrieving the best models (with return_best_model=True) without storing files to disk and with less code.

nikicc on 26 May 2016

👍50

Sure, I think this will be a nice solution.

joelthchao on 26 May 2016

Is the only -- currently existing -- solution, to save the weights to a hdf5 file using ModelCheckpoint and then loading the file again and applying it to a model, which is then returned?

so something similar to this:

def getBestModel(...):
    model.compile(..)
    best_weights_filepath = './best_weights.hdf5'
    earlyStopping=kcallbacks.EarlyStopping(monitor='val_loss', patience=10, verbose=1, mode='auto')
    saveBestModel = kcallbacks.ModelCheckpoint(best_weights_filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='auto')

    # train model
    history = model.fit(x_tr, y_tr, batch_size=batch_size, nb_epoch=n_epochs,
              verbose=1, validation_data=(x_va, y_va), callbacks=[earlyStopping, saveBestModel])

    #reload best weights
    model.load_weights(best_weights_filepath)
    return model

Aeefire on 26 May 2016

👍25 🎉5 ❤1

Could we get an update on this? Current behavior is limiting use of scikit wrappers in grid search.

EDIT: This is the workaround i use for now. Model name needs to be test in model building function that passes the model to the GridSearchCV to avoid same model names and allow multiple threads in cv.

 import keras

 class CustomCheckpoint(keras.callbacks.ModelCheckpoint):

     def __init__(self, output_dir, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto'):
         self.output_dir = output_dir
         super(CustomCheckpoint,self).__init__('', monitor, verbose, save_best_only, save_weights_only, mode)

     def on_train_begin(self, logs={}):
         self.filepath = self.output_dir + self.model.name + '_weights.hdf5'


 class BestModel(keras.callbacks.Callback):

     def __init__(self, output_dir, verbose=0):
         self.output_dir = output_dir
         self.verbose = verbose

     def on_train_end(self, logs={}):
         weights_file = self.output_dir + self.model.name + '_weights.hdf5'
         self.model.load_weights(weights_file)

Hudler on 4 Nov 2016

👍2

Has this issue been resolved or addressed? What is the optimal way to return the best weights from any given epoch based on metrics obtained?

jerpint on 14 Mar 2017

👍1

@jerpint Currently, fit does not have a return_best_model parameter. See fit docs.

At the moment, the best way to save the best model is to use the ModelCheckpoint:

from keras.callbacks import ModelCheckpoint
...
mcp = ModelCheckpoint(model_chk_path, monitor="val_acc",
                      save_best_only=True, save_weights_only=False)
model.fit(X_train, Y_train,
          batch_size=batch_size,
          epochs=nb_epoch,
          validation_data=(X_test, Y_test),
          shuffle=True,
          callbacks=[mcp])

MartinThoma on 28 Apr 2017

👍2

@MartinThoma I think you meant the best way to save the best model, am I correct? My understanding is that after fitting for n epochs if I need to predict using the best model I still need to explicitly load it before predicting.

roebius on 8 May 2017

👍2

@roebius Right, I wanted to write "save". Thank you, I've edited it. Yes, I also think that you would need to load it from the checkpoint.

MartinThoma on 8 May 2017

I think changing the default to return the best model is a smart idea... it is definitely confusing. At the very least, the Keras examples should show it very prominently.

As a new user, you shouldn't have to go into a Github issue in order to return the best model.

Benjamin-Lee on 22 Aug 2017

👍19

@Hudler do you have a similar class for EarlyStopping?

I think you might be describing the solution to my problem, but I'm not sure.

When I try to put a KerasClassifier into GridSearch (on a subset of data), I get the following output:

````
(...)
Epoch 00021: val_acc did not improve

173/173 [==============================] - 0s - loss: 0.5392 - acc: 0.7225 - val_loss: 0.3749 - val_acc: 1.0000
109/109 [==============================] - 0s
192/217 [=========================>....] - ETA: 0s
217/217 [==============================] - 0s
Train on 173 samples, validate on 44 samples
Epoch 1/200
64/173 [==========>...................] - ETA: 0s - loss: 0.7012 - acc: 0.5156
96/173 [===============>..............] - ETA: 0s - loss: 0.7198 - acc: 0.4896
128/173 [=====================>........] - ETA: 0s - loss: 0.7217 - acc: 0.5156
160/173 [==========================>...] - ETA: 0s - loss: 0.7209 - acc: 0.5188Epoch 00000: early stopping
Epoch 00000: val_acc did not improve

173/173 [==============================] - 0s - loss: 0.7216 - acc: 0.5145 - val_loss: 0.6673 - val_acc: 0.7273
32/109 [=======>......................] - ETA: 0s
109/109 [==============================] - 0s
217/217 [==============================] - 0s
Train on 174 samples, validate on 44 samples
Epoch 1/200
32/174 [====>.........................] - ETA: 0s - loss: 0.7479 - acc: 0.5000
64/174 [==========>...................] - ETA: 0s - loss: 0.7409 - acc: 0.5156
96/174 [===============>..............] - ETA: 0s - loss: 0.7245 - acc: 0.5938
128/174 [=====================>........] - ETA: 0s - loss: 0.7170 - acc: 0.6172
160/174 [==========================>...] - ETA: 0s - loss: 0.7098 - acc: 0.6312Epoch 00000: early stopping
Epoch 00000: val_acc did not improve

174/174 [==============================] - 0s - loss: 0.7063 - acc: 0.6322 - val_loss: 0.7228 - val_acc: 0.1591
108/108 [==============================] - 0s
64/218 [=======>......................] - ETA: 0s
128/218 [================>.............] - ETA: 0s
192/218 [=========================>....] - ETA: 0s
218/218 [==============================] - 0s
Train on 173 samples, validate on 44 samples
Epoch 1/200
48/173 [=======>......................] - ETA: 0s - loss: 0.6862 - acc: 0.5417
96/173 [===============>..............] - ETA: 0s - loss: 0.6757 - acc: 0.6250
144/173 [=======================>......] - ETA: 0s - loss: 0.6713 - acc: 0.5972Epoch 00000: early stopping
Epoch 00000: val_acc did not improve
````

Apparently, it seems that since the same instance of EarlyStopping is used as in the previous iteration of grid search, if the model does not improve immediately at the first epoch with respect to what was saved as "best" before, the code stops immediately. This obviously makes GridSearch obsolete.

FrugoFruit90 on 30 Oct 2017

This post gives very detailed description of how to use ModelCheckpoint in Keras. Hope it will help.

mingfeisun on 13 Nov 2017

@FrugoFruit90

Actually, I ran multiple instances of my python script to perform the gridsearch.

However you can write your custom EarlyStopping for example by changing the 'on_train_end' method of the default class https://github.com/fchollet/keras/blob/master/keras/callbacks.py#L505 . You need to set self.best = np.Inf (or -Inf) and self.wait = 0.

Hudler on 13 Nov 2017

❤2

I tried @jerpint recommendation above (copy/paste below). Yet Keras still not giving the best model results. I managed to get val_acc = 1.00, see output below. However when I ran predict_proba and evaluate it gave me much worse results. Can someone please help me understand what is going on??
using Keras==2.1.2

from keras.callbacks import ModelCheckpoint ... mcp = ModelCheckpoint(model_chk_path, monitor="val_acc", save_best_only=True, save_weights_only=False) model.fit(X_train, Y_train, batch_size=batch_size, epochs=nb_epoch, validation_data=(X_test, Y_test), shuffle=True, callbacks=[mcp])

28/28 [==============================] - 0s 393us/step - loss: 1.2081 - acc: 0.7500 - val_loss: 16.0314 - val_acc: 0.0000e+00
Epoch 24/30
Epoch 00024: val_acc did not improve

28/28 [==============================] - 0s 357us/step - loss: 1.1232 - acc: 0.8214 - val_loss: 16.0096 - val_acc: 0.0000e+00
Epoch 25/30
Epoch 00025: val_acc did not improve

28/28 [==============================] - 0s 507us/step - loss: 1.1784 - acc: 0.7500 - val_loss: 16.4731 - val_acc: 0.0000e+00
Epoch 26/30
Epoch 00026: val_acc did not improve

28/28 [==============================] - 0s 357us/step - loss: 1.0180 - acc: 0.8929 - val_loss: 14.6493 - val_acc: 0.0000e+00
Epoch 27/30
Epoch 00027: val_acc did not improve

28/28 [==============================] - 0s 357us/step - loss: 1.0103 - acc: 0.8571 - val_loss: 7.5434 - val_acc: 0.0000e+00
Epoch 28/30
Epoch 00028: val_acc did not improve

28/28 [==============================] - 0s 322us/step - loss: 0.8813 - acc: 0.8929 - val_loss: 0.6771 - val_acc: 1.0000
Epoch 29/30
Epoch 00029: val_acc did not improve

28/28 [==============================] - 0s 447us/step - loss: 0.9386 - acc: 0.8571 - val_loss: 0.6273 - val_acc: 1.0000
Epoch 30/30
Epoch 00030: val_acc did not improve

28/28 [==============================] - 0s 357us/step - loss: 0.8414 - acc: 0.9286 - val_loss: 0.6074 - val_acc: 1.0000

jarlva on 13 Jan 2018

Is there any way to still get the best mode without saving/loading the model to the disk?

louis925 on 27 Jan 2018

I modified the ModelCheckpoint callback so that it can store and reset to the best result at the end of training. The weight will only store in the memory no need to write to the disk.

import numpy as np
from keras.callbacks import Callback
class GetBest(Callback):
    """Get the best model at the end of training.
    # Arguments
        monitor: quantity to monitor.
        verbose: verbosity mode, 0 or 1.
        mode: one of {auto, min, max}.
            The decision
            to overwrite the current stored weights is made
            based on either the maximization or the
            minimization of the monitored quantity. For `val_acc`,
            this should be `max`, for `val_loss` this should
            be `min`, etc. In `auto` mode, the direction is
            automatically inferred from the name of the monitored quantity.
        period: Interval (number of epochs) between checkpoints.
    # Example
        callbacks = [GetBest(monitor='val_acc', verbose=1, mode='max')]
        mode.fit(X, y, validation_data=(X_eval, Y_eval),
                 callbacks=callbacks)
    """

    def __init__(self, monitor='val_loss', verbose=0,
                 mode='auto', period=1):
        super(GetBest, self).__init__()
        self.monitor = monitor
        self.verbose = verbose
        self.period = period
        self.best_epochs = 0
        self.epochs_since_last_save = 0

        if mode not in ['auto', 'min', 'max']:
            warnings.warn('GetBest mode %s is unknown, '
                          'fallback to auto mode.' % (mode),
                          RuntimeWarning)
            mode = 'auto'

        if mode == 'min':
            self.monitor_op = np.less
            self.best = np.Inf
        elif mode == 'max':
            self.monitor_op = np.greater
            self.best = -np.Inf
        else:
            if 'acc' in self.monitor or self.monitor.startswith('fmeasure'):
                self.monitor_op = np.greater
                self.best = -np.Inf
            else:
                self.monitor_op = np.less
                self.best = np.Inf

    def on_train_begin(self, logs=None):
        self.best_weights = self.model.get_weights()

    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        self.epochs_since_last_save += 1
        if self.epochs_since_last_save >= self.period:
            self.epochs_since_last_save = 0
            #filepath = self.filepath.format(epoch=epoch + 1, **logs)
            current = logs.get(self.monitor)
            if current is None:
                warnings.warn('Can pick best model only with %s available, '
                              'skipping.' % (self.monitor), RuntimeWarning)
            else:
                if self.monitor_op(current, self.best):
                    if self.verbose > 0:
                        print('\nEpoch %05d: %s improved from %0.5f to %0.5f,'
                              ' storing weights.'
                              % (epoch + 1, self.monitor, self.best,
                                 current))
                    self.best = current
                    self.best_epochs = epoch + 1
                    self.best_weights = self.model.get_weights()
                else:
                    if self.verbose > 0:
                        print('\nEpoch %05d: %s did not improve' %
                              (epoch + 1, self.monitor))            

    def on_train_end(self, logs=None):
        if self.verbose > 0:
            print('Using epoch %05d with %s: %0.5f' % (self.best_epochs, self.monitor,
                                                       self.best))
        self.model.set_weights(self.best_weights)

Then you can use it by

callbacks = [GetBest(monitor='val_acc', verbose=1, mode='max')]
mode.fit(X, y, validation_data=(X_eval, Y_eval), callbacks=callbacks)

model.fit() will now return the best result. Perhaps, keras can include something like this into the library.

louis925 on 28 Jan 2018

👍18 🎉5

@thanks louis925 for this nice solution.,
I agree with most users here: working with Keras and earlystop since a couple of months, I only now, by accident realise, that earlystop does NOT restore the best found value. I expected Early-stop to do so, simply because such behaviour will be desired in most cases.
So at least it would be great, that Keras tells this prominently in its documentation, and provides a handy solution - an additional parameter to earlystop would be the best. Such parameter and its explanation in the docu would make clear the issue to everyone, and provide an easy solution at the same time. Dont see any drawback for this.

js1285 on 11 Mar 2018

louis925
I had a situation, where I initialised your GetBest() CallBack functiononce, but called model.fit() multiple times (for cross validation). I wanted the model to re-initialise the incumbent (self.best) for each training run (find the best weights for each .fit(), not over all folds). Therefore I added an additional parameter "reset" to your code:

def __init__(self, monitor='val_loss', verbose=0,
             mode='auto', period=1, reset = True):
    super(GetBest, self).__init__()
    self.monitor = monitor
    self.verbose = verbose
    self.period = period
    self.reset = reset
    self.best_epochs = 0
    self.epochs_since_last_save = 0

.......

def on_train_begin(self, logs=None):
    if self.reset == True:      # useful if multiple calls of fit(), e.g. during cross validation 
        self.best = np.Inf if self.monitor_op == np.less else -np.Inf
    self.best_weights = self.model.get_weights()

js1285 on 11 Mar 2018

👍2

js1285 - I don't think any information is retained in the callback between successive calls to fit (_i.e.,_ a new instance is created), hence self.best will not retain its past value. You might have to create a class/static variable to do this.

csankar69 on 18 May 2018

👍1

Any updates regarding this issue?

vlizanae on 27 Oct 2018

I did shorten the answer in https://github.com/keras-team/keras/issues/2768#issuecomment-361070688 and submit a pull request as shown above.

alvarouc on 6 May 2019

🎉2

There's also the option of the EarlyStopping callback with the restore_best_weights.

This runs at each epoch end.

wfranceys on 2 Mar 2020

There's also the option of the EarlyStopping callback with the restore_best_weights.

This runs at each epoch end.

Specifically, it only runs when patience is exceeded - meaning if the model achieves best performance < patience epochs before the epoch limit, patience is not exceeded and the best weights are not restored

e.g.
epochs = 500
patience = 10
best performance at epoch = 495
patience not exceeded and best weights not restored

relevant code from tf 2.2.0 release:
https://github.com/tensorflow/tensorflow/blob/2b96f3662bd776e277f86997659e61046b56c315/tensorflow/python/keras/callbacks.py#L1479-L1485