Keras: EarlyStopping callback won't restore best weights unless training stops early

Created on 18 Mar 2019 · 11Comments · Source: keras-team/keras

The EarlyStopping callback will, if the restore_best_weights option is True, restore the best weights if and only if it requests the stopping itself, not if stopping is requested by another callback or the training loop has simply run for a given number of epochs. The call to Model.set_weights is inside the on_epoch_end method guarded by a comparison of wait and patience[1].

If I read the code correctly, then in a scenario where the best weights happened fewer than patience epochs before the last epoch, the model will keep the weights from the last epoch.

If this is intentional, I think it should be documented. If not, I think it would be an easy fix to move the weight restoring logic to the on_train_end method.

[1] https://github.com/keras-team/keras/blob/f0eb8d538c82798944346b4b2df917a06bf5e9d4/keras/callbacks.py#L823-L830

Source

Stigjb

👍13

Most helpful comment

Guys, this is super important. It happened today to us for the first time and we just found it because while the model had been evaluated with a good loss (compared to other experiments), the final results were terrible. The model reached the maximum number of epochs and thus the callback did not restored the best weights from an earlier epoch, as it was meant to do.

In my opinion this issue should be tagged as a bug, as it is still out there in the final version and master branch. It needs to be resolved as suggested by @Stigjb by moving the restoration of the weights in on_train_end, otherwise many people may run and report wrong number or stop believing in their fresh ideas!

iliaschalkidis on 14 Nov 2019

👍11

All 11 comments

I can confirm this issue on keras 2.2.4.

arr28 on 15 May 2019

hope this can be resolved

JaeDukSeo on 9 Jul 2019

Stumbled upon this issue. I think ModelCheckpoint is the way to go if you wish to restore best weights if EarlyStopping does not trigger. However, then there is no usecase I can think of for restoring best weights option in EarlyStopping as ModelCheckpoint will always save the best weights and one can restore those after model.fit. Please correct me if I'm wrong or missing anything.

manrajgrover on 24 Jul 2019

ModelCheckpoint strangely does not have the option to automatically restore the best weights like EarlyStopping does. It is not always easy to pick out the best weights saved by ModelCheckpoint in a programmatic manner: For example, if I tell ModelCheckpoint to give unique names to all the weights files it saves, then I can personally pick the best weight file by inspection, but it takes extra lines of code to automatically interpret the weight file names and pick out the best one.

So right now we can automatically restore the best model weights with EarlyStopping, but only if the training is actually stopped early. Or we can save the best weights in any case with ModelCheckpoint, but it does not have an option to automatically restore the best weights. I would love if there was one standard callback that could do both: save the best weights in any training run, and automatically restore them at the end. Presumably this would best be implemented in ModelCheckpoint, but if the option could be added to EarlyStopping like Stigjb suggests, that would be fantastic as well.

jjbuchanan on 23 Oct 2019

iliaschalkidis on 14 Nov 2019

👍11

If EarlyStopping doesn't trigger, doesn't that mean that you don't have enough epochs to begin with?

meanmikeyk on 17 Nov 2019

If EarlyStopping doesn't trigger, doesn't that mean that you don't have enough epochs to begin with?

Hi @meanmikeyk , it definitely means that the current set of hyper-parameters needed more epochs than you possibly expected. But this does not mean that the model shall not use the best parameters after training.

In my case, the total number of epochs was 50 and patience was 10. The val loss was improving until the 41st epoch and then it overfitted very quickly in the next 9 (most probably it would never change direction), but the patience was not enough, so the final model had terrible parameters. Should I set max epochs to 200 or 1000, or maybe I should also set lower patience?

There is not really great answer, but in any means I should have the right to evaluate my model given the best parameters when I chose to use early stoping with restore_best_weights=True, which did not happen because the callback is not written correctly to prevent these unfortunate situations.

iliaschalkidis on 17 Nov 2019

@meanmikeyk If EarlyStopping doesn't trigger it simply means that the training didn't proceed for patience epochs after the best scoring epoch.

There are many reasons why this could happen: Perhaps you are training the network for the first time and don't know how many epochs it really needs; perhaps some random aspect of the training causes the best epoch to arrive later than expected; or perhaps you simply have a finite amount of patience and/or computing resources and cannot train for more than some fixed time.

The fact that EarlyStopping does not save the best parameters in any case, or offer any option to do so, catches many people by surprise. This ruins long and potentially expensive training runs.

jjbuchanan on 17 Nov 2019

❤3

As I already said @jjbuchanan in worst case scenario, people will not review the training process and will just evaluate and log results. There is a great change that people will report wrong and misleading results, which may mislead some people or a big community and even worse good ideas may get ruined because they had terrible performance (based on the last terrible parameters) and get withdrawn....

iliaschalkidis on 17 Nov 2019

👍1

I think the best solution to this is to also add a restore_best_weights option to Model.fit that you can have this behavior on early termination and regular termination.

As pointed out in https://github.com/tensorflow/tensorflow/issues/35634#issuecomment-612644409, the user may want to continue training after normal termination, so EarlyStoppingCallback always overriding your weights would present a challenge. That's why I think Model.fit also needs this option.

mcmar on 29 Jul 2020

👍1

I don't believe it belongs in Model.fit, for the same reasons the EarlyStopping class exists in the first place. The Keras team could have added a patience option to Model.fit but chose not to in order to avoid further muddying that method. Let's avoid adding a 'restore_best_weights` option for the same reason.

I'm strongly of the opinion that EarlyStopping should restore the best weights if restore_best_weights is True, regardless of why training stopped. Failing to restore the weights is not the behavior of least surprise, which by some definitions makes it a bug. At best, it is ill-advised.

So what about supporting the ability to resume training? I say we add another flag, maybe weights_restored_without_early_stop, that defaults to True but if explicitly set to False will result in the current behavior.