Hi there,
thanks for the great library (I am using 0.7.5.). I am not following the bug report template as I'm not sure this is indeed a bug, or simply I cannot understand how early stopping is implemented. My code looks as follows:
early_stop_callback = EarlyStopping(
monitor='val_acc',
min_delta=0.0,
patience=80,
verbose=True,
mode=self.mode
)
trainer = Trainer(
early_stop_callback=early_stop_callback,
auto_select_gpus=True,
max_epochs=200,
terminate_on_nan=True,
show_progress_bar=True,
fast_dev_run=False,
gpus=1
)
As I understand it, the model should perform early stopping after AT LEAST 80 epochs have passed without improvement on the validation accuracy. However, in my case, early stopping happened at epoch 75. Is this how it should be?
As I said, I am not sure this is actually a bug or a choice (perhaps early stopping is implemented at the batch level?). If it is indeed a bug, I will work a reproducible example. Thank you!
Hi! thanks for your contribution!, great first issue!
I would expect that it should iterate for _at least 80 epochs_, too. So to me, it looks like a bug or some kind of unexpected behavior. Would be great to figure it out!
Ok then, I'll work out some notebook to see if I can reproduce.
Thanks @mateuszpieniak
Here is a working example. As you can see, it stops at epoch 41 even though patience is set to 80.
https://github.com/marcopodda/pl-es-example/blob/master/ES%20example.ipynb
It is definitely a bug. I discovered that EarlyStopping.on_epoch_end is executed twice within one epoch, meaning that patience=160 should solve your issue temporarily.
In the file training_loop.py:
First call:
if self.fast_dev_run or should_check_val:
self.run_evaluation(test_mode=self.testing)
self.call_checkpoint_callback()
self.call_early_stop_callback()
Second call:
# TODO wrap this logic into the callback
if self.enable_early_stop:
if (met_min_epochs and met_min_steps) or self.fast_dev_run:
should_stop = self.early_stop_callback.on_epoch_end(self, self.get_model())
# stop training
stop = should_stop and met_min_epochs
if stop:
self.run_training_teardown()
return
I upgraded to the bleeding edge version yesterday and can confirm that this started happening to me too. I didn't have an issue before I upgraded (I think I was on 0.7.3 before?)
Yep we ran into this as well. It is called once in trainer and once in the on epoch end callback.
@Anjum48 @ricpruss mind send a fix, PR?
@Borda Well, I would love to make my first PL's PR if that's okay? :wink:
@mateuszpieniak sure go ahead! :rocket:
Most helpful comment
It is definitely a bug. I discovered that
EarlyStopping.on_epoch_endis executed twice within one epoch, meaning thatpatience=160should solve your issue temporarily.In the file
training_loop.py:First call:
Second call: