Transformers: 馃殌 Add early stopping to the trainer

Created on 10 Jun 2020  路  7Comments  路  Source: huggingface/transformers

馃殌 Feature request

The trainer (pt, tf) is an easy access point for users who rather not spend too much time building their own trainer class but prefer an out-of-the-box solution. Even though transformers was never meant to be a fully fletched training library, it might please users to add an additional feature: early stopping.

Motivation

Early stopping ensures that the trainer does not needlessly keep training when the loss does not improve. This saves time, money, and let's not forget the trees. 馃槈 Performance-wise this should not lead to different results.

Your contribution

At the moment I cannot work on this, but here are my thoughts:

  • a training argument should be added (pt, tf). This would only work when evaluate_during_training is enabled.
  • for PyTorch: at every evaluation step, an early stopper (can be a separate class even) checks if the loss has improved in the last n steps. Potentially with a minimal threshold that the loss should have improved. If not, the trainer should stop
  • for Tensorflow: I don't have experience with TF myself, but I assume one could use tf.keras.callbacks.EarlyStopping.
Good First Issue High-Level feature

Most helpful comment

Looking at the interest this topic has, I am bumping it to re-open it.

All 7 comments

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Looking at the interest this topic has, I am bumping it to re-open it.

Hi,

So when #4186 is closed, this will close as well? Or is there any more changes expected. on this issue, apart from what #4186 adds?

Thanks

If I've understood things correctly, I think #4186 only addresses the Pytorch implementation of the trainer. @BramVanroy if that's the case I'm happy to work on implementing this feature in Tensorflow (trainer_tf.py).

@san7988 @KMFODA This issue should not directly be closed when that PR is merged because as @KMFODA mentions, it only seems to address PyTorch. A PR for Tensorflow is also welcome!

Thanks for clarifying @BramVanroy. Apologies I was out for the past month due to a personal issue. I'll submit a PR for Tensorflow early stopping now.

An early stopping callback has now been introduced in the PyTorch trainer by @cbrochtrup! 馃憦

AFAIK the implementation the TF Trainer is still under way (https://github.com/huggingface/transformers/pull/7533) so I'll keep this topic open for now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yspaik picture yspaik  路  3Comments

lemonhu picture lemonhu  路  3Comments

alphanlp picture alphanlp  路  3Comments

guanlongtianzi picture guanlongtianzi  路  3Comments

HansBambel picture HansBambel  路  3Comments