Transformers: Add beta 1 and beta 2 option in `TrainingArguments` for `AdamW` optimizer.

Created on 12 Jul 2020 · 3Comments · Source: huggingface/transformers

I want to set the Adam Optimizers beta 2 to 0.98 - this is because I want to train a new RoBERTa LM. The paper sais, that it improves stability. The default is 0.999 and it can not be set in TrainingArguments.

Could you please add the option to specify beta 1 and 2 for AdamW in the TrainingArguments? adam_epsilon can already be specified. If you want me to I can provide a PR.

What do you think?

Source

PhilipMay

Most helpful comment

Closing this in favor of #5592

julien-c on 14 Jul 2020

🚀1 ❤1 🎉1 😄1 👍1

All 3 comments

I'm not sure we would like to add this to the TrainingArguments. If we add all possible params this could quickly explode. Note that you can instantiate your own optimizer and pass it here: https://github.com/huggingface/transformers/blob/7096e47513127d4f072111a7f58f109842a2b6b0/src/transformers/trainer.py#L158

Also pinging @julien-c here.

patrickvonplaten on 13 Jul 2020

Well - my arguments for this change is that adam_epsilon is possible to be set set and so beta 1 and 2 should also be possible to be set. Especialy because the RoBERTa paper suggests an others setting then default.

2nd argument is that it is not that easy to instantiate your own optimizer because there is a dependency to model. See here: https://github.com/huggingface/transformers/blob/7096e47513127d4f072111a7f58f109842a2b6b0/src/transformers/trainer.py#L326-L335

PhilipMay on 13 Jul 2020

Closing this in favor of #5592

julien-c on 14 Jul 2020

🚀1 ❤1 🎉1 😄1 👍1

Was this page helpful?

0 / 5 - 0 ratings