Transformers: Add beta 1 and beta 2 option in `TrainingArguments` for `AdamW` optimizer.

Created on 12 Jul 2020  路  3Comments  路  Source: huggingface/transformers

I want to set the Adam Optimizers beta 2 to 0.98 - this is because I want to train a new RoBERTa LM. The paper sais, that it improves stability. The default is 0.999 and it can not be set in TrainingArguments.

Could you please add the option to specify beta 1 and 2 for AdamW in the TrainingArguments? adam_epsilon can already be specified. If you want me to I can provide a PR.

What do you think?

Most helpful comment

Closing this in favor of #5592

All 3 comments

I'm not sure we would like to add this to the TrainingArguments. If we add all possible params this could quickly explode. Note that you can instantiate your own optimizer and pass it here: https://github.com/huggingface/transformers/blob/7096e47513127d4f072111a7f58f109842a2b6b0/src/transformers/trainer.py#L158

Also pinging @julien-c here.

Well - my arguments for this change is that adam_epsilon is possible to be set set and so beta 1 and 2 should also be possible to be set. Especialy because the RoBERTa paper suggests an others setting then default.

2nd argument is that it is not that easy to instantiate your own optimizer because there is a dependency to model. See here: https://github.com/huggingface/transformers/blob/7096e47513127d4f072111a7f58f109842a2b6b0/src/transformers/trainer.py#L326-L335

Closing this in favor of #5592

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fyubang picture fyubang  路  3Comments

HansBambel picture HansBambel  路  3Comments

guanlongtianzi picture guanlongtianzi  路  3Comments

yspaik picture yspaik  路  3Comments

siddsach picture siddsach  路  3Comments