I want to set the Adam Optimizers beta 2 to 0.98 - this is because I want to train a new RoBERTa LM. The paper sais, that it improves stability. The default is 0.999 and it can not be set in TrainingArguments.
Could you please add the option to specify beta 1 and 2 for AdamW in the TrainingArguments? adam_epsilon can already be specified. If you want me to I can provide a PR.
What do you think?
I'm not sure we would like to add this to the TrainingArguments. If we add all possible params this could quickly explode. Note that you can instantiate your own optimizer and pass it here: https://github.com/huggingface/transformers/blob/7096e47513127d4f072111a7f58f109842a2b6b0/src/transformers/trainer.py#L158
Also pinging @julien-c here.
Well - my arguments for this change is that adam_epsilon is possible to be set set and so beta 1 and 2 should also be possible to be set. Especialy because the RoBERTa paper suggests an others setting then default.
2nd argument is that it is not that easy to instantiate your own optimizer because there is a dependency to model. See here: https://github.com/huggingface/transformers/blob/7096e47513127d4f072111a7f58f109842a2b6b0/src/transformers/trainer.py#L326-L335
Closing this in favor of #5592
Most helpful comment
Closing this in favor of #5592