Hey all,
I think it would be useful if we had a similar framework for momentum schedulers as we do for learning rate schedulers. The implementation would probably be pretty easy, and almost identical to the learning rate schedulers since the momentum value is just stored alongside the learning rate in each param_group dict. This would allow someone to imitate Leslie Smith's 1 cycle policy, for example (see https://arxiv.org/abs/1803.09820 and https://sgugger.github.io/the-1cycle-policy.html#the-1cycle-policy). I'd be willing to submit a PR if this seems reasonable.
Is this as simple as implementing a new "learning rate scheduler" that instead of modifying the learning rate instead changes the momentum? Or does this require introducing a new abstraction and API for momentum scheduling?
@matt-peters it could be. It might also make sense to create something like a "BaseScheduler" that both the learning rate and momentum schedulers could inherit from, then it's just a matter of which field in the param_group to update
Makes sense. This would be a good addition, I'm on board 馃憤
Closing, since this was implemented in #2469.
Most helpful comment
Makes sense. This would be a good addition, I'm on board 馃憤