I think we could remove dampening parameter from SGD here https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py#L10, it is confusing and changes momentum if used
maybe set the default value to 0 instead of momentum
?
Made default to 0.
fixed via https://github.com/pytorch/pytorch/commit/4eb12a26bc5e3671c03f154f61076fd72fcfd233
This change of default dampening has just hit me :) Spent a few days figuring out why the ported net stopped reaching right accuracy levels. Thanks @szagoruyko for the tip.
Previously, with momentum = 0.9 (and 1 - dampening = 0.1
), the old gradient was 9x more important than the current one. Now it is only 10% less important (1 - dampening = 1
).
Possibly this is worth mentioning in
http://pytorch.org/docs/optim.html or in http://pytorch.org/tutorials/beginner/former_torchies_tutorial.html, since learning rates now need to be adjusted when porting lua Torch code, despite naive expectations.
Most helpful comment
This change of default dampening has just hit me :) Spent a few days figuring out why the ported net stopped reaching right accuracy levels. Thanks @szagoruyko for the tip.
Previously, with momentum = 0.9 (and
1 - dampening = 0.1
), the old gradient was 9x more important than the current one. Now it is only 10% less important (1 - dampening = 1
).Possibly this is worth mentioning in
http://pytorch.org/docs/optim.html or in http://pytorch.org/tutorials/beginner/former_torchies_tutorial.html, since learning rates now need to be adjusted when porting lua Torch code, despite naive expectations.