Keras: Fix Adam Optimizer to Implement Paper Correctly

Created on 5 Jul 2018 · 4Comments · Source: keras-team/keras

Adam was recently demonstrated to be implemented incorrectly in several packages including Keras. Propose to fix the optimizer using method described here:

https://arxiv.org/pdf/1711.05101.pdf

Source

isaacgerg

👍2

Most helpful comment

Learning rate decay != weight decay.

They just aren't the same thing.

brge17 on 5 Jul 2018

👍2

All 4 comments

Keras doesn't call it weight decay, it is explicitly called l2 regularization (see https://keras.io/regularizers/) which is indeed accurate.

While common deep learning frameworks of these algorithms implement L2 regularization (often calling it “weight decay” in what may be misleading due to the inequivalence we expose).

Keras calls it l2 regularization.

https://arxiv.org/pdf/1705.08292.pdf

brge17 on 5 Jul 2018

👍1

I don't think that's right.

https://keras.io/optimizers/#adam

decay: float >= 0. Learning rate decay over each update.

Regarding 2, Im not sure what this has to do with the conversation. Did
you mean to say that Adam doesnt generalize as well as SGD? That's besides
the point.

On Thu, Jul 5, 2018 at 5:41 PM, brge17 notifications@github.com wrote:

1.

Keras doesn't call it weight decay, it is explicitly called l2
regularization (see https://keras.io/regularizers/) which is indeed
accurate.
2.

https://arxiv.org/pdf/1705.08292.pdf

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/10611#issuecomment-402860787,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALarq1UIB0nW3tWl5hBr05Z7VhP0dsHMks5uDogdgaJpZM4VETgP
.