Adam was recently demonstrated to be implemented incorrectly in several packages including Keras. Propose to fix the optimizer using method described here:
l2 regularization (see https://keras.io/regularizers/) which is indeed accurate.While common deep learning frameworks of these algorithms implement L2 regularization (often calling it “weight decay” in what may be misleading due to the inequivalence we expose).
Keras calls it l2 regularization.
I don't think that's right.
https://keras.io/optimizers/#adam
Regarding 2, Im not sure what this has to do with the conversation. Did
you mean to say that Adam doesnt generalize as well as SGD? That's besides
the point.
On Thu, Jul 5, 2018 at 5:41 PM, brge17 notifications@github.com wrote:
>
1.
Keras doesn't call it weight decay, it is explicitly called l2
regularization (see https://keras.io/regularizers/) which is indeed
accurate.
2.https://arxiv.org/pdf/1705.08292.pdf
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/10611#issuecomment-402860787,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALarq1UIB0nW3tWl5hBr05Z7VhP0dsHMks5uDogdgaJpZM4VETgP
.
Learning rate decay != weight decay.
They just aren't the same thing.
Is the optimizer fixed now?
Most helpful comment
Learning rate decay != weight decay.
They just aren't the same thing.