Keras: Strange behaviour for Adam optimizer when using different keras syntax

Created on 1 Mar 2017  路  7Comments  路  Source: keras-team/keras

Hi there!

I noticed a very strange behaviour in a simple classification task depending on the keras syntax I use.
In my classification task I have two groups and therefore put a Dense(2) Keras layer plus a softmax activation at the end of my model. If I compile the model, there are two different syntaxes from which one can choose. Strangely, I noticed that for one version the model does learn something (I don't care if the model is just fitting noise at this point) and for one it doesn't. Down below you can see the two versions. They are almost identical except for the way the optimizer is passed to the compile() method.
This only happens with my data, the Adam optimizer and the softmax layer at the end of the model. Very strange.

Does anyone have even the slightest idea what's going on there? I don't have any explanation for that behaviour but now have developed some mistrust in the analyses I did before.
Thanks a lot for your help!

You can download my data from: https://drive.google.com/open?id=0B_3vtX0VzrOMVEIwdFo2U2FsU0E

import numpy as np
from keras.optimizers import Adam
from keras.models import Sequential
from keras.layers import Dense
from keras.layers.core import Activation
import matplotlib.pyplot as plt
%matplotlib inline


# Load data
data = np.load('/YOUR-PATH-TO-DATA/data_keras_bug.npy')
labels = np.load('/YOUR-PATH-TO-DATA/labels_keras_bug.npy')
input_dim = data.shape[1]


# Create first model
model1 = Sequential()
model1.add(Dense(1500, input_dim=input_dim))
model1.add(Activation('relu'))
model1.add(Dense(2))
model1.add(Activation('softmax'))
# declare the optimizer within the compile() method 
model1.compile(loss='categorical_crossentropy', optimizer='adam', lr=0.01, metrics=['accuracy'])


# Create second equivalent model
model2 = Sequential()
model2.add(Dense(1500, input_dim=input_dim))
model2.add(Activation('relu'))
model2.add(Dense(2))
model2.add(Activation('softmax'))
# declare the optimizer explicitly before passing it to the compile() method
optimizer = Adam(lr=0.01)
model2.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])


# fit both models the same way
results1 = model1.fit(data, labels, batch_size=32, nb_epoch=500, verbose=0)
results2 = model2.fit(data, labels, batch_size=32, nb_epoch=500, verbose=0)

# plot the training accuracies
plt.plot(results1.history['acc'])
plt.plot(results2.history['acc'],'k')
plt.show()

output_5_0

Most helpful comment

Without running the code, one big thing jumps out at me:
model1.compile(loss='categorical_crossentropy', optimizer='adam', lr=0.01, metrics=['accuracy'])

Models don't take a learning rate lr as a meaningful parameter. Keyword arguments are supported for Theano as you can pass in other parameters to a Theano function, but there is nothing specific (to either backend) in terms of setting the optimizer's Learning Rate within the model compilation.

In short, model1 has a learning rate of 0.001 (the default for Adam) and model 2 has a learning rate 10x higher of 0.01; for that reason I'd expect the models to behave differently.

To answer the question about why the accuracy graph for model2 looks the way it does: I assume the large learning rate has pushed the model to either a) a very poor local minimum (returning 50%) accuracy or b) an unstable state to a NaN, inf, or similar state caused by gradient explosion.

All 7 comments

Hello,

You are not using the correct syntax in model 1 to pass lr to model.compile

https://github.com/fchollet/keras/blob/master/keras/engine/training.py#L495

kwargs: when using the Theano backend, these arguments
are passed into K.function. Ignored for Tensorflow backend.

Without running the code, one big thing jumps out at me:
model1.compile(loss='categorical_crossentropy', optimizer='adam', lr=0.01, metrics=['accuracy'])

Models don't take a learning rate lr as a meaningful parameter. Keyword arguments are supported for Theano as you can pass in other parameters to a Theano function, but there is nothing specific (to either backend) in terms of setting the optimizer's Learning Rate within the model compilation.

In short, model1 has a learning rate of 0.001 (the default for Adam) and model 2 has a learning rate 10x higher of 0.01; for that reason I'd expect the models to behave differently.

To answer the question about why the accuracy graph for model2 looks the way it does: I assume the large learning rate has pushed the model to either a) a very poor local minimum (returning 50%) accuracy or b) an unstable state to a NaN, inf, or similar state caused by gradient explosion.

Oh wow, that was super quick and super helpful! Thanks a lot to the both of you!!

I encountered a similar issue:
Using adaptive learning optimizer gave extremely different learning behaviors. Specifically, one of the two syntax worked fine, however the other seems to lead to 'no learning'. Please see below for details, in case you're running into the same problem.
Syntax 1: this one gave no learning. Training loss didn't decrease, even with a very small data-set and very complex model. The training loss and accuracy just heavily 'oscillated'
model.compile(metrics=['accuracy'], loss='binary_crossentropy', optimizer='adam' )

Syntax 2: this one gave reasonable learning, in the sens the loss on training and validation decreased
from keras.optimizers import Adam
model.compile(metrics=['accuracy'], loss='binary_crossentropy', optimizer=Adam(lr=1e-3) )

@ShunyuanZ i actually got an opposite result. when using string to specify optimizer, i have reasonable learning but if pass as optimizer object, it learns nothing.
Anyone encounters this in the latest version also?

In my case it's even worse than that : the only way it works in my case is with a TF optimizer (from tf.train) that I wrap in TFOptimizer. I have tried both the object and the string/dictionary with the exact same parameters with no success. I am using RL though. I'll try to come up with a minimal example.
I updated to Keras 2.2.2 and TensorFlow 1.10 with no success. I was at 2.1.5 (I think) and TF 1.6.0 before.

@ShunyuanZ i actually got an opposite result. when using string to specify optimizer, i have reasonable learning but if pass as optimizer object, it learns nothing.
Anyone encounters this in the latest version also?

I'm facing the exact same issue as @flyfj . I have reasonable learning when I pass adam as string to the model compile line but no learning at all when I pass the Adam optimizer object.
adam = optimizers.Adam(lr=0.01)
rms = optimizers.RMSprop(lr=0.001)
1) model.compile(loss='binary_crossentropy',
optimizer="adam",
metrics=['accuracy']) # Works

2) adam = optimizers.Adam(lr=0.01)
model.compile(loss='binary_crossentropy',
optimizer=adam,
metrics=['accuracy']) # Does not work
I tried RMSprop but this issue does not seem to reproduce with that optimizer. Anybody know what's going on? Any help would be appreciated.
The Keras version is '2.2.4' with tensorflow's '1.14.0' version.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rantsandruse picture rantsandruse  路  3Comments

zygmuntz picture zygmuntz  路  3Comments

fredtcaroli picture fredtcaroli  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

somewacko picture somewacko  路  3Comments