Hi everyone,
I recently updated Keras to version 2.0.4 and I saw a big drop in performance compared to Keras 1.2.2.
I'm basically trying to find a mapping between movie titles and movie plot summaries (called synopses) using seq2seq LSTMs. While my model was pretty fast with Keras 1.2.2, it took a lots of time to train in the last version.
model = Sequential()
model.add(Embedding(vocab_size, EMBEDDING_DIM, input_length=MAX_LENGTH, mask_zero=True))
model.add(LSTM(1024, return_sequences=True))
model.add(LSTM(1024, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer=(Adam()), metrics=['accuracy'])
I did not change anything to this model or to the code in general between Keras 1.2.2 and Keras 2.0.4, trained on a Nvidia Quadro K6000 with Theano up-to-date in both cases and simply ran: pip uninstall keras and pip install keras to update it.
However, the training is much slower.
Using Keras 1.2.2:
Epoch 1/10
100/100 [==============================] - 5s - loss: 2.2261 - acc: 0.8375
Using Keras 2.0.4:
Epoch 1/10
100/100 [==============================] - 18s - loss: 3.3189 - acc: 0.8277
In the example above, the delta is only about a few seconds but when training on the whole dataset, it is much much slower.
Does anyone know what I'm doing wrong and could point me in the right direction ?
Could you look, if your GPU is really used?
watch nvidia-smi
Yes, I'm sure my GPU is used to perform the calculations:


I was more worried about my model, actually. Maybe there's an optimization I need to apply in Keras 2.0.4 to make it better since I didn't change anything to the code (simply updated Keras' version). Unfortunately, I received no warnings or errors from Keras' APIs.
But the perf drop tells me there's definitely something going on behind the scenes.
Well, 0% GPU is used.
Maybe starting the training at the same time would have helped... Sorry for that:

Did you switched backends? Tensorflow is the new default since version 2.
No I didn,'t. My Keras config is still the same:
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}
Should I switch to Tensorflow ?
I don't see any reasons to switch to Tensorflow.
What is you image_data_format? Channel first oder last?
I don't know what this means, but here we are:

EDIT: I switched to channels_first but got the same results...
I'm using the latest Theano gpu backend gpuarray, it might be important: Theano gpuarray
@Blockost how do you run the training? Note that if you are using fit_generator the parameters have changed in Keras 2. You might be running more steps than previously.
- model.fit_generator(train_generator, train_gen.n, 1) (old way)
+ model.fit_generator(train_generator, train_gen.n/train_gen.batch_size, 1) (new way)
@holli I don't use fit_generator(). I simply use fit() and pass my dataset as a whole in parameters
Hi @holli , I'm using fit_generator and I couldn't figure out why keras 2 is much slower than 1.2. A pice of my code is
# fit the model on the batches generated by datagen.flow()
if int(keras.__version__.split('.')[0]) < 2:
# Keras 1.2
model.fit_generator(datagen.flow(img_train, gt_train,
batch_size=batch_size, shuffle=True,),
samples_per_epoch=data_augmentation,
nb_epoch=nb_epoch,
validation_data=(img_test, gt_test),
verbose=1, callbacks=callbacks)
else:
# Keras 2.0.4
# Total number of steps (batches of samples) to yield from generator
steps_per_epoch = data_augmentation / batch_size
model.fit_generator(datagen.flow(img_train, gt_train,
batch_size=batch_size, shuffle=True,),
steps_per_epoch=steps_per_epoch,
epochs=nb_epoch,
validation_data=(img_test, gt_test),
verbose=1, callbacks=callbacks)
so I suppose that I'm using the fit_generator in the right way. Anyone has experienced something like this?
_data_augmentation_ is really a weird name. Is this the training sample size?
Yes, it is the number of training samples per epoch.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
I'm experiencing the same issue since my update to the Keras 2 API (even after correctly making changes to migrate from samples_per_epoch to steps_per_epoch). @Blockost, Could you figure out why this is happening?
@sir-avinash Sorry, but I don't use Keras anymore. If I remember correctly, the quick fix was to downgrade back to 1.2.2 to save some training time... Sorry for not being able to help much with that.
Most helpful comment
@Blockost how do you run the training? Note that if you are using fit_generator the parameters have changed in Keras 2. You might be running more steps than previously.