Keras: Training much slower on Keras 2.0.4 than on Keras 1.2.2

Created on 31 May 2017 · 17Comments · Source: keras-team/keras

Hi everyone,

I recently updated Keras to version 2.0.4 and I saw a big drop in performance compared to Keras 1.2.2.

I'm basically trying to find a mapping between movie titles and movie plot summaries (called synopses) using seq2seq LSTMs. While my model was pretty fast with Keras 1.2.2, it took a lots of time to train in the last version.

model = Sequential()
model.add(Embedding(vocab_size, EMBEDDING_DIM, input_length=MAX_LENGTH, mask_zero=True))
model.add(LSTM(1024, return_sequences=True))
model.add(LSTM(1024, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer=(Adam()), metrics=['accuracy'])

I did not change anything to this model or to the code in general between Keras 1.2.2 and Keras 2.0.4, trained on a Nvidia Quadro K6000 with Theano up-to-date in both cases and simply ran: pip uninstall keras and pip install keras to update it.

However, the training is much slower.

Using Keras 1.2.2:
Epoch 1/10 100/100 [==============================] - 5s - loss: 2.2261 - acc: 0.8375

Using Keras 2.0.4:
Epoch 1/10 100/100 [==============================] - 18s - loss: 3.3189 - acc: 0.8277

In the example above, the delta is only about a few seconds but when training on the whole dataset, it is much much slower.

Does anyone know what I'm doing wrong and could point me in the right direction ?

stale

Source

Blockost

👍2

Most helpful comment

@Blockost how do you run the training? Note that if you are using fit_generator the parameters have changed in Keras 2. You might be running more steps than previously.

- model.fit_generator(train_generator, train_gen.n, 1)  (old way)
+ model.fit_generator(train_generator, train_gen.n/train_gen.batch_size, 1) (new way)

holli on 16 Jun 2017

👍4 🎉2

All 17 comments

Could you look, if your GPU is really used?

watch nvidia-smi

StefanoD on 31 May 2017

Yes, I'm sure my GPU is used to perform the calculations:
selection_39
selection_40

I was more worried about my model, actually. Maybe there's an optimization I need to apply in Keras 2.0.4 to make it better since I didn't change anything to the code (simply updated Keras' version). Unfortunately, I received no warnings or errors from Keras' APIs.

But the perf drop tells me there's definitely something going on behind the scenes.

Blockost on 1 Jun 2017

Well, 0% GPU is used.

StefanoD on 1 Jun 2017

Maybe starting the training at the same time would have helped... Sorry for that:

selection_41

Blockost on 1 Jun 2017

Did you switched backends? Tensorflow is the new default since version 2.

StefanoD on 1 Jun 2017

No I didn,'t. My Keras config is still the same:

{ "image_dim_ordering": "th", "epsilon": 1e-07, "floatx": "float32", "backend": "theano" }

Should I switch to Tensorflow ?

Blockost on 1 Jun 2017

I don't see any reasons to switch to Tensorflow.

StefanoD on 1 Jun 2017

What is you image_data_format? Channel first oder last?

StefanoD on 1 Jun 2017

I don't know what this means, but here we are:
selection_42

EDIT: I switched to channels_first but got the same results...
I'm using the latest Theano gpu backend gpuarray, it might be important: Theano gpuarray

Blockost on 1 Jun 2017

@Blockost how do you run the training? Note that if you are using fit_generator the parameters have changed in Keras 2. You might be running more steps than previously.

- model.fit_generator(train_generator, train_gen.n, 1)  (old way)
+ model.fit_generator(train_generator, train_gen.n/train_gen.batch_size, 1) (new way)

holli on 16 Jun 2017

👍4 🎉2

@holli I don't use fit_generator(). I simply use fit() and pass my dataset as a whole in parameters

Blockost on 16 Jun 2017

Hi @holli , I'm using fit_generator and I couldn't figure out why keras 2 is much slower than 1.2. A pice of my code is

  # fit the model on the batches generated by datagen.flow()
  if int(keras.__version__.split('.')[0]) < 2:
        # Keras 1.2
        model.fit_generator(datagen.flow(img_train, gt_train,
                                     batch_size=batch_size, shuffle=True,),
                        samples_per_epoch=data_augmentation,
                        nb_epoch=nb_epoch,
                        validation_data=(img_test, gt_test),
                        verbose=1, callbacks=callbacks)
    else:
        # Keras 2.0.4

        # Total number of steps (batches of samples) to yield from generator
        steps_per_epoch = data_augmentation / batch_size

        model.fit_generator(datagen.flow(img_train, gt_train,
                                     batch_size=batch_size, shuffle=True,),
                        steps_per_epoch=steps_per_epoch,
                        epochs=nb_epoch,
                        validation_data=(img_test, gt_test),
                        verbose=1, callbacks=callbacks)

so I suppose that I'm using the fit_generator in the right way. Anyone has experienced something like this?

curiale on 28 Jul 2017

_data_augmentation_ is really a weird name. Is this the training sample size?

StefanoD on 28 Jul 2017

Yes, it is the number of training samples per epoch.

curiale on 28 Jul 2017

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 26 Oct 2017

I'm experiencing the same issue since my update to the Keras 2 API (even after correctly making changes to migrate from samples_per_epoch to steps_per_epoch). @Blockost, Could you figure out why this is happening?

sir-avinash on 30 Mar 2018

@sir-avinash Sorry, but I don't use Keras anymore. If I remember correctly, the quick fix was to downgrade back to 1.2.2 to save some training time... Sorry for not being able to help much with that.

Blockost on 30 Mar 2018

Was this page helpful?

0 / 5 - 0 ratings