Keras: Large Differences in Performance between Keras 2.0.3 and Keras 1.1.0 – Potential Bug?

Created on 21 Apr 2017 · 4Comments · Source: keras-team/keras

I’ve upgraded to the new verison of Keras and I’m noticing a huge a drop in the performance: from 200 seconds per epoch on the old Keras to 40,000 seconds per epoch on the new Keras.

Attached are two shots showing training on the old Keras and on the new. The models have same architecture are confirmed to both be utilizing the same GPU during training. They are both coded with the functional API. The problem does not change when using different tensorflow backends (0.12 and 1.0).

After doing these side-by-side tests, I’m thinking this may be a bug with Keras.

Training Output of Keras 1.1.0 (270 sec/epoch):
kerasold

Training Output of Keras 2.0.3 (38000 sec/epoch):

kerasnew

Source

mjs-wpi

Most helpful comment

Keras 1 fit_generator goes by total number of samples, which should be bs*1000. Keras 2 fit_generator goes by number of batches, which should be 1000. That would explain some of it. You're off by a factor of 140 and that would explain 32 of it.

I was thinking the stride changes might've been it but they all look correct. Need to read the code closely. The model.summary looks exactly the same for both? What is the output?

If there is any difference between keras1 and keras2, it could be from the BatchNorm changes. Do you see no difference if you just comment out all the batchnorm layers from both files?

Cheers

bstriner on 21 Apr 2017

👍3

All 4 comments

No one can help without seeing any code. Lots of possible explanations. For example, the code for calculating the output shape moved. If you're using some old output shape calculation then something could be off by an order of magnitude. There is nothing in Keras-2 that would slow things down like that. You have to be doing something in your code that is causing the issue.

Cheers

bstriner on 21 Apr 2017

Hi Ben,

Thanks for your response. Attached is a copy of my code, written in Keras 1.1.0 and Keras 2.0.3. From what I can tell, they are the same models. Keras 2.0.3 runs much slower.

keras1.1.0.txt

keras2.0.3.txt

mjs-wpi on 21 Apr 2017

I was thinking the stride changes might've been it but they all look correct. Need to read the code closely. The model.summary looks exactly the same for both? What is the output?

If there is any difference between keras1 and keras2, it could be from the BatchNorm changes. Do you see no difference if you just comment out all the batchnorm layers from both files?

Cheers

bstriner on 21 Apr 2017

👍3

Hi Ben, thanks for finding the error in my generator. My batch size is actually 128, not 32, so I'm getting comparable run times to the Keras 1.1.0 version now that it's fixed.

Model summaries are exactly the same for both, and below is a screen shot of the Keras 2.0.3 run with the corrections you suggested.

Thanks so much for your help - I'm happy that it was a stupid mistake on my part and not a bug.