I’ve upgraded to the new verison of Keras and I’m noticing a huge a drop in the performance: from 200 seconds per epoch on the old Keras to 40,000 seconds per epoch on the new Keras.
Attached are two shots showing training on the old Keras and on the new. The models have same architecture are confirmed to both be utilizing the same GPU during training. They are both coded with the functional API. The problem does not change when using different tensorflow backends (0.12 and 1.0).
After doing these side-by-side tests, I’m thinking this may be a bug with Keras.
Training Output of Keras 1.1.0 (270 sec/epoch):

Training Output of Keras 2.0.3 (38000 sec/epoch):

No one can help without seeing any code. Lots of possible explanations. For example, the code for calculating the output shape moved. If you're using some old output shape calculation then something could be off by an order of magnitude. There is nothing in Keras-2 that would slow things down like that. You have to be doing something in your code that is causing the issue.
Cheers
Hi Ben,
Thanks for your response. Attached is a copy of my code, written in Keras 1.1.0 and Keras 2.0.3. From what I can tell, they are the same models. Keras 2.0.3 runs much slower.
Keras 1 fit_generator goes by total number of samples, which should be bs*1000. Keras 2 fit_generator goes by number of batches, which should be 1000. That would explain some of it. You're off by a factor of 140 and that would explain 32 of it.
I was thinking the stride changes might've been it but they all look correct. Need to read the code closely. The model.summary looks exactly the same for both? What is the output?
If there is any difference between keras1 and keras2, it could be from the BatchNorm changes. Do you see no difference if you just comment out all the batchnorm layers from both files?
Cheers
Hi Ben, thanks for finding the error in my generator. My batch size is actually 128, not 32, so I'm getting comparable run times to the Keras 1.1.0 version now that it's fixed.
Model summaries are exactly the same for both, and below is a screen shot of the Keras 2.0.3 run with the corrections you suggested.
Thanks so much for your help - I'm happy that it was a stupid mistake on my part and not a bug.

Most helpful comment
Keras 1
fit_generatorgoes by total number of samples, which should bebs*1000. Keras 2fit_generatorgoes by number of batches, which should be1000. That would explain some of it. You're off by a factor of 140 and that would explain 32 of it.I was thinking the stride changes might've been it but they all look correct. Need to read the code closely. The model.summary looks exactly the same for both? What is the output?
If there is any difference between keras1 and keras2, it could be from the BatchNorm changes. Do you see no difference if you just comment out all the batchnorm layers from both files?
Cheers