Keras: `predict` memory leak?

Created on 30 Mar 2018 · 9Comments · Source: keras-team/keras

Specs:
Python 3.6.3
keras==2.1.5
tensorflow==1.7.0

It seems there is a memory leak in predict method. If not, please explain what I'm doing wrong:
https://gist.github.com/ilivans/fb2d61d9b5bc3d82d3d0e6eb04cf4778
This script gives me the next output:

Using TensorFlow backend.
Generate data...
x_train shape: (512, 400)
x_test shape: (128, 400)
Build model...
Predict...
rss=142MB vms=569MB
2018-03-30 18:54:06.784917: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
rss=175MB vms=1187MB
rss=191MB vms=1191MB
rss=207MB vms=1195MB
rss=207MB vms=1195MB
rss=207MB vms=1195MB
rss=207MB vms=1195MB
rss=207MB vms=1195MB
rss=223MB vms=1195MB
rss=223MB vms=1195MB
rss=223MB vms=1195MB
rss=223MB vms=1195MB
rss=222MB vms=1195MB
rss=222MB vms=1195MB
rss=222MB vms=1195MB
rss=238MB vms=1200MB
rss=238MB vms=1200MB
rss=238MB vms=1200MB
rss=238MB vms=1200MB
rss=238MB vms=1200MB
rss=238MB vms=1200MB
rss=238MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
rss=239MB vms=1200MB
...

Weird thing that the memory usage converges to some value with some number of predict calls.

Also I mentioned that the larger hidden size the larger memory leak, which is strange for me as well.

Thank you in advance.

Source

ilivans

👍3

Most helpful comment

Following solution seems to work for me:
use the clear_session() tf function to release the memory.
https://stackoverflow.com/questions/46394574/keras-predict-memory-swap-increase-indefinitely
https://stackoverflow.com/questions/50331201/memory-leak-keras-tensorflow1-8-0/50331508

gary-harpaz on 27 Feb 2019

👍3 🎉1

All 9 comments

I've reproduced this behavior in pure TF, so probably need to address this question to TF gurus.
It is somehow connected with multi-thread mode (with inter_op_parallelism_threads=1 there are no excess allocations)

ilivans on 31 Mar 2018

@ilivans even with inter_op_parallelism_threads=1, the memory consumption still increases. the increase is lower but still significant. I would vote to re-open the issue.

ghost on 12 Apr 2018

@abhiboost It's definitely not on the Keras side. I bet you can reproduce the same scenario in pure TF and observe the same behavior, so I think this is not the right place for the question. Probably there is an issue with TF dynamic memory allocation.

ilivans on 12 Apr 2018

@ilivans did anyone ever make a TF issue? I've noticed the same issue as outlined here, except with dramatic memory increases per predict call depending on what device I run the ops on in docker containers.

btaba on 1 Sep 2018

@btaba not me, I've succesfully switched to PyTorch finally =)

ilivans on 1 Sep 2018

haha thanks @ilivans

btaba on 4 Sep 2018

@ilivans did anyone ever make a TF issue? I've noticed the same issue as outlined here, except with dramatic memory increases per predict call depending on what device I run the ops on in docker containers.

Any clues?
I also get the mem leaks in a docker container but not when running on Windows (didn't try Linux w/o docker yet)

I'm using TF 1.8 (cpu) with Keras 2.2.0 (also tried 2.16 with no luck)