Keras: Repeatedly calling model.predict(...) results in memory leak

Created on 18 Jul 2019  Â·  59Comments  Â·  Source: keras-team/keras

System information

  • Have I written custom code (as opposed to using example directory): Yes, minimal example attached
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.14.5
  • TensorFlow backend (yes / no): yes
  • TensorFlow version: 1.12.0
  • Keras version: 2.2.4
  • Python version: 3.6.8
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

Describe the current behavior
Loading a model once and then repeatedly calling model.predict(...) results in continually increasing memory usage

Describe the expected behavior
Calling model.predict(...) should not result in any permanent increase in memory usage

Code to reproduce the issue

import keras
import numpy as np

model = keras.applications.mobilenet_v2.MobileNetV2()
X = np.random.rand(1, 224, 224, 3)

while True:
    # Leaks:
    y = model.predict(X)[0]

    # Does not leak:
    # y = [0]

Other info / logs
I am running on a Mac Mini and using CPU only. When I run the code above, memory usage climbs steadily and will eventually consume 20+ GB of memory!

The issue does not appear to exist on my Ubuntu 18.04 laptop using a GTX 1050

I'm open to the idea that I may be doing something wrong, but with such a small example it seems hard to believe.

Let me know what other info I can give to help.

tensorflow buperformance

Most helpful comment

I have the same issue. model.predict() causes a big memory leak.
I am not sure how posting here helps - There is absolutely no response from the TF team, going by the numerous threads on this topic.

Isn't model.predict() the core of this whole model building business. Not sure why this issue is persisting from TF2.0 and still goes on today without any real response or post from the TF team.

Sucks that so many people are wasting their time on this

All 59 comments

Have you tried using the K. clear_session() method after prediction(from keras import backend as K)?. @hotplot

@wangyexiang Yes, I have run the following:

import numpy as np

import keras
import keras.backend as K

model = keras.applications.mobilenet_v2.MobileNetV2()
X = np.random.rand(1, 224, 224, 3)

while True:
    y = model.predict(X)[0]
    K.clear_session()

And get the following output:

Using TensorFlow backend.
2019-07-24 15:55:58.727337: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-07-24 15:55:58.727567: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 6. Tune using inter_op_parallelism_threads for best performance.
Traceback (most recent call last):
  File "/Users/sam/Desktop/test.py", line 11, in <module>
    y = model.predict(X)[0]
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
    steps=steps)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 294, in predict_loop
    batch_outs = f(ins_batch)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2671, in _call
    session)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2623, in _make_callable
    callable_fn = session._make_callable_from_options(callable_opts)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1471, in _make_callable_from_options
    return BaseSession._Callable(self, callable_options)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1425, in __init__
    session._session, options_ptr, status)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tensor input_1:0, specified in either feed_devices or fetch_devices was not found in the Graph
Exception ignored in: <bound method BaseSession._Callable.__del__ of <tensorflow.python.client.session.BaseSession._Callable object at 0xb36096160>>
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1455, in __del__
    self._session._session, self._handle, status)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No such callable handle: 140702002450384

The exception is unsurprising. I am making repeated predictions using the existing model instance, so it is an error to call clear_session.

@hotplot I mean :

import numpy as np
import keras
import keras.backend as K

X = np.random.rand(1, 224, 224, 3)

while True:
    model = keras.applications.mobilenet_v2.MobileNetV2()
    y = model.predict(X)[0]
    K.clear_session()

@wangyexiang It looks like that leaks too, but it's hard to tell since loading the model is so slow. It has climbed from ~300 MB to ~700 MB since I started running it a couple of hours ago.

That’s strange. Try using other tensorflow version. And using gc.collect() after clear_session(). @hotplot

What's the workaround if you do training before the predict? The clear_session will remove the trained model.

This also seems to happen if you pass a huge dataset into .predict() without putting it in a loop.

Same issue. My memory usage balloons while calling model.predict() in a loop. Even with tf.keras.backend.clear_session() and gc.collect() at the end of the loop.

I really would like a fix to this.

I confirm the issue is present in:
OS: Fedora 30, kernel 5.2.15
Python: 3.7.4, tensorflow: 2.0.0

Same memory leakage issue when using models on GPU system and librabries are : tensorflow-gpu==1.14.0, cuda-10
I have fix the leakage issue, when loading the models (e.g. 'model.load_weights(weights_path, by_name=True)') by Just putting this block of code in the beginning works for me:

from keras import backend as K
cfg = K.tf.ConfigProto()
cfg.gpu_options.allow_growth = True
K.set_session(K.tf.Session(config=cfg))

But not able to release memory after 'vgg_model.predict(image_in_list)[0, :]'

I am using TF 2.0.0, and I also need to call clear_session and re-load the trained model periodically when in a predict loop in order to avoid running out of gpu memory. This behavior is undesirable.

@novog I think u dont need to call 'clear_session' in predict loop.. Once u done with that model u can call 'clear_session' and still it not releasing GPU memory, u can use this:
from numba import cuda
cuda.select_device(0)
cuda.close()

U can install numba by 'pip install numba'
its work for tensorflow-gpu-1.14 with cuda-10

Thanks

@prabodh23 To clarify why I need to call clear_session, I am using a previously trained model to do predictions on a very large number of records. I call predict on one batch at a time in a loop. If I do not call clear_session at least every x number of iterations in that loop, tensorflow runs out of GPU memory. Using clear_session does avoid the issue, but doesn't seem like it should be necessary.

I have been running into the same issue. I have a sliding window prediction, for every predict memory consumption shoots ups and I have been googling to find a resolution.

In a related TensorFlow issue (https://github.com/tensorflow/tensorflow/issues/33009), someone noted that predict_on_batch doesn't display the same issue. That workaround was effective for me, but not for another person in that thread.

One way to avoid this. Try

model(feature_list)

instead of using model.predict(feature_list).

And do not be happy so quickly, there is also a slow leakage in model.fit() function.
Geez, did keras team really do any test before releasing it?

I have implemented a workaround based on predict_on_batch (thanks @novog for pointing this out).
Note that predict_on_batch expects equally sized batches as used for training the model. Here's an example of how to loop through your batches of test data to create predictions:

# custom batched prediction loop to avoid memory leak issues for now in the model.predict call
y_pred_probs = np.empty([len(X_test), VOCAB_SIZE], dtype=np.float32)  # pre-allocate required memory for array for efficiency

BATCH_INDICES = np.arange(start=0, stop=len(X_test), step=BATCH_SIZE)  # row indices of batches
BATCH_INDICES = np.append(BATCH_INDICES, len(X_test))  # add final batch_end row

for index in np.arange(len(BATCH_INDICES) - 1):
    batch_start = BATCH_INDICES[index]  # first row of the batch
    batch_end = BATCH_INDICES[index + 1]  # last row of the batch
    y_pred_probs[batch_start:batch_end] = model.predict_on_batch(X_test[batch_start:batch_end])

(Note that if only pre-allocating the results array already results in a MemoryError then simply the array does not fit in your available memory regardless of the memory leak issue.)

I'm facing the same issue on Windows 10 and even though "K. clear_session()" does solve, or at least alleviate, the issue it would be very nice to have a real fix instead of having to rely on workarounds.

Apparently, a similar issue has been solved in TensorFlow 2.1.0 (dev) according to this issue: https://github.com/tensorflow/tensorflow/issues/34579

I will wait for the final version of TensorFlow 2.1.0 and see whether the bug still persists in Keras.

Just wanted to add my 2 cents in here to say that I am also seeing these issues with model.fit()
I have a hyper parameter tuning loop where I train multiple models with slightly different configurations, and every model seems to leave behind about ~100-200MB of memory after every iteration. I tried K.clear_session() and that definitely helped (it used to be significantly more memory left behind), but there's still something that I'm missing.

For prosperity, I'm using tensorflow 2.0, specifically tf.keras, and numpy 1.17.4. This is on RHEL, using CPU only. My loop is basically:

for combo in combinations:
    # build network, which is several Bidirectional(GRU()) + Dense() layers
    model.fit(x_train, y_train, shuffle = True, epochs = 50, batch_size = 128, verbose = 1)
    # I commented out the predict/evaluation just to make sure
    K.clear_session()

And as a complete other side note, I've been using mprof to track memory usage. It's been a godsend in tracking these memory leaks.

This was also happening to me using model.predict() in a loop. I'm on a mac running tf 2.0.0.
I was able to fix it by re-saving my model without the optimizer:
model.save("my_model.h5", include_optimizer=False)
Then restart and use that saved model for predictions. ( Assuming you are only doing predictions )

This was also happening to me using model.predict() in a loop. I'm on a mac running tf 2.0.0.
I was able to fix it by re-saving my model without the optimizer:
model.save("my_model.h5", include_optimizer=False)
Then restart and use that saved model for predictions. ( Assuming you are only doing predictions )

Thanks @cclaan , your solution works for me. Specifically, my environment ties to TF 2.0.0 temporarily so I can't just upgrade to 2.1.0 for the fix. I hope there's a patch for TF 2.0 to include this bugfix.

Apparently, a similar issue has been solved in TensorFlow 2.1.0 (dev) according to this issue: tensorflow/tensorflow#34579

I will wait for the final version of TensorFlow 2.1.0 and see whether the bug still persists in Keras.

Thanks for the tip. Looks like 2.1.0 is out, but not out on Anaconda yet. Hopefully it'll get put out in the next few weeks.

model.predict_on_batch fix for me.

In a related TensorFlow issue (tensorflow/tensorflow#33009), someone noted that predict_on_batch doesn't display the same issue. That workaround was effective for me, but not for another person in that thread.

I tested predict_on_batch and it made a big difference compared to predict method, but still it has memory leak.

I have the same issue in 2.0.0 but it seems that the problem was solved in v2.1.0.
Is there anyone else confirm this?

I have the same issue in 2.0.0 but it seems that the problem was solved in v2.1.0.
Is there anyone else confirm this?

I am still having the same issue after upgrading

I have the same issue in 2.0.0 but it seems that the problem was solved in v2.1.0.
Is there anyone else confirm this?

Still having this problem in 2.1.0

For me on tf 2.0.0:

  • tf.keras.backend.clear_session() after predict helped the memory leak but didn't fix it completely.
  • using predict_on_batch instead of predict fixed the memory leak, but really slowed down my predictions. I can't use this.
  • upgraded to tf 2.1.0 .. best solution for me, fixed memory leak without clear_session or predict_on_batch.

I haven't tried:

  • @cclaan solution of using include_optimizer=False in model.save. What does this accomplish?

Same with 2.1.0, 2.0.1, 2.0
image

I'm also still facing the same issue with TensorFlow 2.1.0.

Interesting @AverinLV . I don't know why 2.1.0 helped my issue. I ran loops and concurrency of predict calls and my machine's memory stayed the same. Unless I'm fooling myself somehow

@Huii @Shane-Neeley @AverinLV

Hey all, so I had upgraded to 2.1.0, and also switched from the Anaconda version to the Pip version. This fixed my issue. My best guess then is that Anaconda's version is where the issue is. Now, I have no idea why these versions are different...or where we would go to report this.

@Shane-Neeley @felixvelariusbos
Have you guys tried to use memory-profiler? Can you show the output?
Still facing this problem in tf-nightly-gpu 2.2.0.dev20200307
image

This happens for me still on TF 2.1 though it was much worse on 2.0, that is that it takes way more time to see that there is really a memory leak

TF 2.0
Only change predict to predict_on_batch
Works!

So in general, almost people can solve this problem by using predict_on_batch

One way to avoid this. Try

model(feature_list)

instead of using model.predict(feature_list).

And do not be happy so quickly, there is also a slow leakage in model.fit() function.
Geez, did keras team really do any test before releasing it?

Thanks! This works for me.

for me change to a gpu with bigger memory size will just solve thie issue (gtx1650 4g -> p100 16g)

One way to avoid this. Try

model(feature_list)

instead of using model.predict(feature_list).

And do not be happy so quickly, there is also a slow leakage in model.fit() function.
Geez, did keras team really do any test before releasing it?

Yeah I also found fit shows memory leak.

This was also happening to me using model.predict() in a loop. I'm on a mac running tf 2.0.0.
I was able to fix it by re-saving my model without the optimizer:
model.save("my_model.h5", include_optimizer=False)
Then restart and use that saved model for predictions. ( Assuming you are only doing predictions )

Leak is still here despite doing exactly that. And it's as massive as before.

@xiahualiu

One way to avoid this. Try

model(feature_list)

instead of using model.predict(feature_list).

What do you mean exactly? How are you supposed to do that?

I tried:

from keras.models import load_model
my_model = load_model('hello.h5')
my_data = something()

my_model.predict(my_data) # works
my_model(my_data) # doesn't work

Second option triggers an error

ValueError: Unexpectedly found an instance of type `<class 'numpy.ndarray'>`.
Expected a symbolic tensor instance.

In my case, data is a NumPy array resembling this (truncated):

array([[[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],
        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]]])

Should the Numpy array be directly converted to a Tensor?

@JivanRoquet Hello!
You are right, the neural network in tensorflow feeds on (batched) tensors rather than ndarrays.
I use PyTorch now, it worked perfect for my reinforcement learning project, no memory leakage at all. I suggest anyone who can select their platform go for PyTorch. It is stable, fast and easy to use.

I can also confirm this issue on the latest version of Tensorflow 2 (both tensorflow and tensorflow-gpu) on OSX 10.14.6 and CentOS 7 only on the Anaconda versions.

Milions of repeated calls to model.predict() lead to racking up over a terabyte of RAM.

Uninstalling the Anaconda version of tensorflow/tensorflow-gpu and installing the pip version fixes this issue, however it would be nice to have the Anaconda version working.

@jghawaly interesting — my version is not Anaconda-based and the problem is definitely present, so it's probably related to some hidden element (or it's two distinct problems).

I'm still encountering the leak on predict() with Tensorflow 2.2.0rc3 (pip version).

predict_on_batch did the work finally..
tf 2.1.0
Any improvement in recent versions ?

In tf 2.2.0 (conda-based) the problem still persists. K.clear_session() and gc.collect() and even using model.predict_on_batch together did not solve the problem. However, I was able to use model(tf.convert_to_tensor(np_input)), which decreased memory leakage drastically. Maybe this will help.

@xiahualiu

One way to avoid this. Try

model(feature_list)

instead of using model.predict(feature_list).

What do you mean exactly? How are you supposed to do that?

I tried:

from keras.models import load_model
my_model = load_model('hello.h5')
my_data = something()

my_model.predict(my_data) # works
my_model(my_data) # doesn't work

Second option triggers an error

ValueError: Unexpectedly found an instance of type `<class 'numpy.ndarray'>`.
Expected a symbolic tensor instance.

In my case, data is a NumPy array resembling this (truncated):

array([[[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],
        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]]])

Should the Numpy array be directly converted to a Tensor?

@JivanRoquet it should work. I just tested. So each model itself is functional, you can call on it by just passing your input. But remember if you wanna do inference, do model(inputs, training=False) to disable all the dropout etc. I tried all the solutions above and this is the only way that doesn't leak memory. idk why but just to share my two cents experience

None of the work arounds here seem to work for my network except K.clear_session() (this is how I used it). While using model.predict caused sharp jump in RAM usage within a couple of minutes, model(inputs, training=False) has a much more gradual increase but increases nevertheless. Tf-gpu1.14. I think it could depend on the network architecture, I also get the topological sort error sometimes despite having no loops which seems to happen with more number of filters or some other unclear reason (issue #24816). So all these errors might be at play somehow.

In tf 2.2.0 (conda-based) the problem still persists. K.clear_session() and gc.collect() and even using model.predict_on_batch together did not solve the problem. However, I was able to use model(tf.convert_to_tensor(np_input)), which decreased memory leakage drastically. Maybe this will help.

@fazekaszs decreased but not completely solved is it?

@moha23 exactly, the leak is still there, just smaller

I have the same issue. model.predict() causes a big memory leak.
I am not sure how posting here helps - There is absolutely no response from the TF team, going by the numerous threads on this topic.

Isn't model.predict() the core of this whole model building business. Not sure why this issue is persisting from TF2.0 and still goes on today without any real response or post from the TF team.

Sucks that so many people are wasting their time on this

Hi,

I am trying to call model.predict() on CPU multiple times and I observe RAM memory leak. clear_session() with model reload and gc.collect() doesn't solve the issue. I ran the code on tensorflow 2.1 and 2.3 as well but issue still persists. Is there a workaround for this issue? I am using TF 1.14 and Python 3.6. Have been struggling to solve this problem since so long.

For people reporting that they are also experiencing a memory leak, could you specify if you experience it using the reproduction code in the original post of this issue? If you do not, then the fix to this issue will not necessarily help you; you may be better served by creating a new issue with minimal reproduction code for your own use case.

using tensorflow 1.14 and keras 2.1.5, calling model.predict() in a loop/using multiple threads causes the memory leak error on python 3.6

This is still an issue in Tensorflow 2.3.0, with CUDA 10.2.

result = model.predict(data) causes a memory leak which runs the GPU out of memory after about 900-1000 batches.

result = model.predict_on_batch(data) does not leak, as far as I can tell.

Same issue here on macOS catalina 10.15.6, python 3.8.5 and tensorflow (cpu) 2.3.0, not only does it leak with model.predict(x) but also with model.predict_on_batch(x).

  • gc.collect() seemed to work for a while but could not contain the leakage in the long run.
  • tf.convert_to_tensor helped a bit too.

On windows with tensorflow gpu 2.3.0 I have no problems at all. Once ram usage got as high as 2gb but dropped immediately and GPU was not affected.

Both versions installed with pip, I am using multiple threads, could that be related?

Same issue here on macOS catalina 10.15.6, python 3.8.5 and tensorflow (cpu) 2.3.0, not only does it leak with model.predict(x) but also with model.predict_on_batch(x).

  • gc.collect() seemed to work for a while but could not contain the leakage in the long run.
  • tf.convert_to_tensor helped a bit too.

On windows with tensorflow gpu 2.3.0 I have no problems at all. Once ram usage got as high as 2gb but dropped immediately and GPU was not affected.

Both versions installed with pip, I am using multiple threads, could that be related?

Uninstalling TF 2.3, installing TF-nightly 2.4 and reinstalling TF-2.3 seems to have fixed the issue 🤔.

I changed model.fit to model.fit_generator and then the memory leak stopped. Hope it helps. I've used MobileNetV2 and EfficientNetB3 with 5 Stratified Kfold, so I had to run each one 5 times in different images.

This seems to still be an issue at https://github.com/tensorflow/tensorflow/issues/33030# yet it has been closed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NancyZxll picture NancyZxll  Â·  3Comments

zygmuntz picture zygmuntz  Â·  3Comments

snakeztc picture snakeztc  Â·  3Comments

vinayakumarr picture vinayakumarr  Â·  3Comments

kylemcdonald picture kylemcdonald  Â·  3Comments