Hello.
When using tensorflow, all ops are entered into the global tf graph. This results in memory leaks and loooong compilation times when building several models, one after the other, in the same python process (think ipython, cross validation, etc.)
For now, I solve this on my end by doing the following:
import keras.backend.tensorflow_backend
if keras.backend.tensorflow_backend._SESSION:
import tensorflow as tf
tf.reset_default_graph()
keras.backend.tensorflow_backend._SESSION.close()
keras.backend.tensorflow_backend._SESSION = None
Maybe we should incorporate this into a keras.reset() function?
Hi, can you put here some tests and profiling of what you mean? Some compilation time numbers and memory usage for example. Anything that we can reproduce might work. We could use that information later to write a PR.
Here is sample code, and the results:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
import os
import psutil
import timeit
import gc
def get_mem_usage():
process = psutil.Process(os.getpid())
return process.memory_info()
def build():
model = Sequential()
model.add(Dense(output_dim=4096, input_dim=4096, init="glorot_uniform"))
model.add(Activation("relu"))
model.compile(loss='categorical_crossentropy', optimizer='sgd')
return model
if __name__ == '__main__':
for i in xrange(10):
gc.collect()
t = timeit.timeit('build()', number=1, setup="from __main__ import build")
mem = get_mem_usage()
print('build time: {}, mem: {}'.format(t, mem))
results:
Using TensorFlow backend.
build time: 1.02965593338, mem: pmem(rss=599789568, vms=1527300096)
build time: 1.0096321106, mem: pmem(rss=1141383168, vms=2068729856)
build time: 1.03104996681, mem: pmem(rss=1682370560, vms=2610061312)
build time: 1.0659198761, mem: pmem(rss=2223833088, vms=3151384576)
build time: 1.08011817932, mem: pmem(rss=2765127680, vms=3692707840)
build time: 1.10519003868, mem: pmem(rss=3306053632, vms=4233703424)
build time: 1.13465809822, mem: pmem(rss=3847581696, vms=4775194624)
build time: 1.14798998833, mem: pmem(rss=4387577856, vms=5314605056)
build time: 1.17501521111, mem: pmem(rss=4929052672, vms=5856210944)
build time: 1.25362706184, mem: pmem(rss=5469794304, vms=6396817408)
notice compilation time and mem usage going up. After cleaning the default graph between iterations, these are the results:
Using TensorFlow backend.
build time: 0.988173961639, mem: pmem(rss=598212608, vms=1527754752)
build time: 0.976176023483, mem: pmem(rss=598134784, vms=1527767040)
build time: 0.973516941071, mem: pmem(rss=598507520, vms=1528115200)
build time: 0.975924968719, mem: pmem(rss=598638592, vms=1528377344)
build time: 0.975230932236, mem: pmem(rss=599068672, vms=1528639488)
build time: 0.976888895035, mem: pmem(rss=599187456, vms=1528623104)
build time: 0.978793144226, mem: pmem(rss=599056384, vms=1528639488)
build time: 0.975780010223, mem: pmem(rss=598925312, vms=1528647680)
build time: 0.977483987808, mem: pmem(rss=598794240, vms=1528639488)
build time: 0.974485874176, mem: pmem(rss=599236608, vms=1528623104)
We'll consider a clear_session
backend method for TensorFlow.
On 29 March 2016 at 22:28, tzachar [email protected] wrote:
Here is sample code, and the results:
from keras.models import Sequential from keras.layers.core import Dense, Activation import os import psutil import timeit import gc
def get_mem_usage():
process = psutil.Process(os.getpid())
return process.memory_info()def build():
model = Sequential()
model.add(Dense(output_dim=4096, input_dim=4096, init="glorot_uniform"))
model.add(Activation("relu"))
model.compile(loss='categorical_crossentropy', optimizer='sgd')
return modelif name == 'main':
for i in xrange(10):
gc.collect()
t = timeit.timeit('build()', number=1, setup="from main import build")
mem = get_mem_usage()
print('build time: {}, mem: {}'.format(t, mem))results:
Using TensorFlow backend.
build time: 1.02965593338, mem: pmem(rss=599789568, vms=1527300096)
build time: 1.0096321106, mem: pmem(rss=1141383168, vms=2068729856)
build time: 1.03104996681, mem: pmem(rss=1682370560, vms=2610061312)
build time: 1.0659198761, mem: pmem(rss=2223833088, vms=3151384576)
build time: 1.08011817932, mem: pmem(rss=2765127680, vms=3692707840)
build time: 1.10519003868, mem: pmem(rss=3306053632, vms=4233703424)
build time: 1.13465809822, mem: pmem(rss=3847581696, vms=4775194624)
build time: 1.14798998833, mem: pmem(rss=4387577856, vms=5314605056)
build time: 1.17501521111, mem: pmem(rss=4929052672, vms=5856210944)
build time: 1.25362706184, mem: pmem(rss=5469794304, vms=6396817408)notice compilation time and mem usage going up. After cleaning the default
graph between iterations, these are the results:Using TensorFlow backend.
build time: 0.988173961639, mem: pmem(rss=598212608, vms=1527754752)
build time: 0.976176023483, mem: pmem(rss=598134784, vms=1527767040)
build time: 0.973516941071, mem: pmem(rss=598507520, vms=1528115200)
build time: 0.975924968719, mem: pmem(rss=598638592, vms=1528377344)
build time: 0.975230932236, mem: pmem(rss=599068672, vms=1528639488)
build time: 0.976888895035, mem: pmem(rss=599187456, vms=1528623104)
build time: 0.978793144226, mem: pmem(rss=599056384, vms=1528639488)
build time: 0.975780010223, mem: pmem(rss=598925312, vms=1528647680)
build time: 0.977483987808, mem: pmem(rss=598794240, vms=1528639488)
build time: 0.974485874176, mem: pmem(rss=599236608, vms=1528623104)—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2102#issuecomment-203254379
A different solution is to wrap everything (from user point of view) inside a:
with tf.Graph().as_default():
However, this does not play nicely with the way Keras initializes a tf session, holding it as global from the process init. A clear_session() method is needed anyway.
This might be relevant #2535.
We got the same problem in a loop for a sklearn kfold experiment. No problem switching to Theano.
I run into OOM exceptions while using KerasClassifier to sweep large hyperparameter grids with TF backend. No problems with Theano.
I'm seeing this too. For me, it happens when I'm using kfolds.
You can now use K.clear_session()
when using TensorFlow, which will clean up everything. This is recommended if you ever create models inside a loop.
You should update Keras. clear_session
was added a few months ago.
Hi,
Yes I realized that an hour later. I have updated Keras and it works now.
Thanks for the great software!
Jeroen Meijer
On Saturday, September 3, 2016 8:59 PM, François Chollet <[email protected]> wrote:
You should update Keras. clear_session was added a few months ago.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
Hi guys, after google quite long time about the tensorflow/keras memory leak, most answer is to add K.clear_session() at the end. Therefore, I used this code every iterations in a loop of model fitting and checked the number of graph operations(the length of operations is fixed). However, the memory was still increasing and finally reached almost 100%. Any ideas on this issue?
My code is like this:
for date in date_list:
#### data cleaning
df = df_lstm.loc[df_lstm.index<=date]
df_y = df['ret'] - df['ret'].mean()
trainY = df_y[timesteps-1:-1]
trainX = x_transformed[:-1]
#trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = x_transformed[-timesteps:]
testXX = np.reshape(testX, (1,testX.shape[0], testX.shape[1]))
data_dim = trainX.shape[1]
trainYY = np.array([[0,1] if x <= 0 else [1,0] for x in trainY])
from numpy import array
trainXX=array([trainX[i:i+timesteps,:] for i in range(trainX.shape[0]-timesteps+1)])
#### start to build models
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.3
config.gpu_options.allow_growth = True
K.set_session(tf.Session(graph=tf.get_default_graph(),config=config))
model = Sequential()
model.add(LSTM(dimension_of_lstm, input_shape=(timesteps, data_dim),dropout_W=0.25,dropout_U=0.25)) # returns a sequence of vectors of dimension 32
# returns a sequence of vectors of dimension 32
# return a single vector of dimension 32
model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(trainXX, trainYY, batch_size=batchsize, nb_epoch=epoch_num, )
y_pred_enet = model.predict(testXX)
del model
#g = tf.get_default_graph()
#print(len(g.get_operations()))
#tried all the answers I could find at the end
K.clear_session()
tf.reset_default_graph()
tf.contrib.keras.backend.clear_session()
Hi,
Try
from keras import backend as be
(...)
be.clear_session()
Same here. Want to use keras.wrappers.scikit_learn.KerasClassifier and
from sklearn.model_selection.GridSearchCV for my thesis. I have to reduce the number (not the values) of possible values of the hyper parameters.
With ~640 different combinations: 1 hour to OOM
With ~450 different combinations: 3 hours to OOM
With ~290 different combinations: 5 hours to OOM
The server is large enough (really!) and contains two Tesla K80 GPUs.
Reduced the dataset also, but no luck. If I reduce the parameter any further, GridSearch makes no sense anymore. And I do not see, how to run clear_session with GridSearchCV without rewriting it.
Edit: if I run clear_session manually, the memory still remains like this:
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 14943 C /usr/bin/python3 324MiB |
| 0 53052 C /usr/bin/python3 10588MiB | <--- my process
| 1 14943 C /usr/bin/python3 368MiB |
| 1 53052 C /usr/bin/python3 10506MiB | <--- my process
Now we are using the Keras 2.1.5 and the problem exists and does not get resolved by K.crear_session()
With TF 1.8 and Keras 2.2.0 K.clear_session()
leads to Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
when used in a context such as #4417
@talpay
I have keras 2.2 and TF 1.8 but I don't see that error. try to install it by conda install -c hesi_m keras
which installs both Keras 2.2 and TF 1.8 and do not mix it with Pip. it might solve the case
@VanitarNordic
It's definitely not a package management issue and I've recreated it with some of the Keras example code. Have you tested it with a tensorboard callback that has histogram_freq=1
? Because it only happens when training multiple models in a loop, having the tensorboard callback, and _then_ calling K.clear_session()
(which is necessary as pointed out in above issue).
Can confirm that with TF 1.8 and Keras 2.2.0 K.clear_session() leads to crash. The same code on TF 1.8 and Keras 2.1.6 is working correctly.
@BluerBlack
We have to use it, that's the only method we can use to make consistent results when the code is inside a loop. I faced the crash also but I did not know it is because of that, because it was not generating any error.
@VanitarNordic
I know. I'm using it for the same reason(GridSearchCV). It's crashing for me without any message too (once i got message that program tried do something with memory address 0).
K.clear_session() is constantly crashing after 3rd call for me and I'm also using tensorboard callback but with histogram_freq=0.
@BluerBlack
Exactly it happens in the third iteration! funny. I had to downgrade to the 2.1.6 either.
We also have memory leaks when using keras + tensorflow. There are multiple places where it consumes RAM and doesn't free afterwards. We create models in a loop, after some time it consumes all free memory; for example, on a server it takes all 132Gb. clear_session()
doesn't help.
ENVs:
Ubuntu 16.04.4, python 2.7.15 (Anaconda)
Linux Mint 18.2, python 2.7.9
tensorflow 1.8.0
Keras 2.2.0
Here is a demo script with one of the leak cases (requires objgraph
and psutil
):
from __future__ import print_function
import os, sys, gc
import objgraph, psutil
from keras.layers import Input, Dense
from keras.models import Model
from keras.regularizers import l2
from keras import backend as K
data = []
ps = psutil.Process(os.getpid())
getrss = lambda: ps.memory_info().rss / 1024 / 1024
def simple():
data.append(['sdsds'] * 1000000)
def model():
coef = l2(0.0005)
input_data = Input(shape=(33,))
enc_layer = Dense(40, activation='relu', kernel_regularizer=coef)
dec_layer = Dense(33, activation='linear', kernel_regularizer=coef)
enc = enc_layer(input_data)
dec = dec_layer(enc)
dae = Model(inputs=input_data, outputs=dec)
# K.clear_session()
def print_obj(title, limit=None):
print('\n' + title)
objgraph.show_growth(limit=limit)
print('')
def main(func, show_obj, iterations=10):
print('ITERATIONS:', iterations)
start = getrss()
print('MEM BEFORE RUN:', start)
if show_obj: print_obj('OBJECTS BEFORE RUN:', 3)
# Do something ...
for _ in range(iterations):
func()
print('MEM AFTER RUN:', getrss())
global data
del data[:]
print('GC COUNT: ', gc.collect())
end = getrss()
if show_obj: print_obj('OBJECTS AFTER RUN:')
delta = end - start
print('MEM AFTER GC: {} (leak: {})'.format(end, delta))
# USAGE: KERAS_BACKEND=tensorflow python memtest.py [num_iterations] [simple] [showobj]
if __name__ == '__main__':
func = simple if 'simple' in sys.argv else model
show_obj = 'showobj' in sys.argv
iterations = next((int(x) for x in sys.argv if x.isdigit()), 10)
main(func, show_obj, iterations)
Output:
$ KERAS_BACKEND=tensorflow python memtest.py
Using TensorFlow backend.
ITERATIONS: 10
MEM BEFORE RUN: 158
MEM AFTER RUN: 166
GC COUNT: 49
MEM AFTER GC: 166 (leak: 8)
Similar issue: https://github.com/tensorflow/tensorflow/issues/10408
Is there way to fix that?
First, I suggest you dial down ur tone a bit.
This is not the place to go trolling.
As for a fix, if the clear_session() way does not work for you, I would suggest reusing the models. If you are generating a small number of different models, you can do something like this:
def generate_models():
models = {
'model1': gen_model_1(),
'model2': gen_model_2(),
}
for k, model in models.items():
model.save_weights(k)
return models
def get_blank_model(k, models):
model = model[k]
model.load_weights(k)
return model
as long as you do not need several models of the same type in parallel, you are all good. Otherwise, please be more specific about your use case.
@tzachar, thanks for the suggestion! Looks like models usage in a loops requires little bit different app design.
PS Sorry if my tone offends you somehow, I don't blame anyone or demand for a solution, just wanted to report what memory issue is still exists. I did not mean anything wrong.
Hello. I have had similar exceptions with .clear_session() and Keras 2.2.0. Since I am using global optimization I have to train many LSTM models in a sequence.
After downgrading to Keras 2.1.6, I have no OOM issues.
Confirmed, thanks
I can confirm this problem with Keras 2.2.2 and Tensorflow 1.8.
I downgraded Keras to version 2.1.6, and the problem is gone.
Damn it :-(
I think keras uses default session(probably), So I have set session manually and then called K.clear_session()
which is working fine as below.
from keras import backend as K
cfg = K.tf.ConfigProto()
cfg.gpu_options.allow_growth = True
K.set_session(K.tf.Session(config=cfg))
# training / validation part ....
K.clear_session()
# loading another model ....
@arjun-kava What version of Keras & TF are you using?
I'm encountering the same problem and have implemented yours and other's suggestions using the below versions with no luck.
TensorFlow 1.9.0
Keras 2.2.1
Thanks!
I downgraded Keras to version 2.1.6, and the problem is gone.
Downgrading looks working!
But the function clear_session()
on tensorflow backend looks the same. T___T
No idea why is working...
Wow.....
Never thought the way to solve the problem is "downgrading".........
tensorflow 1.9 + keras 2.2.0. No way to unload model..
Keras 2.2.2, TF 1.9.0
OOM during CV validation within an inner loop. Same result if model is reused or recreated after 12 iterations.
By the way... I can confirm that downgrading to Keras 2.1.6 fixes the issue.
Just came across this issue. I'm using tf 1.9.0 and its keras version 2.1.6-tf.
Is it possible to reopen this issue?
Is it possible to reopen this issue?
downgrade tf to 1.8
@igorcadelima
Here is a pattern I adopted when fighting OOM that in retrospect may have caused OOM on its own:
model = load_model(...)
# predictions
del model
K.clear_session()
model = load_model(...)
# predictions
I suspect that is why I was hitting OOM after my first del/clear_session(): deleting the model may deprive TF of info it needs to clear the session properly.
Now I am not reloading the model anyway, and the original OOM seems to be gone, maybe due to newer versions of everything. I'm not testing that 'del model' before clear_session() caused the latest memory leak, because it takes a while, but I recommend anyone using that sort of pattern try deleting things after the clear_session():
K.clear_session()
del model
model = load_model(...)
Beware of adoption becoming maladaptation. :-)
Is it possible to do this from C++?
I have the exact same problem but with C++ code and being unable to release memory without fully killing the program or using cudaDeviceReset() which works but does not allow further use of tensorflow within the calling process.
Worst case, maybe you could fork the calling process, and the child would be able to start tf. Tho if you have a lot in memory it could be an awkward copy.
On Monday, October 8, 2018, 11:35:21 AM PDT, Zach notifications@github.com wrote:
Is it possible to do this from C++?
I have the exact same problem but with C++ code and being unable to release memory without fully killing the program or using cudaDeviceReset() which works but does not allow further use of tensorflow within the calling process.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
I can also confirm that downgrading to Keras 2.1.6 fixes the issue
You can now use
K.clear_session()
when using TensorFlow, which will clean up everything. This is recommended if you ever create models inside a loop.
Will this K.clear_session()
also reset the tf.set_random_seed()
?
Same problem.
Config:
Context: model overwritten and fitted several times in a for loop (I store a few key indicators at the end of the loop, I'm not interested in the model per se).
==> Without K.clear_session() -> memory leaks
==> With K.clear_session() and from Jupyter Notebook (I was told it's not the best option in conjunction with Keras / TF) -> Kernel died
Updated both (TF 1.12.0 / Keras 2.2.4) -> Problem gone.
import ... as K
import gc
model = ....
del model
K.clear_session()
gc.collect()
it may work.
I'm still seeing this issue with:
TensorFlow Version: 1.13.1
TensorFlow.keras Version: 2.2.4-tf
OS: Windows 10
TensorFlow-GPU running on: NVIDIA GTX 1080 ti
I've tried tf.keras.backend.clear_session()
with no luck, still hitting RAM OOM errors eventually. I've also tried manually invoking garbage collection with no luck.
I should note that tf.keras.backend.clear_session()
does result in a visible drop in RAM, but the next call to Model.fit(...)
during looping, consumes more memory than was freed during the initial call to tf.keras.backend.clear_session()
. I should also note that I am using TensorFlow datasets with one-shot iterators during training.
I haven't been able to pinpoint why this happens. But I know the problem occurs when I call Model.fit(...)
on my Keras model with the two one-shot-iterators in a repeated loop. If i just initialize the one-shot iterators and don't fit the Keras model (only compile the model) then the memory usage is uniform. As soon as Model.fit(...)
is called with train_ds.make_one_shot_iterator()
and val_ds.make_one_shot_iterator()
, I slowly leak RAM despite calling tf.keras.backend.clear_session()
at the beginning of the loop.
Has anyone encountered this issue while directly fitting the Keras model to TensorFlow data generators? I'm trying not to downgrade too far due to the TensorFlow generator support in the more recent releases.
I'm working on an [mcve], but my code is still a bit lengthy to post.
I solved this problem switching to theano
import os
os.environ['KERAS_BACKEND'] = 'theano'
from keras.models import Sequential
....
I'm still seeing this issue with:
TensorFlow Version: 1.13.1
TensorFlow.keras Version: 2.2.4-tf
OS: Windows 10
TensorFlow-GPU running on: NVIDIA GTX 1080 tiI've tried
tf.keras.backend.clear_session()
with no luck, still hitting RAM OOM errors eventually. I've also tried manually invoking garbage collection with no luck.I should note that
tf.keras.backend.clear_session()
does result in a visible drop in RAM, but the next call toModel.fit(...)
during looping, consumes more memory than was freed during the initial call totf.keras.backend.clear_session()
. I should also note that I am using TensorFlow datasets with one-shot iterators during training.I haven't been able to pinpoint why this happens. But I know the problem occurs when I call
Model.fit(...)
on my Keras model with the two one-shot-iterators in a repeated loop. If i just initialize the one-shot iterators and don't fit the Keras model (only compile the model) then the memory usage is uniform. As soon asModel.fit(...)
is called withtrain_ds.make_one_shot_iterator()
andval_ds.make_one_shot_iterator()
, I slowly leak RAM despite callingtf.keras.backend.clear_session()
at the beginning of the loop.Has anyone encountered this issue while directly fitting the Keras model to TensorFlow data generators? I'm trying not to downgrade too far due to the TensorFlow generator support in the more recent releases.
I'm working on an [mcve], but my code is still a bit lengthy to post.
I am having the exactly problems as you had described. As soon as model.fit is called, memory for tuple increased.
@tzachar I want to know how to know how to add the following function you mentioned in my code:
import keras.backend.tensorflow_backend
if keras.backend.tensorflow_backend._SESSION:
import tensorflow as tf
tf.reset_default_graph()
keras.backend.tensorflow_backend._SESSION.close()
keras.backend.tensorflow_backend._SESSION = None
my code :
`@app.before_first_request
# @app.route('/loading')
def load_resnet_model():
print('begin to get model')
global graph
graph = tf.get_default_graph()
global model_image
img_dim = (299, 299, 3)
num_label = 2
input_tensor = Input(shape=img_dim)
base_model = InceptionResNetV2(include_top=False, input_shape=img_dim, weights='imagenet')
x = input_tensor
x = Lambda(preprocess_input, name='preprocessing')(x)
x = base_model(x)
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
x = Dense(num_label, activation='softmax', name='softmax')(x)
model_image = Model(input_tensor, x)
print('finish loading model')
@app.route("/api/", methods=["POST"])
def predict_tag():
print('beginning to prediction')
data = request.get_json()
len_test = validation_batch.shape[0]
for t_image in lst_main_image:
n_fold = 5
preds_test = np.zeros((len_test, 2), dtype=np.float)
print('t_image:', t_image)
tag_i_time = time.time()
for i in range(1, 6):
model_image.load_weights('../model/{}/main_image/{}_aug_inception.fold_{}{}.hdf5'.format(industry, industry, i, t_image))
model_image.compile(optimizer=Adam(lr=1e-4), loss='binary_crossentropy', metrics=['accuracy'])
test_prob = model_image.predict(validation_batch)
preds_test += test_prob
tag_i_e = time.time()
print('each tag the times:', t_image, tag_i_e - tag_i_time)
preds_test /= n_fold
y_pred = preds_test.argmax(axis=-1)
lst_result_image.append(list(y_pred))
print('finish predict the tag:', t_image)
lst_all_result = {}
return jsonify(lst_all_result)
if __name__ == '__main__':
app.run(debug=True)`
Not exactly sure why this issue has been closed.
What can be done to mitigate the growing loading time when calling load_model
sequentially?
E.g. having ten different models that need to be loaded in memory, which means that using clear_session()
is not an option here.
import keras
from keras.model import load_model
keras.backend.clear_session()
files = ['model1.h5', 'model2.h5', 'model3.h5', 'model4.h5', '...']
models = [load_model(f) for f in files]
# each model takes 30 seconds more than the previous one to load
#Â in particular, models 9 or 10 really take ages to load
do_something_with(models)
Most helpful comment
You can now use
K.clear_session()
when using TensorFlow, which will clean up everything. This is recommended if you ever create models inside a loop.