Keras: How to release model's GPU memory in cross-validation?

Created on 10 May 2016 · 30Comments · Source: keras-team/keras

I am doing 10-fold cross validation and I noticed that the GPU memory requirement is rising through iterations. The graph of used memory through 10 folds on a randomly generated toy data set (look the code below) looks like this:
log
Is there a way to release the GPU memory after each fold? After a fold has finished, I do not need a model any more nor do I need any weights. The current behaviour prevents me to run 10-fold cross validation on my data sets due to out of memory error. Note, that I have enough memory to run the first iteration and that the problem occur in some later iteration due to raising memory requirements. Is there some work-around?

I've tried deleting the model with del model and I also run garbage collection but none of this seems to work.

DH = 200
N_WORDS = 100000
SEQ_LENGTH = 50
N_CLASSES = 10
N_EXAMPLES = 10000

x = np.random.randint(N_WORDS, size=(N_EXAMPLES, SEQ_LENGTH))
x_test = np.random.randint(N_WORDS, size=(N_EXAMPLES, SEQ_LENGTH))
y = np.random.randint(N_CLASSES, size=(N_EXAMPLES, 1))
y = MultiLabelBinarizer().fit_transform(y)  # encode in one-hot

print('x.shape:', x.shape)
print('y.shape:', y.shape)

for i in range(10):
    print('ITERATION {}'.format(i))

    model = Sequential()
    model.add(Embedding(input_dim=N_WORDS, output_dim=DH, input_length=SEQ_LENGTH))
    model.add(Dropout(.2))
    model.add(LSTM(DH))
    model.add(Dropout(.5))
    model.add(Dense(N_CLASSES))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='RMSprop')

    model.fit(x, y, nb_epoch=1, show_accuracy=True)
    predictions = model.predict(x_test)

    #
    # here I would like to delete model and free its GPU memory
    #

Source

nikicc

👍3

Most helpful comment

from keras import backend as K
After every iter:
K.clear_session()
This command destroys the current TF graph and creates a new one. So, the GPU memory will clear after every iteration.

mrzhu-cool on 3 Dec 2016

👍40 👎4 🎉2

All 30 comments

I met the same problem too, when training model with a large corpus. I sliced the data into many parts and using a fit() to feed one part into the model. When a fit() was called, the memory increased. After too many fit(), a out of memory error happened. I'm still looking for a solution.

Shen256 on 11 May 2016

Hi, nikicc.
I find the solution. Use theano flag "cnmem=0.7" will fix the max usage of GPU memory to 70% of total memory. If you set it to 0 or do not use this flag, theano will use enough memory it needs.

Shen256 on 11 May 2016

👎5

@Shen256 I tried this some time ago and it did not work for me.

nikicc on 11 May 2016

This don't set the max memory to use. It set the size of the first
allocation we do on the GPU that we will reuse. If this is not enough
Theano will allocate more.

Le mer. 11 mai 2016 05:37, nikicc [email protected] a écrit :

@Shen256 https://github.com/Shen256 I tried this some time ago and it
did not work for me.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2689#issuecomment-218410202

nouiz on 11 May 2016

If you don't need the weights, you could launch every new model in a new process, fit it, gather the results (metrics) and terminate the process. When a process is terminated, the GPU memory is released. It should be possible using the multiprocessing module. For a small problem and if you have enough space on your GPU you could even try using more than 1 process. I'm not sure about how Theano is managing the locks and allocate memory when using multiple processes but I tried it with celery and it seems to work correctly.

tboquet on 11 May 2016

Theano support that. There could be delays to compile c code thef first
time (and warning about that), but when the cache contain what you need
then there is no problem.

If you use lib.cnmem, just don't forget to make it low enough so that you
can have multiple GPU share the GPU memory.

On Wed, May 11, 2016 at 10:25 AM, Thomas Boquet [email protected]
wrote:

If you don't need the weights, you could launch every new model in a new
process, fit it, gather the results (metrics) and terminate the process.
When a process is terminated, the GPU memory is released. It should be
possible using the multiprocessing module. For a small problem and if you
have enough space on your GPU you could even try using more than 1 process.
I'm not sure about how Theano is managing the locks and allocate memory
when using multiple processes but I tried it with celery and it seems to
work correctly.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2689#issuecomment-218475604

nouiz on 11 May 2016

@nouiz tx for the confirmation!

tboquet on 11 May 2016

Thanks everyone! I think I will go with multiprocessing for now 😄

nikicc on 11 May 2016

I am running into the same problem as @nikicc here when I am trying to grid search for hyper-parameters. A set of hyper-parameters (depth/number of nodes in each layer) which work fine when that is the only model run fails while grid-searching in a for loop (the model is initialized in the loop)

Does anyone know if keras handles this issue? Or if it can be handled directly in theano? I'd prefer not going the multi-processing route (Also, did that work for you @nikicc ). Thanks!

Vatshank on 16 May 2016

👍3

@Vatshank I am not sure if Theano/Keras currently handle this issue. I am handling it myself using two tricks.

Firstly, when evaluating one parameter configuration in cross validation I'm reusing the model instead of recompiling it for every fold. I first compile the model and then store its initial weights before any learning was done. Then I train it for one fold and next reinitialise the weights to its initial state (using set_weights) so it is prepared for the next fold. Doing to, I am able to preform, for example, 10-fold cross validation with only one model and only one compiling. Beware that doing so the random initialisation of weights will be the same in each fold.

The second trick is the multi-processing route, which is also not that complicated (a few extra lines of code). I use when evaluating different parameter configuration. Each configuration is evaluated in a separate process and so it release all the GPU's memory once it is done.

nikicc on 16 May 2016

👍6

Will try these out @nikicc. Thank you so much!

Vatshank on 16 May 2016

If you want to free the Theano shared variable that contain the weights,
you must make sure the old model isn't completly deleted. Then Theano will
release the GPU memory.

Otherwise, if you have a list of the shared variable (parameters), you can
just call var.set_value(numpy.zeros((0,)* var.ndim, dtype=var.dtype). This
will delete the old parameter with an empty parameter, so it will free the
memory.

On Mon, May 16, 2016 at 1:20 PM, Vatshank Chaturvedi <
[email protected]> wrote:

Will try these out @nikicc https://github.com/nikicc. Thank you so much!

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2689#issuecomment-219486476

nouiz on 16 May 2016

👍1

Thanks @nouiz! Good to know this. Will try if this could be used through Keras.

nikicc on 17 May 2016

Someone find an elegant way to process Cross Validation with multiprocessing module ?

I have some issues to start with this module...

JGuillaumin on 22 Nov 2016

@JGuillaumin I wrote a simple wrapper that run the method in a separate process. The idea is that all you have to do — to run some method in a separate process instead of the current one — is to call it through run_in_separate_process method defined below.

Use case:

>>> import os
>>> print('Main process PID:', os.getpid())
Main process PID: 5305

>>> def foo(in_arg):
...     print('foo\'s PID:', os.getpid())
...     return in_arg + 1

>>> print('Return value:', foo(1))          # run foo in the same process
foo's PID: 5305                             # the same PID as in main
Return value: 2

>>> print('Return value:', run_in_separate_process(foo, (1, )))          # run foo in a separate process
foo's PID: 5338                             # different PID as in main
Return value: 2

Method for funning in a separate process.

from multiprocessing import Process, Queue

def run_in_separate_process(method, args):
    def queue_wrapper(q, params):
        r = method(*params)
        q.put(r)

    q = Queue()
    p = Process(target=queue_wrapper, args=(q, args))
    p.start()
    return_val = q.get()
    p.join()
    return return_val

Hope this helps!

nikicc on 22 Nov 2016

👍1

Thank you very much !
I had problems with the order q.get() and p.join().

JGuillaumin on 22 Nov 2016

I have an error from the module multiprocessing :

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
    send(obj)
PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed

This error appears at the end of the first process (after the first round of the k-fold loop).
The cell is still running (with [*] ), but without 'print' information/log from keras (or my 'print' function).

The function train_model_kfold, generates a model and train it with data and parameters passed as arguments.
It returns history from model.fit and scrores from model.evaluate.

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)

batch_size = 256
nb_epoch = 5

cvscores = []
cv_histories = []

i = 1
for train, test in kfold.split(np.zeros(y.shape[0]), y.argmax(axis=1)):

    print "start fold {}".format(i)

    history, scores = run_in_separate_process(train_model_kfold, 
                                            (X[train],y[train],
                                             X[test],y[test],
                                             X_eval,y_eval,
                                             i, batch_size,nb_epoch))

    cv_histories.append(history)
    cvscores.append(scores[1] * 100)

    i += 1

JGuillaumin on 22 Nov 2016

UPDATE :

It works when train_model_kfold returns nothing !
So I think the problem is about serialization when in run_in_separate_process, we try to get back the output of the function train_model_kfold (history and scores)

JGuillaumin on 22 Nov 2016

Seems like Keras's history is not picklable. But do you really need the whole history? If you would like to see the scores — which are stored in history.history — returning that should suffice. Can you try to return just history.history (which is a dict and hence should not cause problems when pickling) from train_model_kfold?

nikicc on 22 Nov 2016

It works with history.history ! Thank you very much

JGuillaumin on 23 Nov 2016

👍1

from keras import backend as K
After every iter:
K.clear_session()
This command destroys the current TF graph and creates a new one. So, the GPU memory will clear after every iteration.

mrzhu-cool on 3 Dec 2016

👍40 👎4 🎉2

@mrzhu-cool will give this a try.

nikicc on 4 Dec 2016

This only seems to apply to TensorFlow. Anyway currently I am unable to reproduce the bug on Theano.

nikicc on 7 Dec 2016

@mrzhu-cool solution works like a charm when using TensorFlow

jmrf on 3 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 3 Jul 2017

K.clear_session() is not working in Tensorflow backend

eugene123tw on 4 Jul 2017

👍12

stale[bot] on 2 Oct 2017

There is an easy solution when using k-fold to validate your model:

DH = 200
N_WORDS = 100000
SEQ_LENGTH = 50
N_CLASSES = 10
N_EXAMPLES = 10000

x = np.random.randint(N_WORDS, size=(N_EXAMPLES, SEQ_LENGTH))
x_test = np.random.randint(N_WORDS, size=(N_EXAMPLES, SEQ_LENGTH))
y = np.random.randint(N_CLASSES, size=(N_EXAMPLES, 1))
y = MultiLabelBinarizer().fit_transform(y)  # encode in one-hot

print('x.shape:', x.shape)
print('y.shape:', y.shape)

 model = Sequential()
model.add(Embedding(input_dim=N_WORDS, output_dim=DH, input_length=SEQ_LENGTH))
model.add(Dropout(.2))
model.add(LSTM(DH))
model.add(Dropout(.5))
model.add(Dense(N_CLASSES))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='RMSprop')

# Store initial weights
init_weights = model.get_weights()

for i in range(10):
    print('ITERATION {}'.format(i))
    model.fit(x, y, nb_epoch=1, show_accuracy=True)
    predictions = model.predict(x_test)

    #
    # here I would like to delete model and free its GPU memory
    #
    # With this line you reset the weights of your model
    model.set_weights(init_weights)

The key is: if the same model has to be trained and evaluated k-times, create once and reset its weights k-times.

Regards.

PS: I know this is a stale issue but my solution is simple and works.