I am doing 10-fold cross validation and I noticed that the GPU memory requirement is rising through iterations. The graph of used memory through 10 folds on a randomly generated toy data set (look the code below) looks like this:
Is there a way to release the GPU memory after each fold? After a fold has finished, I do not need a model any more nor do I need any weights. The current behaviour prevents me to run 10-fold cross validation on my data sets due to out of memory error
. Note, that I have enough memory to run the first iteration and that the problem occur in some later iteration due to raising memory requirements. Is there some work-around?
I've tried deleting the model with del model
and I also run garbage collection but none of this seems to work.
DH = 200
N_WORDS = 100000
SEQ_LENGTH = 50
N_CLASSES = 10
N_EXAMPLES = 10000
x = np.random.randint(N_WORDS, size=(N_EXAMPLES, SEQ_LENGTH))
x_test = np.random.randint(N_WORDS, size=(N_EXAMPLES, SEQ_LENGTH))
y = np.random.randint(N_CLASSES, size=(N_EXAMPLES, 1))
y = MultiLabelBinarizer().fit_transform(y) # encode in one-hot
print('x.shape:', x.shape)
print('y.shape:', y.shape)
for i in range(10):
print('ITERATION {}'.format(i))
model = Sequential()
model.add(Embedding(input_dim=N_WORDS, output_dim=DH, input_length=SEQ_LENGTH))
model.add(Dropout(.2))
model.add(LSTM(DH))
model.add(Dropout(.5))
model.add(Dense(N_CLASSES))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='RMSprop')
model.fit(x, y, nb_epoch=1, show_accuracy=True)
predictions = model.predict(x_test)
#
# here I would like to delete model and free its GPU memory
#
I met the same problem too, when training model with a large corpus. I sliced the data into many parts and using a fit() to feed one part into the model. When a fit() was called, the memory increased. After too many fit(), a out of memory error happened. I'm still looking for a solution.
Hi, nikicc.
I find the solution. Use theano flag "cnmem=0.7" will fix the max usage of GPU memory to 70% of total memory. If you set it to 0 or do not use this flag, theano will use enough memory it needs.
@Shen256 I tried this some time ago and it did not work for me.
This don't set the max memory to use. It set the size of the first
allocation we do on the GPU that we will reuse. If this is not enough
Theano will allocate more.
Le mer. 11 mai 2016 05:37, nikicc [email protected] a écrit :
@Shen256 https://github.com/Shen256 I tried this some time ago and it
did not work for me.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2689#issuecomment-218410202
If you don't need the weights, you could launch every new model in a new process, fit it, gather the results (metrics) and terminate the process. When a process is terminated, the GPU memory is released. It should be possible using the multiprocessing
module. For a small problem and if you have enough space on your GPU you could even try using more than 1 process. I'm not sure about how Theano is managing the locks and allocate memory when using multiple processes but I tried it with celery and it seems to work correctly.
Theano support that. There could be delays to compile c code thef first
time (and warning about that), but when the cache contain what you need
then there is no problem.
If you use lib.cnmem, just don't forget to make it low enough so that you
can have multiple GPU share the GPU memory.
On Wed, May 11, 2016 at 10:25 AM, Thomas Boquet [email protected]
wrote:
If you don't need the weights, you could launch every new model in a new
process, fit it, gather the results (metrics) and terminate the process.
When a process is terminated, the GPU memory is released. It should be
possible using the multiprocessing module. For a small problem and if you
have enough space on your GPU you could even try using more than 1 process.
I'm not sure about how Theano is managing the locks and allocate memory
when using multiple processes but I tried it with celery and it seems to
work correctly.—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2689#issuecomment-218475604
@nouiz tx for the confirmation!
Thanks everyone! I think I will go with multiprocessing
for now 😄
I am running into the same problem as @nikicc here when I am trying to grid search for hyper-parameters. A set of hyper-parameters (depth/number of nodes in each layer) which work fine when that is the only model run fails while grid-searching in a for loop (the model is initialized in the loop)
Does anyone know if keras handles this issue? Or if it can be handled directly in theano? I'd prefer not going the multi-processing route (Also, did that work for you @nikicc ). Thanks!
@Vatshank I am not sure if Theano/Keras currently handle this issue. I am handling it myself using two tricks.
Firstly, when evaluating one parameter configuration in cross validation I'm reusing the model instead of recompiling it for every fold. I first compile the model and then store its initial weights before any learning was done. Then I train it for one fold and next reinitialise the weights to its initial state (using set_weights
) so it is prepared for the next fold. Doing to, I am able to preform, for example, 10-fold cross validation with only one model and only one compiling. Beware that doing so the random initialisation of weights will be the same in each fold.
The second trick is the multi-processing route, which is also not that complicated (a few extra lines of code). I use when evaluating different parameter configuration. Each configuration is evaluated in a separate process and so it release all the GPU's memory once it is done.
Will try these out @nikicc. Thank you so much!
If you want to free the Theano shared variable that contain the weights,
you must make sure the old model isn't completly deleted. Then Theano will
release the GPU memory.
Otherwise, if you have a list of the shared variable (parameters), you can
just call var.set_value(numpy.zeros((0,)* var.ndim, dtype=var.dtype). This
will delete the old parameter with an empty parameter, so it will free the
memory.
On Mon, May 16, 2016 at 1:20 PM, Vatshank Chaturvedi <
[email protected]> wrote:
Will try these out @nikicc https://github.com/nikicc. Thank you so much!
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2689#issuecomment-219486476
Thanks @nouiz! Good to know this. Will try if this could be used through Keras.
Someone find an elegant way to process Cross Validation with multiprocessing
module ?
I have some issues to start with this module...
@JGuillaumin I wrote a simple wrapper that run the method in a separate process. The idea is that all you have to do — to run some method in a separate process instead of the current one — is to call it through run_in_separate_process
method defined below.
Use case:
>>> import os
>>> print('Main process PID:', os.getpid())
Main process PID: 5305
>>> def foo(in_arg):
... print('foo\'s PID:', os.getpid())
... return in_arg + 1
>>> print('Return value:', foo(1)) # run foo in the same process
foo's PID: 5305 # the same PID as in main
Return value: 2
>>> print('Return value:', run_in_separate_process(foo, (1, ))) # run foo in a separate process
foo's PID: 5338 # different PID as in main
Return value: 2
Method for funning in a separate process.
from multiprocessing import Process, Queue
def run_in_separate_process(method, args):
def queue_wrapper(q, params):
r = method(*params)
q.put(r)
q = Queue()
p = Process(target=queue_wrapper, args=(q, args))
p.start()
return_val = q.get()
p.join()
return return_val
Hope this helps!
Thank you very much !
I had problems with the order q.get() and p.join().
I have an error from the module multiprocessing
:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
send(obj)
PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed
This error appears at the end of the first process (after the first round of the k-fold loop).
The cell is still running (with [*] ), but without 'print' information/log from keras (or my 'print' function).
The function train_model_kfold
, generates a model and train it with data and parameters passed as arguments.
It returns history
from model.fit
and scrores
from model.evaluate
.
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
batch_size = 256
nb_epoch = 5
cvscores = []
cv_histories = []
i = 1
for train, test in kfold.split(np.zeros(y.shape[0]), y.argmax(axis=1)):
print "start fold {}".format(i)
history, scores = run_in_separate_process(train_model_kfold,
(X[train],y[train],
X[test],y[test],
X_eval,y_eval,
i, batch_size,nb_epoch))
cv_histories.append(history)
cvscores.append(scores[1] * 100)
i += 1
UPDATE :
It works when train_model_kfold
returns nothing !
So I think the problem is about serialization when in run_in_separate_process
, we try to get back the output of the function train_model_kfold
(history and scores)
Seems like Keras's history is not picklable. But do you really need the whole history? If you would like to see the scores — which are stored in history.history
— returning that should suffice. Can you try to return just history.history
(which is a dict and hence should not cause problems when pickling) from train_model_kfold
?
It works with history.history
! Thank you very much
from keras import backend as K
After every iter:
K.clear_session()
This command destroys the current TF graph and creates a new one. So, the GPU memory will clear after every iteration.
@mrzhu-cool will give this a try.
This only seems to apply to TensorFlow. Anyway currently I am unable to reproduce the bug on Theano.
@mrzhu-cool solution works like a charm when using TensorFlow
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
K.clear_session() is not working in Tensorflow backend
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
There is an easy solution when using k-fold to validate your model:
DH = 200
N_WORDS = 100000
SEQ_LENGTH = 50
N_CLASSES = 10
N_EXAMPLES = 10000
x = np.random.randint(N_WORDS, size=(N_EXAMPLES, SEQ_LENGTH))
x_test = np.random.randint(N_WORDS, size=(N_EXAMPLES, SEQ_LENGTH))
y = np.random.randint(N_CLASSES, size=(N_EXAMPLES, 1))
y = MultiLabelBinarizer().fit_transform(y) # encode in one-hot
print('x.shape:', x.shape)
print('y.shape:', y.shape)
model = Sequential()
model.add(Embedding(input_dim=N_WORDS, output_dim=DH, input_length=SEQ_LENGTH))
model.add(Dropout(.2))
model.add(LSTM(DH))
model.add(Dropout(.5))
model.add(Dense(N_CLASSES))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='RMSprop')
# Store initial weights
init_weights = model.get_weights()
for i in range(10):
print('ITERATION {}'.format(i))
model.fit(x, y, nb_epoch=1, show_accuracy=True)
predictions = model.predict(x_test)
#
# here I would like to delete model and free its GPU memory
#
# With this line you reset the weights of your model
model.set_weights(init_weights)
The key is: if the same model has to be trained and evaluated k-times, create once and reset its weights k-times.
Regards.
PS: I know this is a stale issue but my solution is simple and works.
@mrzhu-cool this worked for me. Thanks!
I am on Keras 2.0.6 and TF 1.2.1.
@mrzhu-cool will this clear the weights as well?
Most helpful comment
from keras import backend as K
After every iter:
K.clear_session()
This command destroys the current TF graph and creates a new one. So, the GPU memory will clear after every iteration.