Keras: Out of memory issues (GPU)

Created on 29 Jan 2016  Â·  14Comments  Â·  Source: keras-team/keras

I'm training a model with Theano/CUDA, and if I attempt to specify a large batch_size (1024 in my case), it reports an out of memory error, which is understandable. However, if I change it back to a size that previously worked (I'm doing it in a notebook), it will still be out of memory, as if it didn't attempt to free whatever it allocated for the previous attempt, so I'm forced to restart the Python process (and reload all data/recompile models).

I can provide model code if needed, its on a laptop that does not have internet access currently.

stale

Most helpful comment

Hitting this as well. I've been playing with a trivial model in jupyter and I've observed that while iterating in jupyter my GPU used memory (as reported by nvidia-settings) steadily increases in ~200mb increments until finally theano generates an out of memory error.

Calling del model, gc.collect() seems to have no effect on used GPU memory. Restarting the jupyter kernel resolves it, but is undesirable.

All 14 comments

It sounds like a problem with IPython Notebook, maybe it doesn't release allocated resources (memory). It happened to me before.
I suggest you develop your model in a text editor or IDE, and run it through console.

It is possible that cuda driver don't like this and stay in a bad state. It
isn't able to recover from all type of error. I'm not sure which one it
recover correctly and which one it don't.

Calling a_theano_function.free() could help free the memory, but as it was
said, running it outside ipython notebook would fix those problems.

On Fri, Jan 29, 2016 at 7:17 AM, Marko Jocić [email protected]
wrote:

It sounds like a problem with IPython Notebook, maybe it doesn't release
allocated resources (memory). It happened to me before.
I suggest you develop your model in a text editor or IDE, and run it
through console.

—
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/1587#issuecomment-176726753.

If I run it as a complete python program then it obviously wouldn't be a problem because the exception would terminate the python process, but that defeats the purpose of being able to keep training the model and checking the results without having to recompile it every time.

I'm pretty sure running it in python shell would be the same as using the notebook, though.

It seems that it starts allocating large amounts of memory, but when it runs out it throws an exception and doesn't free the memory. I don't know if forcing garbage collection would help, but that theano free function looks like it would help, thanks.

Hitting this as well. I've been playing with a trivial model in jupyter and I've observed that while iterating in jupyter my GPU used memory (as reported by nvidia-settings) steadily increases in ~200mb increments until finally theano generates an out of memory error.

Calling del model, gc.collect() seems to have no effect on used GPU memory. Restarting the jupyter kernel resolves it, but is undesirable.

Has anyone managed to find a way around this (i.e. without restarting the kernel) ?

@MannyKayy deleting all IPython references to the model and calling the garbage collector works for me (using theano):

%xdel model
import gc
for i in range(3): gc.collect()

having this issue myself with tensorflow, above^ method doesn't solve issue in notebook.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

still no progress on this? I'm having the same issue...using tf and keras

I have the same issue. Can't find any solution except to restart the notebook.

+1. Struggling with this a lot.

Same for me. It would be great to not restart the notebook every time

Same here. Any solution better than restart the kernel of the notebook?

Why no one can solve this serious problem? My method is to shutdown the jupyter kernal everytime, too waste time.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

snakeztc picture snakeztc  Â·  3Comments

braingineer picture braingineer  Â·  3Comments

LuCeHe picture LuCeHe  Â·  3Comments

kylemcdonald picture kylemcdonald  Â·  3Comments

somewacko picture somewacko  Â·  3Comments