Keras: Out of memory error

Created on 21 May 2015 · 4Comments · Source: keras-team/keras

I'm trying to run the LSTM network from the imdb example on my own data.

I have more training examples (100k), long sentences (300 instead of 100 maxlen), and slightly larger vocabulary (23000 instead of 20000)

Now - I get the out-of-memory error below (although only after finishing a whole epoch)

This is on a 4GB GTX 760 card.

My question is: What determines the memory usage of this network? Obviously a larger vocab means a larger embed layer.

Do longer sentences mean a bigger network? i.e. are lstm nodes unwound in time for ALL words? Is there such a thing as a "window size"? Or is the back-propagation in time "virtual" and does not increase memory used?

Do more training examples mean more memory usage? I thought only one mini-batch was loaded at a time?

Otherwise thanks for a great library!

Train on 94990 samples, validate on 10555 samples
Epoch 0
WARNING: unused streams above 2048 (Tune GPU_mrg get_n_streams)
94976/94990 [============================>.] - ETA: 0s - loss: 0.2739 - acc.: 0.9235Error allocating 3242496000 bytes of device memory (out of memory). Driver report 876146688 bytes free and 4294246400 bytes total
Traceback (most recent call last):
  File "nntrain.py", line 45, in <module>
    model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=5, validation_split=0.1, show_accuracy=True)
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 206, in fit
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 132, in test
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/compile/function_module.py", line 588, in __call__
    self.fn.thunks[self.fn.position_of_error])
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/compile/function_module.py", line 579, in __call__
    outputs = self.fn()
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/gof/op.py", line 644, in rval
    r = p(n, [x[0] for x in i], o)
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/sandbox/cuda/basic_ops.py", line 2223, in perform
    out[0] = x.reshape(tuple(shp))
MemoryError: Error allocating 3242496000 bytes of device memory (out of memory).
Apply node that caused the error: GpuReshape{2}(GpuDimShuffle{1,0,2}.0, MakeVector.0)
Inputs types: [CudaNdarrayType(float32, 3D), TensorType(int64, vector)]
Inputs shapes: [(300, 10555, 256), (2,)]
Inputs strides: [(256, 76800, 1), (8,)]
Inputs scalar values: ['not scalar', 'not scalar']

Source

gromgull

Most helpful comment

Error allocating 3242496000 bytes of device memory (out of memory). Driver report 876146688 bytes free and 4294246400 bytes total

This tells you everything you need to know. Your GPU does not have enough memory for this task.

Things you can try:

close applications that might be using your GPU (your GPU has 4.3GB of memory, you're trying to allocate 3.2GB which should fit in theory)
reduce the batch size (how many samples are loaded on the GPU at a time)
if the batch size is already small, it means your network simply doesn't fit in your GPU memory. In that case, reduce the network size.

Alternative solutions...

Run your code on CPU
Get a bigger GPU

Obviously a larger vocab means a larger embed layer.

Yes.

Do longer sentences mean a bigger network?

No, the network size will be the same, but each sample will be larger therefore you will be using more memory to load each batch.

fchollet on 21 May 2015

👍14 🎉4

All 4 comments

Error allocating 3242496000 bytes of device memory (out of memory). Driver report 876146688 bytes free and 4294246400 bytes total

This tells you everything you need to know. Your GPU does not have enough memory for this task.

Things you can try:

close applications that might be using your GPU (your GPU has 4.3GB of memory, you're trying to allocate 3.2GB which should fit in theory)
reduce the batch size (how many samples are loaded on the GPU at a time)
if the batch size is already small, it means your network simply doesn't fit in your GPU memory. In that case, reduce the network size.

Alternative solutions...

Run your code on CPU
Get a bigger GPU

Obviously a larger vocab means a larger embed layer.

Yes.

Do longer sentences mean a bigger network?

No, the network size will be the same, but each sample will be larger therefore you will be using more memory to load each batch.

fchollet on 21 May 2015

👍14 🎉4

Thanks for the feedback - I just wonder if there may be "memory leak" or something - since the network in theory fits on my GPU and it successfully trained one epoch before failing.

I reduced the sequence length to 100 - then it trained two epochs before failing.

I am quite ignorant about GPU programming - but could there be memory that isn't "freed" after each epoch?

Would settings the truncate_gradient option for the LSTM layer reduce memory consumption?

gromgull on 22 May 2015

100 seems large for a LSTM. Try 32...

Garbage collection is not instantaneous, so if you're working close to the memory limit you have a very high risk to get out of memory even though your work fits in memory "in theory".