Keras: CUDNN_STATUS_MAPPING_ERROR

Created on 29 Mar 2016 · 8Comments · Source: keras-team/keras

Hello,

I have a larger Graph-Model that runs fine with a sequence_length of 500 . If I change the sequence_length to 5000, I get a CUDNN_STATUS_MAPPING_ERROR. I tried it twice, and the error happens at exactly the same iteration, stacktrace is below.

The GPU is a Titan X with 12. GB Memory

What can I do to trace the error further?

Thanks, Ernst

Epoch 4096/10000
1/2 [==============>...............] - ETA: 2s - loss: 0.3742Traceback (most recent call last):
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
RuntimeError: GpuDnnConvGradI: error doing operation: CUDNN_STATUS_MAPPING_ERROR

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "UFCNN1_5000.py", line 977, in
nb_epoch=epoch)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/models.py", line 1795, in fit_generator
accuracy=show_accuracy)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/models.py", line 1475, in train_on_batch
return self._train(ins)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/backend/theano_backend.py", line 450, in __call__
return self.function(*inputs)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
RuntimeError: GpuDnnConvGradI: error doing operation: CUDNN_STATUS_MAPPING_ERROR
Apply node that caused the error: GpuDnnConvGradI{algo='time_once', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='full', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 283
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, (False, False, False, True)), , Scalar(float32), Scalar(float32)]
Inputs shapes: [(3, 150, 5000, 1), (1, 3, 58460, 1), (1, 150, 53461, 1), 'No shapes', (), ()]
Inputs strides: [(750000, 5000, 1, 0), (0, 58460, 1, 0), (0, 53461, 1, 0), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', , 1.0, 0.0]
Inputs name: ('kernel', 'grad', 'output', 'descriptor', 'alpha', 'beta')

Outputs clients: [[GpuDimShuffle{0,1,2,x}(GpuDnnConvGradI{algo='time_once', inplace=True}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Source

ErnstTmp

Most helpful comment

I faced the same error and figured out my pytorch version does not match CUDA version on my machine. I installed newer version of pytorch and it worked.
CUDA version 9.2.88
pytorch 0.4.1

NoushNabi on 8 Nov 2018

👍2

All 8 comments

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.

NasenSpray on 29 Mar 2016

try lib.cnmem=0.9 or lower then the value you used.

On Tue, Mar 29, 2016 at 3:49 PM, NasenSpray [email protected]
wrote:

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2124#issuecomment-203070187

nouiz on 29 Mar 2016

Frederic,

thank you very much, I 'll try that now!

Thanks
Ernst

On 03/29/2016 10:30 PM, Frédéric Bastien wrote:

try lib.cnmem=0.9 or lower then the value you used.

On Tue, Mar 29, 2016 at 3:49 PM, NasenSpray [email protected]
wrote:

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2124#issuecomment-203070187

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2124#issuecomment-203087058

ErnstTmp on 29 Mar 2016

Maybe you are running out of the GPU memory. Set cnmem lower.

fluency03 on 30 Mar 2016

Chang,

thank you very much, I 'll do that!

Thanks,
Ernst

On 03/30/2016 02:40 PM, Chang Liu wrote:

You are running our of the GPU memory. Set cnmem lower.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2124#issuecomment-203412768

ErnstTmp on 30 Mar 2016

Hi Guys,
thank you very much for your help.

in my current optimisation, I had this error at epoch 410 with CNMeM Limit 90 %.

I changed that to 80 % and rerun the same optimisation, and I got the same error at the same iteration, see below. So it might look like the CNMEM setting is understood by the GPU, but changing it did not change the error.

Kind regards
Ernst

Epoch 410/500
11/20 [===============>..............] - ETA: 26s - loss: 0.5517 - acc: 0.7910Using Theano backend.
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 80.0% of memory, CuDNN 4007)
Traceback (most recent call last):
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()

ErnstTmp on 2 Apr 2016

For the record: switching to CUDNN 5.0, CUDA 7.5.18 and Theano 0.9 dev (at ubuntu 14.04) seems to have removed the problem.

Thanks,
Ernst

ErnstTmp on 24 Apr 2016

👍1

I faced the same error and figured out my pytorch version does not match CUDA version on my machine. I installed newer version of pytorch and it worked.
CUDA version 9.2.88
pytorch 0.4.1

NoushNabi on 8 Nov 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings