Hello,
I have a larger Graph-Model that runs fine with a sequence_length of 500 . If I change the sequence_length to 5000, I get a CUDNN_STATUS_MAPPING_ERROR. I tried it twice, and the error happens at exactly the same iteration, stacktrace is below.
The GPU is a Titan X with 12. GB Memory
What can I do to trace the error further?
Thanks, Ernst
Epoch 4096/10000
1/2 [==============>...............] - ETA: 2s - loss: 0.3742Traceback (most recent call last):
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
RuntimeError: GpuDnnConvGradI: error doing operation: CUDNN_STATUS_MAPPING_ERROR
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "UFCNN1_5000.py", line 977, in
nb_epoch=epoch)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/models.py", line 1795, in fit_generator
accuracy=show_accuracy)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/models.py", line 1475, in train_on_batch
return self._train(ins)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/backend/theano_backend.py", line 450, in __call__
return self.function(*inputs)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
RuntimeError: GpuDnnConvGradI: error doing operation: CUDNN_STATUS_MAPPING_ERROR
Apply node that caused the error: GpuDnnConvGradI{algo='time_once', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='full', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 283
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, (False, False, False, True)),
Inputs shapes: [(3, 150, 5000, 1), (1, 3, 58460, 1), (1, 150, 53461, 1), 'No shapes', (), ()]
Inputs strides: [(750000, 5000, 1, 0), (0, 58460, 1, 0), (0, 53461, 1, 0), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown',
Inputs name: ('kernel', 'grad', 'output', 'descriptor', 'alpha', 'beta')
Outputs clients: [[GpuDimShuffle{0,1,2,x}(GpuDnnConvGradI{algo='time_once', inplace=True}.0)]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.
Report it to Nvidia.
try lib.cnmem=0.9 or lower then the value you used.
On Tue, Mar 29, 2016 at 3:49 PM, NasenSpray [email protected]
wrote:
An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.Report it to Nvidia.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2124#issuecomment-203070187
Frederic,
thank you very much, I 'll try that now!
Thanks
Ernst
On 03/29/2016 10:30 PM, Frédéric Bastien wrote:
try lib.cnmem=0.9 or lower then the value you used.
On Tue, Mar 29, 2016 at 3:49 PM, NasenSpray [email protected]
wrote:An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.Report it to Nvidia.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2124#issuecomment-203070187—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2124#issuecomment-203087058
Maybe you are running out of the GPU memory. Set cnmem lower.
Chang,
thank you very much, I 'll do that!
Thanks,
Ernst
On 03/30/2016 02:40 PM, Chang Liu wrote:
You are running our of the GPU memory. Set cnmem lower.
—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2124#issuecomment-203412768
Hi Guys,
thank you very much for your help.
in my current optimisation, I had this error at epoch 410 with CNMeM Limit 90 %.
I changed that to 80 % and rerun the same optimisation, and I got the same error at the same iteration, see below. So it might look like the CNMEM setting is understood by the GPU, but changing it did not change the error.
Kind regards
Ernst
Epoch 410/500
11/20 [===============>..............] - ETA: 26s - loss: 0.5517 - acc: 0.7910Using Theano backend.
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 80.0% of memory, CuDNN 4007)
Traceback (most recent call last):
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
For the record: switching to CUDNN 5.0, CUDA 7.5.18 and Theano 0.9 dev (at ubuntu 14.04) seems to have removed the problem.
Thanks,
Ernst
I faced the same error and figured out my pytorch version does not match CUDA version on my machine. I installed newer version of pytorch and it worked.
CUDA version 9.2.88
pytorch 0.4.1
Most helpful comment
I faced the same error and figured out my pytorch version does not match CUDA version on my machine. I installed newer version of pytorch and it worked.
CUDA version 9.2.88
pytorch 0.4.1