hi all,
recently updated to theano 0.9 and cudnn v5, things broke. downgraded 0.8.2 and things were moderately working again ( I could import theano... ). But then I tried to run some things and ran into the issue pasted below. I pull the master branch from keras and it still happens. I tried it with a lasagne mnist example, and it doesn't happen. I haven't tried downloading theano or cudnn yet.
(DL)cogniton [examples] $ python imdb_cnn.py
Using Theano backend.
Using gpu device 0: GeForce GTX 980 (CNMeM is disabled, cuDNN 5004)
Loading data...
Downloading data from https://s3.amazonaws.com/text-datasets/imdb.pkl
33218560/33213513 [==============================] - 1s
20000 train sequences
5000 test sequences
Pad sequences (samples x time)
X_train shape: (20000, 400)
X_test shape: (5000, 400)
Build model...
Traceback (most recent call last):
File "imdb_cnn.py", line 84, in <module>
validation_data=(X_test, y_test))
File "/home/cogniton/research/code/keras/keras/models.py", line 405, in fit
sample_weight=sample_weight)
File "/home/cogniton/research/code/keras/keras/engine/training.py", line 996, in fit
self._make_test_function()
File "/home/cogniton/research/code/keras/keras/engine/training.py", line 676, in _make_test_function
**self._function_kwargs)
File "/home/cogniton/research/code/keras/keras/backend/theano_backend.py", line 517, in function
return Function(inputs, outputs, updates=updates, **kwargs)
File "/home/cogniton/research/code/keras/keras/backend/theano_backend.py", line 503, in __init__
**kwargs)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/compile/function.py", line 320, in function
output_keys=output_keys)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/compile/pfunc.py", line 479, in pfunc
output_keys=output_keys)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/compile/function_module.py", line 1777, in orig_func
tion
defaults)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/compile/function_module.py", line 1641, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/link.py", line 690, in make_thunk
storage_map=storage_map)[:3]
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/vm.py", line 1003, in make_all
no_recycling))
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py", line 256, in make_thunk
compute_map, no_recycling)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/op.py", line 970, in make_thunk
no_recycling)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/op.py", line 879, in make_c_thunk
output_storage=node_output_storage)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cc.py", line 1200, in make_thunk
keep_lock=keep_lock)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cc.py", line 1143, in __compile__
keep_lock=keep_lock)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cc.py", line 1595, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1142, in module_from_key
module = lnk.compile_cmodule(location)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cc.py", line 1506, in compile_cmodule
preargs=preargs)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/sandbox/cuda/nvcc_compiler.py", line 410, in compile
_str
return dlimport(lib_filename)
File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cmodule.py", line 299, in dlimport
rval = __import__(module_name, {}, {}, [module_name])
RuntimeError: ('The following error happened while compiling the node', GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0,
GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), conv_mode='conv', precision='float32'}
.0, Constant{1.0}, Constant{0.0}), '\n', 'could not create cuDNN handle: CUDNN_STATUS_NOT_INITIALIZED', "[GpuDnnConv{algo='smal
l', inplace=True}(<CudaNdarrayType(float32, (False, False, False, True))>, <CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(flo
at32, 4D)>, <CDataType{cudnnConvolutionDescriptor_t}>, Constant{1.0}, Constant{0.0})]")
Fixed the issue. The bug was due to drivers.
B.
Can you tell more? I have see a few report of this, it could help other.
Can you tell what your GPU was, what is the driver version you had problems
and which one worked well?
On Tue, May 10, 2016 at 5:54 AM, Brian McMahan [email protected]
wrote:
Fixed the issue. The bug was due to drivers.
B.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2678#issuecomment-218111540
short version:
longer version:
The only reported case I could find for CUDNN_STATUS_NOT_INITIALIZED
was here: https://github.com/karpathy/neuraltalk2/issues/57
So that lead me to believe I had the wrong cuda version. It turns out cuDNN is for 7.5 or later. cuDNN v5 Release Candidate (RC) (April, 2016), for CUDA 7.5 and later.
. So, I tried to upgrade. This lead to some other issues because installing the cuda 7.5 install script wasn't quite working.
For me, (and I am guessing) most of the driver upgrade difficulty stemmed from a crappy attempt to get ethereum working and even before that, having originally installed the drivers specific to my card (the display drivers) rather than the cuda drivers. both of those had been hiccups in functioning research code.
The internet (and accumulating frustrating experiences) pointed at having to just completely wipe the OS and reinstall because of residual traces of the drivers. (which, it wasn't so bad because I have a separate partition for the os contents..).
Hope that helps.
Most helpful comment
short version:
longer version:
The only reported case I could find for
CUDNN_STATUS_NOT_INITIALIZED
was here: https://github.com/karpathy/neuraltalk2/issues/57So that lead me to believe I had the wrong cuda version. It turns out cuDNN is for 7.5 or later.
cuDNN v5 Release Candidate (RC) (April, 2016), for CUDA 7.5 and later.
. So, I tried to upgrade. This lead to some other issues because installing the cuda 7.5 install script wasn't quite working.For me, (and I am guessing) most of the driver upgrade difficulty stemmed from a crappy attempt to get ethereum working and even before that, having originally installed the drivers specific to my card (the display drivers) rather than the cuda drivers. both of those had been hiccups in functioning research code.
The internet (and accumulating frustrating experiences) pointed at having to just completely wipe the OS and reinstall because of residual traces of the drivers. (which, it wasn't so bad because I have a separate partition for the os contents..).
Hope that helps.