keras crashing when using convolutions

Created on 10 May 2016  Â·  3Comments  Â·  Source: keras-team/keras

hi all,

recently updated to theano 0.9 and cudnn v5, things broke. downgraded 0.8.2 and things were moderately working again ( I could import theano... ). But then I tried to run some things and ran into the issue pasted below. I pull the master branch from keras and it still happens. I tried it with a lasagne mnist example, and it doesn't happen. I haven't tried downloading theano or cudnn yet.

(DL)cogniton [examples] $ python imdb_cnn.py 
Using Theano backend.
Using gpu device 0: GeForce GTX 980 (CNMeM is disabled, cuDNN 5004)
Loading data...
Downloading data from https://s3.amazonaws.com/text-datasets/imdb.pkl
33218560/33213513 [==============================] - 1s      
20000 train sequences
5000 test sequences
Pad sequences (samples x time)
X_train shape: (20000, 400)
X_test shape: (5000, 400)
Build model...
Traceback (most recent call last):
  File "imdb_cnn.py", line 84, in <module>
    validation_data=(X_test, y_test))
  File "/home/cogniton/research/code/keras/keras/models.py", line 405, in fit
    sample_weight=sample_weight)
  File "/home/cogniton/research/code/keras/keras/engine/training.py", line 996, in fit
    self._make_test_function()
  File "/home/cogniton/research/code/keras/keras/engine/training.py", line 676, in _make_test_function
    **self._function_kwargs)
  File "/home/cogniton/research/code/keras/keras/backend/theano_backend.py", line 517, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/home/cogniton/research/code/keras/keras/backend/theano_backend.py", line 503, in __init__
    **kwargs)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/compile/function.py", line 320, in function
    output_keys=output_keys)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/compile/pfunc.py", line 479, in pfunc
    output_keys=output_keys)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/compile/function_module.py", line 1777, in orig_func
tion
    defaults)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/compile/function_module.py", line 1641, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/link.py", line 690, in make_thunk
    storage_map=storage_map)[:3]
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/vm.py", line 1003, in make_all
    no_recycling))
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py", line 256, in make_thunk
    compute_map, no_recycling)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/op.py", line 970, in make_thunk
    no_recycling)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/op.py", line 879, in make_c_thunk
    output_storage=node_output_storage)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cc.py", line 1200, in make_thunk
    keep_lock=keep_lock)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cc.py", line 1143, in __compile__
    keep_lock=keep_lock)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cc.py", line 1595, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1142, in module_from_key
    module = lnk.compile_cmodule(location)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cc.py", line 1506, in compile_cmodule
    preargs=preargs)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/sandbox/cuda/nvcc_compiler.py", line 410, in compile
_str
    return dlimport(lib_filename)
  File "/home/cogniton/anaconda/envs/DL/lib/python2.7/site-packages/theano/gof/cmodule.py", line 299, in dlimport
    rval = __import__(module_name, {}, {}, [module_name])
RuntimeError: ('The following error happened while compiling the node', GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0,
 GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), conv_mode='conv', precision='float32'}
.0, Constant{1.0}, Constant{0.0}), '\n', 'could not create cuDNN handle: CUDNN_STATUS_NOT_INITIALIZED', "[GpuDnnConv{algo='smal
l', inplace=True}(<CudaNdarrayType(float32, (False, False, False, True))>, <CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(flo
at32, 4D)>, <CDataType{cudnnConvolutionDescriptor_t}>, Constant{1.0}, Constant{0.0})]")

Most helpful comment

short version:

  • I have a gtx 980
  • originally (last year) started with linux display drivers then upgraded to cuda 7.0
  • yesterday, upgraded to cudnn v5 and cudnn v5 assumes cuda 7.5
  • tried to then upgrade cuda to 7.5
  • if cuda 7.5 got mucked for any reason, it creates issues
  • assuming the cuda 7.5 install not being exactly what cudnn v5 expects, my only (desperate last attempt) solution was to completely wipe the os and reinstall to 7.5 from a fresh install

    • uninstalling cuda drivers to get a fresh install seems to be problematic (according to cuda forum posts)

longer version:

The only reported case I could find for CUDNN_STATUS_NOT_INITIALIZED was here: https://github.com/karpathy/neuraltalk2/issues/57

So that lead me to believe I had the wrong cuda version. It turns out cuDNN is for 7.5 or later. cuDNN v5 Release Candidate (RC) (April, 2016), for CUDA 7.5 and later.. So, I tried to upgrade. This lead to some other issues because installing the cuda 7.5 install script wasn't quite working.

For me, (and I am guessing) most of the driver upgrade difficulty stemmed from a crappy attempt to get ethereum working and even before that, having originally installed the drivers specific to my card (the display drivers) rather than the cuda drivers. both of those had been hiccups in functioning research code.

The internet (and accumulating frustrating experiences) pointed at having to just completely wipe the OS and reinstall because of residual traces of the drivers. (which, it wasn't so bad because I have a separate partition for the os contents..).

Hope that helps.

All 3 comments

Fixed the issue. The bug was due to drivers.

B.

Can you tell more? I have see a few report of this, it could help other.
Can you tell what your GPU was, what is the driver version you had problems
and which one worked well?

On Tue, May 10, 2016 at 5:54 AM, Brian McMahan [email protected]
wrote:

Fixed the issue. The bug was due to drivers.

B.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2678#issuecomment-218111540

short version:

  • I have a gtx 980
  • originally (last year) started with linux display drivers then upgraded to cuda 7.0
  • yesterday, upgraded to cudnn v5 and cudnn v5 assumes cuda 7.5
  • tried to then upgrade cuda to 7.5
  • if cuda 7.5 got mucked for any reason, it creates issues
  • assuming the cuda 7.5 install not being exactly what cudnn v5 expects, my only (desperate last attempt) solution was to completely wipe the os and reinstall to 7.5 from a fresh install

    • uninstalling cuda drivers to get a fresh install seems to be problematic (according to cuda forum posts)

longer version:

The only reported case I could find for CUDNN_STATUS_NOT_INITIALIZED was here: https://github.com/karpathy/neuraltalk2/issues/57

So that lead me to believe I had the wrong cuda version. It turns out cuDNN is for 7.5 or later. cuDNN v5 Release Candidate (RC) (April, 2016), for CUDA 7.5 and later.. So, I tried to upgrade. This lead to some other issues because installing the cuda 7.5 install script wasn't quite working.

For me, (and I am guessing) most of the driver upgrade difficulty stemmed from a crappy attempt to get ethereum working and even before that, having originally installed the drivers specific to my card (the display drivers) rather than the cuda drivers. both of those had been hiccups in functioning research code.

The internet (and accumulating frustrating experiences) pointed at having to just completely wipe the OS and reinstall because of residual traces of the drivers. (which, it wasn't so bad because I have a separate partition for the os contents..).

Hope that helps.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

harishkrishnav picture harishkrishnav  Â·  3Comments

kylemcdonald picture kylemcdonald  Â·  3Comments

zygmuntz picture zygmuntz  Â·  3Comments

Imorton-zd picture Imorton-zd  Â·  3Comments

yil8 picture yil8  Â·  3Comments