Spleeter: [Bug] cuDNN failed to initialize

Created on 5 Apr 2020 · 10Comments · Source: deezer/spleeter

Description

I know this has been discussed before, but maybe someone has a clear fix to this.

One of my machines doesn't want to process any audio files and I get the following error message:

(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_7/Conv2D}}]]

Step to reproduce

Installed using conda
Run as python -m
Got above error

Output

Traceback (most recent call last):
  File "c:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
    return fn(*args)
  File "c:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "c:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv2d_7/Conv2D}}]]
     [[strided_slice_21/_307]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv2d_7/Conv2D}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\user\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\Users\user\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\Users\user\Anaconda3\lib\site-packages\spleeter\__main__.py", line 58, in <module>
    entrypoint()
  File "c:\Users\user\Anaconda3\lib\site-packages\spleeter\__main__.py", line 54, in entrypoint
    main(sys.argv)
  File "c:\Users\user\Anaconda3\lib\site-packages\spleeter\__main__.py", line 46, in main
    entrypoint(arguments, params)
  File "c:\Users\user\Anaconda3\lib\site-packages\spleeter\commands\separate.py", line 45, in entrypoint
    synchronous=False
  File "c:\Users\user\Anaconda3\lib\site-packages\spleeter\separator.py", line 191, in separate_to_file
    sources = self.separate(waveform, audio_descriptor)
  File "c:\Users\user\Anaconda3\lib\site-packages\spleeter\separator.py", line 155, in separate
    return self.separate_tensorflow(waveform, audio_descriptor)
  File "c:\Users\user\Anaconda3\lib\site-packages\spleeter\separator.py", line 106, in separate_tensorflow
    'audio_id': audio_descriptor})
  File "c:\Users\user\Anaconda3\lib\site-packages\tensorflow\contrib\predictor\predictor.py", line 77, in __call__
    return self._session.run(fetches=self.fetch_tensors, feed_dict=feed_dict)
  File "c:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
    run_metadata_ptr)
  File "c:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "c:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
    run_metadata)
  File "c:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node conv2d_7/Conv2D (defined at \site-packages\spleeter\utils\estimator.py:80) ]]
     [[strided_slice_21/_307]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node conv2d_7/Conv2D (defined at \site-packages\spleeter\utils\estimator.py:80) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'conv2d_7/Conv2D':
  File "\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "\site-packages\spleeter\__main__.py", line 58, in <module>
    entrypoint()
  File "\site-packages\spleeter\__main__.py", line 54, in entrypoint
    main(sys.argv)
  File "\site-packages\spleeter\__main__.py", line 46, in main
    entrypoint(arguments, params)
  File "\site-packages\spleeter\commands\separate.py", line 45, in entrypoint
    synchronous=False
  File "\site-packages\spleeter\separator.py", line 191, in separate_to_file
    sources = self.separate(waveform, audio_descriptor)
  File "\site-packages\spleeter\separator.py", line 155, in separate
    return self.separate_tensorflow(waveform, audio_descriptor)
  File "\site-packages\spleeter\separator.py", line 103, in separate_tensorflow
    predictor = self._get_predictor()
  File "\site-packages\spleeter\separator.py", line 75, in _get_predictor
    self._predictor = to_predictor(estimator)
  File "\site-packages\spleeter\utils\estimator.py", line 80, in to_predictor
    return predictor.from_saved_model(latest)
  File "\site-packages\tensorflow\contrib\predictor\predictor_factories.py", line 153, in from_saved_model
    config=config)
  File "\site-packages\tensorflow\contrib\predictor\saved_model_predictor.py", line 153, in __init__
    loader.load(self._session, tags.split(','), export_dir)
  File "\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "\site-packages\tensorflow\python\saved_model\loader_impl.py", line 269, in load
    return loader.load(sess, tags, import_scope, **saver_kwargs)
  File "\site-packages\tensorflow\python\saved_model\loader_impl.py", line 422, in load
    **saver_kwargs)
  File "\site-packages\tensorflow\python\saved_model\loader_impl.py", line 352, in load_graph
    meta_graph_def, import_scope=import_scope, **saver_kwargs)
  File "\site-packages\tensorflow\python\training\saver.py", line 1473, in _import_meta_graph_with_return_elements
    **kwargs))
  File "\site-packages\tensorflow\python\framework\meta_graph.py", line 857, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "\site-packages\tensorflow\python\framework\importer.py", line 443, in import_graph_def
    _ProcessNewOps(graph)
  File "\site-packages\tensorflow\python\framework\importer.py", line 236, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "\site-packages\tensorflow\python\framework\ops.py", line 3751, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "\site-packages\tensorflow\python\framework\ops.py", line 3751, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "\site-packages\tensorflow\python\framework\ops.py", line 3641, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "\site-packages\tensorflow\python\framework\ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Environment

| | |
| ----------------- | ------------------------------- |
| OS | Windows 10 |
| Installation type | Conda |
| RAM available | 8 GB VRAM / 8 GB RAM |
| Hardware spec | RTX2080 / i7 960 |

GPU bug conda

Source

aidv

Most helpful comment

The older docker images do not work with tensorflow 1.15 due to cuda incompatibility. We will push newer images with updated dependencies.

alreadytaikeune on 12 Apr 2020

👍2

All 10 comments

Ok so I ran conda install -c conda-forge spleeter-gpu

And upon closer inspection I get the following

The following packages will be SUPERSEDED by a higher-priority channel:

certifi pkgs/main --> conda-forge cudatoolkit anaconda::cudatoolkit-10.1.243-h74a97~ --> pkgs/main::cudatoolkit-10.0.130-0 cudnn anaconda::cudnn-7.6.5-cuda10.1_0 --> pkgs/main::cudnn-7.6.5-cuda10.0_0 tensorboard anaconda/noarch::tensorboard-2.1.0-py~ --> conda-forge/win-64::tensorboard-1.14.0-py37_0 tensorflow anaconda::tensorflow-2.1.0-gpu_py37h7~ --> pkgs/main::tensorflow-1.14.0-gpu_py37h5512b17_0 tensorflow-base anaconda::tensorflow-base-2.1.0-gpu_p~ --> pkgs/main::tensorflow-base-1.14.0-gpu_py37h55fc52a_0 tensorflow-estima~ anaconda/noarch::tensorflow-estimator~ --> conda-forge/win-64::tensorflow-estimator-1.14.0-py37h5ca1d4c_0 tensorflow-gpu anaconda::tensorflow-gpu-2.1.0-h0d30e~ --> pkgs/main::tensorflow-gpu-1.14.0-h0d30ee6_0
Shouldn't it install TF 1.15? It seems to be installing 1.14.

aidv on 6 Apr 2020

The older docker images do not work with tensorflow 1.15 due to cuda incompatibility. We will push newer images with updated dependencies.

alreadytaikeune on 12 Apr 2020

👍2

@alreadytaikeune Would installing conda install -c conda-forge spleeter allow for GPU processing as well?

Remember having issues with that before.

aidv on 13 Apr 2020

The older docker images do not work with tensorflow 1.15 due to cuda incompatibility. We will push newer images with updated dependencies.

Confirmed the docker image tagged gpu fails with this exception but the image with the tag 3.7-gpu succeeds. Not a big deal but considering the size (> 6GB) of these images it's a bit of a wait to get new ones ;)

lazzarello on 21 May 2020

I had the same issue with a RTX 2070, resolved it with:

CUDA == 10.0.130 (/usr/local/cuda-10.0/lib64/)
libcudnn.so == 7.6.0.64 (/usr/local/cuda-10.0/lib64/)

Because I am using my GPU for display, I had some conflicts with tf that is trying to allocate the whole GPU memory.
Hence forcing GPU memory growth via an environment variable (run it before spleeter):
export TF_FORCE_GPU_ALLOW_GROWTH=true
You should add it system-wide :
echo "export TF_FORCE_GPU_ALLOW_GROWTH=true" | tee -a /etc/profile.d/myenvvar.sh

And also ensure that it is indeed this version of cudnn that is being loaded by tensorflow (if you have multiple versions of cuda on your PC)!

ltetrel on 24 Jun 2020

@ltetrel Thanks.

How do I ensure that the correct version of cudnn is loaded by tensorflow?

I've mad another issue #430 where I've explained pretty much everything I could try but still fail most of the cases.

aidv on 24 Jun 2020

@aidv
You can check the major version with:
tf.test.is_gpu_available()
It will show you also what version of cuda libraries are being loaded.

ltetrel on 26 Jun 2020

For anyone having issues on Windows10:

Seemed like cuDNN was failing to init due to not being able to allocate the necessary memory on my GPU (rtx 2070 Super) upfront. Fix was to set a flag to allow incremental memory growth. Following script worked for me:

import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth  = True
tf.Session(config=config) 

from spleeter.separator import Separator
from spleeter.audio.adapter import get_default_audio_adapter

separator = Separator('spleeter:2stems')
separator.separate_to_file('the-legacy.mp3', './')

feliperyan on 12 Jul 2020

👍1

Can confirm that adding this to separator.py fixed it for me -- @mmoussallam, would you accept this in a PR?

axfelix on 14 Aug 2020

Hi all,

Glad this fixes the issue, but it seems to be gpu and windows specific so I'd rather not set the parameter globally at risk of breaking the setup for others. Also we've moved on to TF2 and the memory management has probably changed.