Keras: NotFoundError and No algorithm worked

Created on 4 Jul 2017  路  8Comments  路  Source: keras-team/keras

Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2017-07-04 09:30:28.678755: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-04 09:30:28.678790: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-04 09:30:28.678801: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-04 09:30:28.678809: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-04 09:30:28.678817: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-04 09:30:28.950767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:02:00.0
Total memory: 3.94GiB
Free memory: 3.79GiB
2017-07-04 09:30:29.160608: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x29a6ec0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-07-04 09:30:29.161536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties: 
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.87GiB
2017-07-04 09:30:29.162085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 
2017-07-04 09:30:29.162108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y Y 
2017-07-04 09:30:29.162116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   Y Y 
2017-07-04 09:30:29.162130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:02:00.0)
2017-07-04 09:30:29.162143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 980, pci bus id: 0000:03:00.0)
2017-07-04 09:30:32.258800: E tensorflow/stream_executor/cuda/cuda_driver.cc:1073] failed to get elapsed time between events: CUDA_ERROR_NOT_READY
2017-07-04 09:30:33.259500: E tensorflow/stream_executor/cuda/cuda_driver.cc:1073] failed to get elapsed time between events: CUDA_ERROR_NOT_READY
2017-07-04 09:30:34.259632: E tensorflow/stream_executor/cuda/cuda_driver.cc:1073] failed to get elapsed time between events: CUDA_ERROR_NOT_READY
2017-07-04 09:30:35.259788: E tensorflow/stream_executor/cuda/cuda_driver.cc:1073] failed to get elapsed time between events: CUDA_ERROR_NOT_READY
2017-07-04 09:30:36.259949: E tensorflow/stream_executor/cuda/cuda_driver.cc:1073] failed to get elapsed time between events: CUDA_ERROR_NOT_READY
2017-07-04 09:30:37.260125: E tensorflow/stream_executor/cuda/cuda_driver.cc:1073] failed to get elapsed time between events: CUDA_ERROR_NOT_READY
2017-07-04 09:30:38.260269: E tensorflow/stream_executor/cuda/cuda_driver.cc:1073] failed to get elapsed time between events: CUDA_ERROR_NOT_READY
Traceback (most recent call last):
  File "mnist_cnn.py", line 67, in <module>
    validation_data=(x_test, y_test))
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 870, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1507, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1156, in _fit_loop
    outs = f(ins_batch)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 2269, in __call__
    **self.session_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: No algorithm worked!
     [[Node: conv2d_2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](conv2d_1/Relu, conv2d_2/kernel/read)]]
     [[Node: mul_2/_39 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1029_mul_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'conv2d_2/convolution', defined at:
  File "mnist_cnn.py", line 51, in <module>
    model.add(Conv2D(64, (3, 3), activation='relu'))
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 476, in add
    output_tensor = layer(self.outputs[0])
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 596, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/convolutional.py", line 164, in call
    dilation_rate=self.dilation_rate)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 3138, in conv2d
    data_format='NHWC')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 670, in convolution
    op=op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 338, in with_space_to_batch
    return op(input, num_spatial_dims, padding)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 662, in op
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 131, in _non_atrous_convolution
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 399, in conv2d
    data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): No algorithm worked!
     [[Node: conv2d_2/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](conv2d_1/Relu, conv2d_2/kernel/read)]]
     [[Node: mul_2/_39 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1029_mul_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
stale

Most helpful comment

My problem was that I called the model with an input_shape of (?,28,28,1) and later called it with (?,28,28,3).

All 8 comments

TensorFlow seems to be outdated.

I have tried the latest version of TensorFlow and didn't solve the problem @taehoonlee

It seems to be CUDA problems.

2017-07-04 09:30:32.258800: E tensorflow/stream_executor/cuda/cuda_driver.cc:1073] failed to get elapsed time between events: CUDA_ERROR_NOT_READY

The problem will be caused when I use the CNN,such as i run the minst_cnn.py in the the kears examples,but when i run the minst_mlp.py didn't cause this problem.I have change different version of the CUDA銆丆UDNN and TensorFlow but didn't solve it. @taehoonlee

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

I have the same error, but I don't have CUDA_ERROR_NOT_READY:

2019-09-23 13:02:03.455922: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-23 13:02:03.461452: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2019-09-23 13:02:03.461738: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x340b0d0 executing computations on platform Host. Devices:
2019-09-23 13:02:03.461770: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-23 13:02:03.462505: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-09-23 13:02:03.487318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.488106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:01:00.0
2019-09-23 13:02:03.488267: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-23 13:02:03.489775: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-23 13:02:03.491062: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-23 13:02:03.491317: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-23 13:02:03.492745: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-23 13:02:03.493522: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-23 13:02:03.497020: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-23 13:02:03.497158: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.498436: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.499306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-23 13:02:03.499329: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-23 13:02:03.584851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-23 13:02:03.584876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-23 13:02:03.584882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-23 13:02:03.584979: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.585597: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.586202: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.586808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9941 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-09-23 13:02:03.587613: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.588264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:01:00.0
2019-09-23 13:02:03.588282: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-23 13:02:03.588289: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-23 13:02:03.588295: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-23 13:02:03.588303: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-23 13:02:03.588312: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-23 13:02:03.588334: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-23 13:02:03.588341: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-23 13:02:03.588405: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.589061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.589760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-23 13:02:03.589798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-23 13:02:03.589806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-23 13:02:03.589812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-23 13:02:03.589895: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.590629: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-23 13:02:03.591684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9941 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
generation == 0 ; objects count == 0 ; starting...
normalized: &{0x7f4190172ac0 [1 3 1024 1024]}
2019-09-23 13:02:06.093434: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-23 13:02:06.258504: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-23 13:02:07.104110: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
2019-09-23 13:02:07.104345: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
2019-09-23 13:02:07.104463: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
2019-09-23 13:02:07.104577: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
2019-09-23 13:02:07.104682: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
2019-09-23 13:02:07.104782: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
2019-09-23 13:02:07.104878: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
2019-09-23 13:02:07.104970: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
2019-09-23 13:02:07.105043: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:977 : Not found: No algorithm worked!
panic: (*tensorflow.statusError) "2 root error(s) found.
  (0) Not found: No algorithm worked!
         [[{{node D/FromRGB_lod0/Conv2D}}]]
         [[D/scores_out/_3]]
  (1) Not found: No algorithm worked!
         [[{{node D/FromRGB_lod0/Conv2D}}]]
0 successful operations.
0 derived errors ignored."

hello everybady, this can work well .

x_in = np.array([[ [[2], [1], [2], [0], [1]], [[1], [3], [2], [2], [3]], [[1], [1], [3], [3], [0]], [[2], [2], [0], [1], [1]], [[0], [0], [3], [1], [2]], ]]) kernel_in = np.array([ [ [[2, 0.1]], [[3, 0.2]] ], [ [[0, 0.3]],[[1, 0.4]] ], ]) x = tf.constant(x_in, dtype=tf.float32) kernel = tf.constant(kernel_in, dtype=tf.float32) tf.nn.conv2d(x, kernel, strides=[1, 1, 1, 1], padding='VALID') x_in.shape,kernel_in.shape

however ,wrong

x_in = np.array([[ [[0,1,0,1], [0,1,0,1], [0,1,0,1], [0,1,0,1]], [[0,1,0,1], [0,1,0,1], [0,1,0,1], [0,1,0,1]], [[0,1,0,1], [0,1,0,1], [0,1,0,1], [0,1,0,1]], ]]) x_in = tf.constant(x_in, dtype=tf.float32) kernel_in = np.array([ [ [[2, 0.1]], [[3, 0.2]] ], [ [[0, 0.3]],[[1, 0.4]] ], ]) kernel_in = tf.constant(kernel_in,dtype=tf.float32) x_in,kernel_in tf.nn.conv2d(input=x_in,filters = kernel_in,strides=[1,1,1,1],padding='SAME')

`---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
in
19 kernel_in = tf.constant(kernel_in,dtype=tf.float32)
20 x_in,kernel_in
---> 21 tf.nn.conv2d(input=x_in,filters = kernel_in,strides=[1,1,1,1],padding='SAME')

~/anaconda3/envs/fish_detection_yoloV4/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py in conv2d_v2(input, filters, strides, padding, data_format, dilations, name)
1915 data_format=data_format,
1916 dilations=dilations,
-> 1917 name=name)
1918
1919

~/anaconda3/envs/fish_detection_yoloV4/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, data_format, dilations, name, filters)
2012 data_format=data_format,
2013 dilations=dilations,
-> 2014 name=name)
2015
2016

~/anaconda3/envs/fish_detection_yoloV4/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name)
935 pass # Add nodes to the TensorFlow graph.
936 except _core._NotOkStatusException as e:
--> 937 _ops.raise_from_not_ok_status(e, name)
938 # Add nodes to the TensorFlow graph.
939 if not isinstance(strides, (list, tuple)):

~/anaconda3/envs/fish_detection_yoloV4/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
6651 message = e.message + (" name: " + name if name is not None else "")
6652 # pylint: disable=protected-access
-> 6653 six.raise_from(core._status_to_exception(e.code, message), None)
6654 # pylint: enable=protected-access
6655

~/anaconda3/envs/fish_detection_yoloV4/lib/python3.6/site-packages/six.py in raise_from(value, from_value)

NotFoundError: No algorithm worked! [Op:Conv2D]`

we can find difference in code .

yes, It's good, I make a wrong mattr...

My problem was that I called the model with an input_shape of (?,28,28,1) and later called it with (?,28,28,3).

Was this page helpful?
0 / 5 - 0 ratings