Deeplabcut: cuDNN launch failure : input shape([1,3,395,536]) filter shape([7,7,3,64])

Created on 12 Apr 2018 · 15Comments · Source: DeepLabCut/DeepLabCut

Hi there,
When trying to retrain the network using the example labels - just to test if the installation is ok - I get a mismatch error like that:

(tensorflow) mic@mic-OptiPlex-9010:~/DeepLabCut/pose-tensorflow/models/reachingJan30-trainset95shuffle1/train$ TF_CUDNN_USE_AUTOTUNE=0 CUDA_VISIBLE_DEVICES=0 python3 ../../../train.py
/home/mic/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Config:
{'all_joints': [[0], [1], [2], [3]],
 'all_joints_names': ['hand', 'Finger1', 'Finger2', 'Joystick'],
 'batch_size': 1,
 'crop': False,
 'crop_pad': 0,
 'dataset': '../../UnaugmentedDataSet_reachingJan30/reaching_Mackenzie95shuffle1.mat',
 'dataset_type': 'default',
 'display_iters': 5000,
 'fg_fraction': 0.25,
 'global_scale': 0.8,
 'init_weights': '../../pretrained/resnet_v1_50.ckpt',
 'intermediate_supervision': False,
 'intermediate_supervision_layer': 12,
 'location_refinement': True,
 'locref_huber_loss': True,
 'locref_loss_weight': 0.05,
 'locref_stdev': 7.2801,
 'log_dir': 'log',
 'max_input_size': 1000,
 'mean_pixel': [123.68, 116.779, 103.939],
 'mirror': False,
 'multi_step': [[0.005, 10000],
                [0.02, 430000],
                [0.002, 730000],
                [0.001, 1030000]],
 'net_type': 'resnet_50',
 'num_joints': 4,
 'optimizer': 'sgd',
 'pos_dist_thresh': 17,
 'regularize': False,
 'save_iters': 50000,
 'scale_jitter_lo': 0.5,
 'scale_jitter_up': 1.5,
 'scoremap_dir': 'test',
 'shuffle': True,
 'snapshot_prefix': './snapshot',
 'stride': 8.0,
 'use_gt_segm': False,
 'video': False,
 'video_batch': False,
 'weigh_negatives': False,
 'weigh_only_present_joints': False,
 'weigh_part_predictions': False,
 'weight_decay': 0.0001}
2018-04-12 16:28:32.944642: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-12 16:28:32.944900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: Quadro K620 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.33GiB
2018-04-12 16:28:32.944919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-12 16:28:33.373499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-12 16:28:33.373536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-12 16:28:33.373543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-12 16:28:33.373694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1088 MB memory) -> physical GPU (device: 0, name: Quadro K620, pci bus id: 0000:01:00.0, compute capability: 5.0)
INFO:tensorflow:Restoring parameters from ../../pretrained/resnet_v1_50.ckpt
Restoring parameters from ../../pretrained/resnet_v1_50.ckpt
2018-04-12 16:28:38.363988: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-04-12 16:28:38.364664: W ./tensorflow/stream_executor/stream.h:2018] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
    status, run_metadata)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([1,3,395,536]) filter shape([7,7,3,64])
     [[Node: resnet_v1_50/conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/conv1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, resnet_v1_50/conv1/weights/read)]]
     [[Node: add/_763 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1602_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../../../train.py", line 140, in <module>
    train()
  File "../../../train.py", line 119, in train
    feed_dict={learning_rate: current_lr})
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1140, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    run_metadata)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([1,3,395,536]) filter shape([7,7,3,64])
     [[Node: resnet_v1_50/conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/conv1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, resnet_v1_50/conv1/weights/read)]]
     [[Node: add/_763 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1602_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'resnet_v1_50/conv1/Conv2D', defined at:
  File "../../../train.py", line 140, in <module>
    train()
  File "../../../train.py", line 85, in train
    losses = pose_net(cfg).train(batch)
  File "/home/mic/DeepLabCut/pose-tensorflow/nnet/pose_net.py", line 96, in train
    heads = self.get_net(batch[Batch.inputs])
  File "/home/mic/DeepLabCut/pose-tensorflow/nnet/pose_net.py", line 85, in get_net
    net, end_points = self.extract_features(inputs)
  File "/home/mic/DeepLabCut/pose-tensorflow/nnet/pose_net.py", line 58, in extract_features
    global_pool=False, output_stride=16,is_training=False)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_v1.py", line 274, in resnet_v1_50
    scope=scope)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_v1.py", line 205, in resnet_v1
    net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_utils.py", line 146, in conv2d_same
    scope=scope)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1049, in convolution
    outputs = layer.apply(inputs)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 825, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 714, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 168, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 870, in __call__
    return self.conv_op(inp, filter)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 522, in __call__
    return self.call(inp, filter)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 206, in __call__
    name=self.name)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 953, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): cuDNN launch failure : input shape([1,3,395,536]) filter shape([7,7,3,64])
     [[Node: resnet_v1_50/conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/conv1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, resnet_v1_50/conv1/weights/read)]]
     [[Node: add/_763 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1602_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Any hints greatly appreciated!

Source

mschart

Most helpful comment

I solved the issue by adding
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
in the script

AdriandLiu on 9 Sep 2019

👍12 ❤1

All 15 comments

can you tell us about the operating system and CUDA /TF installation you have?
---seems your CuDNN library needs upgraded to match; what version are you using (and which TF?)
--Quadro K620 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:01:00.0 totalMemory: 1.95GiB
(also, we have never used the GPU you are using, you might check if it has enough memory...)

(quick hints, but can look further into it)

MMathisLab on 12 Apr 2018

Hi thanks, ok I guess the memory issue with that GPU will pop up later, if at all.

In [2]: tensorflow.__version__
Out[2]: '1.7.0'

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

It seems that my tensorflow with Cuda is ok, according to that test:

In [4]: # Creates a graph.
   ...: a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
   ...: b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
   ...: c = tf.matmul(a, b)
   ...: # Creates a session with log_device_placement set to True.
   ...: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
   ...: # Runs the op.
   ...: print(sess.run(c))

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro K620, pci bus id: 0000:01:00.0, compute capability: 5.0

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-12 17:00:26.724641: I tensorflow/core/common_runtime/placer.cc:884] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-12 17:00:26.724667: I tensorflow/core/common_runtime/placer.cc:884] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-12 17:00:26.724687: I tensorflow/core/common_runtime/placer.cc:884] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]

mschart on 12 Apr 2018

thanks for quick response! Okay let us check a bit more then; just to be sure, you ran this first (not step1), correct?; Step2_ConvertingLabels2DataFrame.py

(also I will double check it works with TF 1.7; I believe we tested 1.0- 1.4)...

updated:
we tested it up to TF 1.5; I changed the README to be more clear at the top (was a note only in the bottom). I'll leave issue open and test 1.7, etc. thanks!

MMathisLab on 12 Apr 2018

Wait, sorry. I ran Step 3 only:

In [7]: run Step3_CheckLabels.py
4
<map object at 0x7fc01ad0d470>
['hand', 'Finger1', 'Finger2', 'Joystick']
['reachingvideo1']
Creating images with labels by  Mackenzie

mschart on 12 Apr 2018

Few notes -
(1) we tested it up to TF 1.5; I changed the README to be more clear at the top (was a note only in the bottom)

(2) If a run properly starts, it will look like this:
mackenzie@c997c82acb00:~/DeepLabCut-master/pose-tensorflow/models/reachingJan30-trainset95shuffle1/train$ TF_CUDNN_USE_AUTOTUNE=0 C
UDA_VISIBLE_DEVICES=0 python3 ../../../train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
.....

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:0c:00.0
Total memory: 7.92GiB
Free memory: 7.80GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080,
pci bus id: 0000:0c:00.0)
iteration: 0 loss: 0.0002 lr: 0.005

(3) One of the authors (https://github.com/cellistigs) said he had an issue when his cuDNN version didn’t match his CUDA version.
There are some nice minimal examples on the tf tutorials page that can pick something like this up:

https://www.tensorflow.org/tutorials/deep_cnn

Specifically, running something like “cifar10train.py” could expose a compatibility issue.

(4) TensorFlow 1.7 should use CUDA 9.1 --> http://www.python36.com/install-tensorflow141-gpu/

(today I confirmed the code works with TensorFlow 1.5 works with CUDA 9.0):

 $ cat /usr/local/cuda/version.txt

CUDA Version 9.0.176

hope that helps! I will close the issue now.

MMathisLab on 12 Apr 2018

Mhm, weird, with

Ubuntu 18.04
Python 3.6.4
tensorflow 1.5.0
CUDA Version 9.0.176

when typing

TF_CUDNN_USE_AUTOTUNE=0 CUDA_VISIBLE_DEVICES=0 python3 ../../../train.py

I still get

InternalError (see above for traceback): cuDNN launch failure : input shape([1,3,310,795]) filter shape([7,7,3,64])

Although GPU with tf seems fine:

Python 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 18:10:19) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf
/home/mic/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

In [2]: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2018-05-16 11:36:30.499506: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-05-16 11:36:30.642311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-16 11:36:30.642681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.20GiB
2018-05-16 11:36:30.642703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
2018-05-16 11:36:30.911397: I tensorflow/core/common_runtime/direct_session.cc:297] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1

Downgrading tensorflow to 1.4 is easy, e.g. in anaconda:

pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.4.0-cp36-cp36m-linux_x86_64.whl

while downgrading to CUDA 8 on Ubuntu 18.04 is more involved. I first followed these instructions:

https://unix.stackexchange.com/questions/429549/cuda-on-debian-9-where-is-the-toolkit

Then update paths (from cuda 9 to 8):

$ export PATH="$PATH:/usr/local/cuda-8.0/bin"
$ export LD_LIBRARY_PATH="/usr/local/cuda-8.0/lib64"

Then downgrade cuDNN to version 6:

sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb

After that, training starts apparently:

~/DeepLabCut-master/pose-tensorflow/models/front15.05-trainset95shuffle1/train$ TF_CUDNN_USE_AUTOTUNE=0 CUDA_VISIBLE_DEVICES=0 python3 ../../../train.py
/home/mic/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
/home/mic/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Config:
{'all_joints': [[0], [1]],
 'all_joints_names': ['Finger1', 'Finger2'],
 'batch_size': 1,
 'crop': False,
 'crop_pad': 0,
 'dataset': '../../UnaugmentedDataSet_front15.05/front_Michael95shuffle1.mat',
 'dataset_type': 'default',
 'display_iters': 1000,
 'fg_fraction': 0.25,
 'global_scale': 0.8,
 'init_weights': '../../pretrained/resnet_v1_50.ckpt',
 'intermediate_supervision': False,
 'intermediate_supervision_layer': 12,
 'location_refinement': True,
 'locref_huber_loss': True,
 'locref_loss_weight': 0.05,
 'locref_stdev': 7.2801,
 'log_dir': 'log',
 'max_input_size': 1000,
 'mean_pixel': [123.68, 116.779, 103.939],
 'mirror': False,
 'multi_step': [[0.005, 10000],
                [0.02, 430000],
                [0.002, 730000],
                [0.001, 1030000]],
 'net_type': 'resnet_50',
 'num_joints': 2,
 'optimizer': 'sgd',
 'pos_dist_thresh': 17,
 'regularize': False,
 'save_iters': 50000,
 'scale_jitter_lo': 0.5,
 'scale_jitter_up': 1.5,
 'scoremap_dir': 'test',
 'shuffle': True,
 'snapshot_prefix': './snapshot',
 'stride': 8.0,
 'use_gt_segm': False,
 'video': False,
 'video_batch': False,
 'weigh_negatives': False,
 'weigh_only_present_joints': False,
 'weigh_part_predictions': False,
 'weight_decay': 0.0001}
2018-05-16 14:49:58.781010: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-05-16 14:49:58.920619: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-16 14:49:58.920996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.21GiB
2018-05-16 14:49:58.921019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
WARNING:tensorflow:From /home/mic/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/training/python/training/training.py:412: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
From /home/mic/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/training/python/training/training.py:412: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Restoring parameters from ../../pretrained/resnet_v1_50.ckpt
Restoring parameters from ../../pretrained/resnet_v1_50.ckpt
iteration: 0 loss: 0.0008 lr: 0.005

mschart on 16 May 2018

Yep, looks promising! We plan to share a docker image soon that can just be run and reproduces our environment and should make the whole installation process less painful.

AlexEMG on 16 May 2018

Great installation advice for Tensorflow with GPU on Ubuntu
https://medium.com/@ikekramer/installing-cuda-8-0-and-cudnn-5-1-on-ubuntu-16-04-6b9f284f6e77

mschart on 20 Sep 2018

👍1

I find myself coming back to these installation notes whenever I set DLC up for a fresh Ubuntu install (the docker solution didn't work for various reasons on various machines, so I gave up on it). Here the key steps to get tensorflow to work with DLC on Ubuntu 18:

after having anaconda environment (DLCdependencies),

Install tf 1.8:
pip3 install --upgrade tensorflow-gpu==1.8

Install cuda 9.0:
bash cuda_9.0.176_384.81_linux.run
from https://developer.nvidia.com/cuda-toolkit-archive
(with driver 390 - i.e. don't install newest GPU driver; if problems arise, see full documentation here: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html)

Install cuDNN 7 (file at https://developer.nvidia.com/rdp/cudnn-download, instructions at https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html):
Navigate to your directory containing cuDNN.
Unzip the cuDNN package.
$ tar -xzvf cudnn-9.0-osx-x64-v7.tgz
Copy the following files into the CUDA Toolkit directory, and change the file permissions.

$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*

Add to .bashrc:
export PATH="$PATH:/usr/local/cuda/bin"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64"

mschart on 11 Oct 2018

👍4

Thanks!! I’ll add it to the installation guide in the next update

MMathisLab on 12 Oct 2018

Just try decrease the batch size, it will work (If all the GPU memory is used(nvidia-smi)). Else do this os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'. I had same issue which got solved with this.

jaiprasadreddy on 27 Oct 2018

👍4

I solved the issue by adding
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
in the script

AdriandLiu on 9 Sep 2019

👍12 ❤1

I solved the issue by adding
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
in the script

thank you for your answer

yang9599 on 28 Oct 2019

I met the same question.The format I set was _channels_last_.but the error report Input_shape(256,1,120,120) like your([1,3,395,536]),which is in the format _channels_first_.I don't know if you noticed?and I don't know why it happend.
Do you fix this problem eventually?