Hi there,
When trying to retrain the network using the example labels - just to test if the installation is ok - I get a mismatch error like that:
(tensorflow) mic@mic-OptiPlex-9010:~/DeepLabCut/pose-tensorflow/models/reachingJan30-trainset95shuffle1/train$ TF_CUDNN_USE_AUTOTUNE=0 CUDA_VISIBLE_DEVICES=0 python3 ../../../train.py
/home/mic/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Config:
{'all_joints': [[0], [1], [2], [3]],
'all_joints_names': ['hand', 'Finger1', 'Finger2', 'Joystick'],
'batch_size': 1,
'crop': False,
'crop_pad': 0,
'dataset': '../../UnaugmentedDataSet_reachingJan30/reaching_Mackenzie95shuffle1.mat',
'dataset_type': 'default',
'display_iters': 5000,
'fg_fraction': 0.25,
'global_scale': 0.8,
'init_weights': '../../pretrained/resnet_v1_50.ckpt',
'intermediate_supervision': False,
'intermediate_supervision_layer': 12,
'location_refinement': True,
'locref_huber_loss': True,
'locref_loss_weight': 0.05,
'locref_stdev': 7.2801,
'log_dir': 'log',
'max_input_size': 1000,
'mean_pixel': [123.68, 116.779, 103.939],
'mirror': False,
'multi_step': [[0.005, 10000],
[0.02, 430000],
[0.002, 730000],
[0.001, 1030000]],
'net_type': 'resnet_50',
'num_joints': 4,
'optimizer': 'sgd',
'pos_dist_thresh': 17,
'regularize': False,
'save_iters': 50000,
'scale_jitter_lo': 0.5,
'scale_jitter_up': 1.5,
'scoremap_dir': 'test',
'shuffle': True,
'snapshot_prefix': './snapshot',
'stride': 8.0,
'use_gt_segm': False,
'video': False,
'video_batch': False,
'weigh_negatives': False,
'weigh_only_present_joints': False,
'weigh_part_predictions': False,
'weight_decay': 0.0001}
2018-04-12 16:28:32.944642: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-12 16:28:32.944900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: Quadro K620 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.33GiB
2018-04-12 16:28:32.944919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-12 16:28:33.373499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-12 16:28:33.373536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-04-12 16:28:33.373543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-04-12 16:28:33.373694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1088 MB memory) -> physical GPU (device: 0, name: Quadro K620, pci bus id: 0000:01:00.0, compute capability: 5.0)
INFO:tensorflow:Restoring parameters from ../../pretrained/resnet_v1_50.ckpt
Restoring parameters from ../../pretrained/resnet_v1_50.ckpt
2018-04-12 16:28:38.363988: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-04-12 16:28:38.364664: W ./tensorflow/stream_executor/stream.h:2018] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
status, run_metadata)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([1,3,395,536]) filter shape([7,7,3,64])
[[Node: resnet_v1_50/conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/conv1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, resnet_v1_50/conv1/weights/read)]]
[[Node: add/_763 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1602_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "../../../train.py", line 140, in <module>
train()
File "../../../train.py", line 119, in train
feed_dict={learning_rate: current_lr})
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1140, in _run
feed_dict_tensor, options, run_metadata)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
run_metadata)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([1,3,395,536]) filter shape([7,7,3,64])
[[Node: resnet_v1_50/conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/conv1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, resnet_v1_50/conv1/weights/read)]]
[[Node: add/_763 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1602_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'resnet_v1_50/conv1/Conv2D', defined at:
File "../../../train.py", line 140, in <module>
train()
File "../../../train.py", line 85, in train
losses = pose_net(cfg).train(batch)
File "/home/mic/DeepLabCut/pose-tensorflow/nnet/pose_net.py", line 96, in train
heads = self.get_net(batch[Batch.inputs])
File "/home/mic/DeepLabCut/pose-tensorflow/nnet/pose_net.py", line 85, in get_net
net, end_points = self.extract_features(inputs)
File "/home/mic/DeepLabCut/pose-tensorflow/nnet/pose_net.py", line 58, in extract_features
global_pool=False, output_stride=16,is_training=False)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_v1.py", line 274, in resnet_v1_50
scope=scope)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_v1.py", line 205, in resnet_v1
net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_utils.py", line 146, in conv2d_same
scope=scope)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1049, in convolution
outputs = layer.apply(inputs)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 825, in apply
return self.__call__(inputs, *args, **kwargs)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 714, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 168, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 870, in __call__
return self.conv_op(inp, filter)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 522, in __call__
return self.call(inp, filter)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 206, in __call__
name=self.name)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 953, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/home/mic/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): cuDNN launch failure : input shape([1,3,395,536]) filter shape([7,7,3,64])
[[Node: resnet_v1_50/conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_50/conv1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, resnet_v1_50/conv1/weights/read)]]
[[Node: add/_763 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1602_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Any hints greatly appreciated!
can you tell us about the operating system and CUDA /TF installation you have?
---seems your CuDNN library needs upgraded to match; what version are you using (and which TF?)
--Quadro K620 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:01:00.0 totalMemory: 1.95GiB
(also, we have never used the GPU you are using, you might check if it has enough memory...)
(quick hints, but can look further into it)
Hi thanks, ok I guess the memory issue with that GPU will pop up later, if at all.
In [2]: tensorflow.__version__
Out[2]: '1.7.0'
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
It seems that my tensorflow with Cuda is ok, according to that test:
In [4]: # Creates a graph.
...: a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
...: b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
...: c = tf.matmul(a, b)
...: # Creates a session with log_device_placement set to True.
...: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
...: # Runs the op.
...: print(sess.run(c))
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro K620, pci bus id: 0000:01:00.0, compute capability: 5.0
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-12 17:00:26.724641: I tensorflow/core/common_runtime/placer.cc:884] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-12 17:00:26.724667: I tensorflow/core/common_runtime/placer.cc:884] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-12 17:00:26.724687: I tensorflow/core/common_runtime/placer.cc:884] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
[49. 64.]]
thanks for quick response! Okay let us check a bit more then; just to be sure, you ran this first (not step1), correct?; Step2_ConvertingLabels2DataFrame.py
(also I will double check it works with TF 1.7; I believe we tested 1.0- 1.4)...
updated:
we tested it up to TF 1.5; I changed the README to be more clear at the top (was a note only in the bottom). I'll leave issue open and test 1.7, etc. thanks!
Wait, sorry. I ran Step 3 only:
In [7]: run Step3_CheckLabels.py
4
<map object at 0x7fc01ad0d470>
['hand', 'Finger1', 'Finger2', 'Joystick']
['reachingvideo1']
Creating images with labels by Mackenzie
Few notes -
(1) we tested it up to TF 1.5; I changed the README to be more clear at the top (was a note only in the bottom)
(2) If a run properly starts, it will look like this:
mackenzie@c997c82acb00:~/DeepLabCut-master/pose-tensorflow/models/reachingJan30-trainset95shuffle1/train$ TF_CUDNN_USE_AUTOTUNE=0 C
UDA_VISIBLE_DEVICES=0 python3 ../../../train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
.....
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:0c:00.0
Total memory: 7.92GiB
Free memory: 7.80GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080,
pci bus id: 0000:0c:00.0)
iteration: 0 loss: 0.0002 lr: 0.005
(3) One of the authors (https://github.com/cellistigs) said he had an issue when his cuDNN version didn’t match his CUDA version.
There are some nice minimal examples on the tf tutorials page that can pick something like this up:
https://www.tensorflow.org/tutorials/deep_cnn
Specifically, running something like “cifar10train.py” could expose a compatibility issue.
(4) TensorFlow 1.7 should use CUDA 9.1 --> http://www.python36.com/install-tensorflow141-gpu/
(today I confirmed the code works with TensorFlow 1.5 works with CUDA 9.0):
$ cat /usr/local/cuda/version.txt
CUDA Version 9.0.176
hope that helps! I will close the issue now.
Mhm, weird, with
Ubuntu 18.04
Python 3.6.4
tensorflow 1.5.0
CUDA Version 9.0.176
when typing
TF_CUDNN_USE_AUTOTUNE=0 CUDA_VISIBLE_DEVICES=0 python3 ../../../train.py
I still get
InternalError (see above for traceback): cuDNN launch failure : input shape([1,3,310,795]) filter shape([7,7,3,64])
Although GPU with tf seems fine:
Python 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 18:10:19)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import tensorflow as tf
/home/mic/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
In [2]: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2018-05-16 11:36:30.499506: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-05-16 11:36:30.642311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-16 11:36:30.642681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.20GiB
2018-05-16 11:36:30.642703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
2018-05-16 11:36:30.911397: I tensorflow/core/common_runtime/direct_session.cc:297] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
Downgrading tensorflow to 1.4 is easy, e.g. in anaconda:
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.4.0-cp36-cp36m-linux_x86_64.whl
while downgrading to CUDA 8 on Ubuntu 18.04 is more involved. I first followed these instructions:
https://unix.stackexchange.com/questions/429549/cuda-on-debian-9-where-is-the-toolkit
Then update paths (from cuda 9 to 8):
$ export PATH="$PATH:/usr/local/cuda-8.0/bin"
$ export LD_LIBRARY_PATH="/usr/local/cuda-8.0/lib64"
Then downgrade cuDNN to version 6:
sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb
After that, training starts apparently:
~/DeepLabCut-master/pose-tensorflow/models/front15.05-trainset95shuffle1/train$ TF_CUDNN_USE_AUTOTUNE=0 CUDA_VISIBLE_DEVICES=0 python3 ../../../train.py
/home/mic/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
/home/mic/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Config:
{'all_joints': [[0], [1]],
'all_joints_names': ['Finger1', 'Finger2'],
'batch_size': 1,
'crop': False,
'crop_pad': 0,
'dataset': '../../UnaugmentedDataSet_front15.05/front_Michael95shuffle1.mat',
'dataset_type': 'default',
'display_iters': 1000,
'fg_fraction': 0.25,
'global_scale': 0.8,
'init_weights': '../../pretrained/resnet_v1_50.ckpt',
'intermediate_supervision': False,
'intermediate_supervision_layer': 12,
'location_refinement': True,
'locref_huber_loss': True,
'locref_loss_weight': 0.05,
'locref_stdev': 7.2801,
'log_dir': 'log',
'max_input_size': 1000,
'mean_pixel': [123.68, 116.779, 103.939],
'mirror': False,
'multi_step': [[0.005, 10000],
[0.02, 430000],
[0.002, 730000],
[0.001, 1030000]],
'net_type': 'resnet_50',
'num_joints': 2,
'optimizer': 'sgd',
'pos_dist_thresh': 17,
'regularize': False,
'save_iters': 50000,
'scale_jitter_lo': 0.5,
'scale_jitter_up': 1.5,
'scoremap_dir': 'test',
'shuffle': True,
'snapshot_prefix': './snapshot',
'stride': 8.0,
'use_gt_segm': False,
'video': False,
'video_batch': False,
'weigh_negatives': False,
'weigh_only_present_joints': False,
'weigh_part_predictions': False,
'weight_decay': 0.0001}
2018-05-16 14:49:58.781010: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-05-16 14:49:58.920619: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-16 14:49:58.920996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.21GiB
2018-05-16 14:49:58.921019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
WARNING:tensorflow:From /home/mic/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/training/python/training/training.py:412: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
From /home/mic/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/training/python/training/training.py:412: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Restoring parameters from ../../pretrained/resnet_v1_50.ckpt
Restoring parameters from ../../pretrained/resnet_v1_50.ckpt
iteration: 0 loss: 0.0008 lr: 0.005
Yep, looks promising! We plan to share a docker image soon that can just be run and reproduces our environment and should make the whole installation process less painful.
Great installation advice for Tensorflow with GPU on Ubuntu
https://medium.com/@ikekramer/installing-cuda-8-0-and-cudnn-5-1-on-ubuntu-16-04-6b9f284f6e77
I find myself coming back to these installation notes whenever I set DLC up for a fresh Ubuntu install (the docker solution didn't work for various reasons on various machines, so I gave up on it). Here the key steps to get tensorflow to work with DLC on Ubuntu 18:
after having anaconda environment (DLCdependencies),
Install tf 1.8:
pip3 install --upgrade tensorflow-gpu==1.8
Install cuda 9.0:
bash cuda_9.0.176_384.81_linux.run
from https://developer.nvidia.com/cuda-toolkit-archive
(with driver 390 - i.e. don't install newest GPU driver; if problems arise, see full documentation here: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html)
Install cuDNN 7 (file at https://developer.nvidia.com/rdp/cudnn-download, instructions at https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html):
Navigate to your
Unzip the cuDNN package.
$ tar -xzvf cudnn-9.0-osx-x64-v7.tgz
Copy the following files into the CUDA Toolkit directory, and change the file permissions.
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
Add to .bashrc:
export PATH="$PATH:/usr/local/cuda/bin"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64"
Thanks!! I’ll add it to the installation guide in the next update
Just try decrease the batch size, it will work (If all the GPU memory is used(nvidia-smi)). Else do this os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'. I had same issue which got solved with this.
I solved the issue by adding
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
in the script
I solved the issue by adding
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
in the script
thank you for your answer
I met the same question.The format I set was _channels_last_.but the error report Input_shape(256,1,120,120) like your([1,3,395,536]),which is in the format _channels_first_.I don't know if you noticed?and I don't know why it happend.
Do you fix this problem eventually?
Reducing the Batch_size worked for me, thanks @jaiprasadreddy !
Most helpful comment
I solved the issue by adding
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
in the script