Systeme Information
OS Platform and Distribution: Linux Ubuntu 14.04 LTS
TensorFlow installed from: pip tensorflow-gpu
TensorFlow version: ('v1.2.0-rc2-21-g12f033d', '1.2.0')
CUDA/cuDNN version: Cuda 8.0, Cudnn 5.1
I'm trying to run Tf slim image classifier with following URL
https://github.com/tensorflow/models/tree/master/slim
python train_image_classifier.py
--train_dir= /home/sk/workspace/slim/datasets/log
--dataset_name=flowers
--dataset_split_name=train
--dataset_dir=/home/sk/workspace/slim/datasets/flowers
--model_name=inception_v3
below is error, i think is about gpu hardware problem but it works perfectly fine with TFlearn of tensorflow example code, only tf slim doesn't work
i reinstalled tensorflow and cudnn but I can't fix it
what is cause of this problem?
Caused by op u'InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1', defined at:
File "/home/sk/workspace/slim/train_image_classifier.py", line 573, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sk/workspace/slim/train_image_classifier.py", line 539, in main
global_step=global_step)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
self._create_slots([_get_variable_for(v) for v in var_list])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/rmsprop.py", line 103, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 714, in _get_single_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 197, in __init__
expected_shape=expected_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 281, in _init_from_args
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 128, in variable_op_v2
shared_name=shared_name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 708, in _variable_v2
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Colocation Debug Info:
Colocation group had the following types and devices:
ApplyRMSProp: CPU
Const: CPU
Assign: CPU
IsVariableInitialized: CPU
Identity: CPU
VariableV2: CPU
[[Node: InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1 = VariableV2[_class=["loc:@InceptionV3/Logits/Conv2d_1c_1x1/biases"], container="", dtype=DT_FLOAT, shape=[3], shared_name="", _device="/device:GPU:0"]()]]
nvidia-smi status
-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 0000:01:00.0 On | N/A |
| 23% 30C P8 10W / 250W | 313MiB / 12181MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1056 G /usr/lib/xorg/Xorg 212MiB |
| 0 1655 G compiz 98MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
Is there an earlier part of your log where you can see TF finding and mapping the GPU as device:GPU:0?
sudo python train_image_classifier.py --train_dir=/home/sk/flower_train --dataset_name=flowers --dataset_split_name=train --dataset_dir=/home/sk/flowers --model_name=inception_v3
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
2017-07-18 18:01:45.645834: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-18 18:01:45.645851: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-18 18:01:45.645869: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-18 18:01:45.645873: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-18 18:01:45.645877: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape, gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape_1)]]
Caused by op u'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs', defined at:
File "train_image_classifier.py", line 573, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 533, in main
var_list=variables_to_train)
File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 297, in optimize_clones
optimizer, clone, num_clones, regularization_losses, **kwargs)
File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 261, in _optimize_clone
clone_grad = optimizer.compute_gradients(sum_loss, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 346, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in <lambda>
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 650, in _SubGrad
rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 395, in _broadcast_gradient_args
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
...which was originally created as op u'InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub', defined at:
File "train_image_classifier.py", line 573, in <module>
tf.app.run()
[elided 0 identical lines from previous traceback]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 472, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "train_image_classifier.py", line 455, in clone_fn
logits, end_points = network_fn(images)
File "/home/sk/workspace/models/slim/nets/nets_factory.py", line 108, in network_fn
return func(images, num_classes, is_training=is_training)
File "/home/sk/workspace/models/slim/nets/inception_v3.py", line 483, in inception_v3
depth_multiplier=depth_multiplier)
File "/home/sk/workspace/models/slim/nets/inception_v3.py", line 104, in inception_v3_base
net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 956, in convolution
outputs = normalizer_fn(outputs, **normalizer_params)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 554, in batch_norm
outputs = layer.apply(inputs, training=is_training)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 492, in apply
return self.__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 441, in __call__
outputs = self.call(inputs, *args, **kwargs)
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape, gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape_1)]]
Traceback (most recent call last):
File "train_image_classifier.py", line 573, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 569, in main
sync_optimizer=optimizer if FLAGS.sync_replicas else None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 732, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 279, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape, gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape_1)]]
Caused by op u'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs', defined at:
File "train_image_classifier.py", line 573, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 533, in main
var_list=variables_to_train)
File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 297, in optimize_clones
optimizer, clone, num_clones, regularization_losses, **kwargs)
File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 261, in _optimize_clone
clone_grad = optimizer.compute_gradients(sum_loss, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 346, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in <lambda>
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 650, in _SubGrad
rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 395, in _broadcast_gradient_args
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
...which was originally created as op u'InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub', defined at:
File "train_image_classifier.py", line 573, in <module>
tf.app.run()
[elided 0 identical lines from previous traceback]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 472, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "train_image_classifier.py", line 455, in clone_fn
logits, end_points = network_fn(images)
File "/home/sk/workspace/models/slim/nets/nets_factory.py", line 108, in network_fn
return func(images, num_classes, is_training=is_training)
File "/home/sk/workspace/models/slim/nets/inception_v3.py", line 483, in inception_v3
depth_multiplier=depth_multiplier)
File "/home/sk/workspace/models/slim/nets/inception_v3.py", line 104, in inception_v3_base
net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 956, in convolution
outputs = normalizer_fn(outputs, **normalizer_params)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 554, in batch_norm
outputs = layer.apply(inputs, training=is_training)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 492, in apply
return self.__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 441, in __call__
outputs = self.call(inputs, *args, **kwargs)
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape, gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape_1)]]
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "train_image_classifier.py", line 573, in <module>\n tf.app.run()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthrough))', 'File "train_image_classifier.py", line 569, in main\n sync_optimizer=optimizer if FLAGS.sync_replicas else None)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 655, in train\n ready_op = tf_variables.report_uninitialized_variables()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(*args, **kwargs))', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n wrapped = TFShouldUseWarningWrapper(x)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 96, in __init__\n stack = [s.strip() for s in traceback.format_stack()]']
==================================
@cy89
I think you're using a CPU tensorflow. Usually it prints out device information. Closing, since this is really a usage problem.
I'm facing the same bug on MacOS.
Based on the disclaimer on the install page, it seems that GPU support is not possible:
Note: As of version 1.2, TensorFlow no longer provides GPU support on Mac OS X.
I suppose it is also the case for MacOS.
The Tf slim launched with the documented command in the readme.md fails with the same trace as @sukkyusun :
# in a tensorflow activated virtualenv with python 2.7 version
DATASET_DIR=/tmp/data/flowers
TRAIN_DIR=/tmp/flowers-models/inception_v3
CHECKPOINT_PATH=/tmp/my_checkpoints/inception_v3.ckpt
python train_image_classifier.py \
--train_dir=${TRAIN_DIR} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=flowers \
--dataset_split_name=train \
--model_name=inception_v3 \
--checkpoint_path=${CHECKPOINT_PATH} \
--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
--trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits
@drpngx can you point out where I make a mistake in the way I use the provided scripts?
Thanks for your time,
Sorry, MacOS is a bit tricky to support. You're basically on your own there. The stack trace just says you don't have a GPU available. You can build from sources and look at the output. It should print messages about detected GPU devices.
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
same error in tensorflow 1.4 ubuntu
uninstalling tensorflow-gpu==1.3.0 and installing tensorflow-gpu==1.2.1 worked for me! thanks
Try:
config = tf.ConfigProto(allow_soft_placement=True)
with tf.Session(config=config) as sess:
......
I found another trick:
In the train_image_classifier.py script line 40 there is:
tf.app.flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')
Turn this function to True
if you will to use CPUs instead of GPUs which seems to be enabled by default.
Hope it will help.
I found another trick:
In the train_image_classifier.py script line 40 there is:
tf.app.flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')
Turn this function to
True
if you will to use CPUs instead of GPUs which seems to be enabled by default.Hope it will help.
This fixed the error for me.
Thanks
Most helpful comment
I found another trick:
In the train_image_classifier.py script line 40 there is:
tf.app.flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')
Turn this function to
True
if you will to use CPUs instead of GPUs which seems to be enabled by default.Hope it will help.