Models: Tf slim run error

Created on 30 Jun 2017 · 10Comments · Source: tensorflow/models

Systeme Information

OS Platform and Distribution: Linux Ubuntu 14.04 LTS
TensorFlow installed from: pip tensorflow-gpu
TensorFlow version: ('v1.2.0-rc2-21-g12f033d', '1.2.0')
CUDA/cuDNN version: Cuda 8.0, Cudnn 5.1

GPU model and memory: titan x pascal 12gb

Describe the problem

I'm trying to run Tf slim image classifier with following URL
https://github.com/tensorflow/models/tree/master/slim

python train_image_classifier.py
--train_dir= /home/sk/workspace/slim/datasets/log
--dataset_name=flowers
--dataset_split_name=train
--dataset_dir=/home/sk/workspace/slim/datasets/flowers
--model_name=inception_v3

below is error, i think is about gpu hardware problem but it works perfectly fine with TFlearn of tensorflow example code, only tf slim doesn't work
i reinstalled tensorflow and cudnn but I can't fix it
what is cause of this problem?

Source code / logs

Caused by op u'InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1', defined at:
File "/home/sk/workspace/slim/train_image_classifier.py", line 573, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sk/workspace/slim/train_image_classifier.py", line 539, in main
global_step=global_step)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
self._create_slots([_get_variable_for(v) for v in var_list])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/rmsprop.py", line 103, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 714, in _get_single_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 197, in __init__
expected_shape=expected_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 281, in _init_from_args
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 128, in variable_op_v2
shared_name=shared_name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 708, in _variable_v2
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Colocation Debug Info:
Colocation group had the following types and devices:
ApplyRMSProp: CPU
Const: CPU
Assign: CPU
IsVariableInitialized: CPU
Identity: CPU
VariableV2: CPU
[[Node: InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1 = VariableV2[_class=["loc:@InceptionV3/Logits/Conv2d_1c_1x1/biases"], container="", dtype=DT_FLOAT, shape=[3], shared_name="", _device="/device:GPU:0"]()]]

nvidia-smi status

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1056 G /usr/lib/xorg/Xorg 212MiB |
| 0 1655 G compiz 98MiB |
+-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

community support support

Source

sukkyusun

Most helpful comment

I found another trick:
In the train_image_classifier.py script line 40 there is:
tf.app.flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')

Turn this function to True if you will to use CPUs instead of GPUs which seems to be enabled by default.

Hope it will help.

ghost on 11 May 2018

👍7 ❤3 🎉2 😄2 🚀1

All 10 comments

Is there an earlier part of your log where you can see TF finding and mapping the GPU as device:GPU:0?

cy89 on 8 Jul 2017

sudo python train_image_classifier.py --train_dir=/home/sk/flower_train --dataset_name=flowers --dataset_split_name=train --dataset_dir=/home/sk/flowers --model_name=inception_v3
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
2017-07-18 18:01:45.645834: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-18 18:01:45.645851: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-18 18:01:45.645869: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-18 18:01:45.645873: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-18 18:01:45.645877: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape, gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape_1)]]

Caused by op u'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs', defined at:
  File "train_image_classifier.py", line 573, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_image_classifier.py", line 533, in main
    var_list=variables_to_train)
  File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 297, in optimize_clones
    optimizer, clone, num_clones, regularization_losses, **kwargs)
  File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 261, in _optimize_clone
    clone_grad = optimizer.compute_gradients(sum_loss, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 346, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 650, in _SubGrad
    rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 395, in _broadcast_gradient_args
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

...which was originally created as op u'InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub', defined at:
  File "train_image_classifier.py", line 573, in <module>
    tf.app.run()
[elided 0 identical lines from previous traceback]
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_image_classifier.py", line 472, in main
    clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
  File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "train_image_classifier.py", line 455, in clone_fn
    logits, end_points = network_fn(images)
  File "/home/sk/workspace/models/slim/nets/nets_factory.py", line 108, in network_fn
    return func(images, num_classes, is_training=is_training)
  File "/home/sk/workspace/models/slim/nets/inception_v3.py", line 483, in inception_v3
    depth_multiplier=depth_multiplier)
  File "/home/sk/workspace/models/slim/nets/inception_v3.py", line 104, in inception_v3_base
    net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 956, in convolution
    outputs = normalizer_fn(outputs, **normalizer_params)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 554, in batch_norm
    outputs = layer.apply(inputs, training=is_training)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 492, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 441, in __call__
    outputs = self.call(inputs, *args, **kwargs)

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape, gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape_1)]]

Traceback (most recent call last):
  File "train_image_classifier.py", line 573, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_image_classifier.py", line 569, in main
    sync_optimizer=optimizer if FLAGS.sync_replicas else None)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 732, in train
    master, start_standard_services=False, config=session_config) as sess:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
    start_standard_services=start_standard_services)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 279, in prepare_session
    sess.run(init_op, feed_dict=init_feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape, gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape_1)]]

Caused by op u'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs', defined at:
  File "train_image_classifier.py", line 573, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_image_classifier.py", line 533, in main
    var_list=variables_to_train)
  File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 297, in optimize_clones
    optimizer, clone, num_clones, regularization_losses, **kwargs)
  File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 261, in _optimize_clone
    clone_grad = optimizer.compute_gradients(sum_loss, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 346, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 540, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 650, in _SubGrad
    rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 395, in _broadcast_gradient_args
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

...which was originally created as op u'InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub', defined at:
  File "train_image_classifier.py", line 573, in <module>
    tf.app.run()
[elided 0 identical lines from previous traceback]
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_image_classifier.py", line 472, in main
    clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
  File "/home/sk/workspace/models/slim/deployment/model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "train_image_classifier.py", line 455, in clone_fn
    logits, end_points = network_fn(images)
  File "/home/sk/workspace/models/slim/nets/nets_factory.py", line 108, in network_fn
    return func(images, num_classes, is_training=is_training)
  File "/home/sk/workspace/models/slim/nets/inception_v3.py", line 483, in inception_v3
    depth_multiplier=depth_multiplier)
  File "/home/sk/workspace/models/slim/nets/inception_v3.py", line 104, in inception_v3_base
    net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 956, in convolution
    outputs = normalizer_fn(outputs, **normalizer_params)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 554, in batch_norm
    outputs = layer.apply(inputs, training=is_training)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 492, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 441, in __call__
    outputs = self.call(inputs, *args, **kwargs)

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/device:GPU:0"](gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape, gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/Shape_1)]]

ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "train_image_classifier.py", line 573, in <module>\n    tf.app.run()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run\n    _sys.exit(main(_sys.argv[:1] + flags_passthrough))', 'File "train_image_classifier.py", line 569, in main\n    sync_optimizer=optimizer if FLAGS.sync_replicas else None)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 655, in train\n    ready_op = tf_variables.report_uninitialized_variables()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n    return _add_should_use_warning(fn(*args, **kwargs))', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n    wrapped = TFShouldUseWarningWrapper(x)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 96, in __init__\n    stack = [s.strip() for s in traceback.format_stack()]']
==================================

@cy89

sukkyusun on 18 Jul 2017

I think you're using a CPU tensorflow. Usually it prints out device information. Closing, since this is really a usage problem.

drpngx on 18 Jul 2017

👍1

I'm facing the same bug on MacOS.

Based on the disclaimer on the install page, it seems that GPU support is not possible:
Note: As of version 1.2, TensorFlow no longer provides GPU support on Mac OS X.
I suppose it is also the case for MacOS.

The Tf slim launched with the documented command in the readme.md fails with the same trace as @sukkyusun :

# in a tensorflow activated virtualenv with python 2.7 version
DATASET_DIR=/tmp/data/flowers
TRAIN_DIR=/tmp/flowers-models/inception_v3
CHECKPOINT_PATH=/tmp/my_checkpoints/inception_v3.ckpt
python train_image_classifier.py \
    --train_dir=${TRAIN_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=flowers \
    --dataset_split_name=train \
    --model_name=inception_v3 \
    --checkpoint_path=${CHECKPOINT_PATH} \
    --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
    --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits

@drpngx can you point out where I make a mistake in the way I use the provided scripts?
Thanks for your time,

baptistemanson on 11 Aug 2017

Sorry, MacOS is a bit tricky to support. You're basically on your own there. The stack trace just says you don't have a GPU available. You can build from sources and look at the output. It should print messages about detected GPU devices.

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/moments/Sub_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.

drpngx on 28 Aug 2017

👍1

same error in tensorflow 1.4 ubuntu

offbye on 29 Nov 2017

uninstalling tensorflow-gpu==1.3.0 and installing tensorflow-gpu==1.2.1 worked for me! thanks

AmanuelHirpa on 19 Jan 2018

Try:

  config = tf.ConfigProto(allow_soft_placement=True)
  with tf.Session(config=config) as sess:
      ......

Ocxs on 4 Feb 2018

I found another trick:
In the train_image_classifier.py script line 40 there is:
tf.app.flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')

Turn this function to True if you will to use CPUs instead of GPUs which seems to be enabled by default.

Hope it will help.

ghost on 11 May 2018

👍7 ❤3 🎉2 😄2 🚀1

I found another trick:
In the train_image_classifier.py script line 40 there is:
tf.app.flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')

Turn this function to True if you will to use CPUs instead of GPUs which seems to be enabled by default.

Hope it will help.

This fixed the error for me.
Thanks