Models: 'InceptionV3/Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0'

Created on 7 Jan 2018  Â·  5Comments  Â·  Source: tensorflow/models

System information

What is the top-level directory of the model you are using: slim
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): windows 10
TensorFlow installed from (source or binary): binary using pip install
TensorFlow version (use command below): 1.4.0 / 1.5.0 dev GPU
Bazel version (if compiling from source):
CUDA/cuDNN version: 8.0 / 6.1
GPU model and memory: GTX960 4G
Exact command to reproduce:

when I run the train_image_classifier.py then problem is:
Caused by op 'InceptionV3/Predictions/Softmax', defined at:
File "train_image_classifier.py", line 577, in
tf.app.run()
File "C:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 477, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "D:\python3.5.2\Model\tensorflow_models\models-master\research\slim\deployment\model_deploy.py", line 193, in create_clones
outputs = model_fn(args, *kwargs)
File "train_image_classifier.py", line 460, in clone_fn
logits, end_points = network_fn(images)
File "D:\python3.5.2\Model\tensorflow_models\models-master\research\slim\nets\nets_factory.py", line 135, in network_fn
return func(images, num_classes, is_training=is_training, *kwargs)
File "D:\python3.5.2\Model\tensorflow_models\models-master\research\slim\nets\inception_v3.py", line 543, in inception_v3
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
File "C:\Anaconda3\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 181, in func_with_args
return func(
args, **current_args)
File "C:\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 2582, in softmax
predictions = nn.softmax(logits_2d)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1667, in softmax
return _softmax(logits, gen_nn_ops._softmax, dim, name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1610, in _softmax
return compute_op(logits, name=name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 4316, in _softmax
"Softmax", logits=logits, name=name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
op_def=op_def)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'InceptionV3/Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: InceptionV3/Predictions/Softmax = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](InceptionV3/Predictions/Reshape)]]

awaiting model gardener

Most helpful comment

I got the same problem. It is solved by changing the last few lines of codes defined in train_image_classifier.py

       ###########################
        # Kicks off the training. #
        ###########################

        session_config = tf.ConfigProto(allow_soft_placement=True)

        slim.learning.train(
                train_tensor,
                logdir=FLAGS.train_dir,
                master=FLAGS.master,
                is_chief=(FLAGS.task == 0),
                init_fn=_get_init_fn(),
                summary_op=summary_op,
                number_of_steps=FLAGS.max_number_of_steps,
                log_every_n_steps=FLAGS.log_every_n_steps,
                save_summaries_secs=FLAGS.save_summaries_secs,
                save_interval_secs=FLAGS.save_interval_secs,
                sync_optimizer=optimizer if FLAGS.sync_replicas else None,
                session_config=session_config,
                )

All 5 comments

@sguada @nathansilberman

I got the same problem. It is solved by changing the last few lines of codes defined in train_image_classifier.py

       ###########################
        # Kicks off the training. #
        ###########################

        session_config = tf.ConfigProto(allow_soft_placement=True)

        slim.learning.train(
                train_tensor,
                logdir=FLAGS.train_dir,
                master=FLAGS.master,
                is_chief=(FLAGS.task == 0),
                init_fn=_get_init_fn(),
                summary_op=summary_op,
                number_of_steps=FLAGS.max_number_of_steps,
                log_every_n_steps=FLAGS.log_every_n_steps,
                save_summaries_secs=FLAGS.save_summaries_secs,
                save_interval_secs=FLAGS.save_interval_secs,
                sync_optimizer=optimizer if FLAGS.sync_replicas else None,
                session_config=session_config,
                )

@Ao-Lee Very appreciated! That problem has been handled,but when I run again there is a new problem.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,192,17,17]
[[Node: InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=0.001, is_training=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/Conv2D, InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/Const, InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/beta/read/_317, InceptionV3/AuxLogits/Conv2d_1b_1x1/BatchNorm/Const_1, InceptionV3/AuxLogits/Conv2d_1b_1x1/BatchNorm/Const_1)]]
[[Node: zero_fraction_12/Mean/_631 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2709_zero_fraction_12/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

It's no reason for that.I 'm thinking whether it's the channels question?

well, I guess your problem is out of memory error . try to lower batch size and run it again

@Ao-Lee wow,it really works.Thank you very much.

Was this page helpful?
0 / 5 - 0 ratings