Trying to run this example on windows, error when training :
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: Predictions/Softmax = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](Predictions/Reshape)]]
The tensorflow installation and cuda is ok since I can still train object-detection flawlessly.
Tried the 1.5 nightly dev version but same error.
Thanks for any help in advance
WARNING:tensorflow:From ***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py:400: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From ***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py:468: softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.softmax_cross_entropy instead. Note that the order of the logits and labels arguments has been changed.
WARNING:tensorflow:From ***\Anaconda3-5.0.0\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:399: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.compute_weighted_loss instead.
WARNING:tensorflow:From ***\Anaconda3-5.0.0\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:152: add_arg_scope.<locals>.func_with_args (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
2017-11-15 16:33:40.904052: I C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2017-11-15 16:33:41.038052: I C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1031] Found device 0 with properties:
name: Quadro K1100M major: 3 minor: 0 memoryClockRate(GHz): 0.7055
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.46GiB
2017-11-15 16:33:41.038052: I C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Quadro K1100M, pci bus id: 0000:01:00.0, compute capability: 3.0)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Cannot assign a device for operation 'Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: Predictions/Softmax = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](Predictions/Reshape)]]
Caused by op 'Predictions/Softmax', defined at:
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py", line 574, in <module>
tf.app.run()
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py", line 474, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py", line 457, in clone_fn
logits, end_points = network_fn(images)
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\nets\nets_factory.py", line 135, in network_fn
return func(images, num_classes, is_training=is_training, **kwargs)
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\nets\lenet.py", line 77, in lenet
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 2598, in softmax
predictions = nn.softmax(logits_2d)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1667, in softmax
return _softmax(logits, gen_nn_ops._softmax, dim, name)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1610, in _softmax
return compute_op(logits, name=name)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 4367, in _softmax
"Softmax", logits=logits, name=name)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\framework\ops.py", line 3042, in create_op
op_def=op_def)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\framework\ops.py", line 1521, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: Predictions/Softmax = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](Predictions/Reshape)]]
Traceback (most recent call last):
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
return fn(*args)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\client\session.py", line 1293, in _run_fn
self._extend_graph()
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\client\session.py", line 1354, in _extend_graph
self._session, graph_def.SerializeToString(), status)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: Predictions/Softmax = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](Predictions/Reshape)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py", line 574, in <module>
tf.app.run()
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py", line 570, in main
sync_optimizer=optimizer if FLAGS.sync_replicas else None)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 746, in train
master, start_standard_services=False, config=session_config) as sess:
File "***\Anaconda3-5.0.0\lib\contextlib.py", line 81, in __enter__
return next(self.gen)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\training\supervisor.py", line 992, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\training\supervisor.py", line 820, in stop
ignore_live_threads=ignore_live_threads)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "***\Anaconda3-5.0.0\lib\site-packages\six.py", line 686, in reraise
raise value
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\training\supervisor.py", line 981, in managed_session
start_standard_services=start_standard_services)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\training\supervisor.py", line 718, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\training\session_manager.py", line 279, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
run_metadata_ptr)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
options, run_metadata)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: Predictions/Softmax = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](Predictions/Reshape)]]
Caused by op 'Predictions/Softmax', defined at:
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py", line 574, in <module>
tf.app.run()
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py", line 474, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\train_image_classifier.py", line 457, in clone_fn
logits, end_points = network_fn(images)
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\nets\nets_factory.py", line 135, in network_fn
return func(images, num_classes, is_training=is_training, **kwargs)
File "***\Anaconda3-5.0.0\Lib\site-packages\tensorflow\models\research\slim\nets\lenet.py", line 77, in lenet
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 2598, in softmax
predictions = nn.softmax(logits_2d)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1667, in softmax
return _softmax(logits, gen_nn_ops._softmax, dim, name)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1610, in _softmax
return compute_op(logits, name=name)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 4367, in _softmax
"Softmax", logits=logits, name=name)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\framework\ops.py", line 3042, in create_op
op_def=op_def)
File "***\Anaconda3-5.0.0\lib\site-packages\tensorflow\python\framework\ops.py", line 1521, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: Predictions/Softmax = Softmax[T=DT_FLOAT, _device="/device:GPU:0"](Predictions/Reshape)]]
@thenbasilmanran, any advice to offer?
I ran into the same problem with another model of slim. For me, a workaround for the problem was to add "with tf.device('/cpu:0'):" in front of the softmax definition. I think, in your case this is probably in front of line 77 in lenet.py.
@philippwerner Thank you for the help! I encountered the same problem for lenet and mobilenet as well.
Could you give some more details on where you added the line? Is it gen_nn_ops.py ?
@jingyibo123: I only fixed the problem for NASnet, but I think the fix should work for the other networks as well if used at the right line of code. If you want to fix it for all networks, the softmax() function in tensorflow/python/ops/nn_ops.py, line 1667 is probably a good place to add it. Try to replace
return _softmax(logits, gen_nn_ops._softmax, dim, name)
by
with tf.device('/cpu:0'):
return _softmax(logits, gen_nn_ops._softmax, dim, name)
@all: I know this is only a quick and dirty fix. We are still looking for a clean solution that only uses CPU if no GPU kernel is available. Maybe there should be some exception handling + warning to stderr?
@jingyibo123 I have the same error with you but I have tried methods of philippwerner鈥榮 but I failed.Have
you ever solve it?
Having exactly the same issue running locally with GeForce GTX 980TI. Not a problem AWS EC2 K80 though. Any update by any chance? Thx
@mrry We're getting a bunch of reports that the softmax GPU kernel isn't available on Windows. Is this expected?
AFAICT we compile the op for GPU and test it as part of the Windows GPU build: search for "softmax_op_gpu.cu.cc" in this build log output.
@philippwerner Thank you, that worked for me.
@seppestaes the solution of philippwerner works well.
Since the GPU kernel of Softmax has been compiled we should't have this issue anymore in the future.
add allow_soft_placement=True to the session config :
tf.ConfigProto(allow_soft_placement=True)
Edit in the train_image_classifier.py. This worked for me ....
###########################
# Kicks off the training. #
###########################
session_config = tf.ConfigProto(allow_soft_placement=True)
slim.learning.train(
train_tensor,
logdir=FLAGS.train_dir,
master=FLAGS.master,
is_chief=(FLAGS.task == 0),
init_fn=_get_init_fn(),
summary_op=summary_op,
number_of_steps=FLAGS.max_number_of_steps,
log_every_n_steps=FLAGS.log_every_n_steps,
save_summaries_secs=FLAGS.save_summaries_secs,
save_interval_secs=FLAGS.save_interval_secs,
sync_optimizer=optimizer if FLAGS.sync_replicas else None,
session_config=session_config,
)
found the problem: tensorflow version 1.2 doosent know how to work with TPU and fails when it tr to get free GPU
Most helpful comment
add allow_soft_placement=True to the session config :
tf.ConfigProto(allow_soft_placement=True)