I pulled the environment information from the tf_env_collect.sh script offered here: https://github.com/tensorflow/tensorflow/blob/master/tools/tf_env_collect.sh.
== cat /etc/issue ===============================================
Linux ip-172-31-64-152 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u3 (2017-08-15) x86_64 GNU/Linux
VERSION_ID="8"
VERSION="8 (jessie)"
== are we in docker =============================================
No
== compiler =====================================================
c++ (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
== uname -a =====================================================
Linux ip-172-31-64-152 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u3 (2017-08-15) x86_64 GNU/Linux
== check pips ===================================================
numpy (1.13.1)
protobuf (3.4.0)
tensorflow (1.3.0)
tensorflow-tensorboard (0.1.5)
== check for virtualenv =========================================
False
== tensorflow import ============================================
tf.VERSION = 1.3.0
tf.GIT_VERSION = unknown
tf.COMPILER_VERSION = unknown
Sanity check: array([1], dtype=int32)
== env ==========================================================
LD_LIBRARY_PATH is unset
DYLD_LIBRARY_PATH is unset
== nvidia-smi ===================================================
Fri Oct 20 23:33:16 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:00:1E.0 Off | 0 |
| N/A 58C P0 57W / 149W | 10961MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 18939 C ...rflow-serving/bin/tensorflow_model_server 10957MiB |
+-----------------------------------------------------------------------------+
== cuda libs ===================================================
/usr/local/cuda-8.0/doc/man/man7/libcudart.so.7
/usr/local/cuda-8.0/doc/man/man7/libcudart.7
/usr/local/cuda-8.0/lib64/libcudart.so.8.0.61
/usr/local/cuda-8.0/lib64/libcudart_static.a
Additionally, I am using Bitnami to run tensorflow serving: https://docs.bitnami.com/general/infrastructure/tensorflowserving/
I used the following command to compile Tensorflow Serving with GPU support:
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k --jobs 6 --verbose_failures tensorflow_serving/model_servers:tensorflow_model_server
I have a model that does the following:
tf.map_fn loop, convert the encoded png images to NHWC images. This part is fine.tf.image.non_max_suppression in a tf.while_loop. This is where the problems appear.When I get to the tf.while_loop, I start to get strange Cub::DeviceReduce::Sum errors. This seems to specifically happen when I run tf.where operations.
These errors do not appear when I try to run the graph in Python with tensorflow-gpu support.
This is the error that appears:
WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: face_detector/bounding_boxes/nms_bounding_boxes/bbox_masked/Where = Where[_output_shapes=[[?,1]], _device="/job:localhost/replica:0/task:0/device:GPU:0"](face_detector/bounding_boxes/nms_bounding_boxes/bbox_masked/Reshape_1)]]
[[Node: face_detector/bounding_boxes/nms_bounding_boxes/bbox_masked/Gather/_181 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_780_face_detector/bounding_boxes/nms_bounding_boxes/bbox_masked/Gather", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](^_cloopface_detector/bounding_boxes/nms_bounding_boxes/Gather_2/indices/_21)]]
Specifically, it says that there is an "invalid configuration argument". What I am asking is: what is the invalid configuration argument?. Could it be a mistake in the way I compiled Tensorflow Serving (i.e. in the options I specified)?
See the appendix below for description of the Tensorflow code for this part of the graph.
For context, here is the tensorflow code corresponding to this part of the graph:
bboxes_tfarr = tf.TensorArray(tf.float32, size=batch_size, infer_shape=False)
partition_tfarr = tf.TensorArray(tf.int32, size=batch_size, infer_shape=False)
def cond(i, acc, pacc):
return tf.less(i, batch_size)
def body(i, bbox_acc, partition_acc):
bbox = tf.reshape(tf.gather(bboxes, [i]), [-1, 4], name='bboxes')
scores = tf.reshape(tf.gather(confs, [i]), [-1], name='scores')
cond = tf.greater(scores, threshold, name='cond')
conf_mask = tf.where(
cond,
tf.ones_like(scores),
tf.zeros_like(scores),
name='conf_mask'
)
conf_mask = tf.cast(conf_mask, tf.bool, name='conf_mask_bool')
scores = tf.boolean_mask(scores, conf_mask, name='scores_masked')
bbox = tf.boolean_mask(bbox, conf_mask, name='bbox_masked')
number_of_faces = tf.reshape(tf.gather(tf.shape(scores), 0), [], name='number_of_faces')
max_output_size = tf.reduce_min(tf.stack([number_of_faces, max_detectable_faces], axis=0), name='max_output_size')
bbox_inds = tf.image.non_max_suppression(bbox, scores, iou_threshold=0.5, max_output_size=max_output_size, name='nms')
bbox = tf.clip_by_value(tf.gather(bbox, bbox_inds, axis=0), 0.0, 1.0, name='bboxes_clipped')
partition = tf.multiply(tf.ones_like(bbox_inds), i)
return (tf.add(i, 1), bbox_acc.write(i, bbox), partition_acc.write(i, partition))
_, bboxes, partitions = tf.while_loop(
cond,
body,
(tf.constant(0), bboxes_tfarr, partition_tfarr),
name='nms_bounding_boxes'
)
bboxes = tf.reshape(bboxes.concat(), [-1, 4], name='bboxes_reshaped')
partitions = tf.reshape(partitions.concat(), [-1], name='partitions_reshaped')
The code above is intended to do the following on each iteration of the tf.while_loop:
bboxes tensor has the shape [None, 960, 4] (i.e. 960 max bboxes). The scores tensor has the shape [None, 960] (i.e. score for each bbox)tf.where on the scores tensor to produce a boolean mask with which I can filter down the bboxes and scores tensors. If I have, for example, only 5 elements in the scores tensor that are above the threshold, then I would expect the bboxes and scores tensors to now have the shapes [None, 5, 4] and [None, 5], respectively, after the tf.boolean_mask operation.max_output_size chooses the maximum number of bounding boxes to extract using tf.image.non_max_suppression. I have the constant max_detectable_faces that caps this.bboxes based on the indices returned by tf.image.non_max_suppression.partitions tensor is a 1-D tensor where each element is the position in the original batch of each bounding box. This allows me to later use tf.image.crop_and_resize on the images.I solved this. I had to use tf.device('cpu:0') for the while_loop. I noticed that some of the tensors in the loop body were mapped to the GPU while others were mapped to the CPU, in which case those tensors wouldn't have access to each others memory.
Hi
I got this error while executing faster R-CNN. please help me to resolve this error. thanks in advance
018-01-11 06:05:49.066479: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
2018-01-11 06:05:49.066923: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094692: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094692: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094700: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094705: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094712: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
Traceback (most recent call last):
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/home/anaconda3/lib/python3.6/contextlib.py", line 89, in __exit__
next(self.gen)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where[_device="/job:localhost/replica:0/task:0/gpu:0"](LOSS_default/NotEqual/_411)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./tools/trainval_net.py", line 140, in
max_iters=args.max_iters)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 400, in train_net
sw.train_model(sess, max_iters)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 311, in train_model
self.net.train_step(sess, blobs, train_op)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/nets/network.py", line 465, in train_step
feed_dict=feed_dict)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where[_device="/job:localhost/replica:0/task:0/gpu:0"](LOSS_default/Not
@chiru83 Do you fixed this error? I also meet this errors when I train Mask RCNN.
2018-05-13 00:55:05.452042: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
2018-05-13 00:55:05.452913: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453128: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453190: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453238: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
Traceback (most recent call last):
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
return fn(*args)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
status, run_metadata)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/test.py", line 153, in <module>
print(sess.run(mrcnn_loss, feed_dict=feed_datas))
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'fpn_maskrcnn_head/PyramidROIAlign/Where', defined at:
File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/test.py", line 113, in <module>
config.MASK_POOL_SIZE, config.NUM_CLASS, config.ANCHOR_STRIDES)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 113, in wrapper
return func(*args, **kwargs)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 52, in wrapper
return func(*args, **kwargs)
File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/model.py", line 617, in fpn_maskrcnn_head
roi_features = PyramidROIAlign(rois, fpn_features, pool_size, features_strides)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 84, in wrapper
return func(*args, **kwargs)
File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/model.py", line 572, in PyramidROIAlign
index = tf.where(tf.equal(leves,level))[:,0]
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2439, in where
return gen_array_ops.where(input=condition, name=name)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5930, in where
"Where", input=input, name=name)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Process finished with exit code 1
Not yet. But, I guess the errors may be due to older versions of the GPU. Drivers may need to be updated, but not sure though.
Best regards,
Pojala Chiranjeevi
CR/RTC1.2-IN
Tel. +91 80 6101-3423
From: Superlee notifications@github.com
Sent: Sunday, May 13, 2018 10:35 AM
To: tensorflow/serving serving@noreply.github.com
Cc: Chiranjeevi Pojala (CR/RTC1.2-IN) Chiranjeevi.Pojala@in.bosch.com; Mention mention@noreply.github.com
Subject: Re: [tensorflow/serving] Could not launch cub::DeviceReduce::Sum to count number of true indices (#627)
@chiru83https://github.com/chiru83 Do you fixed this error? I also meet this errors when I train Mask RCNN.
2018-05-13 00:55:05.452042: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
2018-05-13 00:55:05.452913: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453128: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453190: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453238: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
Traceback (most recent call last):
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
return fn(*args)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
status, run_metadata)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/test.py", line 153, in
print(sess.run(mrcnn_loss, feed_dict=feed_datas))
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'fpn_maskrcnn_head/PyramidROIAlign/Where', defined at:
File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/test.py", line 113, in
config.MASK_POOL_SIZE, config.NUM_CLASS, config.ANCHOR_STRIDES)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 113, in wrapper
return func(*args, **kwargs)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 52, in wrapper
return func(*args, **kwargs)
File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/model.py", line 617, in fpn_maskrcnn_head
roi_features = PyramidROIAlign(rois, fpn_features, pool_size, features_strides)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 84, in wrapper
return func(*args, **kwargs)
File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/model.py", line 572, in PyramidROIAlign
index = tf.where(tf.equal(leves,level))[:,0]
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2439, in where
return gen_array_ops.where(input=condition, name=name)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5930, in where
"Where", input=input, name=name)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
[[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Process finished with exit code 1
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/tensorflow/serving/issues/627#issuecomment-388601868, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aas38d7G23AfoFv0ILp0ZUIZKTglDIHJks5tx78DgaJpZM4QBWVU.
I have solved the seem problem!!! There is no answer anywhere, may be there is some gpu code use a different gpu id from which used by tensorflow.
Most helpful comment
Hi
I got this error while executing faster R-CNN. please help me to resolve this error. thanks in advance
018-01-11 06:05:49.066479: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
2018-01-11 06:05:49.066923: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094692: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094692: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094700: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094705: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094712: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
Traceback (most recent call last):
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/home/anaconda3/lib/python3.6/contextlib.py", line 89, in __exit__
next(self.gen)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where[_device="/job:localhost/replica:0/task:0/gpu:0"](LOSS_default/NotEqual/_411)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./tools/trainval_net.py", line 140, in
max_iters=args.max_iters)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 400, in train_net
sw.train_model(sess, max_iters)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 311, in train_model
self.net.train_step(sess, blobs, train_op)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/nets/network.py", line 465, in train_step
feed_dict=feed_dict)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where[_device="/job:localhost/replica:0/task:0/gpu:0"](LOSS_default/Not