Serving: Could not launch cub::DeviceReduce::Sum to count number of true indices

Created on 21 Oct 2017  Â·  5Comments  Â·  Source: tensorflow/serving

Environment

I pulled the environment information from the tf_env_collect.sh script offered here: https://github.com/tensorflow/tensorflow/blob/master/tools/tf_env_collect.sh.

== cat /etc/issue ===============================================
Linux ip-172-31-64-152 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u3 (2017-08-15) x86_64 GNU/Linux
VERSION_ID="8"
VERSION="8 (jessie)"

== are we in docker =============================================
No

== compiler =====================================================
c++ (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


== uname -a =====================================================
Linux ip-172-31-64-152 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u3 (2017-08-15) x86_64 GNU/Linux

== check pips ===================================================
numpy (1.13.1)
protobuf (3.4.0)
tensorflow (1.3.0)
tensorflow-tensorboard (0.1.5)

== check for virtualenv =========================================
False

== tensorflow import ============================================
tf.VERSION = 1.3.0
tf.GIT_VERSION = unknown
tf.COMPILER_VERSION = unknown
Sanity check: array([1], dtype=int32)

== env ==========================================================
LD_LIBRARY_PATH is unset
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
Fri Oct 20 23:33:16 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   58C    P0    57W / 149W |  10961MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     18939    C   ...rflow-serving/bin/tensorflow_model_server 10957MiB |
+-----------------------------------------------------------------------------+

== cuda libs  ===================================================
/usr/local/cuda-8.0/doc/man/man7/libcudart.so.7
/usr/local/cuda-8.0/doc/man/man7/libcudart.7
/usr/local/cuda-8.0/lib64/libcudart.so.8.0.61
/usr/local/cuda-8.0/lib64/libcudart_static.a

Additionally, I am using Bitnami to run tensorflow serving: https://docs.bitnami.com/general/infrastructure/tensorflowserving/

I used the following command to compile Tensorflow Serving with GPU support:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k --jobs 6 --verbose_failures tensorflow_serving/model_servers:tensorflow_model_server

Problem

I have a model that does the following:

  • Takes a batch of png images
  • In a tf.map_fn loop, convert the encoded png images to NHWC images. This part is fine.
  • Run the images through a CNN face detector.
  • Parse out the bounding boxes and run tf.image.non_max_suppression in a tf.while_loop. This is where the problems appear.

When I get to the tf.while_loop, I start to get strange Cub::DeviceReduce::Sum errors. This seems to specifically happen when I run tf.where operations.

These errors do not appear when I try to run the graph in Python with tensorflow-gpu support.

This is the error that appears:

WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
     [[Node: face_detector/bounding_boxes/nms_bounding_boxes/bbox_masked/Where = Where[_output_shapes=[[?,1]], _device="/job:localhost/replica:0/task:0/device:GPU:0"](face_detector/bounding_boxes/nms_bounding_boxes/bbox_masked/Reshape_1)]]
     [[Node: face_detector/bounding_boxes/nms_bounding_boxes/bbox_masked/Gather/_181 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_780_face_detector/bounding_boxes/nms_bounding_boxes/bbox_masked/Gather", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](^_cloopface_detector/bounding_boxes/nms_bounding_boxes/Gather_2/indices/_21)]]

Specifically, it says that there is an "invalid configuration argument". What I am asking is: what is the invalid configuration argument?. Could it be a mistake in the way I compiled Tensorflow Serving (i.e. in the options I specified)?

See the appendix below for description of the Tensorflow code for this part of the graph.

Appendix

For context, here is the tensorflow code corresponding to this part of the graph:

bboxes_tfarr = tf.TensorArray(tf.float32, size=batch_size, infer_shape=False)
partition_tfarr = tf.TensorArray(tf.int32, size=batch_size, infer_shape=False)

def cond(i, acc, pacc):
    return tf.less(i, batch_size)

def body(i, bbox_acc, partition_acc):
    bbox   = tf.reshape(tf.gather(bboxes, [i]), [-1, 4], name='bboxes')
    scores = tf.reshape(tf.gather(confs, [i]), [-1], name='scores')

    cond = tf.greater(scores, threshold, name='cond')

    conf_mask = tf.where(
        cond,
        tf.ones_like(scores),
        tf.zeros_like(scores),
        name='conf_mask'
    )

    conf_mask = tf.cast(conf_mask, tf.bool, name='conf_mask_bool')

    scores = tf.boolean_mask(scores, conf_mask, name='scores_masked')
    bbox = tf.boolean_mask(bbox, conf_mask, name='bbox_masked')

    number_of_faces = tf.reshape(tf.gather(tf.shape(scores), 0), [], name='number_of_faces')
    max_output_size = tf.reduce_min(tf.stack([number_of_faces, max_detectable_faces], axis=0), name='max_output_size')
    bbox_inds = tf.image.non_max_suppression(bbox, scores, iou_threshold=0.5, max_output_size=max_output_size, name='nms')
    bbox = tf.clip_by_value(tf.gather(bbox, bbox_inds, axis=0), 0.0, 1.0, name='bboxes_clipped')

    partition = tf.multiply(tf.ones_like(bbox_inds), i)

    return (tf.add(i, 1), bbox_acc.write(i, bbox), partition_acc.write(i, partition))

_, bboxes, partitions = tf.while_loop(
    cond,
    body,
    (tf.constant(0), bboxes_tfarr, partition_tfarr),
    name='nms_bounding_boxes'
)

bboxes     = tf.reshape(bboxes.concat(), [-1, 4], name='bboxes_reshaped')
partitions = tf.reshape(partitions.concat(), [-1], name='partitions_reshaped')

The code above is intended to do the following on each iteration of the tf.while_loop:

  • Take the bounding boxes for one of the images in the batch, as well as the confidence scores for each box. The bboxes tensor has the shape [None, 960, 4] (i.e. 960 max bboxes). The scores tensor has the shape [None, 960] (i.e. score for each bbox)
  • I have a threshold between 0 and 1. I first run tf.where on the scores tensor to produce a boolean mask with which I can filter down the bboxes and scores tensors. If I have, for example, only 5 elements in the scores tensor that are above the threshold, then I would expect the bboxes and scores tensors to now have the shapes [None, 5, 4] and [None, 5], respectively, after the tf.boolean_mask operation.
  • The max_output_size chooses the maximum number of bounding boxes to extract using tf.image.non_max_suppression. I have the constant max_detectable_faces that caps this.
  • Filter the bboxes based on the indices returned by tf.image.non_max_suppression.
  • The partitions tensor is a 1-D tensor where each element is the position in the original batch of each bounding box. This allows me to later use tf.image.crop_and_resize on the images.
performance

Most helpful comment

Hi
I got this error while executing faster R-CNN. please help me to resolve this error. thanks in advance

018-01-11 06:05:49.066479: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
2018-01-11 06:05:49.066923: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094692: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094692: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094700: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094705: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094712: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
Traceback (most recent call last):
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/home/anaconda3/lib/python3.6/contextlib.py", line 89, in __exit__
next(self.gen)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where[_device="/job:localhost/replica:0/task:0/gpu:0"](LOSS_default/NotEqual/_411)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./tools/trainval_net.py", line 140, in
max_iters=args.max_iters)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 400, in train_net
sw.train_model(sess, max_iters)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 311, in train_model
self.net.train_step(sess, blobs, train_op)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/nets/network.py", line 465, in train_step
feed_dict=feed_dict)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where[_device="/job:localhost/replica:0/task:0/gpu:0"](LOSS_default/Not

All 5 comments

I solved this. I had to use tf.device('cpu:0') for the while_loop. I noticed that some of the tensors in the loop body were mapped to the GPU while others were mapped to the CPU, in which case those tensors wouldn't have access to each others memory.

Hi
I got this error while executing faster R-CNN. please help me to resolve this error. thanks in advance

018-01-11 06:05:49.066479: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
2018-01-11 06:05:49.066923: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094692: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094692: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094700: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094705: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-01-11 06:05:49.094712: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
Traceback (most recent call last):
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/home/anaconda3/lib/python3.6/contextlib.py", line 89, in __exit__
next(self.gen)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where[_device="/job:localhost/replica:0/task:0/gpu:0"](LOSS_default/NotEqual/_411)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./tools/trainval_net.py", line 140, in
max_iters=args.max_iters)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 400, in train_net
sw.train_model(sess, max_iters)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 311, in train_model
self.net.train_step(sess, blobs, train_op)
File "/home/FasterRCNN/tf-faster-rcnn-master/tools/../lib/nets/network.py", line 465, in train_step
feed_dict=feed_dict)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 2815, status: invalid device function
[[Node: LOSS_default/Where = Where[_device="/job:localhost/replica:0/task:0/gpu:0"](LOSS_default/Not

@chiru83 Do you fixed this error? I also meet this errors when I train Mask RCNN.

2018-05-13 00:55:05.452042: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
2018-05-13 00:55:05.452913: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453128: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453190: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
2018-05-13 00:55:05.453238: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
Traceback (most recent call last):
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/test.py", line 153, in <module>
    print(sess.run(mrcnn_loss, feed_dict=feed_datas))
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'fpn_maskrcnn_head/PyramidROIAlign/Where', defined at:
  File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/test.py", line 113, in <module>
    config.MASK_POOL_SIZE, config.NUM_CLASS, config.ANCHOR_STRIDES)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 113, in wrapper
    return func(*args, **kwargs)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 52, in wrapper
    return func(*args, **kwargs)
  File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/model.py", line 617, in fpn_maskrcnn_head
    roi_features = PyramidROIAlign(rois, fpn_features, pool_size, features_strides)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 84, in wrapper
    return func(*args, **kwargs)
  File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/model.py", line 572, in PyramidROIAlign
    index = tf.where(tf.equal(leves,level))[:,0]
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2439, in where
    return gen_array_ops.where(input=condition, name=name)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5930, in where
    "Where", input=input, name=name)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices.  temp_storage_bytes: 1, status: invalid configuration argument
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]
     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]


Process finished with exit code 1

Not yet. But, I guess the errors may be due to older versions of the GPU. Drivers may need to be updated, but not sure though.

Best regards,

Pojala Chiranjeevi
CR/RTC1.2-IN

Tel. +91 80 6101-3423

From: Superlee notifications@github.com
Sent: Sunday, May 13, 2018 10:35 AM
To: tensorflow/serving serving@noreply.github.com
Cc: Chiranjeevi Pojala (CR/RTC1.2-IN) Chiranjeevi.Pojala@in.bosch.com; Mention mention@noreply.github.com
Subject: Re: [tensorflow/serving] Could not launch cub::DeviceReduce::Sum to count number of true indices (#627)

@chiru83https://github.com/chiru83 Do you fixed this error? I also meet this errors when I train Mask RCNN.

2018-05-13 00:55:05.452042: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument

2018-05-13 00:55:05.452913: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]

2018-05-13 00:55:05.453128: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]

2018-05-13 00:55:05.453190: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]

2018-05-13 00:55:05.453238: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]

Traceback (most recent call last):

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call

return fn(*args)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn

status, run_metadata)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__

c_api.TF_GetCode(self.status.status))

tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/test.py", line 153, in

print(sess.run(mrcnn_loss, feed_dict=feed_datas))

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run

run_metadata_ptr)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run

feed_dict_tensor, options, run_metadata)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run

options, run_metadata)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call

raise type(e)(node_def, op, message)

tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'fpn_maskrcnn_head/PyramidROIAlign/Where', defined at:

File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/test.py", line 113, in

config.MASK_POOL_SIZE, config.NUM_CLASS, config.ANCHOR_STRIDES)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 113, in wrapper

return func(*args, **kwargs)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 52, in wrapper

return func(*args, **kwargs)

File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/model.py", line 617, in fpn_maskrcnn_head

roi_features = PyramidROIAlign(rois, fpn_features, pool_size, features_strides)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorpack/tfutils/scope_utils.py", line 84, in wrapper

return func(*args, **kwargs)

File "/home/chaoli/PycharmProjects/SuperCode/tensorpack-master/Tensorpack_Examples/Humanpose/model.py", line 572, in PyramidROIAlign

index = tf.where(tf.equal(leves,level))[:,0]

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2439, in where

return gen_array_ops.where(input=condition, name=name)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5930, in where

"Where", input=input, name=name)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper

op_def=op_def)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op

op_def=op_def)

File "/home/chaoli/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__

self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](fpn_maskrcnn_head/PyramidROIAlign/Equal/_777)]]

     [[Node: fpn_maskrcnn_head/PyramidROIAlign/Where/_779 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2167_fpn_maskrcnn_head/PyramidROIAlign/Where", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Process finished with exit code 1

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/tensorflow/serving/issues/627#issuecomment-388601868, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aas38d7G23AfoFv0ILp0ZUIZKTglDIHJks5tx78DgaJpZM4QBWVU.

I have solved the seem problem!!! There is no answer anywhere, may be there is some gpu code use a different gpu id from which used by tensorflow.

Was this page helpful?
0 / 5 - 0 ratings