Hi,
I would like to use SSDRandomCrop and BoxEncoder in my input data pipeline. There is an example of how to use them with COCOReader. However, I have my data is stored as TFRecords (and would like to use tfrecords for efficiency reasons). So, I have configured my pipeline like this:
class DetectionPipeline(Pipeline):
def __init__(self,
tfrecords,
tfrecords_idx,
batch_size,
num_workers,
device_id,
shard_id,
prefetch,
is_training=True):
super(DetectionPipeline, self).__init__(batch_size, num_workers, device_id, prefetch)
self.is_training = is_training
features = {
'image/encoded': tfrecord.FixedLenFeature((), tfrecord.string, ''),
'image/format': tfrecord.FixedLenFeature((), tfrecord.string, 'jpeg'),
# Object boxes and classes.
'image/object/bbox/xmin': tfrecord.VarLenFeature(tfrecord.float32, 0.0),
'image/object/bbox/ymin': tfrecord.VarLenFeature(tfrecord.float32, 0.0),
'image/object/bbox/xmax': tfrecord.VarLenFeature(tfrecord.float32, 0.0),
'image/object/bbox/ymax': tfrecord.VarLenFeature(tfrecord.float32, 0.0),
'image/object/class/label': tfrecord.VarLenFeature(tfrecord.int64, -1),
}
self.input = ops.TFRecordReader(path=tfrecords,
index_path=tfrecords_idx,
features=features,
random_shuffle=self.is_training,
shard_id=shard_id,
num_shards=num_workers)
self.decode_image = ops.HostDecoder(device="cpu", output_type=types.RGB)
def define_graph(self):
# Read images and labels
inputs = self.input(name="Reader")
image = inputs["image/encoded"]
xmin = inputs["image/object/bbox/xmin"]
ymin = inputs["image/object/bbox/ymin"]
xmax = inputs["image/object/bbox/xmax"]
ymax = inputs["image/object/bbox/ymax"]
bbox_label = inputs["image/object/class/label"]
image = self.decode_image(image)
return image, xmin, ymin, xmax, ymax, bbox_label
So, I have separate tensor lists for [xmin], [ymin], [xmax], [ymax]. However, the operations like SSDRandomCrop require boxes to be a single tensor as [xmin, ymin, xmax, ymax].
Is there a way I can stack/concatenate these values into a single TensorList? Maybe I can configure the TFRecordsReader some way that it reads them together?
Hi,
For that, you need to create a custom op that would merge those 3 tensors into one. I don't think we have any other way.
You can check https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/extend/create_a_custom_operator.html - basically, new op would have 4 inputs and 1 output. And do just copy from inputs to the output.
Thank you for the answer. Is there any API reference for the custom ops? I can see only this simple example, but where do I find a proper documentation?
For instance, how do I change this code:
template<>
void ConcatOp<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace *ws, const int idx) {
const auto &input = ws->Input<::dali::CPUBackend>(idx);
auto &output = ws->Output<::dali::CPUBackend>(idx);
output.set_type(input.type());
output.ResizeLike(input);
::dali::TypeInfo type = input.type();
type.Copy<::dali::CPUBackend, ::dali::CPUBackend>(
output.raw_mutable_data(),
input.raw_data(), input.size(), 0);
}
to operate on four inputs, rather than one? Which method should I call to stack TensorLists with adding a new dimension? DALI C++ API seems to be undocumented.
Hi @kometa-triatlon !
DALI C++ API is not stable, so currently the documentation might be non-exhaustive.
As for reading multiple inputs within RunImpl function, you can do it like:
const auto &input0 = ws->Input<::dali::GPUBackend>(0);
const auto &input1 = ws->Input<::dali::GPUBackend>(1);
In the case you've presented (about stacking TensorLists together), the layout of the data changes. Thus, the best way would be to do it "manually":
You can perform the last point by acquiring pointers to specific tensors inside input TensorLists (TensorList::tensor<float>(idx), TensorList::data<float>()) as well as output tensor (TensorList::mutable_data<float>())
For more TensorList utilities, that can help you to implement you operator, please refer to code documentation:
https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/data/buffer.h
https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/data/tensor.h
https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/data/tensor_list.h
Also, please note, that when implementing CPUBackend operator, return type of ws->Input<CPUBackend>() is a Tensor. On the other hand, for GPUBackend operator, ws->Input<GPUBackend>() returns TensorList.
You can also refer to discussion in https://github.com/NVIDIA/DALI/issues/410.
I have created a very simple custom operation that just copies the contents of several tensors each after another. Now I can have the bounding boxes coordinates as a single tensor:
class DetectionPipeline(Pipeline):
def __init__(...):
...
self.concat = ops.Concat(device="cpu")
...
def define_graph(self):
....
bbox = self.concat(ymin, xmin, ymax, xmax)
return image.gpu(), bbox_label.gpu(), bbox.gpu()
The problem is that the data is concatenated alongside the outer dimension, whereas I need the xmin, ymin, xmax, ymax values to be interleaved.
The obvious solution is to transpose the resulting tensor after concat, which works just fine if I transpose it with TensorFlow op:
with tf.device('/gpu:0'):
image_t, classes_t, boxes_t = daliop(
pipeline=pipe,
shapes=[(args.batch_size, args.img_height, args.img_width, 3), (), ()],
dtypes=[tf.uint8, tf.int64, tf.float32])
boxes_t = tf.transpose(boxes_t, [0, 2, 1])
But transposing the tensor inside the DALI graph:
self.transpose = ops.Transpose(device='gpu', perm=[1, 0])
....
bbox = self.concat(ymin, xmin, ymax, xmax)
bbox = self.transpose(bbox.gpu())
Causes an error:
DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline: [/opt/dali/dali/pipeline/operators/transpose/transpose.cu:105] Error while transposing cuttPlan(plan, batched_perm.size(), c_dims.get(), c_permutation.get(), sizeof(T), stream)
Unfortunately, the error message is not very informative. Could you maybe guess what might be wrong?
Hi,
Could you provide some self-contained code to reproducing your problem (concat operator source code would be helpful).
I created a simple example of transposition and bboxes from COCO and I definitely see data transposed.:
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
from nvidia.dali.backend_impl import TensorListGPU
import os
import numpy as np
file_root = "/data/coco/coco-2017/coco2017/val2017"
annotations_file = "/data/coco/coco-2017/coco2017/annotations/instances_val2017.json"
def to_array(dali_out):
if isinstance(dali_out, TensorListGPU):
dali_out = dali_out.as_cpu()
return np.squeeze(dali_out.as_array())
class COCOPipeline(Pipeline):
def __init__(self, batch_size, num_threads, device_id):
super(COCOPipeline, self).__init__(batch_size, num_threads, device_id)
self.input = ops.COCOReader(file_root=file_root, annotations_file=annotations_file, ratio=True, ltrb=True)
self.transpose = ops.Transpose(device='gpu', perm=[0, 1])
self.transpose2 = ops.Transpose(device='gpu', perm=[1, 0])
def define_graph(self):
inputs, bboxes, labels = self.input()
bboxes = self.transpose(bboxes.gpu())
bboxes2 = self.transpose2(bboxes.gpu())
return bboxes, bboxes2
pipe = COCOPipeline(batch_size=1, num_threads=2, device_id=0)
pipe.build()
out = pipe.run()
out_0 = to_array(out[0])
out_1 = to_array(out[1])
print(out_0)
print(out_1)
If it still doesn't work please reopen.
I have implemented the concatenation op and it worked fine with DALI 0.8. The implementation is quick and dirty:
namespace custom_ns {
template <>
void ConcatOp<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace *ws,
const int idx) {
const auto &xmin = ws->Input<::dali::CPUBackend>(0);
const auto &ymin = ws->Input<::dali::CPUBackend>(1);
const auto &xmax = ws->Input<::dali::CPUBackend>(2);
const auto &ymax = ws->Input<::dali::CPUBackend>(3);
auto &output = ws->Output<::dali::CPUBackend>(idx);
::dali::TypeInfo type = xmin.type();
auto n_bboxes = xmin.size();
output.set_type(type);
output.Resize({n_bboxes, 4});
for (int i = 0; i < n_bboxes; i++) {
float *dest = output.mutable_data<float>() + i * 4;
std::memcpy(dest, xmin.data<float>() + i, sizeof(float));
std::memcpy(dest + 1, ymin.data<float>() + i, sizeof(float));
std::memcpy(dest + 2, xmax.data<float>() + i, sizeof(float));
std::memcpy(dest + 3, ymax.data<float>() + i, sizeof(float));
}
}
} // namespace custom_ns
DALI_REGISTER_OPERATOR(Concat, ::custom_ns::ConcatOp<::dali::CPUBackend>,
::dali::CPU);
DALI_SCHEMA(Concat)
.DocStr("Concatenates the input tensors")
.NumInput(4)
.NumOutput(1);
Now I would like to use it with weekly build of DALI 0.11 to cope with #855
So, I have rebuilt the code under new environment with DALI 0.11. However, when I simply try to load the plugin with plugin_manager.load_library('./ConcatOp/build/libconcat.so') I got this error:
Traceback (most recent call last):
File "train_ssd.py", line 143, in <module>
main()
File "train_ssd.py", line 86, in main
is_training=True)
File "/home/work/ssd_DALI/pipeline.py", line 42, in __init__
device='cpu')
File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/ops.py", line 220, in __init__
converted_value = _type_convert_value(dtype, value)
File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/types.py", line 70, in _type_convert_value
return _known_types[dtype][1](val)
File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/types.py", line 32, in _not_implemented
raise NotImplementedError()
NotImplementedError
Has something changed in the custom operator API since 0.8?
It looks like the error does not come from load_library, but pipeline constructor. Can you share more of your python code? What happens around pipeline.py:42 and in tran_ssd.py:86?
There is nothing special:
train_ssd.py:
pipe = DetectionPipeline(tfrecords,
tfrecords_idx,
batch_size=args.batch_size,
num_workers=1,
device_id=0,
shard_id=0,
is_training=True)
pipeline.py:
self.input = ops.TFRecordReader(path=tfrecords,
index_path=tfrecords_idx,
features=features,
shard_id=shard_id,
num_shards=num_workers,
random_shuffle=False,
device='cpu')
It seems like the origin is in the TFRecordReader constructor, but if I comment out plugin_manager.load_library('./ConcatOp/build/libconcat.so') (which is not used), the error is gone.
Could you provide some minimal repro that reproduces that error?
Here is the minimal example. It works fine with docker image nvcr.io/nvidia/tensorflow:19.03-py3, i.e. outputs bounding box coordinates:
(1, 2, 4)
[[[0.13597734 0.48 0.5524079 0.742 ]
[0.02266289 0.024 0.9971671 0.996 ]]]
------
(1, 1, 4)
[[[0.41492537 0.4 0.61791044 0.602 ]]]
------
(1, 2, 4)
[[[0.246 0.41333333 0.43 0.52 ]
[0.478 0.416 0.614 0.5466667 ]]]
------
But it produces the error after updating nvidia-dali (and tf-plugin) to the weekly build.
Thanks. Tracked as DALI-839. I will get back to you when I learn more.
It was a regression introduced with new functionality - python based custom operator.
It should be fixed in https://github.com/NVIDIA/DALI/pull/910, and will be available as soon as it is merged in the following nightly. As this is python based fix you can just pick those changes and edit your DALI installation.
Thank you for providing a complete repro - it helped a lot during debugging.