Dali: Object detection data from TFRecords to SSDRandomCrop: how to concatenate TensorLists??

Created on 15 Apr 2019 · 14Comments · Source: NVIDIA/DALI

Hi,

I would like to use SSDRandomCrop and BoxEncoder in my input data pipeline. There is an example of how to use them with COCOReader. However, I have my data is stored as TFRecords (and would like to use tfrecords for efficiency reasons). So, I have configured my pipeline like this:

class DetectionPipeline(Pipeline):                                                                                                                                                                                                                                    
    def __init__(self,                                                                                                                                                                                                                                                
                 tfrecords,                                                                                                                                                                                                                                           
                 tfrecords_idx,                                                                                                                                                                                                                                       
                 batch_size,                                                                                                                                                                                                                                          
                 num_workers,                                                                                                                                                                                                                                         
                 device_id,                                                                                                                                                                                                                                           
                 shard_id,                                                                                                                                                                                                                                            
                 prefetch,                                                                                                                                                                                                                                            
                 is_training=True):                                                                                                                                                                                                                                   

        super(DetectionPipeline, self).__init__(batch_size, num_workers, device_id, prefetch)                                                                                                                                                                         

        self.is_training = is_training                                                                                                                                                                                                                                

        features = {                                                                                                                                                                                                                                                  
            'image/encoded': tfrecord.FixedLenFeature((), tfrecord.string, ''),                                                                                                                                                                                       
            'image/format': tfrecord.FixedLenFeature((), tfrecord.string, 'jpeg'),                                                                                                                                                                                    

            # Object boxes and classes.                                                                                                                                                                                                                               
            'image/object/bbox/xmin': tfrecord.VarLenFeature(tfrecord.float32, 0.0),                                                                                                                                                                                  
            'image/object/bbox/ymin': tfrecord.VarLenFeature(tfrecord.float32, 0.0),                                                                                                                                                                                  
            'image/object/bbox/xmax': tfrecord.VarLenFeature(tfrecord.float32, 0.0),                                                                                                                                                                                  
            'image/object/bbox/ymax': tfrecord.VarLenFeature(tfrecord.float32, 0.0),                                                                                                                                                                                  
            'image/object/class/label': tfrecord.VarLenFeature(tfrecord.int64, -1),                                                                                                                                                                                   
        }                                                                                                                                                                                                                                                             

        self.input = ops.TFRecordReader(path=tfrecords,                                                                                                                                                                                                               
                                        index_path=tfrecords_idx,                                                                                                                                                                                                     
                                        features=features,                                                                                                                                                                                                            
                                        random_shuffle=self.is_training,                                                                                                                                                                                              
                                        shard_id=shard_id,                                                                                                                                                                                                            
                                        num_shards=num_workers)                                                                                                                                                                                                       

        self.decode_image = ops.HostDecoder(device="cpu", output_type=types.RGB)

    def define_graph(self):                                                                                                                                                                                                                                           
        # Read images and labels                                                                                                                                                                                                                                      
        inputs = self.input(name="Reader")                                                                                                                                                                                                                            
        image = inputs["image/encoded"]                                                                                                                                                                                                                               
        xmin = inputs["image/object/bbox/xmin"]                                                                                                                                                                                                                       
        ymin = inputs["image/object/bbox/ymin"]                                                                                                                                                                                                                       
        xmax = inputs["image/object/bbox/xmax"]                                                                                                                                                                                                                       
        ymax = inputs["image/object/bbox/ymax"]                                                                                                                                                                                                                       
        bbox_label = inputs["image/object/class/label"]                                                                                                                                                                                                               

        image = self.decode_image(image)                                                                                                                                                                                                                              

        return image, xmin, ymin, xmax, ymax, bbox_label

So, I have separate tensor lists for [xmin], [ymin], [xmax], [ymax]. However, the operations like SSDRandomCrop require boxes to be a single tensor as [xmin, ymin, xmax, ymax].

Is there a way I can stack/concatenate these values into a single TensorList? Maybe I can configure the TFRecordsReader some way that it reads them together?

question

Source

dprylipko

👍2

All 14 comments

Hi,
For that, you need to create a custom op that would merge those 3 tensors into one. I don't think we have any other way.
You can check https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/extend/create_a_custom_operator.html - basically, new op would have 4 inputs and 1 output. And do just copy from inputs to the output.

JanuszL on 15 Apr 2019

Thank you for the answer. Is there any API reference for the custom ops? I can see only this simple example, but where do I find a proper documentation?

For instance, how do I change this code:

template<>                                                                                                                                                                                                                                                            
void ConcatOp<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace *ws, const int idx) {                                                                                                                                                                              
  const auto &input = ws->Input<::dali::CPUBackend>(idx);                                                                                                                                                                                                             
  auto &output = ws->Output<::dali::CPUBackend>(idx);                                                                                                                                                                                                                 
  output.set_type(input.type());                                                                                                                                                                                                                                      
  output.ResizeLike(input);                                                                                                                                                                                                                                           

  ::dali::TypeInfo type = input.type();                                                                                                                                                                                                                               
  type.Copy<::dali::CPUBackend, ::dali::CPUBackend>(                                                                                                                                                                                                                  
      output.raw_mutable_data(),                                                                                                                                                                                                                                      
      input.raw_data(), input.size(), 0);                                                                                                                                                                                                                             
}

to operate on four inputs, rather than one? Which method should I call to stack TensorLists with adding a new dimension? DALI C++ API seems to be undocumented.

kometa-triatlon on 26 Apr 2019

Hi @kometa-triatlon !
DALI C++ API is not stable, so currently the documentation might be non-exhaustive.

As for reading multiple inputs within RunImpl function, you can do it like:

const auto &input0 = ws->Input<::dali::GPUBackend>(0);
const auto &input1 = ws->Input<::dali::GPUBackend>(1);

In the case you've presented (about stacking TensorLists together), the layout of the data changes. Thus, the best way would be to do it "manually":

Determine shape of output tensor (or tensor list)
Create new tensor, that will be the output tensor (or tensor list)
Reshape it properly
Manually fill it with data.

You can perform the last point by acquiring pointers to specific tensors inside input TensorLists (TensorList::tensor<float>(idx), TensorList::data<float>()) as well as output tensor (TensorList::mutable_data<float>())

For more TensorList utilities, that can help you to implement you operator, please refer to code documentation:
https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/data/buffer.h
https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/data/tensor.h
https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/data/tensor_list.h

Also, please note, that when implementing CPUBackend operator, return type of ws->Input<CPUBackend>() is a Tensor. On the other hand, for GPUBackend operator, ws->Input<GPUBackend>() returns TensorList.

szalpal on 26 Apr 2019

👍1

You can also refer to discussion in https://github.com/NVIDIA/DALI/issues/410.

JanuszL on 26 Apr 2019

👍1

I have created a very simple custom operation that just copies the contents of several tensors each after another. Now I can have the bounding boxes coordinates as a single tensor:

class DetectionPipeline(Pipeline):
    def __init__(...):
        ...
        self.concat = ops.Concat(device="cpu")
        ...

    def define_graph(self):
        ....
        bbox = self.concat(ymin, xmin, ymax, xmax)
        return image.gpu(), bbox_label.gpu(),  bbox.gpu()

The problem is that the data is concatenated alongside the outer dimension, whereas I need the xmin, ymin, xmax, ymax values to be interleaved.

The obvious solution is to transpose the resulting tensor after concat, which works just fine if I transpose it with TensorFlow op:

    with tf.device('/gpu:0'):                                                                                                                                                                                                                                         
        image_t, classes_t, boxes_t = daliop(                                                                                                                                                                                                                         
            pipeline=pipe,                                                                                                                                                                                                                                            
            shapes=[(args.batch_size, args.img_height, args.img_width, 3), (), ()],                                                                                                                                                                                   
            dtypes=[tf.uint8, tf.int64, tf.float32])                                                                                                                                                                                                                  

        boxes_t = tf.transpose(boxes_t, [0, 2, 1])

But transposing the tensor inside the DALI graph:

self.transpose = ops.Transpose(device='gpu', perm=[1, 0])

....

bbox = self.concat(ymin, xmin, ymax, xmax)                                                                                                                                                                                                                    
bbox = self.transpose(bbox.gpu())

Causes an error:

DALI daliShareOutput(&pipe_handle_) failed: Critical error in pipeline: [/opt/dali/dali/pipeline/operators/transpose/transpose.cu:105] Error while transposing cuttPlan(plan, batched_perm.size(), c_dims.get(), c_permutation.get(), sizeof(T), stream)

Unfortunately, the error message is not very informative. Could you maybe guess what might be wrong?

kometa-triatlon on 3 May 2019

Hi,
Could you provide some self-contained code to reproducing your problem (concat operator source code would be helpful).
I created a simple example of transposition and bboxes from COCO and I definitely see data transposed.:

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
from nvidia.dali.backend_impl import TensorListGPU
import os
import numpy as np

file_root = "/data/coco/coco-2017/coco2017/val2017"
annotations_file = "/data/coco/coco-2017/coco2017/annotations/instances_val2017.json"

def to_array(dali_out):
    if isinstance(dali_out, TensorListGPU):
        dali_out = dali_out.as_cpu()

    return np.squeeze(dali_out.as_array())

class COCOPipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(COCOPipeline, self).__init__(batch_size, num_threads, device_id)
        self.input = ops.COCOReader(file_root=file_root, annotations_file=annotations_file, ratio=True, ltrb=True)
        self.transpose = ops.Transpose(device='gpu', perm=[0, 1])
        self.transpose2 = ops.Transpose(device='gpu', perm=[1, 0])

    def define_graph(self):
        inputs, bboxes, labels = self.input()
        bboxes = self.transpose(bboxes.gpu())
        bboxes2 = self.transpose2(bboxes.gpu())
        return bboxes, bboxes2

pipe = COCOPipeline(batch_size=1, num_threads=2, device_id=0)

pipe.build()
out = pipe.run()
out_0 = to_array(out[0])
out_1 = to_array(out[1])
print(out_0)
print(out_1)

JanuszL on 7 May 2019

If it still doesn't work please reopen.

JanuszL on 14 May 2019

I have implemented the concatenation op and it worked fine with DALI 0.8. The implementation is quick and dirty:

namespace custom_ns {

template <>
void ConcatOp<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace *ws,
                                           const int idx) {

  const auto &xmin = ws->Input<::dali::CPUBackend>(0);
  const auto &ymin = ws->Input<::dali::CPUBackend>(1);
  const auto &xmax = ws->Input<::dali::CPUBackend>(2);
  const auto &ymax = ws->Input<::dali::CPUBackend>(3);

  auto &output = ws->Output<::dali::CPUBackend>(idx);

  ::dali::TypeInfo type = xmin.type();

  auto n_bboxes = xmin.size();
  output.set_type(type);
  output.Resize({n_bboxes, 4});
  for (int i = 0; i < n_bboxes; i++) {
    float *dest = output.mutable_data<float>() + i * 4;
    std::memcpy(dest, xmin.data<float>() + i, sizeof(float));
    std::memcpy(dest + 1, ymin.data<float>() + i, sizeof(float));
    std::memcpy(dest + 2, xmax.data<float>() + i, sizeof(float));
    std::memcpy(dest + 3, ymax.data<float>() + i, sizeof(float));
  }
}

} // namespace custom_ns

DALI_REGISTER_OPERATOR(Concat, ::custom_ns::ConcatOp<::dali::CPUBackend>,
                       ::dali::CPU);

DALI_SCHEMA(Concat)
    .DocStr("Concatenates the input tensors")
    .NumInput(4)
    .NumOutput(1);

Now I would like to use it with weekly build of DALI 0.11 to cope with #855

So, I have rebuilt the code under new environment with DALI 0.11. However, when I simply try to load the plugin with plugin_manager.load_library('./ConcatOp/build/libconcat.so') I got this error:

Traceback (most recent call last):
  File "train_ssd.py", line 143, in <module>
    main()
  File "train_ssd.py", line 86, in main
    is_training=True)
  File "/home/work/ssd_DALI/pipeline.py", line 42, in __init__
    device='cpu')
  File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/ops.py", line 220, in __init__
    converted_value = _type_convert_value(dtype, value)
  File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/types.py", line 70, in _type_convert_value
    return _known_types[dtype][1](val)
  File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/types.py", line 32, in _not_implemented
    raise NotImplementedError()
NotImplementedError

Has something changed in the custom operator API since 0.8?

kometa-triatlon on 23 May 2019

It looks like the error does not come from load_library, but pipeline constructor. Can you share more of your python code? What happens around pipeline.py:42 and in tran_ssd.py:86?

mzient on 23 May 2019

There is nothing special:

train_ssd.py:

    pipe = DetectionPipeline(tfrecords,                                                                                                                                                                                                                               
                             tfrecords_idx,                                                                                                                                                                                                                           
                             batch_size=args.batch_size,                                                                                                                                                                                                              
                             num_workers=1,                                                                                                                                                                                                                           
                             device_id=0,                                                                                                                                                                                                                             
                             shard_id=0,                                                                                                                                                                                                                              
                             is_training=True)

pipeline.py:

        self.input = ops.TFRecordReader(path=tfrecords,                                                                                                                                                                                                               
                                        index_path=tfrecords_idx,                                                                                                                                                                                                     
                                        features=features,                                                                                                                                                                                                            
                                        shard_id=shard_id,                                                                                                                                                                                                            
                                        num_shards=num_workers,                                                                                                                                                                                                       
                                        random_shuffle=False,                                                                                                                                                                                                         
                                        device='cpu')

It seems like the origin is in the TFRecordReader constructor, but if I comment out plugin_manager.load_library('./ConcatOp/build/libconcat.so') (which is not used), the error is gone.

kometa-triatlon on 23 May 2019

Could you provide some minimal repro that reproduces that error?

JanuszL on 23 May 2019

Here is the minimal example. It works fine with docker image nvcr.io/nvidia/tensorflow:19.03-py3, i.e. outputs bounding box coordinates:

(1, 2, 4)
[[[0.13597734 0.48       0.5524079  0.742     ]
  [0.02266289 0.024      0.9971671  0.996     ]]]
------
(1, 1, 4)
[[[0.41492537 0.4        0.61791044 0.602     ]]]
------
(1, 2, 4)
[[[0.246      0.41333333 0.43       0.52      ]
  [0.478      0.416      0.614      0.5466667 ]]]
------

But it produces the error after updating nvidia-dali (and tf-plugin) to the weekly build.

ssd_DALI_debug.tar.gz

kometa-triatlon on 23 May 2019

Thanks. Tracked as DALI-839. I will get back to you when I learn more.

JanuszL on 23 May 2019

It was a regression introduced with new functionality - python based custom operator.
It should be fixed in https://github.com/NVIDIA/DALI/pull/910, and will be available as soon as it is merged in the following nightly. As this is python based fix you can just pick those changes and edit your DALI installation.
Thank you for providing a complete repro - it helped a lot during debugging.

JanuszL on 23 May 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings