Dali: Images of different shapes are not supported in `nvidia.dali.plugin.pytorch.DALIGenericIterator`?

Created on 4 May 2020 · 4Comments · Source: NVIDIA/DALI

I am using DALI in Pytorch. Images of different shapes in the same batch should be a very prevalent case.
Follow this doc, I implement my own ExternalInputIterator. Images in my dataset are of different sizes. And I find that my code only works when batch_size=1. Can I put images of different shapes inside one batch? My dataset iterator is like this:

class FaceDatasetIterator():
        def __init__(self, batch_size=1, shuffled=False, dataset=WiderFaceDataset, **kwargs):
        self._batch_size = batch_size
        self.dataset = dataset(**kwargs)
        self.anno_list = [i for i in range(len(self.dataset))]
        if (shuffled):
            shuffle(self.anno_list)

    @property
    def batch_size(self):
        return self._batch_size

    @batch_size.setter
    def batch_size(self, bsize):
        if bsize<=0 or not isinstance(bsize, int):
            raise ValueError("batch size should be a positive integer! ")
        self._batch_size = bsize

    def __iter__(self):
        self.i = 0
        self.n = len(self.dataset)
        return self

    def __next__(self):
        batch = []
        bboxs = []
        labels = []

        if self.i >= self.n:
            raise StopIteration

        for _ in range(self.batch_size):
            index = self.anno_list[self.i]
            img, bbox, label = self.dataset[index]
            batch.append(img)
            bboxs.append(bbox)
            labels.append(label)
            self.i = (self.i + 1) % self.n

        return (batch, bboxs, labels)

    next = __next__

and my pipeline implementation:

class FaceDatasetPipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, dataset_iter):
        super(FaceDatasetPipeline, self).__init__(batch_size,
                                                  num_threads,
                                                  device_id,
                                                  seed=12,
                                                  exec_async=False,    # must set to False if using PythonFunction
                                                  exec_pipelined=False)# must set to False if using PythonFunction
        self.datasest = dataset_iter
        self.iterator = iter(dataset_iter)
        self.iterator.batch_size = batch_size

        self.input = ops.ExternalSource()
        self.input_bbox = ops.ExternalSource()
        self.input_label = ops.ExternalSource()
        self.decode = ops.ImageDecoder(device='mixed', output_type=types.RGB)
        self.cmnp = ops.CropMirrorNormalize(mean=[0.485, 0.456, 0.406],
                                            std=[0.229, 0.224, 0.225],
                                            device='gpu',
                                            output_layout='CHW',
                                            output_dtype=types.FLOAT,)
        self.res = ops.Resize(device='gpu',
                              max_size=1024,
                              resize_shorter=600)
        self.coin1 = ops.CoinFlip(probability=0)
        self.coin2 = ops.CoinFlip(probability=0)

        self.flip = ops.Flip(device="gpu", horizontal=0)
        self.bbflip = ops.BbFlip(device="cpu", ltrb=False)
        self.shape = ops.Shapes(device="gpu")


    def define_graph(self):
        self.jpegs = self.input()
        self.bboxes = self.input_bbox()
        self.labels = self.input_label()

        images = self.decode(self.jpegs)
        shape_raw = self.shape(images)   # [H, W, 3], a 1x3 tensor
        images = self.res(images)
        shape_resized = self.shape(images)
        scale = shape_resized / shape_raw

        rng1 = self.coin1()
        rng2 = self.coin2()
        images = self.cmnp(images, mirror=rng1)
        images = self.flip(images, vertical=rng2)
        bboxes = self.bbflip(self.bboxes, horizontal=rng1, vertical=rng2)
        shape_out = self.shape(images) # [3, H, W], shape after resizing and augmentation

        return (images, bboxes.gpu(), self.labels.gpu(), shape_out, scale)

    def iter_setup(self):
        try:
            (images, bboxes, labels) = self.iterator.next()
            self.feed_input(self.jpegs, images, layout='HWC')
            self.feed_input(self.bboxes, bboxes)
            self.feed_input(self.labels, labels)
        except StopIteration:
            self.iterator = iter(self.datasest)
            raise StopIteration

The code that runs pipeline:
pipe = FaceDatasetPipeline(batch_size=1, num_threads=2, device_id=0, dataset_iter=fdst_iter)
pipe.build()

    dali_iter = DALIGenericIterator(pipelines=pipe,
                                    output_map=['imgs', 'bboxes', 'labels', 'shape', 'scale'],
                                    size=1,
                                    auto_reset=True,
                                    dynamic_shape=True)

    for i, data in enumerate(dali_iter):

Could I achieve various-shaped batches through nvidia.dali.plugin.pytorch.DALIGenericIterator? Or could I construct pytorch's tensor through DALI's TensorList, thus I can circumvent using DALIGenericIterator?
Thanks in advance for any help and advice! :)

question

Source

cai-linjin

Most helpful comment

Hi,
Input images can have any shape, the output one needs to be uniform - as long as you want to use DALIGenericIterator which just return one tensor (where the outermost dimension is the batch size).
If you want to return the batch with nonuniform data you need to write your own iterator and then return each element from the TensorList as a separate framework tensor. You can check how it is done in the GluonIterator - where each element is copied to a separate tensor.

JanuszL on 4 May 2020

🚀1 🎉1 👍1

All 4 comments

JanuszL on 4 May 2020

🚀1 🎉1 👍1

@JanuszL Hello, JanuszL! Thank you for your reply! I've read the implementation of DALIGenericIterator, however, I haven't understood how uniformed tensors come into being. Could you please give a simple example of a custom Iterator? For example, I would like the output to be a List of Dictionaries, each dict is of {"image": Tensor, "boxes": Tensor, "labels":Tensor}, and the number of dicts is batch_size.
It would be great if you add this feature in the future release! Anyway, I am trying hard to understand the GluonIterator you mentioned.
Thank you again.

cai-linjin on 4 May 2020

Hi,