Dali: Status of conditional operators

Created on 23 Apr 2019  路  10Comments  路  Source: NVIDIA/DALI

I am curious if there has been any progress towards the "short circuit" flag feature mentioned in this issue thread? Without this, it is difficult to integrate with PyTorch's DataLoader and Dataset classes for my use case.

The closest alternative I have found so far is detailed below

import random
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops

class ExamplePipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(Example, self).__init__(batch_size, num_threads, device_id)
        self.decoder = ...
        self.input_jpegs = ops.ExternalSource()
        self.augmentations['contrast'] = {
            'operation': ops.Contrast(),
            'generators': [ops.Uniform(range=(0.4, 1.6))],
            'noop': [ops.Uniform(range=(1, 1))],
            'args': ['contrast'],
            'prob': 0.5}
        self.augmentations['brightness'] = {
            'operation': ops.Brightness(),
            'generators': [ops.Uniform(range=(0.6, 1.4))],
            'noop': [ops.Uniform(range=(1, 1))],
            'args': ['brightness'],
            'prob': 0.35}

    def define_graph(self):
        """Modify images based on the pipeline's augmentations dictionary."""
        self.jpegs = self.input_jpegs()
        images = self.decoder(self.jpegs)
        aug_names = list(self.augmentations.keys())
        augs = list(self.augmentations.values())
        for i, aug in enumerate(augs):
            if random.random() < aug['prob']:
                kwargs = {a: g() for a, g in zip(aug['args'], aug['generators'])}
                print('Applied {} ({:.0f}% chance)'.format(
                    aug_names[i], aug['prob'] * 100))
            else:
                kwargs = {a: n() for a, n in zip(aug['args'], aug['noop'])}
            images = aug['operation'](images, **kwargs)
        return images

    def feed_input(self):
        ...

When the build() method of ExamplePipe is called, it will use the Contrast and Brightness operators 50% and 35% of the time, respectively, and perform a "no-op" the rest of the time (i.e. pass the image through the operator but leave it unchanged).

This approach to conditional execution works great if you want each pipeline instance that you create to have different types of operators. However, if you use a self.pipe = ExamplePipe(...) attribute in a PyTorch Dataset class, the conditional behavior _only_ happens at initialization (since pipe.build() is not called each time the pipeline runs). That means iterating over the Dataset with a DataLoader will give you the same _types_ of augmentations every time even though the _values_ given as input to each augmentation can change (due to ops.Uniform). To get unique types of augmentations, you would have to re-instantiate and build the pipe attribute for each batch (which seems like it would defeat a lot of the performance gains).

In my pipeline, I would like to convert 25% of images to black and white, crop 10% of images, etc. With conditional operators, I should be able to use the built-in Saturation and Crop operators. However, without a conditional feature, I need to write custom operators to wrap each built-in operator so that I can randomly generate values on the C++ side

question

Most helpful comment

Some kind of conditional operator would be so useful! I'm currently evaluating DALI and I love how fast it is. But if you can not easily randomize the augmentations it's only of limited use.
The best way of doing image augmentation currently is google's autoaugment, or the improved version from Berkeley. Basically you have a couple of augmentation policies (like defined here: https://github.com/arcelien/pba/blob/master/autoaugment/policies.py ) and then you randomly choose between them. It would be so awesome if this would be possible!

All 10 comments

Hi @addisonklinke

Indeed, what you did here will give you a different pipeline at every pipeline instantiation but you still need some kind of short circuit flag/mask system to DALI operator for it to happen.

This feature is in our backlog but unfortunately, right now we have many features with higher priority.
Adding such a feature will require a lot of changes, especially in GPU operators, where processing is done per batch, often in the same CUDA kernel.

We are currently discussing changes in Operator and we will address this topic.

So we will keep you posted here.

We're reworking our data structures now and the changes we introduce are going to make such conditional operators easier to implement, but it's still a long way to get there.

@Kh4L @mzient Thank you for the update, I'll keep an eye on this thread for future progress. To clarify, I wouldn't need conditional augmentations _within_ a batch, just across batches. I'm not sure if that makes the GPU operator changes easier to handle or not

Across batches operator would be simpler to implement, but would still require rework of the operator and some ideas to be settled (like, where do you store the random generator and how do you get/pass its ouput).
We might make it available before the _intra batch_ conditional, but still, we can't make you any promise on the date.

Some kind of conditional operator would be so useful! I'm currently evaluating DALI and I love how fast it is. But if you can not easily randomize the augmentations it's only of limited use.
The best way of doing image augmentation currently is google's autoaugment, or the improved version from Berkeley. Basically you have a couple of augmentation policies (like defined here: https://github.com/arcelien/pba/blob/master/autoaugment/policies.py ) and then you randomly choose between them. It would be so awesome if this would be possible!

We have some ideas on how we could make conditional operators available in DALI. However, we were rather considering the following:

  • conditional application of an operator - i.e. if a mask flag is set for a sample, then the operator is applied to it, otherwise the sample is passed through
  • conditional sample routing:

    • split: if a mask flag is set, the sample is passed to the operator, otherwise

    • merge: create an output from two inputs by selecting the input based on mask (or index)

      These should be more or less doable in DALI, but it hasn't been very high on our priority list - however, if there's demand, we may adapt to it.

Any updates on this? I have a kind of hacky implementation of applying GPU operators probabilistically per-batch by adding a RNG and a RunNoOp method to Operator<GPUBackend> that just copies the input to the output. I messed with the Run method to call the no-op instead of RunImpl depending on RNG output.

I can try to get this to a state where it can be upstreamed, but if you're about to release a bunch of changes I don't want to step on your toes / create something unmergeable.

Hi,
We haven't done any progress @rbetz if you want to do it on your own I think something like conditional sample routing mentioned earlier would be the best way. I would go with the mux operator that depending on the value one of its inputs passes the forward on of the other inputs. That way you can have not only conditional application of one operator but also the selection between different augmentations that are mutually exclusive.
@mzient any opinion?

I think that both masking and multiplexing are useful, but we haven't done anything towards supporting either. There are some problems there, e.g. some operators can't work with empty tensors and multiplexing operator will most likely produce these. I can't think of any quick fix we can provide there.
As for the hack with RunImpl in CPU operators - it's going to get increasingly difficult, as we transition towards batch execution and separate shape inference step.

You could have a "select" operator which takes N inputs and an auxiliary input providing per-sample input index. With this approach, you could apply all transforms, but only use output of one. This is the shortest route to achieving conditional processing, but definitely not an efficient one, since (n-1)/n results are wasted (unless used for other purposes).

Was this page helpful?
0 / 5 - 0 ratings