Dali: Add 'Compose()' style interface for DALI IO and transforms

Created on 11 Feb 2020  Â·  26Comments  Â·  Source: NVIDIA/DALI

So, we have some existing code over at the PyTorch Ignite project that is actually pretty general and might be really handy to have in DALI: https://github.com/pytorch/ignite/pull/766

In core PyTorch, you can chain transformations and FileIO together easily with the Compose() operation: https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Compose

Something like that would be really useful for simple DALI/PyTorch workflows, and makes them really concise - demonstrated using the code from the Ignite branch here: https://colab.research.google.com/drive/1F_7DihE8YUzirvWV8xn1aMe0EMAP9iB6#scrollTo=tUoTlcSmCJBO

Summarized here, basically with the code we have, constructing e.g. a DALI FileReader pipeline looks like:

oplist = []

oplist.append(ops.ImageDecoder(device = "mixed", output_type = types.RGB)) # "mixed" means 'use GPU and CPU simultaneously'
oplist.append(ops.Resize(device = "gpu", image_type = types.RGB, 
oplist.append(ops.CropMirrorNormalize(device="gpu",output_dtype=types.FLOAT,
                                                    output_layout=types.NCHW,
                                                    image_type=types.RGB,
                                                    mean=[255//2, 255//2, 255//2],
                                                    std=[255//2, 255//2, 255//2]))

transforms_list = ComposeOps(oplist)

pipe = TransformPipeline(batch_size=BATCH_SIZE,
        num_threads=8,
        device_id=0,
        transform=transforms_list,
        reader=ops.FileReader(file_root = "%s/train"%DATASET_NAME, random_shuffle = True))

dali_iter = DALILoader([pipe]) # Our training data generator

Which is clean, flexible, but most importantly imitates exactly how PyTorch itself (and similar projects such as Albumentations) handles IO pipelines - effectively eliminating any learning curve for DALI use and allowing ready adoption.

Can you think of a place where this might fit in DALI?

enhancement

Most helpful comment

  1. My point with ImageDataGenerator being that it is a substantially simpler “first pass” interface. Compose is similarly a much simpler interface, good for a simple workflow. In this way they are absolutely similar!

  2. I really don’t think anyone will be assuming that a simpler interface will be more performant. This isn’t an issue in Keras with the Sequential vs Model APIs - people don’t just automatically assume Sequential is more performant, so I don’t see why that would be an issue here

  3. For simple workflows we just assume independence. As I mentioned earlier, these compound transformations from a single RV are not necessary for most simple augmentation workflows. This is what PyTorch, Albumentations, MxNet and ImageDataGenerator do - and again these are all extremely popular. Assuming independence is more than sufficient for a simplified interface. And again, non-unary ops are not a target for Compose, at least to the extent that we want to have parity with PyTorch. Albumentations parity (i.e. handling detection like that) would be cool, but not at all required for Compose to still be very useful to a lot of workflows.

Quite literally every other major framework handles its augmentation this way. NVIDIA stands alone in absolutely requiring this modular, graphical usage - which just does not make sense to me, as our goal should be adoption and integration. DALI will have a hard time seeing success if it remains so different from literally every other framework that nobody will adopt it.

I have provided several pieces of evidence supporting this being a real issue - notably the towardsdatascience blog post that required NVIDIA input to work even for a simple case, and the Albumentations issue clearly citing DALI usability being foreign, and a concern.

All 26 comments

I think that that this API is not in line with how DALI operates.
DALI operators are rarely truly unary - randomness is usually supplied externally via keyword arguments to operator's call and the ComposeOps has no way to supply them.
Let's assume we have a simple pipeline which decodes some JPEGs, applies a random rotate (-90 to +90 degrees) and random hue transform (-10 to +10 degrees).
If we create the operators ins define_graph, it will look like:

def define_graph(self):
    reader = ops.FileReader()
    decode = ops.ImageDecoder(device="mixed")
    rot = ops.Rotate(device="gpu")
    hsv = ops.Hsv(device="gpu")
    rng = ops.Uniform(range=(-90, 90))
    rng2 = ops.Uniform(range=(-10, 10))

    jpegs, labels = reader()
    images = decode(jpegs)
    images = rot(images, angle = rng())
    images = hsv(images, hue = rng2())
    return images, labels

It's noteworthy that it's impossible to supply the dynamic values of angle or hue at operator construction.
One can write ops.Rotate(device="gpu", angle=-10) to rotate all images clockwise by 10 degrees, but ops.Rotate(device="gpu", angle=rng) is an error.
With that in mind, let's try to combine the image augmentation pipeline consisting of decode, rotate and hsv.
It's clear that we can't just add them to a flat list because the operators need angle and hue arguments, respectively. We can work around it using lambdas:

transform_list = [
decode,
lambda img: rotate(img, angle = rng()),
lambda img: hsv(img, hue = rng2())
]
transfroms = ComposeOps(transform_list)

It's no longer clear nor intuitive.
The result is a callable, which one can obtain in at least two cleaner ways:

  1. When the pipeline is small, one can use a lambda:
transforms = lambda jpegs: hsv(rotate(decode(jpegs), angle = rng(), hue = rng2())
  1. For more elaborate pipelines, one can use a local function:
def transforms(jpegs):
    images = decode(jpegs)
    images = rotate(images, angle = rng())
    images = hsv(images, hue = rng2())
    return colored

Additionally, we're in the process of designing a function-style API (#1598), for which it is even more natural to use lambdas and functions.
The same pipeline would look like this:

def define_graph(self):
    jpegs, labels = ops.file_reader()   # some args here
    images = ops.image_decoder(jpegs, device="mixed")
    images = ops.rotate(images, angle = ops.uniform(range = (-90, 90)))
    images = ops.hsv(images, hue = ops.uniform(range = (-10, 10)))
    return images, labels

The image transformation pipeline can be factored out to a callable:

def image_pipeline(self, jpegs):
    images = ops.image_decoder(jpegs, device="mixed")
    images = ops.rotate(images, angle = ops.uniform(range = (-90, 90)))
    images = ops.hsv(images, hue = ops.uniform(range = (-10, 10)))
    return jpegs

def define_graph(self):
    jpegs, labels = ops.file_reader()   # some args here
    images = self.image_pipeline(jpegs)
    return images, labels

Of course, self.image_pipeline can be a callable member, supplied in any way the user sees fit - not necesarily a hard-coded function in the pipeline class.

It seems to me that the new API you mention is not incompatible with the compose style construction - I.e. if it allows you to specify your randomness lazily, it could be done in the Compose() style, right?

It is always lazy. There's no eager mode in DALI and even the graph creation is actually lazy - i.e. instantiating ops.Resize(device="cpu") doesn't really instantiate anything on the native side until the pipeline is built. And it doesn't even create an OpSpec until __call__.

I’m specifically referring to the distinction between the capital and lowercase interfaces you list -

With lowercase e.g. ops.rotate() you give it a ops.uniform() distribution of degrees instead of feeding in a constant yielded by ops.Uniform() at pipeline creation time, which only as I understand it uniform() becomes any particular value as it loads an image in and can be passed in as a kwarg, while Uniform cannot

Lazy is perhaps the wrong word, but I’m getting at allowing DALI ops to have keywords specified as distributions. If you can do this, can’t you do Compose()?

Or is my understanding of the differences between rotate and Rotate incorrect?

ops.Uniform is an operator - it does not yield a constant, it's an (input-less) operator that just spits out batches of random scalars from given range.
The snake_case identifiers are just wrappers that make writing easier. The two versions of define_graph (one using ops.CamelCase, the other using ops.snake_case) are doing exactly the same thing. There's some extra automation in the latter.
For example, in current API you must call

rot = ops.Rotate(angle=<scalar>)
...
rot(images)

but

rot = ops.Rotate()
rng = ops.Uniform(range=(-90,90))
...
rot(images, angle = rng())

The fact that the placement of the angle argument depends on whether it's a scalar or a data node is not intuitive at all; the snake_case variant (ops.rotate) will filter out keyword arguments being data nodes and pass them to __call__, but it's still something like:

def snake_case(*inputs, **args):
    init_args, call_args = split_args(args)
    op = ops.CamelCase(**init_args)
    return op(*inputs, **call_args)

It makes the code more concise and cleaner (at least I think so) and definitely more compatible with arithmetic expressions (which you can already enjoy in recent DALI releases), but ultimately, after the pipeline has been built, there's absolutely no difference between the two.

BTW - even if you do something like:

rot = ops.Rotate(angle = -45)

images1 = rot(images1)
images2 = rot(images2)

DALI will create _two separate_ rotate operators. The rot variable is a mere placeholder for common scalar arguments to the operators that will be instantiated upon call.

...back to question about Compose - no, you cannot supply a distribution to operator's __init__, it must be to the __call__ and the distribution is an operator in its own right.

Thank you for explaining!

One thing is still unclear though,

If I can do this:

def define_graph(self):
    jpegs, labels = ops.file_reader()   # some args here
    images = ops.image_decoder(jpegs, device="mixed")
    images = ops.rotate(images, angle = ops.uniform(range = (-90, 90)))
    images = ops.hsv(images, hue = ops.uniform(range = (-10, 10)))
    return images, labels

Why can't I write some wrapper using e.g. partial or something that could take a list of partially specified ops: (leaving the partialling part to some auxiliary function):

ops_list = [
ops.file_reader_partial(),
ops.image_decoder_partial(device="mixed"),
ops.rotate_partial(angle = ops.uniform(range = (-90, 90))
]

Point being, that I am unsure whether we are debating about whether the usage pattern I am suggesting is possible, or whether we are debating whether the usage is preferable, because right now I still don't see why it isn't technically possible. It's just a matter of whether we place the burden on the user, or solve it ourselves as DALI.

Given time, for example, I could manually write a wrapper for every single DALI function using the code you gave by doing something like:

def custom_rotate(images, min_angle, max_angle):
    rotate = ops.Rotate()
    rng = ops.Uniform(range=(min_angle, max_angle))
    results = rotate(images, angle = rng())
    return results

and then Compose() custom_rotate() instead instead of Rotate itself, right?

If the issue is not possibility but preferability instead let me know as well so I can think about it some more!

If it is preferability however, I think usability is vastly higher if we follow and allow the design patterns used by the existing frameworks we are trying to integrate into.

As an additional data point, Albumentations (used by Lyft, etc, popular on Kaggle) offers an optimized data pipeline library that also conforms to the existing Compose() sequential archetype:

https://github.com/albumentations-team/albumentations

When asked about DALI integration, they say they would love to include it but cannot essentially because of the implementation difficulties we are discussing here:

https://github.com/albumentations-team/albumentations/issues/100

MxNet in addition to PyTorch also supports Compose: https://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.data.vision.transforms.Compose.html

There is clearly demand for a Compose style interface, and I have yet to see a reason why we can’t abstract things enough to make this possible from a DALI perspective, even if such support needs to be through something like dali.experimental, and we need to use some of the tricks you describe behind the scenes to make that work

The fastest data pipeline is the one that people use, we should avoid creating barriers to entry if there is anything possible we can do about it if we want to drive adoption

Adding more info if it is helpful, it looks like the ops you are saying would be problematic are the ones requiring callable arguments post construction right?

It looks like there are roughly ~20 of these problem ops that would be relevant to this Compose style implementation. For these could we not have e.g. instead of 'nvidia.dali.ops.Rotate' we could also have 'nvidia.dali.compose.ops.rotate' (or maybe nvidia.dali.experimental.sequential.ops.rotate, nvidia.dali.randomize.ops.rotate, etc), the definition of which would look something like:

def rotate(images, min_angle, max_angle, axis, size):
    rotate = ops.Rotate()
    rng = ops.Uniform(range=(min_angle, max_angle))
    results = rotate(images, angle = rng(), axis=axis, size=size)
    return results

Which should be itself Compose-able, right?

so your pipeline might be:

transform_list = [
ops.ImageDecoder(device="mixed"),
compose.ops.rotate(img, min_angle = -90, max_angle = 90, axis=0, size=(480,640))
]
transforms = ComposeOps(transform_list)

To be clear, I am not advocating all DALI workflows use Compose, just the ones that are simple - the workflows using the equivalent of torch vision transforms, Albumentation, etc. For Keras' slow, yet nonetheless _extremely_ popular ImageDataGenerator, DALI could be a great, low hanging fruit 'upgrade' if it were made more accessible to Keras users - who breath concise code as if it is air. For PyTorch Ignite I already demonstrate in the Colab notebook there that even trivial changes to simple workflows can result in drastic speedups.

These are great targets for having a relatively 'Drop In' replacement for whatever they are using now - there would be 0 excuse not to give DALI a shot, whether hobbyist or professional - when so little code change and brain retraining would be required.

DALI isn't just useful for large-scale training, and data scientists almost always mess around with new tools at smaller scale before adding them to their 'mental toolbox'. That process of light experimentation and initial proof of concept generation should be easier, and doesn't have to be at the expense of more complex workflows if we follow what every major framework does by offering an alternative simpler 'drop in replacement' API/Compose pipeline for simpler workflows.

What do you think? Interesting?


EDIT: BrightnessContrast has dynamic args, thanks for checking my work @mzient!

Here is a transcription of the info I am referencing to scope this, in case it is useful:

(Note: I am ignoring deprecated ops),

Names in parentheses are the callable args for that op!

Based on: https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/supported_ops.html

Total ops #: 20+15+10+5+7+16 = 73

Total ops in scope: 20+15+10 = 45

Total ops likely requiring special behavior = 20

Dynamic Args: (20)

nvidia.dali.ops.BBoxPaste (ratio, pastex, pastey) (NOT GPU)
nvidia.dali.ops.BbFlip (horizontal, vertical)
nvidia.dali.ops.BrightnessContrast (brightness, brightness_shift, contrast)
nvidia.dali.ops.Crop (crop_d, crop_h, crop_pos_x, crop_pos_y, crop_pos_z, crop_w)
nvidia.dali.ops.CropMirrorNormalize (crop_d, crop_h, crop_pos_x, crop_pos_y, crop_pos_z, crop_w, mirror)
nvidia.dali.ops.FastResizeCropMirror (TODO: oh golly its a lot - 11 of them) (NOT GPU)
nvidia.dali.ops.Flip (depthwise, horizontal, vertical)
nvidia.dali.ops.Hsv (hue, saturation, value)
nvidia.dali.ops.ImageDecoderCrop(crop_d, crop_h, crop_pos_x, crop_pos_y, crop_pos_z, crop_w)
nvidia.dali.ops.Jitter (mask)
nvidia.dali.ops.NormalDistribution (mean, stddev) (NOT GPU)
nvidia.dali.ops.Paste (ratio, min_canvas_size, paste_x, paste_y)
nvidia.dali.ops.Reshape (shape)
nvidia.dali.ops.Resize (resize_longer, resize_shorter, resize_x, resize_y)
nvidia.dali.ops.ResizeCropMirror (TODO: oh golly its also a lot - 11 of them) (NOT GPU)
nvidia.dali.ops.Rotate (angle, axis, size)
nvidia.dali.ops.Slice (anchor, shape)
nvidia.dali.ops.Sphere (mask)
nvidia.dali.ops.WarpAffine (matrix, size)
nvidia.dali.ops.Water (mask)

No Dynamic Arguments: (15)

nvidia.dali.ops.Cast
nvidia.dali.ops.ColorSpaceConversion
nvidia.dali.ops.Copy
nvidia.dali.ops.DLTensorPythonFunction
nvidia.dali.ops.DumpImage
nvidia.dali.ops.ElementExtract
nvidia.dali.ops.ImageDecoder
nvidia.dali.ops.ImageDecoderRandomCrop
nvidia.dali.ops.LookupTable
nvidia.dali.ops.Pad
nvidia.dali.ops.PowerSpectrum (NOT GPU)
nvidia.dali.ops.PreemphasisFilter (NOT GPU)
nvidia.dali.ops.RandomResizedCrop
nvidia.dali.ops.Transpose
nvidia.dali.ops.Uniform

Inputs: (10)

nvidia.dali.ops.COCOReader
nvidia.dali.ops.Caffe2Reader
nvidia.dali.ops.CaffeReader
nvidia.dali.ops.ExternalSource
nvidia.dali.ops.FileReader
nvidia.dali.ops.MXNetReader
nvidia.dali.ops.SequenceReader
nvidia.dali.ops.Shapes
nvidia.dali.ops.TFRecordReader
nvidia.dali.ops.VideoReader

Out of scope: (5)

nvidia.dali.ops.AudioDecoder (sample rate) (NOT GPU)
nvidia.dali.ops.MFCC (N/A) (NOT GPU)
nvidia.dali.ops.MelFilterBank (N/A) (NOT GPU)
nvidia.dali.ops.Spectrogram (N/A) (NOT GPU)
nvidia.dali.ops.ToDecibels (N/A) (NOT GPU)

Unsure: (7)

nvidia.dali.ops.ImageDecoderSlice (?)
nvidia.dali.ops.BoxEncoder (?)
nvidia.dali.ops.PythonFunction
nvidia.dali.ops.PythonFunctionBase
nvidia.dali.ops.RandomBBoxCrop
nvidia.dali.ops.SSDRandomCrop
nvidia.dali.plugin.pytorch.TorchPythonFunction

Arithmetic (16 total)

unary: +, -
bitwise: +, -, *, /, //
comparison: ==, !=, <, <=, >, >=
bitwise comparison: &, |, ^

I'd like to clarify some points - or at least maybe bring common wording to this discussion. First, I'll try to explain the design behind current API:

  1. There are no "callable" arguments anywhere. They are dynamic arguments (internally referred to as ArgumentInputs) and constant arguments.
    Constant arguments are part of the operator's configuration (and passed to __init__).
    Dynamic arguments are data nodes (results of some other DALI operator) and are passed to __call__.
  1. The examples with rotate/custom_rotate won't work as written, because the result is not callable. What would work is this:
def RandomRotate(device, min_angle, max_angle, axis = None):  # image is not an argument to this function!
    angle_generator = dali.ops.Uniform(range = (min_angle, max_angle))
    rotate = dali.ops.Rotate(device = device)
    return lambda img: rotate(img, angle = angle_generator())  # return a callable to enable composability

Gotchas

Operators can only be called when there's a current pipeline set. This is a non-issue if the graph is built in define_graph, because pipeline sets itself as a current one when it's built (and that's where define_graph is called), but you can't just create a part of DALI graph without a pipeline.
The RandomRotate shown above is legal to call at all times, but the one below is only legal inside define_graph

def RandomRotate(device, min_angle, max_angle, axis = None):  # image is not an argument to this function!
    angle_generator = dali.ops.Uniform(range = (min_angle, max_angle))
    rotate = dali.ops.Rotate(device = device)
    angle = angle_generator()  # possible error! - calling an operator requires a pipeline
    return lambda img: rotate(img, angle = angle_generator())

Minor points to set straight

  • In Rotate, axis is a 3D vector and is only valid (and required) for rotating volumentric data - and it may also be a dynamic argument). It must have nonzero length.
  • In BrightnessContrast _all_ arguments can be dynamic.

Hello,

Maybe an example will make more clear what I am hoping to accomplish and what we probably need to do it. I wrote two example ops (the remaining 18 'problem ops' would similarly need a sequential wrapper) - the idea being that we can make versions of many of the common ops that do work in sequential workflows.

Definitions look like:

class RotateRandom(object):
    def __init__(self, angle=0., axis=0, size=None, **kwargs):
        self.rotate = ops.Rotate(**kwargs)

        self.angle = ops.Uniform(range=angle) if type(angle) is tuple else lambda: types.Constant(angle)

        # TODO: make ipywidgets style (min,max) tuple specifications for remainder of call-time arguments
        self.axis=lambda: types.Constant(axis)
        self.size=size

    def __call__(self, img):
        return self.rotate(img, angle=self.angle(), axis=self.axis(), size=self.size)

class CropMirrorNormalizeRandom(object):
    def __init__(self, crop_d=0., crop_h=0., crop_pos_x=.5,crop_pos_y=.5, crop_pos_z=.5, crop_w=0., mirror=0, **kwargs):
        self.cmn = ops.CropMirrorNormalize(**kwargs)

        self.crop_d = ops.Uniform(range=crop_d) if type(crop_d) is tuple else lambda: types.Constant(crop_d)
        self.crop_h = ops.Uniform(range=crop_h) if type(crop_h) is tuple else lambda: types.Constant(crop_h)
        self.crop_pos_x = ops.Uniform(range=crop_pos_x) if type(crop_pos_x) is tuple else lambda: types.Constant(crop_pos_x)
        self.crop_pos_y = ops.Uniform(range=crop_pos_y) if type(crop_pos_y) is tuple else lambda: types.Constant(crop_pos_y)
        self.crop_pos_z = ops.Uniform(range=crop_pos_z) if type(crop_pos_z) is tuple else lambda: types.Constant(crop_pos_z)
        self.crop_w = ops.Uniform(range=crop_w) if type(crop_w) is tuple else lambda: types.Constant(crop_w)

        # TODO: Figure out how to get random integers and how to handle mirror

        self.mirror = lambda: types.Constant(mirror)

    def __call__(self, img):
        return self.cmn(img, crop_d=self.crop_d(), crop_h=self.crop_h(), crop_pos_x=self.crop_pos_x(), \
                           crop_pos_y=self.crop_pos_y(), crop_pos_z=self.crop_pos_z(), crop_w=self.crop_w(), mirror=self.mirror())

You can find a Colab here: https://colab.research.google.com/drive/1vxaHeG319Zuqana3RA5al7J3QYrL4Wya

It demonstrates construction working as such using those first two sequentially wrapped ops (Rotation and CropMirrorNormalize):

oplist = []

oplist.append(ops.ImageDecoder(device = "mixed", output_type = types.RGB))
oplist.append(ops.Resize(device = "gpu", image_type = types.RGB, 
                            interp_type = types.INTERP_LINEAR, resize_x=WIDTH, resize_y=HEIGHT))

oplist.append(CropMirrorNormalizeRandom(device="gpu",output_dtype=types.FLOAT,
                                                      output_layout=types.NHWC,
                                                    image_type=types.RGB,
                                                    mean=[255//2, 255//2, 255//2],
                                                    std=[255//2, 255//2, 255//2],
                                        crop_h=(0,1), # for some reason this breaks if higher than 1?
                                        crop_w=(0,1),
                                       ))

oplist.append(RotateRandom(angle=(-90,90), device="gpu", keep_size=True))
oplist.append(ops.Transpose(perm=(2,0,1),device="gpu"))

transforms_list = ComposeOps(oplist)

pipe = TransformPipeline(batch_size=BATCH_SIZE,
        num_threads=8,
        device_id=0,
        transform=transforms_list,
        reader=ops.FileReader(file_root = "%s/train"%DATASET_NAME, random_shuffle = True))

dali_iter = DALILoader([pipe])

This gets us very close to ImageDataGenerator type behavior, only with a lot more control and in a short piece of code.

Provided we have wrappers for the 20 ops with dynamic arguments, would you agree that a Compose interface is possible? That it looks pretty clean from an end user perspective if we already provide the sequential modules?

Hi.
Do the Compose-style APIs deal with anything that has more than 1 input or output?

No, it is sort of like the Sequential vs Model APIs in Keras - Compose is a shortcut for quickly building transformation pipelines when all transformations are unary- similar to how Keras Sequential models are used when you don’t need branching deep learning models.

Note in my prior post however that a lot of the non-unary ops can be easily “made unary” by letting each module handle its own randomization. That’s what I did for Rotate and CropMirrorNormalize there

This in turn makes it so you can do the vast majority of common transformation pipelines using only the Compose interface

@dnola - I see a value in this effort. The only reservations I have are regarding the need to maintain these random wrappers. Maybe we can generate them automatically somehow (even extend scheme with some hints to tell the generator when a random generator could be used as an argument).
@mzient @klecki ?

Thank you for your feedback!

I think it would be really cool to handle it automatically!

Note that this RotateRandom etc. convention comes from PyTorch: https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomRotation

This is where the (min, max) argument specification is coming from

Last, just in my reading, here are the 9 ops with dynamic arguments I think would be the major important ones to have a wrapper (or some similar solution, kwarg filtering, decorator, type hinting, etc)- if you get these, you can recreate the behavior of the vast majority of TorchVision transforms as well as Keras ImageDataGenerator:

nvidia.dali.ops.BrightnessContrast (brightness, brightness_shift, contrast)
nvidia.dali.ops.CropMirrorNormalize (crop_d, crop_h, crop_pos_x, crop_pos_y, crop_pos_z, crop_w, mirror)
nvidia.dali.ops.Flip (depthwise, horizontal, vertical)
nvidia.dali.ops.Hsv (hue, saturation, value)
nvidia.dali.ops.Jitter (mask)
nvidia.dali.ops.Reshape (shape)
nvidia.dali.ops.Resize (resize_longer, resize_shorter, resize_x, resize_y)
nvidia.dali.ops.Rotate (angle, axis, size)
nvidia.dali.ops.WarpAffine (matrix, size)

I can see this happening as automatically building a Pipeline from a list of ops, I only worry that it can prohibit of using a lot of Operators/use cases (like anything we do with bounding-boxes usually uses more than 1 input/output).
We also are heading in a direction of allowing some additional transformation on the per-sample arguments (for example you can generate random rotation angles for one op, scale the rotations even more with arithm op and use those bigger angles as argument to another rotate) - that won't happen with that API.
As for auto-generation those wrappers - maybe some meta-code would be enough based on the distinction between per-sample and constant arguments.

The problem would be if we start constraining new functionality to fit everything into 1-in 1-out scheme or prohibit a big part of operators because they would need non-linear graphs.

I can maybe see multiple-input multiple output happening as tuples, but that is probably a rare case.
And probably nobody using python would like to write code like this: https://en.wikibooks.org/wiki/Haskell/Understanding_arrows :P

I totally agree that any work we do on Compose should not be at the cost of non-sequential workloads - keeping it separate like Keras does seems wise.

More complex workloads such as the per-sample compound argument transformations that necessarily require multiple input multiple output type behavior would not be a target of compose, and for sure I agree we don't want to think about it that way

In terms of operator prohibition being too limiting though though, I am quite confident that the ops that we can get to work sequentially (like the 9 I list in addition to the 15+10 that do not require special behavior) are more than enough to cover the majority of data augmentation workloads, so I would highly doubt that Compose being too restrictive is an issue. This strictly sequential transform construction is a common design pattern. Put another way, TorchVision transforms are all unary - and that API is extremely popular.

See for example this post on DALI (which notably the author had to reach out to one of my fellow NVIDIA SAs to get it to work):

https://towardsdatascience.com/fast-data-augmentation-in-pytorch-using-nvidia-dali-68f5432e1f5f

The author goes through a lot of pain to do even the very first step of the pipeline I have in my second Colab. He even stops before actually reproducing the post-load augmentation steps in his original compose, likely due to the complexity of it. His workload would have been a perfect target for Compose, as all of the operations in his original Torch Compose would be easily implemented in a unary fashion.

Last, I do want to emphasize again that even for really simple workloads (like the first Colab I link), DALI can still result in drastic speedups - we don't need to target exclusively complicated workloads to demonstrate DALI's value

Edit: I tried to read your Haskell link but I don't know any of these words :(

Another point for Compose is that it can split better between flexible configuration and the running code.

Could you please detail what it is exactly "1-in 1-out scheme" ? If it is about transforming a single data type: image only or bbox only or mask only.
In albumentations, their solution is to provide englobbing Transformation that internally generates params to transform all supported kinds of data: "image", "bbox", "keypoints", "mask" and then dispatch functional implementations for all types. Data and datatypes are defined as dicts:
{"image": imagendarray, "mask": maskndarray, "bbox": bboxarray, ...}
Sorry, if this is out of subject.

^ Certainly for detection it would be cool to figure something out - those are pretty common, IO heavy workloads! Though I am unsure of DALI internals to know what sort of effort it would take to write those Transformation/dispatchers

I did some keyword alchemy and figured out a lazy (in the effort sense) way of randomizing operators. It is very naive, doesn't bounds check, and does not handle integers. But it works surprisingly well for a lot of the major augmentation tasks. I basically did this:

dynamic_argnames = {'angle', 'axis', 'brightness', 'brightness_shift', 'contrast', 'crop_d', 'crop_h', 'crop_pos_x', 'crop_pos_y', 'crop_pos_z', 'crop_w', 'depthwise', 'horizontal', 'hue', 'mask', 'matrix', 'mirror', 'resize_longer', 'resize_shorter', 'resize_x', 'resize_y', 'saturation', 'shape', 'size', 'value', 'vertical'}

def randomize_argument(rand_range):
    return ops.Uniform(range=rand_range) if type(rand_range) is tuple else lambda: types.Constant(rand_range)

def randomize_op(op, **kwargs):
    for arg, value in kwargs.items():
        if arg in dynamic_argnames:
            kwargs[arg] = randomize_argument(value)
    static_args = {k:v for k,v in kwargs.items() if not k in dynamic_argnames}
    dynamic_args = {k:v for k,v in kwargs.items() if k in dynamic_argnames}
    op = op(**static_args)
    return lambda img: op(img, **{k:v() for k,v in dynamic_args.items()})

You can then do something like:
oplist.append(randomize_op(ops.Hsv, hue=(-255,255), saturation=(-5,5), value=(0,5), device = "gpu", dtype=types.UINT8))

and it actually works and does randomize those dynamic arguments, while still using the static ones for construction. I don't need the 'sequential.ops' I made earlier anymore as a result.

As a result I can do some fairly complete augmentation pipelines (Hsv, BrightnessContrast, etc) now, without having to do anything at all per op (adding new ops would just mean adding new keywords if dynamic - but I already added all the ones from the 20 I listed above) - check v2 of the Colab here:

https://colab.research.google.com/drive/1XDRzDeIeteTPyzYbZp2XWjKtUWMLXqcg

It is far from perfect, and keeping a list of dynamic arg names might not be the best way to approach this, but it could maybe work as a starting point. What do you guys think?

Another point for Compose is that it can split better between flexible configuration and the running code.

Could you please detail what it is exactly "1-in 1-out scheme" ? If it is about transforming a single data type: image only or bbox only or mask only.

I mean operators that take only one input and produce only one output. The transformed data can be anything, and some operators in DALI already can handle different kinds of data (like a 2D image or a 3D volume, but some parameters would usually need to change a bit).

In albumentations, their solution is to provide englobbing Transformation that internally generates params to transform all supported kinds of data: "image", "bbox", "keypoints", "mask" and then dispatch functional implementations for all types. Data and datatypes are defined as dicts:
{"image": imagendarray, "mask": maskndarray, "bbox": bboxarray, ...}
Sorry, if this is out of subject.

This sounds interesting, but with DALI for different kinds of data we get different operators and we would probably write a different pipeline to process images, bounding boxes or audio samples (this one is a different use case so there would be a minimal overlap of operator names that can be shared between image and audio).

I don't really think this should be a part of upstream DALI just yet.

  1. ImageDataGenerator is not at all similar to the compose-style interface - it's a big, fused operator which does affine trasnform normalization and color transform. It has an extra postprocessing step as a callable argument, but it's still not related to composition.
  2. The whole idea of Compose - at least the way I see it - is to separate the preparation of the processing pipeline from execution. It makes perfect sense for immediage (eager) mode processing, i.e. in albumentations - where the inputs are concrete (numpy arrays). The inputs to DALI operators are abstract data nodes; the composition will happen regardless. The users might be tempted to overuse Compose with because of some misguided idea that it will somehow improve performance or sth - but it will always be just syntactic sugar over existing solutions.
  3. When running a composed pipeline on multiple inputs, it makes determining random value independence a guesswork:
    composed = Compose([RandomRotate(angle=(-30,30))], BrightnessContrast(brighness=(-10,10)))
    and later in the pipeline:
    images, masks = composed(images, masks)
    Are they rotated by the same angle? Is the brightness shift equal? There's no easy way to decouple these.

I think we'll focus on simplifying the Python API first, but I'm not closing this issue just yet.

  1. My point with ImageDataGenerator being that it is a substantially simpler “first pass” interface. Compose is similarly a much simpler interface, good for a simple workflow. In this way they are absolutely similar!

  2. I really don’t think anyone will be assuming that a simpler interface will be more performant. This isn’t an issue in Keras with the Sequential vs Model APIs - people don’t just automatically assume Sequential is more performant, so I don’t see why that would be an issue here

  3. For simple workflows we just assume independence. As I mentioned earlier, these compound transformations from a single RV are not necessary for most simple augmentation workflows. This is what PyTorch, Albumentations, MxNet and ImageDataGenerator do - and again these are all extremely popular. Assuming independence is more than sufficient for a simplified interface. And again, non-unary ops are not a target for Compose, at least to the extent that we want to have parity with PyTorch. Albumentations parity (i.e. handling detection like that) would be cool, but not at all required for Compose to still be very useful to a lot of workflows.

Quite literally every other major framework handles its augmentation this way. NVIDIA stands alone in absolutely requiring this modular, graphical usage - which just does not make sense to me, as our goal should be adoption and integration. DALI will have a hard time seeing success if it remains so different from literally every other framework that nobody will adopt it.

I have provided several pieces of evidence supporting this being a real issue - notably the towardsdatascience blog post that required NVIDIA input to work even for a simple case, and the Albumentations issue clearly citing DALI usability being foreign, and a concern.

We have synced offline with @dnola and we will implement such an idea soon. Before that we see that some work need to be done first to make this more versatile and feature complete (this includes):

  • making random generators more flexible
  • relaxing constraint limiting construction of DALI pipeline only inside the define_graph method
    We will update this thread soon.

Basic functionality for compose is merged as #2393

DALI 0.28 has been released, it should address this issue.

Was this page helpful?
0 / 5 - 0 ratings