So, we have some existing code over at the PyTorch Ignite project that is actually pretty general and might be really handy to have in DALI: https://github.com/pytorch/ignite/pull/766
In core PyTorch, you can chain transformations and FileIO together easily with the Compose() operation: https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Compose
Something like that would be really useful for simple DALI/PyTorch workflows, and makes them really concise - demonstrated using the code from the Ignite branch here: https://colab.research.google.com/drive/1F_7DihE8YUzirvWV8xn1aMe0EMAP9iB6#scrollTo=tUoTlcSmCJBO
Summarized here, basically with the code we have, constructing e.g. a DALI FileReader pipeline looks like:
oplist = []
oplist.append(ops.ImageDecoder(device = "mixed", output_type = types.RGB)) # "mixed" means 'use GPU and CPU simultaneously'
oplist.append(ops.Resize(device = "gpu", image_type = types.RGB,
oplist.append(ops.CropMirrorNormalize(device="gpu",output_dtype=types.FLOAT,
output_layout=types.NCHW,
image_type=types.RGB,
mean=[255//2, 255//2, 255//2],
std=[255//2, 255//2, 255//2]))
transforms_list = ComposeOps(oplist)
pipe = TransformPipeline(batch_size=BATCH_SIZE,
num_threads=8,
device_id=0,
transform=transforms_list,
reader=ops.FileReader(file_root = "%s/train"%DATASET_NAME, random_shuffle = True))
dali_iter = DALILoader([pipe]) # Our training data generator
Which is clean, flexible, but most importantly imitates exactly how PyTorch itself (and similar projects such as Albumentations) handles IO pipelines - effectively eliminating any learning curve for DALI use and allowing ready adoption.
Can you think of a place where this might fit in DALI?
I think that that this API is not in line with how DALI operates.
DALI operators are rarely truly unary - randomness is usually supplied externally via keyword arguments to operator's call and the ComposeOps has no way to supply them.
Let's assume we have a simple pipeline which decodes some JPEGs, applies a random rotate (-90 to +90 degrees) and random hue transform (-10 to +10 degrees).
If we create the operators ins define_graph, it will look like:
def define_graph(self):
reader = ops.FileReader()
decode = ops.ImageDecoder(device="mixed")
rot = ops.Rotate(device="gpu")
hsv = ops.Hsv(device="gpu")
rng = ops.Uniform(range=(-90, 90))
rng2 = ops.Uniform(range=(-10, 10))
jpegs, labels = reader()
images = decode(jpegs)
images = rot(images, angle = rng())
images = hsv(images, hue = rng2())
return images, labels
It's noteworthy that it's impossible to supply the dynamic values of angle or hue at operator construction.
One can write ops.Rotate(device="gpu", angle=-10) to rotate all images clockwise by 10 degrees, but ops.Rotate(device="gpu", angle=rng) is an error.
With that in mind, let's try to combine the image augmentation pipeline consisting of decode, rotate and hsv.
It's clear that we can't just add them to a flat list because the operators need angle and hue arguments, respectively. We can work around it using lambdas:
transform_list = [
decode,
lambda img: rotate(img, angle = rng()),
lambda img: hsv(img, hue = rng2())
]
transfroms = ComposeOps(transform_list)
It's no longer clear nor intuitive.
The result is a callable, which one can obtain in at least two cleaner ways:
transforms = lambda jpegs: hsv(rotate(decode(jpegs), angle = rng(), hue = rng2())
def transforms(jpegs):
images = decode(jpegs)
images = rotate(images, angle = rng())
images = hsv(images, hue = rng2())
return colored
Additionally, we're in the process of designing a function-style API (#1598), for which it is even more natural to use lambdas and functions.
The same pipeline would look like this:
def define_graph(self):
jpegs, labels = ops.file_reader() # some args here
images = ops.image_decoder(jpegs, device="mixed")
images = ops.rotate(images, angle = ops.uniform(range = (-90, 90)))
images = ops.hsv(images, hue = ops.uniform(range = (-10, 10)))
return images, labels
The image transformation pipeline can be factored out to a callable:
def image_pipeline(self, jpegs):
images = ops.image_decoder(jpegs, device="mixed")
images = ops.rotate(images, angle = ops.uniform(range = (-90, 90)))
images = ops.hsv(images, hue = ops.uniform(range = (-10, 10)))
return jpegs
def define_graph(self):
jpegs, labels = ops.file_reader() # some args here
images = self.image_pipeline(jpegs)
return images, labels
Of course, self.image_pipeline can be a callable member, supplied in any way the user sees fit - not necesarily a hard-coded function in the pipeline class.
It seems to me that the new API you mention is not incompatible with the compose style construction - I.e. if it allows you to specify your randomness lazily, it could be done in the Compose() style, right?
It is always lazy. There's no eager mode in DALI and even the graph creation is actually lazy - i.e. instantiating ops.Resize(device="cpu") doesn't really instantiate anything on the native side until the pipeline is built. And it doesn't even create an OpSpec until __call__.
I’m specifically referring to the distinction between the capital and lowercase interfaces you list -
With lowercase e.g. ops.rotate() you give it a ops.uniform() distribution of degrees instead of feeding in a constant yielded by ops.Uniform() at pipeline creation time, which only as I understand it uniform() becomes any particular value as it loads an image in and can be passed in as a kwarg, while Uniform cannot
Lazy is perhaps the wrong word, but I’m getting at allowing DALI ops to have keywords specified as distributions. If you can do this, can’t you do Compose()?
Or is my understanding of the differences between rotate and Rotate incorrect?
ops.Uniform is an operator - it does not yield a constant, it's an (input-less) operator that just spits out batches of random scalars from given range.
The snake_case identifiers are just wrappers that make writing easier. The two versions of define_graph (one using ops.CamelCase, the other using ops.snake_case) are doing exactly the same thing. There's some extra automation in the latter.
For example, in current API you must call
rot = ops.Rotate(angle=<scalar>)
...
rot(images)
but
rot = ops.Rotate()
rng = ops.Uniform(range=(-90,90))
...
rot(images, angle = rng())
The fact that the placement of the angle argument depends on whether it's a scalar or a data node is not intuitive at all; the snake_case variant (ops.rotate) will filter out keyword arguments being data nodes and pass them to __call__, but it's still something like:
def snake_case(*inputs, **args):
init_args, call_args = split_args(args)
op = ops.CamelCase(**init_args)
return op(*inputs, **call_args)
It makes the code more concise and cleaner (at least I think so) and definitely more compatible with arithmetic expressions (which you can already enjoy in recent DALI releases), but ultimately, after the pipeline has been built, there's absolutely no difference between the two.
BTW - even if you do something like:
rot = ops.Rotate(angle = -45)
images1 = rot(images1)
images2 = rot(images2)
DALI will create _two separate_ rotate operators. The rot variable is a mere placeholder for common scalar arguments to the operators that will be instantiated upon call.
...back to question about Compose - no, you cannot supply a distribution to operator's __init__, it must be to the __call__ and the distribution is an operator in its own right.
Thank you for explaining!
One thing is still unclear though,
If I can do this:
def define_graph(self):
jpegs, labels = ops.file_reader() # some args here
images = ops.image_decoder(jpegs, device="mixed")
images = ops.rotate(images, angle = ops.uniform(range = (-90, 90)))
images = ops.hsv(images, hue = ops.uniform(range = (-10, 10)))
return images, labels
Why can't I write some wrapper using e.g. partial or something that could take a list of partially specified ops: (leaving the partialling part to some auxiliary function):
ops_list = [
ops.file_reader_partial(),
ops.image_decoder_partial(device="mixed"),
ops.rotate_partial(angle = ops.uniform(range = (-90, 90))
]
Point being, that I am unsure whether we are debating about whether the usage pattern I am suggesting is possible, or whether we are debating whether the usage is preferable, because right now I still don't see why it isn't technically possible. It's just a matter of whether we place the burden on the user, or solve it ourselves as DALI.
Given time, for example, I could manually write a wrapper for every single DALI function using the code you gave by doing something like:
def custom_rotate(images, min_angle, max_angle):
rotate = ops.Rotate()
rng = ops.Uniform(range=(min_angle, max_angle))
results = rotate(images, angle = rng())
return results
and then Compose() custom_rotate() instead instead of Rotate itself, right?
If the issue is not possibility but preferability instead let me know as well so I can think about it some more!
If it is preferability however, I think usability is vastly higher if we follow and allow the design patterns used by the existing frameworks we are trying to integrate into.
As an additional data point, Albumentations (used by Lyft, etc, popular on Kaggle) offers an optimized data pipeline library that also conforms to the existing Compose() sequential archetype:
https://github.com/albumentations-team/albumentations
When asked about DALI integration, they say they would love to include it but cannot essentially because of the implementation difficulties we are discussing here:
https://github.com/albumentations-team/albumentations/issues/100
MxNet in addition to PyTorch also supports Compose: https://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.data.vision.transforms.Compose.html
There is clearly demand for a Compose style interface, and I have yet to see a reason why we can’t abstract things enough to make this possible from a DALI perspective, even if such support needs to be through something like dali.experimental, and we need to use some of the tricks you describe behind the scenes to make that work
The fastest data pipeline is the one that people use, we should avoid creating barriers to entry if there is anything possible we can do about it if we want to drive adoption
Adding more info if it is helpful, it looks like the ops you are saying would be problematic are the ones requiring callable arguments post construction right?
It looks like there are roughly ~20 of these problem ops that would be relevant to this Compose style implementation. For these could we not have e.g. instead of 'nvidia.dali.ops.Rotate' we could also have 'nvidia.dali.compose.ops.rotate' (or maybe nvidia.dali.experimental.sequential.ops.rotate, nvidia.dali.randomize.ops.rotate, etc), the definition of which would look something like:
def rotate(images, min_angle, max_angle, axis, size):
rotate = ops.Rotate()
rng = ops.Uniform(range=(min_angle, max_angle))
results = rotate(images, angle = rng(), axis=axis, size=size)
return results
Which should be itself Compose-able, right?
so your pipeline might be:
transform_list = [
ops.ImageDecoder(device="mixed"),
compose.ops.rotate(img, min_angle = -90, max_angle = 90, axis=0, size=(480,640))
]
transforms = ComposeOps(transform_list)
To be clear, I am not advocating all DALI workflows use Compose, just the ones that are simple - the workflows using the equivalent of torch vision transforms, Albumentation, etc. For Keras' slow, yet nonetheless _extremely_ popular ImageDataGenerator, DALI could be a great, low hanging fruit 'upgrade' if it were made more accessible to Keras users - who breath concise code as if it is air. For PyTorch Ignite I already demonstrate in the Colab notebook there that even trivial changes to simple workflows can result in drastic speedups.
These are great targets for having a relatively 'Drop In' replacement for whatever they are using now - there would be 0 excuse not to give DALI a shot, whether hobbyist or professional - when so little code change and brain retraining would be required.
DALI isn't just useful for large-scale training, and data scientists almost always mess around with new tools at smaller scale before adding them to their 'mental toolbox'. That process of light experimentation and initial proof of concept generation should be easier, and doesn't have to be at the expense of more complex workflows if we follow what every major framework does by offering an alternative simpler 'drop in replacement' API/Compose pipeline for simpler workflows.
What do you think? Interesting?
EDIT: BrightnessContrast has dynamic args, thanks for checking my work @mzient!
Here is a transcription of the info I am referencing to scope this, in case it is useful:
(Note: I am ignoring deprecated ops),
Names in parentheses are the callable args for that op!
Based on: https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/supported_ops.html
nvidia.dali.ops.BBoxPaste (ratio, pastex, pastey) (NOT GPU)
nvidia.dali.ops.BbFlip (horizontal, vertical)
nvidia.dali.ops.BrightnessContrast (brightness, brightness_shift, contrast)
nvidia.dali.ops.Crop (crop_d, crop_h, crop_pos_x, crop_pos_y, crop_pos_z, crop_w)
nvidia.dali.ops.CropMirrorNormalize (crop_d, crop_h, crop_pos_x, crop_pos_y, crop_pos_z, crop_w, mirror)
nvidia.dali.ops.FastResizeCropMirror (TODO: oh golly its a lot - 11 of them) (NOT GPU)
nvidia.dali.ops.Flip (depthwise, horizontal, vertical)
nvidia.dali.ops.Hsv (hue, saturation, value)
nvidia.dali.ops.ImageDecoderCrop(crop_d, crop_h, crop_pos_x, crop_pos_y, crop_pos_z, crop_w)
nvidia.dali.ops.Jitter (mask)
nvidia.dali.ops.NormalDistribution (mean, stddev) (NOT GPU)
nvidia.dali.ops.Paste (ratio, min_canvas_size, paste_x, paste_y)
nvidia.dali.ops.Reshape (shape)
nvidia.dali.ops.Resize (resize_longer, resize_shorter, resize_x, resize_y)
nvidia.dali.ops.ResizeCropMirror (TODO: oh golly its also a lot - 11 of them) (NOT GPU)
nvidia.dali.ops.Rotate (angle, axis, size)
nvidia.dali.ops.Slice (anchor, shape)
nvidia.dali.ops.Sphere (mask)
nvidia.dali.ops.WarpAffine (matrix, size)
nvidia.dali.ops.Water (mask)
nvidia.dali.ops.Cast
nvidia.dali.ops.ColorSpaceConversion
nvidia.dali.ops.Copy
nvidia.dali.ops.DLTensorPythonFunction
nvidia.dali.ops.DumpImage
nvidia.dali.ops.ElementExtract
nvidia.dali.ops.ImageDecoder
nvidia.dali.ops.ImageDecoderRandomCrop
nvidia.dali.ops.LookupTable
nvidia.dali.ops.Pad
nvidia.dali.ops.PowerSpectrum (NOT GPU)
nvidia.dali.ops.PreemphasisFilter (NOT GPU)
nvidia.dali.ops.RandomResizedCrop
nvidia.dali.ops.Transpose
nvidia.dali.ops.Uniform
nvidia.dali.ops.COCOReader
nvidia.dali.ops.Caffe2Reader
nvidia.dali.ops.CaffeReader
nvidia.dali.ops.ExternalSource
nvidia.dali.ops.FileReader
nvidia.dali.ops.MXNetReader
nvidia.dali.ops.SequenceReader
nvidia.dali.ops.Shapes
nvidia.dali.ops.TFRecordReader
nvidia.dali.ops.VideoReader
nvidia.dali.ops.AudioDecoder (sample rate) (NOT GPU)
nvidia.dali.ops.MFCC (N/A) (NOT GPU)
nvidia.dali.ops.MelFilterBank (N/A) (NOT GPU)
nvidia.dali.ops.Spectrogram (N/A) (NOT GPU)
nvidia.dali.ops.ToDecibels (N/A) (NOT GPU)
nvidia.dali.ops.ImageDecoderSlice (?)
nvidia.dali.ops.BoxEncoder (?)
nvidia.dali.ops.PythonFunction
nvidia.dali.ops.PythonFunctionBase
nvidia.dali.ops.RandomBBoxCrop
nvidia.dali.ops.SSDRandomCrop
nvidia.dali.plugin.pytorch.TorchPythonFunction
unary: +, -
bitwise: +, -, *, /, //
comparison: ==, !=, <, <=, >, >=
bitwise comparison: &, |, ^
I'd like to clarify some points - or at least maybe bring common wording to this discussion. First, I'll try to explain the design behind current API:
__init__).__call__.def RandomRotate(device, min_angle, max_angle, axis = None): # image is not an argument to this function!
angle_generator = dali.ops.Uniform(range = (min_angle, max_angle))
rotate = dali.ops.Rotate(device = device)
return lambda img: rotate(img, angle = angle_generator()) # return a callable to enable composability
Operators can only be called when there's a current pipeline set. This is a non-issue if the graph is built in define_graph, because pipeline sets itself as a current one when it's built (and that's where define_graph is called), but you can't just create a part of DALI graph without a pipeline.
The RandomRotate shown above is legal to call at all times, but the one below is only legal inside define_graph
def RandomRotate(device, min_angle, max_angle, axis = None): # image is not an argument to this function!
angle_generator = dali.ops.Uniform(range = (min_angle, max_angle))
rotate = dali.ops.Rotate(device = device)
angle = angle_generator() # possible error! - calling an operator requires a pipeline
return lambda img: rotate(img, angle = angle_generator())
Hello,
Maybe an example will make more clear what I am hoping to accomplish and what we probably need to do it. I wrote two example ops (the remaining 18 'problem ops' would similarly need a sequential wrapper) - the idea being that we can make versions of many of the common ops that do work in sequential workflows.
Definitions look like:
class RotateRandom(object):
def __init__(self, angle=0., axis=0, size=None, **kwargs):
self.rotate = ops.Rotate(**kwargs)
self.angle = ops.Uniform(range=angle) if type(angle) is tuple else lambda: types.Constant(angle)
# TODO: make ipywidgets style (min,max) tuple specifications for remainder of call-time arguments
self.axis=lambda: types.Constant(axis)
self.size=size
def __call__(self, img):
return self.rotate(img, angle=self.angle(), axis=self.axis(), size=self.size)
class CropMirrorNormalizeRandom(object):
def __init__(self, crop_d=0., crop_h=0., crop_pos_x=.5,crop_pos_y=.5, crop_pos_z=.5, crop_w=0., mirror=0, **kwargs):
self.cmn = ops.CropMirrorNormalize(**kwargs)
self.crop_d = ops.Uniform(range=crop_d) if type(crop_d) is tuple else lambda: types.Constant(crop_d)
self.crop_h = ops.Uniform(range=crop_h) if type(crop_h) is tuple else lambda: types.Constant(crop_h)
self.crop_pos_x = ops.Uniform(range=crop_pos_x) if type(crop_pos_x) is tuple else lambda: types.Constant(crop_pos_x)
self.crop_pos_y = ops.Uniform(range=crop_pos_y) if type(crop_pos_y) is tuple else lambda: types.Constant(crop_pos_y)
self.crop_pos_z = ops.Uniform(range=crop_pos_z) if type(crop_pos_z) is tuple else lambda: types.Constant(crop_pos_z)
self.crop_w = ops.Uniform(range=crop_w) if type(crop_w) is tuple else lambda: types.Constant(crop_w)
# TODO: Figure out how to get random integers and how to handle mirror
self.mirror = lambda: types.Constant(mirror)
def __call__(self, img):
return self.cmn(img, crop_d=self.crop_d(), crop_h=self.crop_h(), crop_pos_x=self.crop_pos_x(), \
crop_pos_y=self.crop_pos_y(), crop_pos_z=self.crop_pos_z(), crop_w=self.crop_w(), mirror=self.mirror())
You can find a Colab here: https://colab.research.google.com/drive/1vxaHeG319Zuqana3RA5al7J3QYrL4Wya
It demonstrates construction working as such using those first two sequentially wrapped ops (Rotation and CropMirrorNormalize):
oplist = []
oplist.append(ops.ImageDecoder(device = "mixed", output_type = types.RGB))
oplist.append(ops.Resize(device = "gpu", image_type = types.RGB,
interp_type = types.INTERP_LINEAR, resize_x=WIDTH, resize_y=HEIGHT))
oplist.append(CropMirrorNormalizeRandom(device="gpu",output_dtype=types.FLOAT,
output_layout=types.NHWC,
image_type=types.RGB,
mean=[255//2, 255//2, 255//2],
std=[255//2, 255//2, 255//2],
crop_h=(0,1), # for some reason this breaks if higher than 1?
crop_w=(0,1),
))
oplist.append(RotateRandom(angle=(-90,90), device="gpu", keep_size=True))
oplist.append(ops.Transpose(perm=(2,0,1),device="gpu"))
transforms_list = ComposeOps(oplist)
pipe = TransformPipeline(batch_size=BATCH_SIZE,
num_threads=8,
device_id=0,
transform=transforms_list,
reader=ops.FileReader(file_root = "%s/train"%DATASET_NAME, random_shuffle = True))
dali_iter = DALILoader([pipe])
This gets us very close to ImageDataGenerator type behavior, only with a lot more control and in a short piece of code.
Provided we have wrappers for the 20 ops with dynamic arguments, would you agree that a Compose interface is possible? That it looks pretty clean from an end user perspective if we already provide the sequential modules?
Hi.
Do the Compose-style APIs deal with anything that has more than 1 input or output?
No, it is sort of like the Sequential vs Model APIs in Keras - Compose is a shortcut for quickly building transformation pipelines when all transformations are unary- similar to how Keras Sequential models are used when you don’t need branching deep learning models.
Note in my prior post however that a lot of the non-unary ops can be easily “made unary” by letting each module handle its own randomization. That’s what I did for Rotate and CropMirrorNormalize there
This in turn makes it so you can do the vast majority of common transformation pipelines using only the Compose interface
@dnola - I see a value in this effort. The only reservations I have are regarding the need to maintain these random wrappers. Maybe we can generate them automatically somehow (even extend scheme with some hints to tell the generator when a random generator could be used as an argument).
@mzient @klecki ?
Thank you for your feedback!
I think it would be really cool to handle it automatically!
Note that this RotateRandom etc. convention comes from PyTorch: https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomRotation
This is where the (min, max) argument specification is coming from
Last, just in my reading, here are the 9 ops with dynamic arguments I think would be the major important ones to have a wrapper (or some similar solution, kwarg filtering, decorator, type hinting, etc)- if you get these, you can recreate the behavior of the vast majority of TorchVision transforms as well as Keras ImageDataGenerator:
nvidia.dali.ops.BrightnessContrast (brightness, brightness_shift, contrast)
nvidia.dali.ops.CropMirrorNormalize (crop_d, crop_h, crop_pos_x, crop_pos_y, crop_pos_z, crop_w, mirror)
nvidia.dali.ops.Flip (depthwise, horizontal, vertical)
nvidia.dali.ops.Hsv (hue, saturation, value)
nvidia.dali.ops.Jitter (mask)
nvidia.dali.ops.Reshape (shape)
nvidia.dali.ops.Resize (resize_longer, resize_shorter, resize_x, resize_y)
nvidia.dali.ops.Rotate (angle, axis, size)
nvidia.dali.ops.WarpAffine (matrix, size)
I can see this happening as automatically building a Pipeline from a list of ops, I only worry that it can prohibit of using a lot of Operators/use cases (like anything we do with bounding-boxes usually uses more than 1 input/output).
We also are heading in a direction of allowing some additional transformation on the per-sample arguments (for example you can generate random rotation angles for one op, scale the rotations even more with arithm op and use those bigger angles as argument to another rotate) - that won't happen with that API.
As for auto-generation those wrappers - maybe some meta-code would be enough based on the distinction between per-sample and constant arguments.
The problem would be if we start constraining new functionality to fit everything into 1-in 1-out scheme or prohibit a big part of operators because they would need non-linear graphs.
I can maybe see multiple-input multiple output happening as tuples, but that is probably a rare case.
And probably nobody using python would like to write code like this: https://en.wikibooks.org/wiki/Haskell/Understanding_arrows :P
I totally agree that any work we do on Compose should not be at the cost of non-sequential workloads - keeping it separate like Keras does seems wise.
More complex workloads such as the per-sample compound argument transformations that necessarily require multiple input multiple output type behavior would not be a target of compose, and for sure I agree we don't want to think about it that way
In terms of operator prohibition being too limiting though though, I am quite confident that the ops that we can get to work sequentially (like the 9 I list in addition to the 15+10 that do not require special behavior) are more than enough to cover the majority of data augmentation workloads, so I would highly doubt that Compose being too restrictive is an issue. This strictly sequential transform construction is a common design pattern. Put another way, TorchVision transforms are all unary - and that API is extremely popular.
See for example this post on DALI (which notably the author had to reach out to one of my fellow NVIDIA SAs to get it to work):
https://towardsdatascience.com/fast-data-augmentation-in-pytorch-using-nvidia-dali-68f5432e1f5f
The author goes through a lot of pain to do even the very first step of the pipeline I have in my second Colab. He even stops before actually reproducing the post-load augmentation steps in his original compose, likely due to the complexity of it. His workload would have been a perfect target for Compose, as all of the operations in his original Torch Compose would be easily implemented in a unary fashion.
Last, I do want to emphasize again that even for really simple workloads (like the first Colab I link), DALI can still result in drastic speedups - we don't need to target exclusively complicated workloads to demonstrate DALI's value
Edit: I tried to read your Haskell link but I don't know any of these words :(
Another point for Compose is that it can split better between flexible configuration and the running code.
Could you please detail what it is exactly "1-in 1-out scheme" ? If it is about transforming a single data type: image only or bbox only or mask only.
In albumentations, their solution is to provide englobbing Transformation that internally generates params to transform all supported kinds of data: "image", "bbox", "keypoints", "mask" and then dispatch functional implementations for all types. Data and datatypes are defined as dicts:
{"image": imagendarray, "mask": maskndarray, "bbox": bboxarray, ...}
Sorry, if this is out of subject.
^ Certainly for detection it would be cool to figure something out - those are pretty common, IO heavy workloads! Though I am unsure of DALI internals to know what sort of effort it would take to write those Transformation/dispatchers
I did some keyword alchemy and figured out a lazy (in the effort sense) way of randomizing operators. It is very naive, doesn't bounds check, and does not handle integers. But it works surprisingly well for a lot of the major augmentation tasks. I basically did this:
dynamic_argnames = {'angle', 'axis', 'brightness', 'brightness_shift', 'contrast', 'crop_d', 'crop_h', 'crop_pos_x', 'crop_pos_y', 'crop_pos_z', 'crop_w', 'depthwise', 'horizontal', 'hue', 'mask', 'matrix', 'mirror', 'resize_longer', 'resize_shorter', 'resize_x', 'resize_y', 'saturation', 'shape', 'size', 'value', 'vertical'}
def randomize_argument(rand_range):
return ops.Uniform(range=rand_range) if type(rand_range) is tuple else lambda: types.Constant(rand_range)
def randomize_op(op, **kwargs):
for arg, value in kwargs.items():
if arg in dynamic_argnames:
kwargs[arg] = randomize_argument(value)
static_args = {k:v for k,v in kwargs.items() if not k in dynamic_argnames}
dynamic_args = {k:v for k,v in kwargs.items() if k in dynamic_argnames}
op = op(**static_args)
return lambda img: op(img, **{k:v() for k,v in dynamic_args.items()})
You can then do something like:
oplist.append(randomize_op(ops.Hsv, hue=(-255,255), saturation=(-5,5), value=(0,5), device = "gpu", dtype=types.UINT8))
and it actually works and does randomize those dynamic arguments, while still using the static ones for construction. I don't need the 'sequential.ops' I made earlier anymore as a result.
As a result I can do some fairly complete augmentation pipelines (Hsv, BrightnessContrast, etc) now, without having to do anything at all per op (adding new ops would just mean adding new keywords if dynamic - but I already added all the ones from the 20 I listed above) - check v2 of the Colab here:
https://colab.research.google.com/drive/1XDRzDeIeteTPyzYbZp2XWjKtUWMLXqcg
It is far from perfect, and keeping a list of dynamic arg names might not be the best way to approach this, but it could maybe work as a starting point. What do you guys think?
Another point for Compose is that it can split better between flexible configuration and the running code.
Could you please detail what it is exactly "1-in 1-out scheme" ? If it is about transforming a single data type: image only or bbox only or mask only.
I mean operators that take only one input and produce only one output. The transformed data can be anything, and some operators in DALI already can handle different kinds of data (like a 2D image or a 3D volume, but some parameters would usually need to change a bit).
In albumentations, their solution is to provide englobbing Transformation that internally generates params to transform all supported kinds of data: "image", "bbox", "keypoints", "mask" and then dispatch functional implementations for all types. Data and datatypes are defined as dicts:
{"image": imagendarray, "mask": maskndarray, "bbox": bboxarray, ...}
Sorry, if this is out of subject.
This sounds interesting, but with DALI for different kinds of data we get different operators and we would probably write a different pipeline to process images, bounding boxes or audio samples (this one is a different use case so there would be a minimal overlap of operator names that can be shared between image and audio).
I don't really think this should be a part of upstream DALI just yet.
composed = Compose([RandomRotate(angle=(-30,30))], BrightnessContrast(brighness=(-10,10)))images, masks = composed(images, masks)I think we'll focus on simplifying the Python API first, but I'm not closing this issue just yet.
My point with ImageDataGenerator being that it is a substantially simpler “first pass” interface. Compose is similarly a much simpler interface, good for a simple workflow. In this way they are absolutely similar!
I really don’t think anyone will be assuming that a simpler interface will be more performant. This isn’t an issue in Keras with the Sequential vs Model APIs - people don’t just automatically assume Sequential is more performant, so I don’t see why that would be an issue here
For simple workflows we just assume independence. As I mentioned earlier, these compound transformations from a single RV are not necessary for most simple augmentation workflows. This is what PyTorch, Albumentations, MxNet and ImageDataGenerator do - and again these are all extremely popular. Assuming independence is more than sufficient for a simplified interface. And again, non-unary ops are not a target for Compose, at least to the extent that we want to have parity with PyTorch. Albumentations parity (i.e. handling detection like that) would be cool, but not at all required for Compose to still be very useful to a lot of workflows.
Quite literally every other major framework handles its augmentation this way. NVIDIA stands alone in absolutely requiring this modular, graphical usage - which just does not make sense to me, as our goal should be adoption and integration. DALI will have a hard time seeing success if it remains so different from literally every other framework that nobody will adopt it.
I have provided several pieces of evidence supporting this being a real issue - notably the towardsdatascience blog post that required NVIDIA input to work even for a simple case, and the Albumentations issue clearly citing DALI usability being foreign, and a concern.
We have synced offline with @dnola and we will implement such an idea soon. Before that we see that some work need to be done first to make this more versatile and feature complete (this includes):
define_graph methodBasic functionality for compose is merged as #2393
DALI 0.28 has been released, it should address this issue.
Most helpful comment
My point with ImageDataGenerator being that it is a substantially simpler “first pass” interface. Compose is similarly a much simpler interface, good for a simple workflow. In this way they are absolutely similar!
I really don’t think anyone will be assuming that a simpler interface will be more performant. This isn’t an issue in Keras with the Sequential vs Model APIs - people don’t just automatically assume Sequential is more performant, so I don’t see why that would be an issue here
For simple workflows we just assume independence. As I mentioned earlier, these compound transformations from a single RV are not necessary for most simple augmentation workflows. This is what PyTorch, Albumentations, MxNet and ImageDataGenerator do - and again these are all extremely popular. Assuming independence is more than sufficient for a simplified interface. And again, non-unary ops are not a target for Compose, at least to the extent that we want to have parity with PyTorch. Albumentations parity (i.e. handling detection like that) would be cool, but not at all required for Compose to still be very useful to a lot of workflows.
Quite literally every other major framework handles its augmentation this way. NVIDIA stands alone in absolutely requiring this modular, graphical usage - which just does not make sense to me, as our goal should be adoption and integration. DALI will have a hard time seeing success if it remains so different from literally every other framework that nobody will adopt it.
I have provided several pieces of evidence supporting this being a real issue - notably the towardsdatascience blog post that required NVIDIA input to work even for a simple case, and the Albumentations issue clearly citing DALI usability being foreign, and a concern.