Vision: [RFC] Add scriptable transforms

Created on 26 Sep 2019 · 21Comments · Source: pytorch/vision

TL;DR

I propose that we have a separate set of functional transforms that takes a tensors as input, and returns tensors, and it should be torchscript-able.

Background

TorchVision currently relies on PIL for most of its transforms.
While reasonably fast and widely adopted, the use of an external library it makes our transforms impossible to be traceable / scriptable.

One of the biggest drawbacks of that is that pre-processing is generally a crucial part of reproducing a models' results, and different preprocessing (due to, e.g., OpenCV / PIL differences) can have an impact in the final model result.

By the time torchvision was initially developed, there were way fewer operations implemented on PyTorch that could be used to perform image transformations, such as resizing, rotations and affine warps.
It also creates a kind of weird situation where certain operations expect PIL Images, and others expect Torch Tensors (normalize is a notable case).

Since then, we have improved the support for image resizing in PyTorch (thanks to the upsample function), which supports a number of cases, as well as grid_sample, which enables us to do rotations, affine warpings and more in an efficient manner.

Pros of using PyTorch ops

GPU support
Batching supported
Enables tracing the transforms
autodiff support

Cons of using PyTorch ops

Not bit-wise equivalent to PIL
Some (but not many) cases are not yet supported

It should be noted that using PyTorch ops should not be a hard-constraints. This lets the users still implement their own functionalities by leveraging PIL or OpenCV. But only the transforms based on PyTorch will be able to be exported to torchscript.

This means that the lingua-franca of passing objects around in torchvision transforms would be a torch.Tensor, and not a PIL Image anymore.

How to implement it

Most of the transforms in torchvision can already be expressed with PyTorch native operators, like torch.nn.functional.interpolate or torch.nn.functional.grid_sample, so we should not need to write specialized ops for them in torchvision.

An initial PR adding support for video has been sent in https://github.com/pytorch/vision/pull/1353 , and I think we should improve on top of it to make it cover more ops, and also support images.

Gotchas

Using torch operators has a drawback. It currently only supports batched tensors in NCHW format and floating point values, which is different than the format supported by our current set of transforms (HWC and uint8 for most cases).

For now let's assume that the tensors are float32 and in the NCHW format. We might consider explicitly keeping a memory_format=torch.channels_last layout for compatibility (TBD)

Long-term, we should add support for uint8 (and other integer types) to interpolate and make it more generic over which dimensions to interpolate https://github.com/pytorch/pytorch/issues/10482, but that's a larger task.

List of transforms that could be readily available with PyTorch ops

[x] normalize
[x] resize (only nearest, bilinear and bicubic, for floating types)
[x] pad (except symmetric pad)
[x] crop
[x] center_crop
[x] resized_crop
[x] hflip
[x] vflip
[x] five_crop
[x] ten_crop
[x] adjust_brightness
[x] adjust_contrast
[x] adjust_saturation
[x] adjust_hue
[x] adjust_gamma
[x] rotate (only for nearest and bilinear, for floating types)
[x] affine (only for nearest and bilinear, for floating types)
[x] grayscale

enhancement help wanted high priority transforms

Source

fmassa

👍10 🎉1

Most helpful comment

Congrats on hitting the huge milestone! 🎉

mthrok on 22 Oct 2020

🎉5

All 21 comments

@fmassa for a project of mine, I've implemented some transformations. From your list this includes:

normalize
resize
affine
rotate
grayscale

Take a look at it, ff you have time. Maybe we can adopt my implementation here.

pmeier on 27 Sep 2019

👍2

@pmeier that looks like a great starting point, yes, we can use yours as a starting point!

cc @xiuyanni for awareness

fmassa on 30 Sep 2019

@fmassa , I would like to pick a few of the mentioned ones and see if I can work out those since I haven't contributed before to pytorch/vision. Maybe, this can be a good starting point.

Thank you

PyExtreme on 14 Oct 2019

@PyExtreme sure, go for it!

Note that there are already two open PRs adding functionality for crop and adjust_hue, so I'd recommend checking them before sending your PR

fmassa on 14 Oct 2019

@fmassa, what do you think, right now should we do it like transforms.Tensor<Transform> for existing PIL transforms or entirely replace the previous.
I think by the time we don't have all as pytorch ops, we could this way. Let me know what you think.

surgan12 on 15 Oct 2019

@surgan12 I think it would be better to have a separate file, functional_tensor.py, with the implementations for the functional transforms using tensors, with the same name as in functional.py. Let's not worry about the class-based transforms for now.
And let's keep the PIL-based implementations as well for backwards-compatibility

fmassa on 15 Oct 2019

@fmassa , I would be starting with center crop, resized crop and five crop

PyExtreme on 16 Oct 2019

@PyExtreme those transforms depend on crop, which has been implemented by @ekagra-ranjan but it hasn't been merged yet.

fmassa on 16 Oct 2019

@fmassa , could you please provide me a good starting point?
Would be of great help to me.

Thank you

PyExtreme on 16 Oct 2019

@PyExtreme grayscale would be the easiest way to start. You can probably start from the implementation from @pmeier from https://github.com/pmeier/pystiche/blob/master/pystiche/image/transforms/functional/color.py

BTW, we will need to decide what to do about colorspaces, @pmeier do you have an idea? One possibility would be to add a mode attribute to the specific tensors we are using, but I'm not sure if we should do it

fmassa on 16 Oct 2019

@fmassa ,
Are there any guidelines for setting up the environment at local? For eg, In pytorch/pytorch, we have to install in develop mode.

Also, what are the commands for testing the changes?

Would be of great help to me.

Thank you

PyExtreme on 17 Oct 2019

@fmassa

BTW, we will need to decide what to do about colorspaces, @pmeier do you have an idea?

I thought about that while I've implemented it for my project. I came up with three options.

We implement a custom tensor class for images and subclass it for different color spaces. Each subclass could for example have methods to extract the different channels. This might be a little over-the-top for RGB images, but I think something like torchvision.YUVImage.luminance() or torchvision.RGBAImage.opacity() could come in handy.
We implement a custom tensor class for images and give it an attribute colorspace as you suggested. With that we could also check if the transform is applicable (imagine passing an image with image.colorspace == "RGB" to transforms.YUVToRGB()).
We could leave it to the user and do not perform any checks. Since the the transforms can usually be applied if the number of channels is correct, no errors would be raised, but the resulting values would be nonsensical.

I would prefer option 1, but I acknowledge that it will be the greatest effort of the three. I personally (mostly for time reasons) went with option 3. I haven't seen a scenario where the user could lose his overview if he marks the tensors for example with postfixing _{color_space} on the variable names. One usually does not have more than two or three color spaces involved.

pmeier on 17 Oct 2019

@PyExtreme grayscale would be the easiest way to start. You can probably start from the implementation from @pmeier from https://github.com/pmeier/pystiche/blob/master/pystiche/image/transforms/functional/color.py

BTW, we will need to decide what to do about colorspaces, @pmeier do you have an idea? One possibility would be to add a mode attribute to the specific tensors we are using, but I'm not sure if we should do it

Thanks @fmassa , I am starting to work on it.

PyExtreme on 17 Oct 2019

@fmassa,
I'd be happy to work on adjust_brightness, adjust_contrast and adjust_saturation!

pedrofreire on 24 Oct 2019

@pedrofreire go for it!

fmassa on 25 Oct 2019

Hi @fmassa , I would love to work on center_crop, five_crop and ten_crop.
Please feel free to let me know anything I may need and might be important for it.

Thank you

PyExtreme on 24 Nov 2019

👍1

@PyExtreme sure, go for it.

One thing to keep in mind is that the implementation of those functions rely on the implementation of crop, so the functions will be very similar to the equivalent implementation for PIL Images.

fmassa on 26 Nov 2019

👍2

hi! just to make a point here. All those listed operators (and more) are already supported by kornia.augmentation (without knowing the existence of this work in progress).

In kornia we have been working on the same direction to fully support transforms assuming torch tensors as inputs and PyTorch operators. Would be great to find a balance to not overlap efforts and integrate stuff.