I propose that we have a separate set of functional transforms that takes a tensors as input, and returns tensors, and it should be torchscript-able.
TorchVision currently relies on PIL for most of its transforms.
While reasonably fast and widely adopted, the use of an external library it makes our transforms impossible to be traceable / scriptable.
One of the biggest drawbacks of that is that pre-processing is generally a crucial part of reproducing a models' results, and different preprocessing (due to, e.g., OpenCV / PIL differences) can have an impact in the final model result.
By the time torchvision was initially developed, there were way fewer operations implemented on PyTorch that could be used to perform image transformations, such as resizing, rotations and affine warps.
It also creates a kind of weird situation where certain operations expect PIL Images, and others expect Torch Tensors (normalize is a notable case).
Since then, we have improved the support for image resizing in PyTorch (thanks to the upsample function), which supports a number of cases, as well as grid_sample, which enables us to do rotations, affine warpings and more in an efficient manner.
Pros of using PyTorch ops
Cons of using PyTorch ops
It should be noted that using PyTorch ops should not be a hard-constraints. This lets the users still implement their own functionalities by leveraging PIL or OpenCV. But only the transforms based on PyTorch will be able to be exported to torchscript.
This means that the lingua-franca of passing objects around in torchvision transforms would be a torch.Tensor, and not a PIL Image anymore.
Most of the transforms in torchvision can already be expressed with PyTorch native operators, like torch.nn.functional.interpolate or torch.nn.functional.grid_sample, so we should not need to write specialized ops for them in torchvision.
An initial PR adding support for video has been sent in https://github.com/pytorch/vision/pull/1353 , and I think we should improve on top of it to make it cover more ops, and also support images.
Using torch operators has a drawback. It currently only supports batched tensors in NCHW format and floating point values, which is different than the format supported by our current set of transforms (HWC and uint8 for most cases).
For now let's assume that the tensors are float32 and in the NCHW format. We might consider explicitly keeping a memory_format=torch.channels_last layout for compatibility (TBD)
Long-term, we should add support for uint8 (and other integer types) to interpolate and make it more generic over which dimensions to interpolate https://github.com/pytorch/pytorch/issues/10482, but that's a larger task.
@fmassa for a project of mine, I've implemented some transformations. From your list this includes:
Take a look at it, ff you have time. Maybe we can adopt my implementation here.
@pmeier that looks like a great starting point, yes, we can use yours as a starting point!
cc @xiuyanni for awareness
@fmassa , I would like to pick a few of the mentioned ones and see if I can work out those since I haven't contributed before to pytorch/vision. Maybe, this can be a good starting point.
Thank you
@PyExtreme sure, go for it!
Note that there are already two open PRs adding functionality for crop and adjust_hue, so I'd recommend checking them before sending your PR
@fmassa, what do you think, right now should we do it like transforms.Tensor<Transform> for existing PIL transforms or entirely replace the previous.
I think by the time we don't have all as pytorch ops, we could this way. Let me know what you think.
@surgan12 I think it would be better to have a separate file, functional_tensor.py, with the implementations for the functional transforms using tensors, with the same name as in functional.py. Let's not worry about the class-based transforms for now.
And let's keep the PIL-based implementations as well for backwards-compatibility
@fmassa , I would be starting with center crop, resized crop and five crop
@PyExtreme those transforms depend on crop, which has been implemented by @ekagra-ranjan but it hasn't been merged yet.
@fmassa , could you please provide me a good starting point?
Would be of great help to me.
Thank you
@PyExtreme grayscale would be the easiest way to start. You can probably start from the implementation from @pmeier from https://github.com/pmeier/pystiche/blob/master/pystiche/image/transforms/functional/color.py
BTW, we will need to decide what to do about colorspaces, @pmeier do you have an idea? One possibility would be to add a mode attribute to the specific tensors we are using, but I'm not sure if we should do it
@fmassa ,
Are there any guidelines for setting up the environment at local? For eg, In pytorch/pytorch, we have to install in develop mode.
Also, what are the commands for testing the changes?
Would be of great help to me.
Thank you
@fmassa
BTW, we will need to decide what to do about colorspaces, @pmeier do you have an idea?
I thought about that while I've implemented it for my project. I came up with three options.
torchvision.YUVImage.luminance() or torchvision.RGBAImage.opacity() could come in handy. colorspace as you suggested. With that we could also check if the transform is applicable (imagine passing an image with image.colorspace == "RGB" to transforms.YUVToRGB()). I would prefer option 1, but I acknowledge that it will be the greatest effort of the three. I personally (mostly for time reasons) went with option 3. I haven't seen a scenario where the user could lose his overview if he marks the tensors for example with postfixing _{color_space} on the variable names. One usually does not have more than two or three color spaces involved.
@PyExtreme
grayscalewould be the easiest way to start. You can probably start from the implementation from @pmeier from https://github.com/pmeier/pystiche/blob/master/pystiche/image/transforms/functional/color.pyBTW, we will need to decide what to do about colorspaces, @pmeier do you have an idea? One possibility would be to add a
modeattribute to the specific tensors we are using, but I'm not sure if we should do it
Thanks @fmassa , I am starting to work on it.
@fmassa,
I'd be happy to work on adjust_brightness, adjust_contrast and adjust_saturation!
@pedrofreire go for it!
Hi @fmassa , I would love to work on center_crop, five_crop and ten_crop.
Please feel free to let me know anything I may need and might be important for it.
Thank you
@PyExtreme sure, go for it.
One thing to keep in mind is that the implementation of those functions rely on the implementation of crop, so the functions will be very similar to the equivalent implementation for PIL Images.
hi! just to make a point here. All those listed operators (and more) are already supported by kornia.augmentation (without knowing the existence of this work in progress).
In kornia we have been working on the same direction to fully support transforms assuming torch tensors as inputs and PyTorch operators. Would be great to find a balance to not overlap efforts and integrate stuff.
@vfdev-5 I believe we can close this?
OK for me to close it
Congrats on hitting the huge milestone! 馃帀
Most helpful comment
Congrats on hitting the huge milestone! 馃帀