Vision: “transforms.functional” Broken When Using Images With Alpha

Created on 1 Jul 2020 · 7Comments · Source: pytorch/vision

🐛 Bug

Some “torchvision.transforms.functional” transformations such as “TF.rotate” and “TF.resize” break the image when the image is in RGBA format, as you can see here:

The blue channel before the rotation is left, and after it in the right, everything in where the alpha channel is black was also made black in the other channels.

Steps to reproduce the behavior:

1-Get an image with alpha channel, this is the one I used:

2- Apply some of these functions to it, like this:

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import torchvision.transforms.functional as TF

img = Image.open(r'C:\your\folder\test.png')

plt.imshow(np.asarray(img)[:,:,2:3].squeeze(), cmap='gray'); plt.show()

img   = TF.rotate(img, 30) # or img  = TF.resize(img, (512, 512))

plt.imshow(np.asarray(img)[:,:,2:3].squeeze(), cmap='gray'); plt.show()

Expected behavior:
It should act similar to the "transforms.RandomRotation" - which works properly - where the data in the first three channels are not affected by the data in the forth channel. Not all problems using "vision" are images, let alone all forth channels are transparency. For example, in my case this is a vector bump map where the first three channels are XYZ normal components, and the forth is a height component.

Note this bug happens because pytorch uses PIL's built it functions like the rotate function which are not meant for data science, and just throw away data from the previous channels when the corresponding forth channel value is zero.

Workaround:
Use opencv_transforms library instead which works properly, it is a rewrite of Pytorch’s transforms to be OpenCV based instead of PIL, so it’s also faster:

https://github.com/jbohnslav/opencv_transforms

bug dependency issue transforms

Source

Huud

Most helpful comment

Currently, if we pass input to torch tensor and apply a rotation it gives the following result

import torch

...

img = Image.open('test.png')
np_img = np.asarray(img)
t_img = torch.from_numpy(np_img).permute(2, 0, 1)
out = TF.rotate(t_img, 30)
np_out = out.permute(1, 2, 0).numpy()
print(np_out.shape)

plt.figure(figsize=(20, 7))
for i in range(np_out.shape[-1]):
    plt.subplot(1, 4, i + 1)
    plt.imshow(np_out[:, :, i], cmap='gray')

vs PIL

So, the image data with alpha = 0 inside the image remains present in rotated image, but fill value should be adapted according to the application.

Let's close the issue as solved and @Huud feel free to reopen if you need more support on that. Thanks

vfdev-5 on 21 Oct 2020

👍2

All 7 comments

Thanks for opening this issue!

As you noted, this is a problem with Pillow, which is a dependency for torchvision.

I would like to note that we are progressively adding support for the transforms to work directly on Tensors, see https://github.com/pytorch/vision/issues/1375 and https://github.com/pytorch/vision/issues/2292, so that in the near future you'll be able to directly call those transforms in tensors, which should support alpha channels as well, and this issue will be fixed.

Until then, I would recommend opening an issue in Pillow with the issue so that they can look into fixing it there as well.

It should act similar to the "transforms.RandomRotation" - which works properly

I'm not sure what you meant there -- transforms.RandomRotation calls into transforms.functional.rotate internally, so it should suffer from the same issue as well?

fmassa on 1 Jul 2020

Glad to see tensors will be supported directly, PIL dependency added issues like performance and number of channels that I had to live with or write my own workarounds for them.

I would recommend opening an issue in Pillow with the issue so that they can look into fixing it there as well.

I don't think this is an issue for PIL since it is a library meant for images, and zeroing out pixels that won't be seen due to transparency makes sense for them.

I'm not sure what you meant there -- transforms.RandomRotation calls into transforms.functional.rotate internally.

Ah, looks like I mixed the "opencv_transforms" version when testing, disregard that then.

Huud on 1 Jul 2020

I don't think this is an issue for PIL since it is a library meant for images, and zeroing out pixels that won't be seen due to transparency makes sense for them.

well, I'm not sure I agree. the alpha channel seems to be handled as a separate channel in their interpolation kernels, so this might be a bug?

fmassa on 1 Jul 2020

The underlying issue is that in PIL -in the transform function which is called by functions like rotate- they explicitly convert "RGBA" images to premultiplied alpha format "RGBa" which multiplies the RGB component by the alpha permanently zeroing out RGB data, then they work on the the image already premultiplied, then convert back to "RGBA".

So yeah, it's not a bug for them, it is pretty deliberate. You can replicate the issue minimally with
img = img.convert('RGBa').convert('RGBA')

Huud on 2 Jul 2020

Ok, makes sense.

@vfdev-5 is currently working on making the remaining transforms in torchvision that depends on PIL to also work directly with torch Tensors by using torch operations, so this will be fixed (when using torch Tensors) in the next few weeks

fmassa on 9 Jul 2020

👍1

Now all transforms work on Tensor as well, so that I believe this issue should be fixed if you convert the PIL image to Tensor and apply the transforms directly on Tensor. @vfdev-5 can you double-check and close the issue if that's the case?

fmassa on 21 Oct 2020

👍1

Currently, if we pass input to torch tensor and apply a rotation it gives the following result

import torch

...

img = Image.open('test.png')
np_img = np.asarray(img)
t_img = torch.from_numpy(np_img).permute(2, 0, 1)
out = TF.rotate(t_img, 30)
np_out = out.permute(1, 2, 0).numpy()
print(np_out.shape)

plt.figure(figsize=(20, 7))
for i in range(np_out.shape[-1]):
    plt.subplot(1, 4, i + 1)
    plt.imshow(np_out[:, :, i], cmap='gray')

vs PIL

So, the image data with alpha = 0 inside the image remains present in rotated image, but fill value should be adapted according to the application.

Let's close the issue as solved and @Huud feel free to reopen if you need more support on that. Thanks

vfdev-5 on 21 Oct 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings