Vision: transforms.ToTensor() for numpy float array in the range of [0.0, 255.0]

Created on 13 Jul 2018 · 6Comments · Source: pytorch/vision

I had come across a debugging scenario where the ToTensor() didn't convert the numpy float array in the range of [0.0, 255.0] to the range [0.0 to 1.0] due to following lines:
https://github.com/pytorch/vision/blob/master/torchvision/transforms/functional.py#L50-L53

Basically, this API assumes that all the float arrays will already be in range [0.0 to 1.0].
Do you think we have to change this behavior?

enhancement help wanted transforms

Source

InnovArul

👍3

Most helpful comment

@ekagra-ranjan we could try doing something like that, but the cost of torch.max(img) is negligible and would slowdown many things that rely on ToTensor with floating point data.

I think the underlying issue is that ToTensor tries to do way too many things internally, and it can't suit all needs.

fmassa on 11 Mar 2019

👍2

All 6 comments

I think I'll be creating an Image class that can hold a few different types (PIL images, numpy arrays, tensors), and during the constructor it will know what types of data it expects, so that we can cover all those use-cases.

fmassa on 13 Jul 2018

or create a universal API to directly read an image in a default format, given the filepath? I am not sure if this will help. I feel we (users) are using different packages to read (PIL, scipy, skimage, opencv etc) images. Hence we have more cases to cover.

InnovArul on 13 Jul 2018

@fmassa This issue really confused me as a beginner in pytorch :) What if we change line 57-62 with:

        img = torch.from_numpy(pic.transpose((2, 0, 1)))
        # backward compatibility

        if isinstance(img, torch.ByteTensor):
            return img.float().div(255)
        else:
            img = img.float()
            if torch.max(img)<=1.0:
                return img
            else:
                return img.div(255)

ekagra-ranjan on 10 Mar 2019

@ekagra-ranjan we could try doing something like that, but the cost of torch.max(img) is negligible and would slowdown many things that rely on ToTensor with floating point data.

I think the underlying issue is that ToTensor tries to do way too many things internally, and it can't suit all needs.

fmassa on 11 Mar 2019

👍2

I found the same bug while loading the moving mnist data. The input is not float but the same bug still exists that the ToTensor() function does not transform the values from 0 to 255 to 0 to 1, it also does not give an any warning, so I found it while debugging my own neural net.
Is there any clean solution for that?

Melika-Ayoughi on 27 Jun 2019

@Melika-Ayoughi I'm not sure there is a clean solution if we keep the current approach.
We can do workarounds as the one @ekagra-ranjan mentioned, but this will not be a complete fix.

This is something that I believe should be completely redesigned in my opinion.

fmassa on 9 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings