Vision: transforms.ToTensor() for numpy float array in the range of [0.0, 255.0]

Created on 13 Jul 2018  路  6Comments  路  Source: pytorch/vision

I had come across a debugging scenario where the ToTensor() didn't convert the numpy float array in the range of [0.0, 255.0] to the range [0.0 to 1.0] due to following lines:
https://github.com/pytorch/vision/blob/master/torchvision/transforms/functional.py#L50-L53

Basically, this API assumes that all the float arrays will already be in range [0.0 to 1.0].
Do you think we have to change this behavior?

enhancement help wanted transforms

Most helpful comment

@ekagra-ranjan we could try doing something like that, but the cost of torch.max(img) is negligible and would slowdown many things that rely on ToTensor with floating point data.

I think the underlying issue is that ToTensor tries to do way too many things internally, and it can't suit all needs.

All 6 comments

I think I'll be creating an Image class that can hold a few different types (PIL images, numpy arrays, tensors), and during the constructor it will know what types of data it expects, so that we can cover all those use-cases.

or create a universal API to directly read an image in a default format, given the filepath? I am not sure if this will help. I feel we (users) are using different packages to read (PIL, scipy, skimage, opencv etc) images. Hence we have more cases to cover.

@fmassa This issue really confused me as a beginner in pytorch :) What if we change line 57-62 with:

        img = torch.from_numpy(pic.transpose((2, 0, 1)))
        # backward compatibility

        if isinstance(img, torch.ByteTensor):
            return img.float().div(255)
        else:
            img = img.float()
            if torch.max(img)<=1.0:
                return img
            else:
                return img.div(255)

@ekagra-ranjan we could try doing something like that, but the cost of torch.max(img) is negligible and would slowdown many things that rely on ToTensor with floating point data.

I think the underlying issue is that ToTensor tries to do way too many things internally, and it can't suit all needs.

I found the same bug while loading the moving mnist data. The input is not float but the same bug still exists that the ToTensor() function does not transform the values from 0 to 255 to 0 to 1, it also does not give an any warning, so I found it while debugging my own neural net.
Is there any clean solution for that?

@Melika-Ayoughi I'm not sure there is a clean solution if we keep the current approach.
We can do workarounds as the one @ekagra-ranjan mentioned, but this will not be a complete fix.

This is something that I believe should be completely redesigned in my opinion.

Was this page helpful?
0 / 5 - 0 ratings