The FasterRCNN model (and, more generally, the GeneralizedRCNN class) expects as input images a list of float PyTorch tensors, but if you try to pass it a list of tensors with dtype torch.uint8, the model returns NaN values in the normalization step and, as a consequence, in the losses computation.
Steps to reproduce the behavior:
torch.uint8, along with its corresponding target dictionaryFasterRCNN and pass that image to the modelNaN valuesI would have expected the model to throw an exception or at least a warning. In particular, since the GeneralizedRCNN class takes care of transformations such as normalization and resizing, in my opinion it should also check the type of the input images, in order to avoid such errors.
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 10.15.7 (x86_64)
GCC version: Could not collect
Clang version: 12.0.0 (clang-1200.0.32.28)
CMake version: version 3.18.4
Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] Could not collect
I realized that the error I was facing is caused by the normalize function of the GeneralizedRCNNTransform class, which relies on the image dtype to convert the mean and standard deviation lists to tensors, so that in the default case (ImageNet mean/std) they contain all zeros.
def normalize(self, image):
dtype, device = image.dtype, image.device
mean = torch.as_tensor(self.image_mean, dtype=dtype, device=device)
std = torch.as_tensor(self.image_std, dtype=dtype, device=device)
return (image - mean[:, None, None]) / std[:, None, None]
To avoid this problem, a simple image.float() would suffice.
@Wadaboa Thanks for reporting.
I think the current implementation expects that you have already coverted the image to 0-1 scale, using one of the other transforms:
https://github.com/pytorch/vision/blob/3d60f498e71ba63b428edb184c9ac38fa3737fa6/references/detection/train.py#L51-L53
Concerning the code on normalize, I think the idea of casting the mean/std to the dtype of the image is not correct:
https://github.com/pytorch/vision/blob/3d60f498e71ba63b428edb184c9ac38fa3737fa6/torchvision/models/detection/transform.py#L120-L124
The above works only if the image is of floating type but if it's not, the whole thing fails as you pointed out. I think this needs to be fixed one way or another:
Given that the GeneralizedRCNNTransform receives as arguments mean/std which can be in 0-255 scale, I think option 2 is best. Do you mind sending us a PR that fixes the issue? Please add a unit-test that shows that the problem is resolved.
cc @fmassa
Great, thank you for the clarification! Tomorrow I'll work on the PR and send it right away!
Solved in PR #3266.