Vision: FastRCNNPredictor doesn't return prediction in evaluation

Created on 9 Mar 2020 · 14Comments · Source: pytorch/vision

🐛 Bug

Dear all,

I am doing object detection in an image with one class. After training, FastRCNNPredictor does not return anything in validation mode. I have followed this official tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html.

Thanks.

To Reproduce

Steps to reproduce the behavior:

I have created a custom dataset, this is one of the output:

tensor([[[0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1686, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1529],
          ...,
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490]],

         [[0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1294, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1137],
          ...,
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098]],

         [[0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1216, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1059],
          ...,
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059]]]),
 {'boxes': tensor([[315.0003, 213.5002, 626.0004, 329.5002]]),
  'labels': tensor([0]),
  'image_id': tensor([1]),
  'area': tensor([36503.9961]),
  'iscrowd': tensor([0])})

To prove its correctness I have also visualized the bbox on the image:

Then I create a Dataloader:


dl = DataLoader(ds, batch_size=8, num_workers=4, collate_fn=lambda x: tuple(zip(*x)))

model = fasterrcnn_resnet50_fpn(num_classes=1).to(device)

params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

Training works:

model.train()
for i in range(5):

    for images, targets in dl:
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k,v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

        print(losses)

Output:

tensor(0.6391, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6329, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6139, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5965, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5814, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5468, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5049, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.4502, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.3787, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.2502, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.1605, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0940, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0558, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0507, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0413, device='cuda:0', grad_fn=<AddBackward0>)

But, when I try to get a prediction I have no output:

model = model.eval()
with torch.no_grad():
    model = model.cuda()
    pred = model([ds[2][0].cuda()])

pred is

[{'boxes': tensor([], size=(0, 4)),
  'labels': tensor([], dtype=torch.int64),
  'scores': tensor([])}]

Thank you in advance

Expected behavior

The model should return a valid prediction.

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 430.50
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] efficientnet-pytorch==0.5.1
[pip] msgpack-numpy==0.4.3.2
[pip] numpy==1.17.4
[pip] PytorchStorage==0.0.0
[pip] torch==1.4.0
[pip] torchbearer==0.5.3
[pip] torchlego==0.0.0
[pip] torchsummary==1.5.1
[pip] torchvision==0.5.0
[conda] _pytorch_select           0.2                       gpu_0  
[conda] blas                      1.0                         mkl  
[conda] efficientnet-pytorch      0.5.1                    pypi_0    pypi
[conda] libblas                   3.8.0                    14_mkl    conda-forge
[conda] libcblas                  3.8.0                    14_mkl    conda-forge
[conda] liblapack                 3.8.0                    14_mkl    conda-forge
[conda] liblapacke                3.8.0                    14_mkl    conda-forge
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] pytorchstorage            0.0.0                    pypi_0    pypi
[conda] torch                     1.4.0                    pypi_0    pypi
[conda] torchbearer               0.5.3                    pypi_0    pypi
[conda] torchlego                 0.0.0                    pypi_0    pypi
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchvision               0.5.0                    pypi_0    pypi

models question object detection

Source

FrancescoSaverioZuppichini

Most helpful comment

@FrancescoSaverioZuppichini I think I see the issue: the label for your object is 0, but Faster R-CNN considers value 0 as background. If you make the label be 1, it should work fine.

This is illustrated in the detection tutorial you mentioned, see the dataset line:

# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)

But I agree it can be a bit tricky to spot this. I would happily accept a PR improving the documentation mentioning that the labels should start at 1 and that 0 is treated as background.

fmassa on 10 Mar 2020

👍2

All 14 comments

Hi,

I believe you haven't trained for enough iterations to be able to see the model converge, specially because you are not using a pre-trained model but instead are training it from scratch, which requires a lot of iterations.

I would recommend following the fine-tuning steps in the tutorial that you pointed out, as you'll probably see better and faster results on limited data.

I'm closing the issue, but let us know if you have further questions/

fmassa on 10 Mar 2020

Hi @fmassa,

Thanks :)

The tutorial was followed correctly. The loss is correctly decreasing during training and I see no problem at all. In this case, no output means no bbox. I will use the pre-trained weights and train the model for a longer time. In the meantime, could you be so kind to have a look at the code I have attached? Maybe I missed something. One last question, should I resize the image to normal imagenet format (224)?

Thank you

FrancescoSaverioZuppichini on 10 Mar 2020

The code seems correct to me.

You don't need to resize the image to 224, just make sure your images are in 0-1 range in RGB, and the model will rescale them internally for you

fmassa on 10 Mar 2020

The model is internally resizing the image and the bboxes to (480, 640, C), (COCO format), isn't it?

FrancescoSaverioZuppichini on 10 Mar 2020

Using a pretrained network as follows:

from torchvision.models.detection.faster_rcnn  import fasterrcnn_resnet50_fpn
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

model = fasterrcnn_resnet50_fpn(True).to(device)

num_classes = 1  
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes).to(device)

And trainig the network as showed in the first post I get the following output:

tensor(0.0620, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.1114, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0959, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0404, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0653, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0422, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0317, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0355, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0278, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0377, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0372, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0334, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0235, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0251, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0247, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0220, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0195, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0216, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0260, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0247, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0163, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0161, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0149, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0171, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0158, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0155, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0122, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0179, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0129, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0119, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0133, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0140, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0145, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0131, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0117, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0094, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0123, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0126, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0086, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0106, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0117, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0099, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0119, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0109, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0124, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0075, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0088, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0132, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0101, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0099, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0097, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0087, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0101, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0054, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0092, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0095, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0055, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0078, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0098, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0041, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0080, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0118, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0048, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0089, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0085, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0043, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0074, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0105, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0036, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0075, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0080, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0049, device='cuda:0', grad_fn=<AddBackward0>)

but still,

model = model.eval()
with torch.no_grad():
    model = model.cuda()
    pred = model([ds[2][0].cuda()])

pred is still empty

[{'boxes': tensor([], size=(0, 4)),
  'labels': tensor([], dtype=torch.int64),
  'scores': tensor([])}]

Any idea?

On my side, I have rechecked the type of the inputs and they are correct. An example of one item in the dataset is:

(tensor([[[0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1686, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1529],
          ...,
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490]],

         [[0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1294, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1137],
          ...,
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098]],

         [[0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1216, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1059],
          ...,
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059]]]),
 {'boxes': tensor([[315.0003, 213.5002, 626.0004, 329.5002]]),
  'labels': tensor([0]),
  'image_id': tensor([1]),
  'area': tensor([36503.9961]),
  'iscrowd': tensor([0])})

I am not sure about iscrowd, but in the tutorial, it was set to zero.

Thanks.

FrancescoSaverioZuppichini on 10 Mar 2020

@FrancescoSaverioZuppichini I think I see the issue: the label for your object is 0, but Faster R-CNN considers value 0 as background. If you make the label be 1, it should work fine.

This is illustrated in the detection tutorial you mentioned, see the dataset line:

# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)

But I agree it can be a bit tricky to spot this. I would happily accept a PR improving the documentation mentioning that the labels should start at 1 and that 0 is treated as background.

fmassa on 10 Mar 2020

👍2

@fmassa Thank you, it works! 🥳🥳

I will definitely create a PR and improve the doc over the weekend

FrancescoSaverioZuppichini on 11 Mar 2020

👍1

Cool, looking forward to the PR improving the documentation!

fmassa on 12 Mar 2020

Hi @fmassa, I hope you are healthy. Sorry for the late reply but I have been very busy these days. Is there a doc contribution guide that I can follow to be sure I am changing the right files?

FrancescoSaverioZuppichini on 25 Mar 2020

Hi @FrancescoSaverioZuppichini

All good here, hope everything is good for you as well.

You could maybe add some information in https://github.com/pytorch/vision/blob/master/docs/source/models.rst#object-detection-instance-segmentation-and-person-keypoint-detection or in the tutorials, which are hosted in https://github.com/pytorch/tutorials/blob/master/intermediate_source/torchvision_tutorial.rst

fmassa on 26 Mar 2020

Hi @fmassa, I hope you are doing well. I have added a couple of sentences and hopefully, it is more understandable now

You can find the PR here https://github.com/pytorch/tutorials/pull/914

FrancescoSaverioZuppichini on 30 Mar 2020

Thanks for the PR @FrancescoSaverioZuppichini !

fmassa on 30 Mar 2020

❤1

Hi @FrancescoSaverioZuppichini @fmassa . I am also getting no predictions for faster-rcnn model. How did you resolve that problem, It was just changing by label index from 1 instead of 0.