Vision: FastRCNNPredictor doesn't return prediction in evaluation

Created on 9 Mar 2020  路  14Comments  路  Source: pytorch/vision

馃悰 Bug

Dear all,

I am doing object detection in an image with one class. After training, FastRCNNPredictor does not return anything in validation mode. I have followed this official tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html.

Thanks.

To Reproduce

Steps to reproduce the behavior:

I have created a custom dataset, this is one of the output:

tensor([[[0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1686, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1529],
          ...,
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490]],

         [[0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1294, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1137],
          ...,
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098]],

         [[0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1216, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1059],
          ...,
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059]]]),
 {'boxes': tensor([[315.0003, 213.5002, 626.0004, 329.5002]]),
  'labels': tensor([0]),
  'image_id': tensor([1]),
  'area': tensor([36503.9961]),
  'iscrowd': tensor([0])})

To prove its correctness I have also visualized the bbox on the image:

image

Then I create a Dataloader:


dl = DataLoader(ds, batch_size=8, num_workers=4, collate_fn=lambda x: tuple(zip(*x)))

model = fasterrcnn_resnet50_fpn(num_classes=1).to(device)

params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

Training works:

model.train()
for i in range(5):

    for images, targets in dl:
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k,v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

        print(losses)

Output:

tensor(0.6391, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6329, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6139, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5965, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5814, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5468, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5049, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.4502, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.3787, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.2502, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.1605, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0940, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0558, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0507, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0413, device='cuda:0', grad_fn=<AddBackward0>)

But, when I try to get a prediction I have no output:

model = model.eval()
with torch.no_grad():
    model = model.cuda()
    pred = model([ds[2][0].cuda()])

pred is

[{'boxes': tensor([], size=(0, 4)),
  'labels': tensor([], dtype=torch.int64),
  'scores': tensor([])}]

Thank you in advance

Expected behavior

The model should return a valid prediction.

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 430.50
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] efficientnet-pytorch==0.5.1
[pip] msgpack-numpy==0.4.3.2
[pip] numpy==1.17.4
[pip] PytorchStorage==0.0.0
[pip] torch==1.4.0
[pip] torchbearer==0.5.3
[pip] torchlego==0.0.0
[pip] torchsummary==1.5.1
[pip] torchvision==0.5.0
[conda] _pytorch_select           0.2                       gpu_0  
[conda] blas                      1.0                         mkl  
[conda] efficientnet-pytorch      0.5.1                    pypi_0    pypi
[conda] libblas                   3.8.0                    14_mkl    conda-forge
[conda] libcblas                  3.8.0                    14_mkl    conda-forge
[conda] liblapack                 3.8.0                    14_mkl    conda-forge
[conda] liblapacke                3.8.0                    14_mkl    conda-forge
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] pytorchstorage            0.0.0                    pypi_0    pypi
[conda] torch                     1.4.0                    pypi_0    pypi
[conda] torchbearer               0.5.3                    pypi_0    pypi
[conda] torchlego                 0.0.0                    pypi_0    pypi
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchvision               0.5.0                    pypi_0    pypi
models question object detection

Most helpful comment

@FrancescoSaverioZuppichini I think I see the issue: the label for your object is 0, but Faster R-CNN considers value 0 as background. If you make the label be 1, it should work fine.

This is illustrated in the detection tutorial you mentioned, see the dataset line:

# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)

But I agree it can be a bit tricky to spot this. I would happily accept a PR improving the documentation mentioning that the labels should start at 1 and that 0 is treated as background.

All 14 comments

Hi,

I believe you haven't trained for enough iterations to be able to see the model converge, specially because you are not using a pre-trained model but instead are training it from scratch, which requires a lot of iterations.

I would recommend following the fine-tuning steps in the tutorial that you pointed out, as you'll probably see better and faster results on limited data.

I'm closing the issue, but let us know if you have further questions/

Hi @fmassa,

Thanks :)

The tutorial was followed correctly. The loss is correctly decreasing during training and I see no problem at all. In this case, no output means no bbox. I will use the pre-trained weights and train the model for a longer time. In the meantime, could you be so kind to have a look at the code I have attached? Maybe I missed something. One last question, should I resize the image to normal imagenet format (224)?

Thank you

The code seems correct to me.

You don't need to resize the image to 224, just make sure your images are in 0-1 range in RGB, and the model will rescale them internally for you

The model is internally resizing the image and the bboxes to (480, 640, C), (COCO format), isn't it?

Using a pretrained network as follows:

from torchvision.models.detection.faster_rcnn  import fasterrcnn_resnet50_fpn
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

model = fasterrcnn_resnet50_fpn(True).to(device)

num_classes = 1  
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes).to(device)

And trainig the network as showed in the first post I get the following output:

tensor(0.0620, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.1114, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0959, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0404, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0653, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0422, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0317, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0355, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0278, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0377, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0372, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0334, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0235, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0251, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0247, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0220, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0195, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0216, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0260, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0247, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0163, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0161, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0149, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0171, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0158, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0155, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0122, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0179, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0129, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0119, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0133, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0140, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0145, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0131, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0117, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0094, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0123, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0126, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0086, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0106, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0117, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0099, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0119, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0109, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0124, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0075, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0088, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0132, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0101, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0099, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0097, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0087, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0101, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0054, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0092, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0095, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0055, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0078, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0098, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0041, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0080, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0118, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0048, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0089, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0085, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0043, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0074, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0105, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0036, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0075, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0080, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0049, device='cuda:0', grad_fn=<AddBackward0>)

but still,

model = model.eval()
with torch.no_grad():
    model = model.cuda()
    pred = model([ds[2][0].cuda()])

pred is still empty

[{'boxes': tensor([], size=(0, 4)),
  'labels': tensor([], dtype=torch.int64),
  'scores': tensor([])}]

Any idea?

On my side, I have rechecked the type of the inputs and they are correct. An example of one item in the dataset is:

(tensor([[[0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1686, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1529],
          ...,
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490]],

         [[0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1294, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1137],
          ...,
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098]],

         [[0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1216, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1059],
          ...,
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059]]]),
 {'boxes': tensor([[315.0003, 213.5002, 626.0004, 329.5002]]),
  'labels': tensor([0]),
  'image_id': tensor([1]),
  'area': tensor([36503.9961]),
  'iscrowd': tensor([0])})

I am not sure about iscrowd, but in the tutorial, it was set to zero.

Thanks.

@FrancescoSaverioZuppichini I think I see the issue: the label for your object is 0, but Faster R-CNN considers value 0 as background. If you make the label be 1, it should work fine.

This is illustrated in the detection tutorial you mentioned, see the dataset line:

# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)

But I agree it can be a bit tricky to spot this. I would happily accept a PR improving the documentation mentioning that the labels should start at 1 and that 0 is treated as background.

@fmassa Thank you, it works! 馃コ馃コ

I will definitely create a PR and improve the doc over the weekend

Cool, looking forward to the PR improving the documentation!

Hi @fmassa, I hope you are healthy. Sorry for the late reply but I have been very busy these days. Is there a doc contribution guide that I can follow to be sure I am changing the right files?

Hi @FrancescoSaverioZuppichini

All good here, hope everything is good for you as well.

You could maybe add some information in https://github.com/pytorch/vision/blob/master/docs/source/models.rst#object-detection-instance-segmentation-and-person-keypoint-detection or in the tutorials, which are hosted in https://github.com/pytorch/tutorials/blob/master/intermediate_source/torchvision_tutorial.rst

Hi @fmassa, I hope you are doing well. I have added a couple of sentences and hopefully, it is more understandable now

You can find the PR here https://github.com/pytorch/tutorials/pull/914

Thanks for the PR @FrancescoSaverioZuppichini !

Hi @FrancescoSaverioZuppichini @fmassa . I am also getting no predictions for faster-rcnn model. How did you resolve that problem, It was just changing by label index from 1 instead of 0.

By reading the above messages

Was this page helpful?
0 / 5 - 0 ratings