Detectron2: How does Detectron2 handles odd sized images ?

Created on 12 Feb 2020 · 2Comments · Source: facebookresearch/detectron2

Hello guys,

Thank's for your amazing work.

I got a question concerning the FPN backbone, it requires strides to be log2-contiguous, as far as I understand it means that each input feature map of the FPN must be downsampled 2 times wrt the previous feature map.

As expected, running this bit of code:

from detectron2.modeling.backbone.fpn import build_resnet_fpn_backbone

inputs = torch.randn(10, 3, 500, 500)

class shape:
    def __init__(self, channels, height, width):
        self.channels = channels
        self.height  = height
        self.width = width

model_resnet_fpn = build_resnet_fpn_backbone(cfg_resnet, shape(3, 500, 500))
P = model_resnet_fpn(inputs)

Throws the error :

RuntimeError                              Traceback (most recent call last)
<ipython-input-19-bfac22f8e8d0> in <module>
      8 
      9 model_resnet_fpn = build_resnet_fpn_backbone(cfg_resnet, shape(3, 500, 500)).cuda()
---> 10 P = model_resnet_fpn(inputs)

~/venv/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    539             result = self._slow_forward(*input, **kwargs)
    540         else:
--> 541             result = self.forward(*input, **kwargs)
    542         for hook in self._forward_hooks.values():
    543             hook_result = hook(self, input, result)

~/git_projects/detectron2/detectron2/modeling/backbone/fpn.py in forward(self, x)
    130             top_down_features = F.interpolate(prev_features, scale_factor=2, mode="nearest")
    131             lateral_features = lateral_conv(features)
--> 132             prev_features = lateral_features + top_down_features
    133             if self._fuse_type == "avg":
    134                 prev_features /= 2

RuntimeError: The size of tensor a (63) must match the size of tensor b (64) at non-singleton dimension 3

Because the dimensions of the bottom up outputs map are :

res2 torch.Size([3, 256, 125, 125])
res3 torch.Size([3, 512, 63, 63])
res4 torch.Size([3, 1024, 32, 32])
res5 torch.Size([3, 2048, 16, 16])

Hence in the FPN network it can't add the 2 tensors and it throws the error.

However when using the Detectron2 pipeline to train with odd sized images, it runs smoothly and won't throw this error.

My question is then : how does Detectron2 handles this case, and how can I make it work using build_resnet_fpn_backbone as above ?

My config file is :

MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN" # others are not fully implemented yet
  BACKBONE:
    NAME: "build_resnet_fpn_backbone" # feature pyramid backbone
  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
  MASK_ON: False
  RESNETS:
    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
    DEPTH: 50
  FPN:
    IN_FEATURES: ["res2", "res3", "res4", "res5"]
  ANCHOR_GENERATOR:
    SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map
    ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps)
  RPN:
    IN_FEATURES: ["p2", "p3", "p4", "p5", "p6"]

  ROI_HEADS:
    NAME: "StandardROIHeads"
    IN_FEATURES: ["p2", "p3", "p4", "p5"]

Thank you

Source

Cyril9227

Most helpful comment

Images are padded inside ImageList.from_tensors, used by every model, e.g. here https://github.com/facebookresearch/detectron2/blob/4bd27960e1c56eb8950b04f24c60b845d5840d8a/detectron2/modeling/meta_arch/rcnn.py#L185

ppwwyyxx on 12 Feb 2020

👍4

All 2 comments

ppwwyyxx on 12 Feb 2020

👍4

Images are padded inside ImageList.from_tensors, used by every model, e.g. here

https://github.com/facebookresearch/detectron2/blob/4bd27960e1c56eb8950b04f24c60b845d5840d8a/detectron2/modeling/meta_arch/rcnn.py#L185

This needs to be in the tutorial Partially execute a model:!