Vision: assert error len(grid_sizes) == len(strides) == len(cell_anchors)

Created on 13 Jan 2021 · 3Comments · Source: pytorch/vision

It looks like a bug. When I do not set the AnchorGenerator() in FasterRCNN, the default anchor_sizes in ### detection/faster_rcnn.py line182 shows that 'anchor_sizes = ((32,), (64,), (128,), (512,))' which cause len(cell_anchors) == 5. And I found that in the detection/faster_rcnn.py line120 the anchor_size set '((32, 64, 128, 256, 512), )' and len(cell_anchors) == 1

question

Source

alpha-gradient

All 3 comments

Hi !

I think this error is fixed on master. #2971 #2960 #2983 #2947.

This would be error message on master / next release

if not (len(grid_sizes) == len(strides) == len(cell_anchors)):
    raise ValueError("Achors should be Tuple[Tuple[int]] because each feature "
    "map could potentially have different sizes and aspect ratios. "
    "There needs to be a match between the number of "
    "feature maps passed and the number of sizes / aspect ratios specified.")

In short, you need to pass a Tuple[Typle[int]] instead of a Tuple[int] to Anchor Generator.
This was done to avoid potentially bad results.

Also, I think we should change Line 121 from FRCNN to Tuple[Tuple[]] ?

It think that above line is causing confusion

oke-aditya on 13 Jan 2021

👍1

@alpha-gradient As @oke-aditya mentioned, the error message has been updated to make the situation less confusing.

Here is a simplified version of the code that you are quoting:

backbone = torchvision.models.mobilenet_v2(pretrained=True).features
backbone.out_channels = 1280

anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator)

The above snippet uses sizes=((32, 64, 128, 256, 512),), or in other words defines 1 level/group of 5 anchor-sizes. Why 1 level/group? Because the backbone provides only 1 ouput.

On the other hand the default anchors used in faster-rcnn is ((32,), (64,), (128,), (256,), (512,)) which means we have 5 levels/groups with 1 anchor size:

https://github.com/pytorch/vision/blob/8ebfd2f5d5f1792ce2cf5a2329320f604530a68e/torchvision/models/detection/faster_rcnn.py#L186-L188

Why is that? This is because by default it uses a Feature Pyramid as a backbone which returns 5 outputs (intermediate layers of the original backbone).

The error message that you got basically indicates that the number of outputs on the backbone should match the number of levels of anchor sizes.