Mask_rcnn: Repeated anchors in generate_pyramid_anchors()

Created on 8 Jan 2018 · 3Comments · Source: matterport/Mask_RCNN

I'm work on the source code of Mask_RCNN and I find something interesting.
Code:

print(config.RPN_ANCHOR_SCALES)
print(config.RPN_ANCHOR_RATIOS)
print(config.BACKBONE_SHAPES)
print(config.BACKBONE_STRIDES)
print(config.RPN_ANCHOR_STRIDE)

output:

(8, 16, 32, 64, 128)
[0.5, 1, 2]
[[32 32]
 [16 16]
 [ 8  8]
 [ 4  4]
 [ 2  2]]
[4, 8, 16, 32, 64]
1

We have 5 kinds of feature maps in different size: 32*32, 16*16, 8*8, 4*4, 2*2.
In each of the pixel of each feature map, we generate 3 kinds of anchors of different ratios. In other words, the anchors in each feature map should be [32*32, 16*16, 8*8, 4*4, 2*2] * 3, but I find that the number of anchors generated by function generate_pyramid_anchors() is three times the number above.
Code:

boxes = generate_pyramid_anchors(config.RPN_ANCHOR_SCALES,
                         config.RPN_ANCHOR_RATIOS,
                         config.BACKBONE_SHAPES,
                         config.BACKBONE_STRIDES,
                         config.RPN_ANCHOR_STRIDE)

output:

scales=  8 , shape=  [32 32]
boxes.shape=  (9216, 4)
scales=  16 , shape=  [16 16]
boxes.shape=  (2304, 4)
scales=  32 , shape=  [8 8]
boxes.shape=  (576, 4)
scales=  64 , shape=  [4 4]
boxes.shape=  (144, 4)
scales=  128 , shape=  [2 2]
boxes.shape=  (36, 4)

Code:

print(boxes[-36:])

output:

array([[ -90.50966799,  -45.254834  ,   90.50966799,   45.254834  ],
       [ -90.50966799,  -45.254834  ,   90.50966799,   45.254834  ],
       [ -90.50966799,  -45.254834  ,   90.50966799,   45.254834  ],
       [ -64.        ,  -64.        ,   64.        ,   64.        ],
       [ -64.        ,  -64.        ,   64.        ,   64.        ],
       [ -64.        ,  -64.        ,   64.        ,   64.        ],
       [ -45.254834  ,  -90.50966799,   45.254834  ,   90.50966799],
       [ -45.254834  ,  -90.50966799,   45.254834  ,   90.50966799],
       [ -45.254834  ,  -90.50966799,   45.254834  ,   90.50966799],
       [ -90.50966799,   18.745166  ,   90.50966799,  109.254834  ],
       [ -90.50966799,   18.745166  ,   90.50966799,  109.254834  ],
       [ -90.50966799,   18.745166  ,   90.50966799,  109.254834  ],
       [ -64.        ,    0.        ,   64.        ,  128.        ],
       [ -64.        ,    0.        ,   64.        ,  128.        ],
       [ -64.        ,    0.        ,   64.        ,  128.        ],
       [ -45.254834  ,  -26.50966799,   45.254834  ,  154.50966799],
       [ -45.254834  ,  -26.50966799,   45.254834  ,  154.50966799],
......

Those anchors repeated 3 times. I wonder is this a bug or just for convenient?

Source

Mabinogiysk

Most helpful comment

@Superlee506

The reason why this implementation of Mask R-CNN uses only 1 scale and 3 ratios at each scale for anchors is because it incorporates FPN. As stated in the FPN paper, Section 4.1, Feature Pyramid Networks for RPN:

"Because the head slides densely over all locations at all pyramid levels, it is not necessary to have multi-scale anchors on a specific level. Instead, we assign anchors of a single scale to each level."

In other words, the FPN takes care of the scale issue by virtue by having different pyramid levels, each addressing a different scale. Thus, there is no need to have multiple scales at each FPN level. We just simply need different anchor ratios for each scale at each level.

FruVirus on 24 Feb 2018

👍5 🎉1

All 3 comments

@Mabinogiysk I find in the original paper, they used k = 9（3 scales and 3 ratios）anchors. But in this version, they just generated 3(1 scales, 3 ratios). I get confused about this.