Mask_rcnn: roi level calcualation inside PyramidRoiAlign

Created on 20 Apr 2018 · 5Comments · Source: matterport/Mask_RCNN

Inside PyramidROIAlign, we determine the levels of the feature pyramid network to assign to the ROI in question.

The equation is from section 4.2 equation (1) of the FPN paper.

image_area = tf.cast(
            self.image_shape[0] * self.image_shape[1], tf.float32)
        roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))
        roi_level = tf.minimum(5, tf.maximum(
            2, 4 + tf.cast(tf.round(roi_level), tf.int32)))
        roi_level = tf.squeeze(roi_level, 2)

In the code comments it says that a 224x224 ROI will map to level P4. However, when we feed those params into this equation:

K = 4 + log(2, sqrt(224×224)/(224/sqrt(1024×1024))) = 4 + 10 = 14. 
roi_level = minimum(5, 14) # so we set it to P5

Then we assign the roi_level to P5 because it passed the max value of 5.
Therefore, if our ROI is larger than 224, it is automatically assigned to P5, and the issue is that P5 has a really small spatial resolution (1//64) of the original image shape, and we are giving it the bulk of the ROI's. Or so it seems maybe I am wrong.

Question (1): What are typical ROI sizes for a (1024, 1024, 3) image? Would these regions scale linearly if I reduce the input image dimension?

Question (2). If we are training at a lower resolution (say (256,256, 3)) then scaling by 256 won't really work because it is being wrapped in a log function so wouldn't that be a nonlinear scale?

Source

JonathanCMitchell

👍6

Most helpful comment

note that h and w are normalized coordinates. so your equation should be
K = 4 + log(2, sqrt(224/1024×224/1024)/(224/sqrt(1024×1024))) = 4 + 0= 4.
roi_level = minimum(5, 4) # so we set it to P4

liuwenran on 9 May 2018

👍6

All 5 comments

also interested in this +1

gustavz on 23 Apr 2018

👍2

derfe on 8 May 2018

note that h and w are normalized coordinates. so your equation should be
K = 4 + log(2, sqrt(224/1024×224/1024)/(224/sqrt(1024×1024))) = 4 + 0= 4.
roi_level = minimum(5, 4) # so we set it to P4

liuwenran on 9 May 2018

👍6

Below issue explains the reason for that:
https://github.com/matterport/Mask_RCNN/issues/217

aashokvardhan on 7 Dec 2019

If we assume h=w=ori_side/IMAGE_MAX_DIM and image_shape[0]= image_shape[1]=IMAGE_MAX_DIM,
then 4 + log2(sqrt(h * w) / (224.0 / sqrt(image_area))) = 4 + log2(ori_side/IMAGE_MAX_DIM / (224/IMAGE_MAX_DIM))
=4 + log2(ori_side/ 224), It's the same as the equation in the FPN paper.