Mask_rcnn: roi level calcualation inside PyramidRoiAlign

Created on 20 Apr 2018  路  5Comments  路  Source: matterport/Mask_RCNN

Inside PyramidROIAlign, we determine the levels of the feature pyramid network to assign to the ROI in question.

The equation is from section 4.2 equation (1) of the FPN paper.

image_area = tf.cast(
            self.image_shape[0] * self.image_shape[1], tf.float32)
        roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))
        roi_level = tf.minimum(5, tf.maximum(
            2, 4 + tf.cast(tf.round(roi_level), tf.int32)))
        roi_level = tf.squeeze(roi_level, 2)

In the code comments it says that a 224x224 ROI will map to level P4. However, when we feed those params into this equation:

K = 4 + log(2, sqrt(224脳224)/(224/sqrt(1024脳1024))) = 4 + 10 = 14. 
roi_level = minimum(5, 14) # so we set it to P5

Then we assign the roi_level to P5 because it passed the max value of 5.
Therefore, if our ROI is larger than 224, it is automatically assigned to P5, and the issue is that P5 has a really small spatial resolution (1//64) of the original image shape, and we are giving it the bulk of the ROI's. Or so it seems maybe I am wrong.

Question (1): What are typical ROI sizes for a (1024, 1024, 3) image? Would these regions scale linearly if I reduce the input image dimension?

Question (2). If we are training at a lower resolution (say (256,256, 3)) then scaling by 256 won't really work because it is being wrapped in a log function so wouldn't that be a nonlinear scale?

Most helpful comment

note that h and w are normalized coordinates. so your equation should be
K = 4 + log(2, sqrt(224/1024脳224/1024)/(224/sqrt(1024脳1024))) = 4 + 0= 4.
roi_level = minimum(5, 4) # so we set it to P4

All 5 comments

also interested in this +1

+1

note that h and w are normalized coordinates. so your equation should be
K = 4 + log(2, sqrt(224/1024脳224/1024)/(224/sqrt(1024脳1024))) = 4 + 0= 4.
roi_level = minimum(5, 4) # so we set it to P4

Below issue explains the reason for that:
https://github.com/matterport/Mask_RCNN/issues/217

If we assume h=w=ori_side/IMAGE_MAX_DIM and image_shape[0]= image_shape[1]=IMAGE_MAX_DIM,
then 4 + log2(sqrt(h * w) / (224.0 / sqrt(image_area))) = 4 + log2(ori_side/IMAGE_MAX_DIM / (224/IMAGE_MAX_DIM))
=4 + log2(ori_side/ 224), It's the same as the equation in the FPN paper.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

LifeBeyondExpectations picture LifeBeyondExpectations  路  4Comments

Jargon4072 picture Jargon4072  路  3Comments

msson picture msson  路  4Comments

techjjun picture techjjun  路  4Comments

ziyigogogo picture ziyigogogo  路  3Comments