I find for each layer you build in fpn_classifier_graph() in model.py, you add a TimeDistributed wrapper. I don't understand what the purpose is. From Keras documentation, it said the TimeDistributed wrapper has something to do with temporal data, but I don't think the feature maps are temporal data. I also read the original paper and found no clues.
Could you please explain a little bit more about that?
Thanks very much!
@apptech-evan-huang From the example of https://keras.io/layers/wrappers/.
model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3)),
input_shape=(10, 299, 299, 3)))
you can find that Conv2D is applied to the 10 timesteps(or say samples) of inputs.
The document says
This wrapper applies a layer to every temporal slice of an input.
, but the input doesn't have to be a time series. Instead, you can think of the input be 10 samples of images, each of size (299,299,3), because the operation is still the same.
In this line of code: https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L928,
x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),
name="mrcnn_class_conv1")(x)
the input has the shape [batch, num_rois, POOL_SIZE, POOL_SIZE, channels], and the TimeDistributed layer here just applies Conv2D to each of the num_rois ROIs, which has nothing to do with time series.
It seems to start to make sense to me!
Thanks very much!
@apptech-evan-huang Haha, the keras layer name TimeDistributed could be misleading here.
Most helpful comment
@apptech-evan-huang From the example of https://keras.io/layers/wrappers/.
you can find that Conv2D is applied to the 10 timesteps(or say samples) of inputs.
The document says
, but the input doesn't have to be a time series. Instead, you can think of the input be 10 samples of images, each of size (299,299,3), because the operation is still the same.
In this line of code: https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L928,
the input has the shape
[batch, num_rois, POOL_SIZE, POOL_SIZE, channels], and the TimeDistributed layer here just applies Conv2D to each of thenum_roisROIs, which has nothing to do with time series.