Mask_rcnn: Images with more than 3 channels

Created on 20 Jul 2018  路  11Comments  路  Source: matterport/Mask_RCNN

Hi, I am still trying to read and understand the papers and the structure of the trained network. Mask RCNN trained on COCO is already showing promising results on my data but I was wondering how complicated it would be to present images as input augmented with a fourth channel (depth image). Would that require retraining the whole network or even changing the network's structure?

Most helpful comment

Check this out, the wiki has some guidelines for including more channels at the bottom: https://github.com/matterport/Mask_RCNN/wiki

One step it neglects to mention is that you will need to change part of the build function in mrcnn/model.py. I changed the input_image variable to accept an arbitrary number of channels that is passed in the config class:

# Inputs, changed from original mrcnn to allow variable channels besides 3
        input_image = KL.Input(
            shape=[None, None, config.CHANNELS_NUM], name="input_image")

All 11 comments

Check this out, the wiki has some guidelines for including more channels at the bottom: https://github.com/matterport/Mask_RCNN/wiki

One step it neglects to mention is that you will need to change part of the build function in mrcnn/model.py. I changed the input_image variable to accept an arbitrary number of channels that is passed in the config class:

# Inputs, changed from original mrcnn to allow variable channels besides 3
        input_image = KL.Input(
            shape=[None, None, config.CHANNELS_NUM], name="input_image")

Thanks for the hint,
I tried doing it following the wiki and what you mentioned but I still have some troubles loading my pretrained model.
After excluding the layer 'input_image' it complains about the second layer:

ValueError: Layer #2 (named "conv1"), weight <tf.Variable 'conv1_3/kernel:0' shape=(7, 7, 4, 64) dtype=float32_ref> has shape (7, 7, 4, 64), but the saved weight has shape (64, 3, 7, 7).

It looks like you are trying to use pretrained weights from Imagenet or Coco, which are 3 channel datasets. Should have specified in my earlier comment, but you can't use pretrained weights if your input data has a different number of channels that the pretrain dataset. At least, not without changing more of the underlying mrcnn code. These code changes will only allow you to train from scratch on your data.

Since most large image datasets are RGB and most satellite remote sensing datasets have RGB plus Near-Infrared, it'd be fantastic to be able to start from 3 channel pretrain weights with inputs of more than 3 channels. If anybody has suggestions on how to do this please comment! I'm pretty new to CNNs so advice is much appreciated.

I deleted the second channel, conv1 instead of the first channel and it seems to be training fine, was just a fast try, not sure if only the second layer was removed or all layers called conv1

Hi mehditlili
According to the network architecture, 'conv1' refers to the convolution layer 1 in particular. 'input_image' is just the input tensor while 'conv1' is actually the first Conv layer.
If you load the pre-trained weights like following codes, it works.

model.load_weights(COCO_MODEL_PATH, by_name=True,
                   exclude=["conv1","mrcnn_class_logits", "mrcnn_bbox_fc", 
                                  "mrcnn_bbox", "mrcnn_mask"])

I second doing both:

In model.py , in the def build function:

        input_image = KL.Input(
            shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")

Then before training from a coco dataset:

model.load_weights(weights_path, by_name=True, exclude=[
                "mrcnn_class_logits", "mrcnn_bbox_fc",
                "mrcnn_bbox", "mrcnn_mask", "conv1"])

If you train a subset of layers, remember to include conv1 since it's initialized to random weights. This is relevant if you pass layers="head" or layers="4+", ...etc. when you call train().

@moorage

May I ask you how to include Conv1 please? I am trying to use the grayscale image 2448x2048

@moorage thanks because I wrote the code from sample/train_shapes.py
it trains 'heads' first then 'all'.
But I am not sure what is the difference between training layers separately and training all for once?

May I ask you how to include Conv1 please? I am trying to use the grayscale image 2448x2048

@moorage thanks because I wrote the code from sample/train_shapes.py
it trains 'heads' first then 'all'.
But I am not sure what is the difference between training layers separately and training all for once?

I also have these two questions. Can anyone help?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PaulChongPeng picture PaulChongPeng  路  4Comments

simonhandsome picture simonhandsome  路  3Comments

wjdhuster2018 picture wjdhuster2018  路  3Comments

Mabinogiysk picture Mabinogiysk  路  3Comments

simone-codeluppi picture simone-codeluppi  路  3Comments