Hi, I am still trying to read and understand the papers and the structure of the trained network. Mask RCNN trained on COCO is already showing promising results on my data but I was wondering how complicated it would be to present images as input augmented with a fourth channel (depth image). Would that require retraining the whole network or even changing the network's structure?
Check this out, the wiki has some guidelines for including more channels at the bottom: https://github.com/matterport/Mask_RCNN/wiki
One step it neglects to mention is that you will need to change part of the build function in mrcnn/model.py. I changed the input_image variable to accept an arbitrary number of channels that is passed in the config class:
# Inputs, changed from original mrcnn to allow variable channels besides 3
input_image = KL.Input(
shape=[None, None, config.CHANNELS_NUM], name="input_image")
Thanks for the hint,
I tried doing it following the wiki and what you mentioned but I still have some troubles loading my pretrained model.
After excluding the layer 'input_image' it complains about the second layer:
ValueError: Layer #2 (named "conv1"), weight <tf.Variable 'conv1_3/kernel:0' shape=(7, 7, 4, 64) dtype=float32_ref> has shape (7, 7, 4, 64), but the saved weight has shape (64, 3, 7, 7).
It looks like you are trying to use pretrained weights from Imagenet or Coco, which are 3 channel datasets. Should have specified in my earlier comment, but you can't use pretrained weights if your input data has a different number of channels that the pretrain dataset. At least, not without changing more of the underlying mrcnn code. These code changes will only allow you to train from scratch on your data.
Since most large image datasets are RGB and most satellite remote sensing datasets have RGB plus Near-Infrared, it'd be fantastic to be able to start from 3 channel pretrain weights with inputs of more than 3 channels. If anybody has suggestions on how to do this please comment! I'm pretty new to CNNs so advice is much appreciated.
I deleted the second channel, conv1 instead of the first channel and it seems to be training fine, was just a fast try, not sure if only the second layer was removed or all layers called conv1
Hi mehditlili
According to the network architecture, 'conv1' refers to the convolution layer 1 in particular. 'input_image' is just the input tensor while 'conv1' is actually the first Conv layer.
If you load the pre-trained weights like following codes, it works.
model.load_weights(COCO_MODEL_PATH, by_name=True,
exclude=["conv1","mrcnn_class_logits", "mrcnn_bbox_fc",
"mrcnn_bbox", "mrcnn_mask"])
I second doing both:
In model.py , in the def build function:
input_image = KL.Input(
shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")
Then before training from a coco dataset:
model.load_weights(weights_path, by_name=True, exclude=[
"mrcnn_class_logits", "mrcnn_bbox_fc",
"mrcnn_bbox", "mrcnn_mask", "conv1"])
I also issued a related PR https://github.com/matterport/Mask_RCNN/pull/940
If you train a subset of layers, remember to include conv1 since it's initialized to random weights. This is relevant if you pass layers="head" or layers="4+", ...etc. when you call train().
@moorage
May I ask you how to include Conv1 please? I am trying to use the grayscale image 2448x2048
@lunasdejavu I just did layers='all' , see https://github.com/moorage/Mask_RCNN/blob/master/samples/greppy/greppy.py#L365-L368
@moorage thanks because I wrote the code from sample/train_shapes.py
it trains 'heads' first then 'all'.
But I am not sure what is the difference between training layers separately and training all for once?
May I ask you how to include Conv1 please? I am trying to use the grayscale image 2448x2048
@moorage thanks because I wrote the code from sample/train_shapes.py
it trains 'heads' first then 'all'.
But I am not sure what is the difference between training layers separately and training all for once?
I also have these two questions. Can anyone help?
Most helpful comment
Check this out, the wiki has some guidelines for including more channels at the bottom: https://github.com/matterport/Mask_RCNN/wiki
One step it neglects to mention is that you will need to change part of the
buildfunction in mrcnn/model.py. I changed theinput_imagevariable to accept an arbitrary number of channels that is passed in the config class: