Hi there! I've finally been able to get a Mask R-CNN to train but unfortunately the results are not great. This is probably due to the fact that I'm using medical imaging data (ultrasound images) and the fact that by nature, medical imaging datasets (especially annotated ones) are small.
I was wondering if you have any advice on how to improve the results.
For context, I have 5635 images total (including train and val - I do a 90/10 split) that are of size 420 x 580. I've been using pretrained resnet-50 weights and am currently trying resnet-101. There's typically only one mask per image.
Here's a sample output from the model:

Here's what the mask should be (the blacked out area in the image):

Here's my config file for resnet-50 (note that I'm training on one GPU):
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
BACKBONE:
CONV_BODY: "R-50-FPN"
OUT_CHANNELS: 256
RPN:
USE_FPN: True
ANCHOR_SIZES: (32, 64, 128, 256, 512)
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
PRE_NMS_TOP_N_TRAIN: 2000
PRE_NMS_TOP_N_TEST: 1000
POST_NMS_TOP_N_TEST: 1000
FPN_POST_NMS_TOP_N_TEST: 1000
ROI_HEADS:
USE_FPN: True
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
NUM_CLASSES: 2
ROI_MASK_HEAD:
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
PREDICTOR: "MaskRCNNC4Predictor"
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 2
RESOLUTION: 28
SHARE_BOX_FEATURE_EXTRACTOR: False
MASK_ON: True
DATASETS:
TRAIN: ("nerve_train",)
TEST: ("nerve_val",)
DATALOADER:
NUM_WORKERS: 0
SIZE_DIVISIBILITY: 32
INPUT:
MIN_SIZE_TRAIN: 420
MAX_SIZE_TRAIN: 580
MIN_SIZE_TEST: 420
MAX_SIZE_TEST: 580
SOLVER:
BASE_LR: 0.0025
WEIGHT_DECAY: 0.0001
STEPS: (60000, 80000)
MAX_ITER: 90000
IMS_PER_BATCH: 2
TEST:
IMS_PER_BATCH: 2
Thank you so much in advance.
Hi,
There is a number of things that could be done / checked.
Is this the sample output you shew a training image or testing image?
Looking at the sample, I think the RPN part is not working as you can see no predicted bbox overlaps with true bbox. In this case I would first train the RPN head only and see how the detection works.
Have you figured it out? I'm facing the same problem and I thought it was the NMS problem but im still testing
I think the parameter that matters is the ROI_HEAD.SCORE_THRESH, when I changed it from 0.05 to 0.3, most boxes are gone.
Most helpful comment
Is this the sample output you shew a training image or testing image?
Looking at the sample, I think the RPN part is not working as you can see no predicted bbox overlaps with true bbox. In this case I would first train the RPN head only and see how the detection works.