Hi,
First, I wanna say thank you for providing this great implementation of Mask_RCNN. I am using this repo for a while but can't figure out a proper training strategy, so I hope someone can give me some suggestions.
Image: CBIS-DDSM (X-ray images) training:981, validation:250.
Roughly one instance per image( highly class imbalance).
A sample image and correspoing mask:

My Configurations:
BACKBONE resnet101
BATCH_SIZE 2
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
IMAGE_MAX_DIM 512
IMAGE_MIN_DIM 512
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [512 512 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
MEAN_PIXEL [53.129 53.129 53.129]
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (16, 32, 64, 128, 256)
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 512
TRAIN_ROIS_PER_IMAGE 320
WEIGHT_DECAY 0.0001
Augmentation:
augmentation = iaa.SomeOf((0, 2), [
iaa.Fliplr(0.5),
iaa.Flipud(0.5),])
I read many issues in this repo, and I summarized several strategies that could possibly help me.
Based on these suggestions, I did some incomplete research. But I haven't got nice predictions.
I tested four scenarios:( learning rate decreased by 10 after 20 epochs)
orange: init with random weights, no augmentation, 20 epochs for heads and 40 for all layers.
red: init with coco, with augmentation, 20 epochs for heads and 40 for all layers.
dark blue: init with coco, no augmentation, 20 epochs for heads and 40 for all layers.
light blue: init with coco, with augmentation, only train heads layer(should be only training classifier) for 60 epochs,


no situation shows val_loss lower than 1, any suggestions?
@shikunyu8 Can you try to reduce the anchor scale to (4, 8, 16, 32, 64) and train again?
@StanlyHardy That worth trying, I will do this. Should I init with coco weights? It seems that if I init with coco, the validation loss increases.
Another thing I would suggest trying is to tune the relative weights among the 5 losses. I would also try doing more augmentation than just flipping images.
@patrick-12sigma I tried anchor scale (4, 8, 16, 32, 64), (8, 16, 32, 64,128) and it performed worse than (16, 32, 64, 128, 256).
I tried
```
augmentation = iaa.SomeOf((0, 3), [
iaa.Fliplr(0.5),
iaa.Flipud(0.5),
iaa.OneOf([iaa.Affine(rotate=90),
iaa.Affine(rotate=180),
iaa.Affine(rotate=270)]),
iaa.Multiply((0.8, 1.5)),
iaa.GaussianBlur(sigma=(0.0, 5.0))
])
```
but I didn't see much difference in terms of AP. I think the author of this implementation is using the same loss of the original Mask-rcnn paper, so perhaps it works fine. At least I don't know how to tune that.
The image size is around 5000*3000, I don't know whether using image resizing and mini-mask will cause severe accuracy loss.
@shikunyu8 There is the configuration that controls the relative weights of different losses here.
You mentioned a good point about the the impact of resizing the original image. It really depends on the statistical distribution of the size of the objects you would like to detect. I'd do some quick stats plot to determine if most of the objects to be detected are overwhelmingly small. Or plot the sizes of the false positives (missed GT) and see if they are all small objects.
@patrick-12sigma I did that statistical analysis and also tried all possible anchor scales, but the AP is not as good as expected(best is about 0.75).
I think maybe it is because of over-fitting(I got 0.95 on test set). In mask-rcnn paper, they said: To reduce overfitting, as this training set is smaller, we train using image scales randomly sampled from [640, 800] pixels; inference is on a single scale of 800 pixels.
In this implementation, all images are in same size. I will try to add data augmentation to do this.
like this:
augmentation = iaa.SomeOf((0, 3), [
iaa.Fliplr(0.5),
iaa.Flipud(0.5),
iaa.OneOf([iaa.Affine(rotate=90),
iaa.Affine(rotate=180),
iaa.Affine(rotate=270)],
),
iaa.Affine(scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}),
iaa.Multiply((0.8, 1.5)),
iaa.GaussianBlur(sigma=(0.0, 5.0))
])
Thank you.
@shikunyu8 How can I reduce number of images trained/(batch_size) per epoch ?
Hi @shikunyu8 I'm curious if you found any strategy to be particularly effective now that you've had some time to experiment?
@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:
Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.
Hope this help.
Kunyu Shi
Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful.
Again, thanks for the insight!
@patrick-12sigma I did that statistical analysis and also tried all possible anchor scales, but the AP is not as good as expected(best is about 0.75).
I think maybe it is because of over-fitting(I got 0.95 on test set). In mask-rcnn paper, they said: To reduce overfitting, as this training set is smaller, we train using image scales randomly sampled from [640, 800] pixels; inference is on a single scale of 800 pixels.In this implementation, all images are in same size. I will try to add data augmentation to do this.
like this:augmentation = iaa.SomeOf((0, 3), [ iaa.Fliplr(0.5), iaa.Flipud(0.5), iaa.OneOf([iaa.Affine(rotate=90), iaa.Affine(rotate=180), iaa.Affine(rotate=270)], ), iaa.Affine(scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}), iaa.Multiply((0.8, 1.5)), iaa.GaussianBlur(sigma=(0.0, 5.0)) ])Thank you.
How many times do you augment the data? I mean, I do not get how many images are created by using this code.
Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:
augmentation = iaa.Sometimes(.667, iaa.Sequential([
iaa.Fliplr(0.5), # horizontal flips
iaa.Crop(percent=(0, 0.1)), # random crops
# Small gaussian blur with random sigma between 0 and 0.25.
# But we only blur about 50% of all images.
iaa.Sometimes(0.5,
iaa.GaussianBlur(sigma=(0, 0.25))
),
# Strengthen or weaken the contrast in each image.
iaa.ContrastNormalization((0.75, 1.5)),
# Add gaussian noise.
# For 50% of all images, we sample the noise once per pixel.
# For the other 50% of all images, we sample the noise per pixel AND
# channel. This can change the color (not only brightness) of the
# pixels.
iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)),
# Make some images brighter and some darker.
# In 20% of all cases, we sample the multiplier once per channel,
# which can end up changing the color of the images.
iaa.Multiply((0.8, 1.2)),
# Apply affine transformations to each image.
# Scale/zoom them, translate/move them, rotate them and shear them.
iaa.Affine(
scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
#translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
rotate=(-180, 180),
#shear=(-8, 8)
)
], random_order=True)) # apply augmenters in random order
hello, i want to know how to only train classifiers? what's the command line?
hello, i want to know how to only train classifiers? what's the command line? @shikunyu8 @patrickcgray
@190665688 Look at the color splash example. You can just train the top level of the network and use the coco pre-trained weights.
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=20,
augmentation = augmentation,
layers='heads')
notice the layers=heads
Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful.
Again, thanks for the insight!
Hello, I am very interested in what you said inspect_data.ipynb, how do you set anchor scales through this file? Can you describe how to operate it carefully? Thank you!
@ChauncyFr did you find the solution?
Dear @shikunyu8
Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.
Hi @shikunyu8 thanks so much for all the info! I've altered the anchor scales and you're right the inspect_data.ipynb helps a lot. I'm now training training with different weight_decay values and will report back how they help! I've also changed the loss weights to be 1,2,2,2,5 to prioritize the MRCNN Mask and then lowered train_rois_per_image to 32 because I don't have many objects per image. Very curious how this will impact loss. It is so difficult to only test one variable at a time because it takes me ~36 hours to do a full training run. Your step 5 has also been very helpful.
Again, thanks for the insight!Hello, I am very interested in what you said inspect_data.ipynb, how do you set anchor scales through this file? Can you describe how to operate it carefully? Thank you!
you can inspect with that file how is the size of your boundary box objects and then based on that decide to how to choose your scales. You must change the scales in your main code that run for training (look at the config file the for the default scales).
Dear @shikunyu8
Thanks for your comments on how to improve the network. I don't understand how to do your step 5? Would you please explain how should I change the code to be able to train for certain epochs on certain levels and with certain learning rate! Or maybe @patrickcgray can also help as I see he used your 5th step and said it was helpful for him.
I found the answer in issue #168. I put it here incase can help others. You can train in different stages using this example:
`
print("Train network heads")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=40,
augmentation=augmentation,
layers='heads')
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=120,
layers='4+')
print("Train all layers")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE/10,
epochs=300,
augmentation=augmentation,
layers='all')`
@banafsh89 what is your batch size? and other configuration?
@banafsh89 what is your batch size? and other configuration?
I don't have a desired result yet! my accuracy is 83% but I am aiming for 95%. My batch size is 15, backbone resnet101, and only 900 images for training+val.
The rest of my configuration is the same as nucleus.py configuration in the samples.
Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:
augmentation = iaa.Sometimes(.667, iaa.Sequential([ iaa.Fliplr(0.5), # horizontal flips iaa.Crop(percent=(0, 0.1)), # random crops # Small gaussian blur with random sigma between 0 and 0.25. # But we only blur about 50% of all images. iaa.Sometimes(0.5, iaa.GaussianBlur(sigma=(0, 0.25)) ), # Strengthen or weaken the contrast in each image. iaa.ContrastNormalization((0.75, 1.5)), # Add gaussian noise. # For 50% of all images, we sample the noise once per pixel. # For the other 50% of all images, we sample the noise per pixel AND # channel. This can change the color (not only brightness) of the # pixels. iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)), # Make some images brighter and some darker. # In 20% of all cases, we sample the multiplier once per channel, # which can end up changing the color of the images. iaa.Multiply((0.8, 1.2)), # Apply affine transformations to each image. # Scale/zoom them, translate/move them, rotate them and shear them. iaa.Affine( scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, rotate=(-180, 180), #shear=(-8, 8) ) ], random_order=True)) # apply augmenters in random order
For augmentation, will the augmentation just apply to images or both images and mask annotations?
For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?
Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:
augmentation = iaa.Sometimes(.667, iaa.Sequential([ iaa.Fliplr(0.5), # horizontal flips iaa.Crop(percent=(0, 0.1)), # random crops # Small gaussian blur with random sigma between 0 and 0.25. # But we only blur about 50% of all images. iaa.Sometimes(0.5, iaa.GaussianBlur(sigma=(0, 0.25)) ), # Strengthen or weaken the contrast in each image. iaa.ContrastNormalization((0.75, 1.5)), # Add gaussian noise. # For 50% of all images, we sample the noise once per pixel. # For the other 50% of all images, we sample the noise per pixel AND # channel. This can change the color (not only brightness) of the # pixels. iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)), # Make some images brighter and some darker. # In 20% of all cases, we sample the multiplier once per channel, # which can end up changing the color of the images. iaa.Multiply((0.8, 1.2)), # Apply affine transformations to each image. # Scale/zoom them, translate/move them, rotate them and shear them. iaa.Affine( scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, rotate=(-180, 180), #shear=(-8, 8) ) ], random_order=True)) # apply augmenters in random orderFor augmentation, will the augmentation just apply to images or both images and mask annotations?
For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?
It will be done automatically on mask too. No worries
Hi @javierfs I did some more comprehensive augmentation, and I'm only getting slightly worse on my training set than my validation set, with a total of 265 training images, my code was:
augmentation = iaa.Sometimes(.667, iaa.Sequential([ iaa.Fliplr(0.5), # horizontal flips iaa.Crop(percent=(0, 0.1)), # random crops # Small gaussian blur with random sigma between 0 and 0.25. # But we only blur about 50% of all images. iaa.Sometimes(0.5, iaa.GaussianBlur(sigma=(0, 0.25)) ), # Strengthen or weaken the contrast in each image. iaa.ContrastNormalization((0.75, 1.5)), # Add gaussian noise. # For 50% of all images, we sample the noise once per pixel. # For the other 50% of all images, we sample the noise per pixel AND # channel. This can change the color (not only brightness) of the # pixels. iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)), # Make some images brighter and some darker. # In 20% of all cases, we sample the multiplier once per channel, # which can end up changing the color of the images. iaa.Multiply((0.8, 1.2)), # Apply affine transformations to each image. # Scale/zoom them, translate/move them, rotate them and shear them. iaa.Affine( scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, #translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, rotate=(-180, 180), #shear=(-8, 8) ) ], random_order=True)) # apply augmenters in random orderFor augmentation, will the augmentation just apply to images or both images and mask annotations?
For example, for iaa.Fliplr, I just flip the images, but how about the mask annotations which is saved in the json file?It will be done automatically on mask too. No worries
Hi @banafsh89 , thanks for your reply. Another question is that do I need to change my batch size (STEPS_PER_EPOCH) after I did the augmentation. For example, my current augmentation code is:
augmentation = iaa.OneOf([
iaa.Fliplr(0.5),
iaa.Flipud(0.5),
iaa.Affine(rotate=90),
iaa.GaussianBlur(sigma=(0.0, 5.0))
])
For my understanding, the augmentation will generate a augmented image (random choose fliplr, flipud, affine and GaussianBlur) for each training image, so if there are 200 original training images then the total training images are 200 original + 200 augmented = 400.
So do I need to double my STEPS_PER_EPOCH?
STEPS_PER_EPOCH
No you don't need to change it. Set it based on your training dataset.
Apologies, very new to all this stuff. Just experimenting through, where should I put in the Augmentation code?
I mean in model.py or config.py
Apologies, very new to all this stuff. Just experimenting through, where should I put in the Augmentation code?
I mean in model.py or config.py
You need to set it in your training script, like this :
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=int(n_epochs),
layers=layers,
augmentation = imgaug.augmenters.Sequential([
imgaug.augmenters.Affine(rotate=(-45, 45))]),
class_weight = class_weights
)
@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:
- Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
- Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
- Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
- Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
- Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.
Hope this help.
Kunyu Shi
Hey, thank you so much for this generous training strategy. Actually, I'm gonna for sure use this strategy, but I need some scientific explanation, here are some questions :
1- Why do we need to do 3 different training, the first one on head, seconde one on +4 resnet layers and the last one on all layers.
2- You are saying that I need to save the weights from the frist training and use them for the seconde training and so on ?
3- How can we choose the right data augmentation method for our data ?
4- When you said that training too many iterations will overfit on training data, do you mean by "iterations" steps_per_epoch ? If this is true, so we cant use steps_per_epoch = training set size //batch_size as a general definition ?
Please I need answers for these questions, thank you in advance.
could anyone please tell me, what does TRAIN_ROIS_PER_IMAGE does?
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.
Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.
Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?
yes you are right, TRAIN_ROI_PER_IMAGE is how many ROI proposals you will feed to the mask head. RPN_NMS_THRESHOLD determines which proposals you keep during RPN training based on non-max suppression, so you can increase RPN_NMS_THRESHOLD to increase number of proposals.
@kazzastic I believe it is the number of region of interests that the RPN proposes for every image. If you have a lot of objects in your images, you should keep it high. You can check out the config.py in model folder.
But I thought it was the job of RPN_NMS_THRESHOLD to increase or decrease the number of proposals generated during training.
Or is it that RPN_NMS_THRESHOLD generates and then the TRAIN_ROI_PER_IMAGE decides how many have to be fed to the mask head?
yes you are right, TRAIN_ROI_PER_IMAGE is how many ROI proposals you will feed to the mask head. RPN_NMS_THRESHOLD determines which proposals you keep during RPN training based on non-max suppression, so you can increase RPN_NMS_THRESHOLD to increase number of proposals.
So what do you think is it always good to have a large value of TRAIN_ROI_PER_IMAGE ?
I had tried reducing it to 100 before but it worsened the loss. Training takes a bit longer but i would rather keep the value high.
Most helpful comment
@patrickcgray Hi, I did a lot of experiments and get roughly 85% recall. It's not great, but I guess that is close to the optimal parameter settings for my dataset. Several strategies worth noting:
Innit with coco and early cutoff. I observed huge increase of recall when using coco pre-trained weight, so absolutely init with it. Since my dataset is small, training too many iterations will overfit on training set which will make the model not generalizable, and the validation loss increases after certain epoch. In my experiment, I picked the weight with optimal performance.
Select proper anchor scales. This affect model performance dramatically. You can use inspect_data.ipyn to check your data or just try several levels of scales.
Tune WEIGHT_DECAY. This affect L2 strength and is a good way of preventing overfitting. Please try 0.01, 0.005 and 0.001 first, then try more precisely.
Resnet 50 is worse. Someone would argue that Resnet 50 is less complex than Resnet 101, therefore worth trying when faced with overfitting problem. But in my experiment, its performance is much worse than Resnet 101's. (I innit Resnet 50 with ImageNet weights not coco, which can be the true reason for worse performance. But I can't verify it, since I don't have Resnet 50 weights trained on coco. )
Only train classifiers can be a good idea. If your dataset size is limited, you can just train the classifiers, instead of changing the features in pre-trained weights. Or you can freeze some layers and train part of layers in Resnet like resnet4+. One example: head(20 epochs)---resnet4+(40 epochs)/10---all(60 epochs)(/10) . /10 means learning rate is divided by 10, this helps algorithm to converge easier.
Hope this help.
Kunyu Shi