Mask_rcnn: Suffer from overfitting

Created on 26 Feb 2018  路  98Comments  路  Source: matterport/Mask_RCNN

Hello,

I only have a small training set with about 670 labelled images and would like to further improve the accuracy by training entire backbone network instead of only heads. However, after about 30,40 epoch, the network suffer from overfitting already. ResNet already uses batch norm, so i wonder if there is sth else i can do to improve the situation? How about dropout? If i apply dropout, can i still load the pre-trainned resent weight from CoCo or Imagenet? Or some other technique? Thank you!

Most helpful comment

@keven4ever

With such a small dataset, it is unlikely that BN or dropout will help. Also, BN with dropout is probably not a good idea (see paper on BN) and I don't think you can apply dropout with the pre-trained ResNet weights since that model didn't train using dropout in the first place.

The model capacity of ResNet-101 might be too large for your dataset. While it's true that ResNet enables deeper networks to converge compared to their plain counterparts, there is still a limit on the number of layers that can be incorporated in a ResNet before convergence suffers. For example, Table 6 in the ResNet paper shows that the classification error on CIFAR-10 decreases with increasing number of layers in ResNet up until ResNet-1202. ResNet-1202 actually performs worse than ResNet-32.

To prevent overfitting, you can try:

1) Getting a larger dataset (but this is probably not feasible, otherwise you would've done this already)
2) Stronger weight decay (i.e., L2 regularization)
3) Lower model capacity (e.g., ResNet-50 or even ResNet-32)
3) k-fold cross-validation

All 98 comments

@keven4ever

With such a small dataset, it is unlikely that BN or dropout will help. Also, BN with dropout is probably not a good idea (see paper on BN) and I don't think you can apply dropout with the pre-trained ResNet weights since that model didn't train using dropout in the first place.

The model capacity of ResNet-101 might be too large for your dataset. While it's true that ResNet enables deeper networks to converge compared to their plain counterparts, there is still a limit on the number of layers that can be incorporated in a ResNet before convergence suffers. For example, Table 6 in the ResNet paper shows that the classification error on CIFAR-10 decreases with increasing number of layers in ResNet up until ResNet-1202. ResNet-1202 actually performs worse than ResNet-32.

To prevent overfitting, you can try:

1) Getting a larger dataset (but this is probably not feasible, otherwise you would've done this already)
2) Stronger weight decay (i.e., L2 regularization)
3) Lower model capacity (e.g., ResNet-50 or even ResNet-32)
3) k-fold cross-validation

@keven4ever do you use augmentations? because in ds bowl 2018 it is critical. I had the same problem, augmentations helped me a lot.

@maksimovkonstantin : Thanks. What kind of augmentation techniques do you use? How much gain do you achieved? I checked that your score is 0.437. What is score without augmentation?

@maksimovkonstantin very good question! Actually i tried augmentation (without train full backbone) which only helps to improve loss but not val_loss. Also i tried to train full backbone with default augmentation (flip l/r) which suffer from overfitting. Next step i will try to combine both, btw, what kind of augmentation did you apply? flip l/r, flip u/d, rotate 90?

@FruVirus thx for tips, i also intend to try a shallow model like ResNet-50. i saw in model.py's resnet_graph method, it supports both resnet50 and resnet101, just change architecture to resnet50 should be sufficient, right?

@keven4ever , yes I believe so. I'd be interested to hear if this helps with your dataset.

@FruVirus sure, will keep you updated! Btw, is there easy way to load coco pre-trained weight for ResNet50 FPN?

@John1231983 score without aug is 0.413

screen shot 2018-03-02 at 13 09 31
@maksimovkonstantin i tried to do some augmentation before image resizing, including flip l/r, flip u/d and rotate 90 degree, with ResNet101, as you can see, again it starts to overfit. What kind of aug did you apply? Are you using ResNet101 or 50?

@keven4ever I use default ResNet101, also I use rotate on custom angle here is my aug function

def data_augmentation(input_images,
                      h_flip=True,
                      v_flip=True,
                      rotation=360,
                      zoom=1.5,
                      brightness=0.5,
                      crop=False):
    # first is input all other are output
    # Data augmentation
    output_images = input_images.copy()
    if crop and random.randint(0, 1):
        # random crop
        h, w, c = output_images[0].shape
        upper_h, new_h, upper_w, new_w = locs_for_random_crop(h, w)
        output_images = [input_image[upper_h:upper_h + new_h, upper_w:upper_w + new_w, :] for input_image in output_images]

    # random flip
    if h_flip and random.randint(0, 1):
        output_images = [cv2.flip(input_image, 1) for input_image in output_images]
    if v_flip and random.randint(0, 1):
        output_images = [cv2.flip(input_image, 0) for input_image in output_images]

    factor = 1.0 + abs(random.gauss(mu=0.0, sigma=brightness))
    if random.randint(0, 1):
        factor = 1.0 / factor
    table = np.array([((i / 255.0) ** factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
    output_images[0] = cv2.LUT(output_images[0], table)
    if rotation:
        angle = random.randint(0, rotation)
    else:
        angle = 0.0
    if zoom:
        scale = random.randint(50, zoom * 100) / 100
    else:
        scale = 1.0
    # print(angle, scale)
    if rotation or zoom:
        for i, input_image in enumerate(output_images):
            M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), angle, scale)
            # M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), 45, 1)
            output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]))
    # print('len of output %s' % len(output_images))
    return [input_image.astype(np.uint8) for input_image in output_images]

@maksimovkonstantin looks great! Thx so much!

@maksimovkonstantin: Me too, I also got the 0.41 LB with left right, up down flip and Adam optimization. One more thing, do you use fixed dataset (made by Konstantin Lopuhin) to obtain 0.413 LB?
@keven4ever : What is optimization are you using? I am using Adam with 80 epochs with all

model.train(dataset_train, dataset_val,
            learning_rate=1e-4,
            epochs=80,
            layers='all')

@John1231983 i still use SGD as it is the one used in paper. @John1231983 are you able to avoid overfitting when train all with only flipping augmentation? Since this is exactly what i did, the only difference is optimiser.

I think I did not have it. Let see my log
screenshot from 2018-03-02 22-51-51

This is my training schedule with Adam method

LEARNING_RATE=1e-4
model.train(dataset_train, dataset_val,
            learning_rate=LEARNING_RATE,
            epochs=40,
            layers='all')
model.train(dataset_train, dataset_val, 
            learning_rate=LEARNING_RATE/10,
            epochs=80, 
            layers="all")

model.train(dataset_train, dataset_val,
            learning_rate=LEARNING_RATE/100,
            epochs=120,
            layers='all')

For above, I got 0.41 LB with the fixed dataset using resnet-50. Could you tell me what is base score did you achieve? Base score means use original mask-rcnn implementation.

@John1231983 my base score is 0.448 but as i mentioned, it is hard to reproduce however, i also managed to archive 0.44+ several times without train all network, of course i tune several parameters as mentioned in other thread

Great. I guess I miss some parameters. So I think you just change hyper-parameters and achieved 0.44+. Am I right? Do you train the network with different training input, such as gray input for one network, and color input for another network? This is my hyper-parameters setting. How about you?

   USE_MINI_MASK = True
    MINI_MASK_SHAPE = (56, 56)  
    GPU_COUNT = 1
    IMAGES_PER_GPU = 2
    bs = GPU_COUNT * IMAGES_PER_GPU
    STEPS_PER_EPOCH = 600  // bs
    VALIDATION_STEPS = 70 // bs
    NUM_CLASSES = 1 + 1 
    IMAGE_MIN_DIM = 512
    IMAGE_MAX_DIM = 512
    IMAGE_PADDING = True 
    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)  
    BACKBONE_STRIDES = [4, 8, 16, 32, 64]
    RPN_TRAIN_ANCHORS_PER_IMAGE = 320 #300
    POST_NMS_ROIS_TRAINING = 2000
    POST_NMS_ROIS_INFERENCE = 2000
    POOL_SIZE = 7
    MASK_POOL_SIZE = 14
    MASK_SHAPE = [28, 28]
    TRAIN_ROIS_PER_IMAGE = 512
    RPN_NMS_THRESHOLD = 0.7
    MAX_GT_INSTANCES = 256
    DETECTION_MAX_INSTANCES = 400 
    DETECTION_MIN_CONFIDENCE = 0.7
    DETECTION_NMS_THRESHOLD = 0.3    
    MEAN_PIXEL = np.array([42.17746161,38.21568456,46.82167803])
    WEIGHT_DECAY = 0.0001

@John1231983 correct! I think increase TRAIN_ROIS_PER_IMAGE to 512 help me boots the performance a lot, before that i got around 0.414. Also i use original image instead of gray input.

I think you can boot more using the schemes: cluster training set into 3 sets: Train each set by mask rcnn, then you obtained 3 checkpoints. After that, apply each checkpoint for each cluster in the test set.

@John1231983 do you use augmentation or you get 0.44 with your above config on clear images?

@maksimovkonstantin : I just use a simple augmentation as left right and up down. I will use your augmentation. Thanks again. For above setting, I got 0.41. Only @keven4ever achieved 0.44, not me :(

@John1231983 i tried with three class approach (white, black and purple), but only in a single model, not get as high as 0.448, but maybe 0.43 or 0.44+, so no gain. I will try your approach after manage to get all network trained.

@maksimovkonstantin actually i got 0.448 with only flip l/r augmentation.

@maksimovkonstantin : I think your code somehow wrong because you have to rotation,filp both image and its masks,boxes. Your code only augment the image

@maksimovkonstantin @John1231983 i am still not fully convinced by zoom and crop based augmentation? For example, if we always crop 128x128 patch from original image, then to use mask rcnn, we still need to scale it up to sth like 512x512, this will always increase the size of cell during training, will model fail to predict small cells?

@keven4ever : For cropping, it only for making dataset larger. Actually, for semantic segmentation, we do not need to resize a fixed size as 512x512, so it may improve performance. For mask-rcnn, we have to use a fixed input as 512x512 or 1024x1024, so I guess it will not improve performance because we add many zero padding to image

@FruVirus I tried ResNet50 and train everything from scratch. With data augmentation, there is no overfitting pb any more, however, the mAP is still much worse than training only heads with ResNet101( pre-loaded coco weight). I think pre-loaded weight makes quite much of difference (i only have single GTX 1080, took two days to train ResNet50).

@keven4ever: if I understand correctly, you only train 'head' for coco weight , and did not train 'all' to achieve .43+ score. Am I right? If that, I guess you may need to train all when you see the overfitting. For ex, train all after 20 epoches.

@John1231983 that's correct!

Thanks. Could you provide your LB using resnet50 and train from scratch. I achieved .41 with resnet50 and imagenet pretrain, train all , ignore training heads

Hey guys, anyone knows how to add focal loss?

@John1231983 I only got 0.376. Btw, where did you download the pre-trained imagenet rest50 weight?

@keven4ever : Too low. I got 0.41 with it. Now, I am using coco pretrain and hope it better.

FYI, this is the link to download pre-train models (resnet, inception...), but I used it and they provided the worst results than resnet50 https://github.com/fchollet/deep-learning-models/releases

This is my learning schedule. Do you use same as me?

model.train(dataset_train, dataset_val,
            learning_rate=bowl_config.LEARNING_RATE/10,
            epochs=10,
            layers="heads")

model.train(dataset_train, dataset_val,
            learning_rate=bowl_config.LEARNING_RATE / 10,
            epochs=40,
            layers="all")
model.train(dataset_train, dataset_val,
            learning_rate=bowl_config.LEARNING_RATE / 100,
            epochs=80,
            layers="all")

@John1231983 @keven4ever I trained with SGD 0.001 100 epochs heads and 60 epochs 4+ using pretrained coco weights and ResNet101 backbone - it gives around 0.435 score, I think key to sucess is to train all only in the very last epochs.

@maksimovkonstantin : Very funny. I have changed many setting and find the way to obtain better but it looks that using default strategy give better performance. In summarize, could you confirm to us about your strategy like that?

model.train(dataset_train, dataset_val,
                    learning_rate=config.LEARNING_RATE,
                    epochs=100,
                    layers='heads')

# Training - Stage 2
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=60,
            layers='4+')

# Training - Stage 3
# Fine tune all layers
print("Fine tune all layers")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE / 10,
            epochs=10,
            layers='all')

@John1231983 exactly!) with config below
`class BowlConfig(Config):
NAME = "nucleos"
GPU_COUNT = 2
IMAGES_PER_GPU = 1

NUM_CLASSES = 1 + 1  # background + 1 area


IMAGE_MIN_DIM = 256
IMAGE_MAX_DIM = 512
IMAGE_PADDING = True
RPN_ANCHOR_SCALES = (16, 32, 64, 128, 256)  # anchor side in pixels

 TRAIN_ROIS_PER_IMAGE = 1024

ROI_POSITIVE_RATIO = 0.33

STEPS_PER_EPOCH =  550 // (IMAGES_PER_GPU * GPU_COUNT) 


VALIDATION_STEPS = 50 // (IMAGES_PER_GPU * GPU_COUNT)

MEAN_PIXEL = [43.53, 39.56, 48.22]

LEARNING_RATE = 1e-3

USE_MINI_MASK = True
MAX_GT_INSTANCES = 500

`

@maksimovkonstantin first of all thank you for share this interesting train schema, the purpose of this competition is to get hands dirty and gain some experience, i have to say what you shared did serve this purpose for me, thank you again!

@maksimovkonstantin @John1231983 you have shared different train schema and parameters, I wonder if you configuration/schema is re-producable? The reason I am asking is that, after got my best LB score, I tried to train it again either by continue with last epoch or start from epoch 0, never be able to get similar performance any more. This also happened to some other configurations I had. Also I tried different things which in theory should improve the performance, but in fact it just provided worse score. But I only tried with different things once, so I wonder if train it multiple time, maybe I will eventually get better score. Then this makes me think that such complicated network and so many hyper-parameters, maybe the result is not so re-producable. If this is the case, instead of trying different parameter and train schema just once, we shall stick to the configuration we believe and try several times. What do you guys think?

@keven4ever I also have the same issue with reproducibility, but I hope that last my scheme will be more stable.

One more thing I want to share that is convert images (color and gray) to same space like gray space. Then after obtain the result in inference, you can consider post processing that give me some gain. I think the challenge has many problem that deep learning may not handle , i.e different image space....

@maksimovkonstantin : In your function data_augmentation, you will augment image data with random rotation, zoom...How about its masks? It must apply same the random number (scale, angle) for its masks to make consistency

@John1231983 it augments both masks and image, this function as input takes list of images, where first one is image and other are masks.

@maksimovkonstantin : Great to hear that. However, I used this function and it provides an error. This is my script

image=dataset_train.load_image(0)
masks, class_ids = dataset_train.load_mask(0)
#Image shape of (256, 320, 3) and masks shape of (256, 320, 73)
input_aug=data_augmentation([image, masks])

This is error

    input_aug=data_augmentation([image, masks])
  File "augmentation_data.py", line 46, in data_augmentation
    output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]))
cv2.error: /io/opencv/modules/imgproc/src/imgwarp.cpp:1825: error: (-215) ifunc != 0 in function remap

This is my opencv-python version

Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'3.3.1'

@John1231983 the list should be of images with shape 256 320 3, you should unpack stacked masks list to 73 mask images with 3 equal channels in each

@maksimovkonstantin : I have tried but it still errors. This is the shape of mask after I converted

masks_rgb_all=[]
for i in range(masks.shape[2]):
    mask=masks[:,:,i]
    masks_rgb = []
    for i in range (3):
        masks_rgb.append(mask)
    masks_rgb = np.stack(masks_rgb, axis=-1)
    masks_rgb_all.append(masks_rgb)
masks_rgb_all = np.stack(masks_rgb_all, axis=-1)
print (masks_rgb.shape,masks_rgb_all.shape)

input_aug=data_augmentation([image,masks_rgb_all])

(256, 320, 3) (256, 320, 3, 73)
Error still
output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0])) cv2.error: /io/opencv/modules/imgproc/src/imgwarp.cpp:1825: error: (-215) ifunc != 0 in function remap

@John1231983 @maksimovkonstantin i can confirm that the training schema starting with heads then train all does improve the performance. You can see the performance figure below. the lowest red line is the one i got highest LB score (0.448), then upper red line is training only heads, then the green line shows when i train with all after 84 epoch.

The difference with my best record is that this time i have used more augmentation (flipping l/r, flipping u/d, rotating 360 degree, brightness, but i still have not applied zooming and cropping), but it seems augmentation makes the model a little bit under fitting.

Also i only used SGD with lr 0.001. According to this: https://shaoanlu.wordpress.com/2017/05/29/sgd-all-which-one-is-the-best-optimizer-dogs-vs-cats-toy-experiment/, SGD usually can find better local optimal solution than adaptive optimizer like Adam.

screen shot 2018-03-07 at 14 15 25

@keven4ever I have very close loss chrarts, but I can't reach mask loss as you close to 0.1, i think config is the key.

@maksimovkonstantin i am not sure if config is the key since the only difference here is data augmentation, the best performance one only used flip l/r augmentation and only train heads, the config is the same. So i am totally confused, in theory both augmentation and train entire network should improve the performance instead of reduce.

@keven4ever : As I know, we are working in pixel level, so scaling mask must be careful. As my experiment (I did not try augmentation), the post-processing is most important in this challenge

@keven4ever and @maksimovkonstantin : After training the dataset many time, I found the best way to achieve 0.44+ are

  1. Using coco pretrain
  2. Train head then train all. Number of training head biger than training all
  3. Do not apply complex augmentation data, just fipud, fliplr are enough
  4. Using SGD with clipnorm. Adam is faster but as @keven4ever mentioned, it difficult to achieve local optimal
  5. Split dataset into some cluster likes gray, color, HSV... does not help improve performance. Just train the network with all types together.
  6. Post processing like dilation, CRF... are importance

Do you agree these mentioned points? What your performance now @keven4ever ? Hope you can reproduce the LB with my above tips

@John1231983 based on experiments, it looks correct, however some of them doesn't really make sense, I suspect there is some special either in Mask R-CNN implementation or in dataset, for example:

  1. why other data augmentation like rotating, brightness doesn't help with performance
  2. why add more mask classes doesn't help improve the performance

I am not sure about the last bullet, I have not trie post processing like dilation nor CRF. The only post processing I did is to clean the masks overlapping, otherwise there is submission error.

@keven4ever : I think that the baseline maskrcnn using this repo achieved around 0.4+ LB. The performance also depends on the strategy of learning. What is your LB using COCO and head, all training?

For me, the dilation (post-processing) improve my score from 0.4 to 0.43 LB. It still lower than the baseline of maskrcnn with pytorch implementation (0.5+ LB)

I'm thinking that one of the major difference is the choice of weighted focal loss used by the torch version of mask-rcnn.

@paulcx : Thanks for your information. Could you tell me which losses have been replaced by the weighted focal loss? I want to modify the repo to check the efficient.

# Losses
rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
[input_rpn_match, rpn_class_logits])
rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
[input_rpn_bbox, input_rpn_match, rpn_bbox])
class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
[target_class_ids, mrcnn_class_logits, active_class_ids])
bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
[target_bbox, target_class_ids, mrcnn_bbox])
mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
[target_mask, target_class_ids, mrcnn_mask])

It's the rpn class loss and pytorch version replace the smooth_l1 with the weighted_smooth_l1 as well.

@John1231983 very interesting, could you pls elaberate a little bit more how you did dilation? Only apply binary dilation on each predicated mask instance? What kind of kernel did you use? As i understood, binary dilation operation just enlarge the area of foreground(in this case, the mask), right? why does this improve the performance? Also how to handle in case two masks are overlapping, which one should be dilated? Thx

This is my code. You can try it and let me know how much improve LB using it.

from skimage.morphology import binary_dilation
def refineMasks(mask):
    return binary_dilation(mask, disk(1))
#Run the refine masks
for i in range(predicts.shape[2]-1):
     predicts[:,:,i] = refineMasks(predicts[:,:,i])

@John1231983 thx for sharing the code! Btw, i can confirm that there is sth we missed for data augmentation. In my config, the result using flip l/r rotating is for sure better than using other augmentation. I checked the code again, still could not find out why, like image shape in image_meta, etc

@John1231983 hmm, i applied the dilation on top of my best model (lb score 0.448), the result is 0.412. So it seems the same optimisation doesn't apply to everyone, at least in this specific case. Thank you anyway!

I think so. It is very difficult to reproduce the result. For now, I think it is better to use focal lost as pytorch version. The author of pytorch shows the baseline maskrcnn is 0.5 which is so far from our baseline

Sure, keep us updated in case you get a boost. I will continue to figure out why other data augmentation doesn't help.

One more thing, Do you try other augmentation likes flipud and rot90? Because you said that only fliplr provided best performance

@John1231983 yep, i followed @maksimovkonstantin 's code, also introduced brightness augmentation

    factor = 1.0 + abs(random.gauss(mu=0.0, sigma=brightness))
    if random.randint(0, 1):
        factor = 1.0 / factor
    table = np.array([((i / 255.0) ** factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
    output_images[0] = cv2.LUT(output_images[0], table)

How you use the function? I have tried but it cannot call for masks input. My masks input is WxHxnum_mask

ok, here is my code:

def data_augmentation(input_image, masks,
                      h_flip=True,
                      v_flip=True,
                      rotation=360,
                      zoom=1.5,
                      brightness=0.5,
                      crop=False):
    # first is input all other are output
    # Data augmentation
    output_image = input_image.copy()
    output_masks = masks.copy()
    # random crop
    # if crop and random.randint(0, 1):
    # h, w, c = output_images[0].shape
    # upper_h, new_h, upper_w, new_w = locs_for_random_crop(h, w)
    # output_images = [input_image[upper_h:upper_h + new_h, upper_w:upper_w + new_w, :] for input_image in output_images]

    # random flip
    if h_flip and random.randint(0, 1):
        output_image = np.fliplr(output_image)
        output_masks = np.fliplr(output_masks)

    if v_flip and random.randint(0, 1):
        output_image = np.flipud(output_image)
        output_masks = np.flipud(output_masks)

    factor = 1.0 + abs(random.gauss(mu=0.0, sigma=brightness))
    if random.randint(0, 1):
        factor = 1.0 / factor
    table = np.array([((i / 255.0) ** factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
    output_image = cv2.LUT(output_image, table)
    if rotation:
        rotate_times = random.randint(0, rotation/90)
    else:
        rotate_times = 0.0
    for r in range(0, rotate_times):
        output_image = np.rot90(output_image)
        output_masks = np.rot90(output_masks)

    #     if zoom:
    #         scale = random.randint(50, zoom * 100) / 100
    #     else:
    #         scale = 1.0
    #     # print(angle, scale)
    #     if rotation or zoom:
    #         for i, input_image in enumerate(output_images):
    #             M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), angle, scale)
    #             # M = cv2.getRotationMatrix2D((input_image.shape[1] // 2, input_image.shape[0] // 2), 45, 1)
    #             output_images[i] = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]))
    #     # print('len of output %s' % len(output_images))
    return output_image, output_masks

you just called it with data_augmentation(original_image, original_masks)

Thanks. I will try and let you know in my case. I know why you can use the code because you have commented the scale case. I have not the success with scale case.

@keven4ever, @John1231983 , @maksimovkonstantin

Just wanted to say that this thread has been very useful in terms of my own training. Lots of good things learned from reading what you guys have tried/done!

@FruVirus you are welcome. In case you are also on DSB2018 competition, could you pls share which score do you get?

@keven4ever

I am not on the DSB competition and unfortunately, I can't share that many details on my current work =/

@FruVirus no pb! Also learnt a lot from your tips and this is a great community, good luck with your work!

@keven4ever: do you have any improvement about your LB? Now I move to pytorch that train more faster and have more pretrained model. I will let you know if it helps the score improvement. Now I got 0.42 using this pytorch version of Heng.

Hello @John1231983, you can also try to use Keras for data augmentation. Here is the docs: https://keras.io/preprocessing/image/

x2 with @FruVirus . This thread has been very useful and worth to read. I just want to add a few things:

Did you check this repo: https://github.com/aleju/imgaug . Maybe you can try with more complex augmentations. However, remember to check that you still can see the target after processing. For example, smoothing with too high value will make your target disappear and augmentation will be not be helpful, it will be against you contaminating the data.

I haven't see too much (not sure if I skipped those comments) talk about image processing. You can see in https://www.kaggle.com/c/data-science-bowl-2018/discussion/48130#282959 that image processing also helps a lot to get better results. Just as a suggestions, maybe you can try some image processing methods (some pre-processing before feed the network and some image quality enhancement techniques before inference) Searching for the best training parameters to create a more robust model is very important. However, I believe an enhanced input image will leads to better inference results.

I am not on the DSB competition but I though I could share a few of my thoughts with you. Perhaps one of my lines could be useful for your work and lead us to a further conversation about how to improve the results using Mask RCNN in any kind of instance segmentation task.

@Hatuw I tried Keras's image generator, the challenge is that for masks, i can't use vectorized approach, instead have to for loop each mask one by one to do augmentation, this makes training quite slow. Have you found some better way?

Hi @keven4ever use can use vectorized implementation as implemented in this kernel https://www.kaggle.com/hexietufts/easy-to-use-keras-imagedatagenerator

@John1231983 thank you for asking! I actually had some progress, now 0.46+. Some findings:

  • Since the data set is small only 670 images, i use only 67 images as validation set, which probably too small, bias by some bad quality images like 7b38c9173ebe69b4c6ba7e703c0c27f39305d9b2910f46405993d2ea7a963b80. So even if val_loss starts to increase, it doesn't mean the model actually starts to overfit. So, instead i continuously train with all network and use both random crop and flipping, in the end the result is better. So i think i shall increase the size of validation set and also manually pick images as validation set.

@keven4ever : Good job. It is close to my LB. I suggest you can increase your LB by using external dataset. Some of dataset provided similar task as the challenge. I am using the dataset https://www.kaggle.com/voglinio/external-h-e-data-with-mask-annotations and it increases 0.03LB. Combined with mosaics image, I hope it can achieve 0.48 LB as a baseline. Hope the tips help you. Now, my score is 0.473 using pytorch code because of speed training.

@keven4ever
Sorry about that I am so busy these days.
I try to use the image generator in load_image_gt function, but it will make the training slow down. I think that generate some image before training is better.
I haven'd paid attention to this challenge for some days. If you have some proposal, welcome to contact me and discuss together.
Thanks!

Good discussion about image augmentation here. I just pushed an update to support imgaug augmentations out of the box, by passing an augmentation object to the train() function.

http://imgaug.readthedocs.io/en/latest/source/augmenters.html

Thanks waleedka for this pr. I think we have to add one more condition in load_image_gt to pass cropped images which have zero masks in the case number of image per gpu is 1, otherwise it will feed zero masks to network and it has nan loss. For doing it, I think we add while() with condition number of mask is bigger than 0, if not, we will try to crop another position. How do you think that?

@waleedka : In your train () function, it only support fliplr. How about add more option likes scale, rotation?

augmentation = imgaug.augmenters.Fliplr(0.5)

Does it likes?

augmentation = imgaug.augmenters.Sequential([   
    imgaug.augmenters.Fliplr(0.5), # horizontally flip 50% of the images
    imgaug.augmenters.Flipud(0.5),  # horizontally flip 50% of the images
    sometimes(iaa.CropAndPad(
            percent=(-0.05, 0.1),
            pad_mode=ia.ALL,
            pad_cval=(0, 255)
        )),
        sometimes(iaa.Affine(
            scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, # scale images to 80-120% of their size, individually per axis
            translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, # translate by -20 to +20 percent (per axis)
            rotate=(-45, 45), # rotate by -45 to +45 degrees
            shear=(-16, 16), # shear by -16 to +16 degrees
            order=[0, 1], # use nearest neighbour or bilinear interpolation (fast)
            cval=(0, 255), # if mode is constant, use a cval between 0 and 255
            mode=ia.ALL # use any of scikit-image's warping modes (see 2nd image from the top for examples)
        )),
])

@John1231983 The train() function supports all the augmentations that imgaug offers, so yes, just pass that big augmentation sequence to train() and it should work.

The code applies the same augmentations to both, images and masks, and it already knows that some augmentations apply to images only and not to masks (like changing color channels or adding Gaussian noise). But, with that said, even augmentations that are safe for masks sometimes have options that make them unsafe, so always test your augmentations on both images and masks before training.

And, thanks for the tip about images with no masks. I'll look into it.

@John1231983 Hi, John. I have tested random_crop, my score drops from 0.440 to 0.424. Here is my code. Is there something wrong?height=512 width=512 if image.shape[0]>=height&image.shape[1]>=width: if random.randint(0,1): image, mask=randomCrop(image,mask,width,height)
My learning schedule is 50epochs all(1e-4) 25epochs all(1e-5). Can you help me?

@waleedka : Thanks for your reply. I used new PR and I got the error

Epoch 1/60
 28/435 [>.............................] - ETA: 5:35 - loss: 4.6554 - rpn_class_loss: 0.2579 - rpn_bbox_loss: 1.9744 - mrcnn_class_loss: 0.0893 - mrcnn_bbox_loss: 1.7575 - mrcnn_mask_loss: 0.5764Traceback (most recent call last):
  File "train.py", line 72, in <module>
    augmentation=augmentation)
  File "/home/john/mask_rcnn/model.py", line 2300, in train
    use_multiprocessing=True,
  File "/home/john/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/john/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 2192, in fit_generator
    generator_output = next(output_generator)
  File "/home/john/anaconda3/lib/python3.6/site-packages/keras/utils/data_utils.py", line 785, in get
    raise StopIteration()
StopIteration

I have no error before use the new PR. How can I fix it? I detect that the bug is somewhere in the file utils.py that you have updated
@cccmdls : Your code is correct. But I will crop if it is bigger than 512 other wise using resize function. Let me know your LB with this one. I am using mosaics

@John1231983 there is a problem that if the image is bigger than 512 and random.randint(0,1)=0, then it will not crop this image, how do you do that?resize or crop again? Currently I am training a model that without random.randint(0,1). I want to see what happens in this situation.

@John1231983 I couldn't reproduce the error you mentioned. I tested on the train_shapes notebook and used the big augmentation you listed above and it worked. You might want to track that issue in your code. If you confirm that it's indeed a bug, please provide more details.

@John1231983 so sad only got 0.410. LR=1e-4 50all(LR)+25all(LR/10) using mosaics. coco pretrained model. Test on the stage1_test. I don't know how to split the result based on mosaics_test to stage_test csv file. Can you help me?

@waleedka: I think someone who have same error as me provided to you in other thread. I think it may be similar.
@ccmdls: i did not test on mosaic testing set. I only train on mosaic training set and test on original image. First, i will random crop 512x512 if the size of image is bigger than 512 ( not using prob crop), otherwise resize image to 512x512. I trained with 60 epochs on heads and 40 epoch on alls with learning rate 0.0001 using Adam. I don't know why someone success to train with SGD( i used sgd but got 0.44). Using above suggestion, you may got 0.47 (no post)~0.49lb (with post processing)

@John1231983 Hi,John.Thanks for your advices, But I only got 0.380,0.370,0.383,0.377 without any post processing.I can't reproduce your result. So sorry. Here is my config file. Can you give me some advices?
LEARNING_RATE = 1e-4
USE_MINI_MASK = True
MINI_MASK_SHAPE = (56, 56)
STEPS_PER_EPOCH = 392
VALIDATION_STEPS = 44
IMAGE_MIN_DIM = 512
IMAGE_MAX_DIM = 512
RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128) # anchor side in pixels, maybe add a 256?
BACKBONE_STRIDES = [4, 8, 16, 32, 64]
RPN_TRAIN_ANCHORS_PER_IMAGE = 320 #320
POST_NMS_ROIS_TRAINING = 2000
POST_NMS_ROIS_INFERENCE = 2000
# Pooled ROIs
POOL_SIZE = 7
MASK_POOL_SIZE = 14
MASK_SHAPE = [28, 28]
TRAIN_ROIS_PER_IMAGE = 512
RPN_NMS_THRESHOLD = 0.7
MAX_GT_INSTANCES = 600 #512
DETECTION_MAX_INSTANCES = 600 #400
DETECTION_MIN_CONFIDENCE = 0.7 # may be smaller?
DETECTION_NMS_THRESHOLD = 0.3 # 0.3
MEAN_PIXEL = np.array([31.92144429,29.84380259,34.66032842]) #mosaics
WEIGHT_DECAY = 0.0001

and here is my code about random crop.

height=512
width=512
if image.shape[0]>height&image.shape[1]>width:
image, mask=randomCrop(image,mask,width,height)
else:
image, window, scale, padding = utils.resize_image(
image,
min_dim=config.IMAGE_MIN_DIM,
max_dim=config.IMAGE_MAX_DIM,
padding=config.IMAGE_PADDING)
mask = utils.resize_mask(mask, scale, padding)

def randomCrop(img, mask, width, height):
x = random.randint(0, img.shape[1] - width)
y = random.randint(0, img.shape[0] - height)
img = img[y:y+height, x:x+width]
mask = mask[y:y+height, x:x+width]
return img, mask

Now that the competition is over, who knows a public github repo with the best score using mask rcnn?

The best result reported in kaggle https://www.kaggle.com/c/data-science-bowl-2018/discussion/54089 for @waleedka was 0.476 (I know this was just a baseline). I wonder if someone got higher score using matterport's Mask RCNN.

I am reading the top solution review https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741 and they used Unet. However, I am interested to know who got the best result using the matterport's script. According to the article, the preprocessing of the masks, the correct augmentations and the 2nd level model played a crucial part in the good accuracy of their solution.

It will be interested to reproduce their key steps and change the unet for mask rcnn and comparte the results under similar pre/ post processing since they mentioned that they didn't try the mask rcnn for the competition.

@keven4ever @John1231983 Do you think using the same mask processing of the winner solution, mask rcnn could do as good as the winner?

Update: In this link here was reported ZhengLi as the highest score using Mask RCNN and his solution is here

Why nobody tune WEIGHT_DECAY? I think it's a hyper parameter that will affect L2 strength.

I think so. It is very difficult to reproduce the result. For now, I think it is better to use focal lost as pytorch version. The author of pytorch shows the baseline maskrcnn is 0.5 which is so far from our baseline

@paulcx
Hi, can you point me where is the pytorch version? Thanks!

Can anyone tell me how to add codes to print the training accuracy and validation accuracy in every epoch since we want to check the model is overfitting or not?

@lunasdejavu , you can check the val_loss to know if your model overfitting or not.

@keven4ever @John1231983 @maksimovkonstantin Hello guys. Thank you so much for this discussion, that was like the best discussion I've ever read in github. I'm new in this field and I'm actually working on a project using Matterport implementation of Mask RCNN. I understood almost every technique that you mentioned in this section, but I'm confused using the training technique that you used for this competition. For example, training 20 epochs on heads and 80 epochs on all layers means that we are going to use the weights (model artifacts) generated by the first training (on heads) and use them to train on all layers ? Thank you in advance.

@keven4ever

With such a small dataset, it is unlikely that BN or dropout will help. Also, BN with dropout is probably not a good idea (see paper on BN) and I don't think you can apply dropout with the pre-trained ResNet weights since that model didn't train using dropout in the first place.

The model capacity of ResNet-101 might be too large for your dataset. While it's true that ResNet enables deeper networks to converge compared to their plain counterparts, there is still a limit on the number of layers that can be incorporated in a ResNet before convergence suffers. For example, Table 6 in the ResNet paper shows that the classification error on CIFAR-10 decreases with increasing number of layers in ResNet up until ResNet-1202. ResNet-1202 actually performs worse than ResNet-32.

To prevent overfitting, you can try:

  1. Getting a larger dataset (but this is probably not feasible, otherwise you would've done this already)
  2. Stronger weight decay (i.e., L2 regularization)
  3. Lower model capacity (e.g., ResNet-50 or even ResNet-32)
  4. k-fold cross-validation

Hello @FruVirus can you please say how to apply k-fold cross-validation for mask-rcnn matterport repo ...???
Thanks in Advance.

Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?

Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?

By looking at your val_loss if it stops decreasing. You can apply early stopping to stop the training when the model starts to overfit (depending on how much epochs the val_loss has stopped decreasing)

Hello everyone, can someone provide the information on how to check whether the model is overfitting or not?

By looking at your val_loss if it stops decreasing. You can apply early stopping to stop the training when the model starts to overfit (depending on how much epochs the val_loss has stopped decreasing)

Hi @Altimis, but if the val_loss is fluctuating ??? why does the val_loss fluctuate?

@rupa1118 this repo has an example of how to use K fold cross-validation. you can use the built-in sklearn KFold methods

Was this page helpful?
0 / 5 - 0 ratings