Mask_rcnn: Random cropping in data augmentation

Created on 4 Feb 2018 · 22Comments · Source: matterport/Mask_RCNN

Hello all, I want to ask someone who has succeeded to apply random cropping in data augmentation. Does it help to increase the performance accuracy? Could you tell me some tips to implement it? I known that random cropping is useful in segmentation task

Source

John1231983

Most helpful comment

This is what I am using for random cropping images with masks:

def randomCrop(img, mask, width, height):
    assert img.shape[0] >= height
    assert img.shape[1] >= width
    assert img.shape[0] == mask.shape[0]
    assert img.shape[1] == mask.shape[1]
    x = random.randint(0, img.shape[1] - width)
    y = random.randint(0, img.shape[0] - height)
    img = img[y:y+height, x:x+width]
    mask = mask[y:y+height, x:x+width]
    return img, mask

wouterdewinter on 15 Feb 2018

👍8

All 22 comments

This is what I am using for random cropping images with masks:

def randomCrop(img, mask, width, height):
    assert img.shape[0] >= height
    assert img.shape[1] >= width
    assert img.shape[0] == mask.shape[0]
    assert img.shape[1] == mask.shape[1]
    x = random.randint(0, img.shape[1] - width)
    y = random.randint(0, img.shape[0] - height)
    img = img[y:y+height, x:x+width]
    mask = mask[y:y+height, x:x+width]
    return img, mask

wouterdewinter on 15 Feb 2018

👍8

@wouterdewinter this function will change the size of image and mask for input. How do you deal with that?

paulcx on 23 Feb 2018

👍1

@paulcx i also would like to do cropping. One idea is that doing cropping before resizing in load_image_gt, then resizing will make sure all image have the same size eventually, what do you think?

keven4ever on 26 Feb 2018

@keven4ever Sounds reasonable.

Watch for this edge case, though: cropping might cause small masks that are close to the edges to be completely removed. In that case, the class IDs associated with them should be removed as well. Do the cropping as early in the code as possible to take advantage of all the checking and clean up that the code does before passing the data to the model.

waleedka on 26 Feb 2018

Hi all, I have a problem that I did not find.

I have an image of size 1024 x 1024. So I random crop the image to 256x256 for training because of GPU resource. In inference, I have to make the final prediction of 1024 x 1024 size by crop it into 256x256 and sliding it over the image. How can I make the final prediction from the result of cropped images result?

John1231983 on 26 Feb 2018

👍1

@paulcx yes, but I crop all images (and masks) to the same size, so only the crop offset changes

@keven4ever this is a problem, but it doesn't matter if some masks are removed, just discard them (and the corresponding class_id), if all masks are cropped then discard the complete sample

@John1231983 for the final prediction you increase the min / max image size to 1024 in the config, this works and you don't have to slide over the image

wouterdewinter on 27 Feb 2018

@wouterdewinter @waleedka Thx for the tips! Very helpful!

keven4ever on 27 Feb 2018

I implemented a customized the random crop function based on this work.

def random_crop(x, y, crop_size=(256,256)):
    assert x.shape[0] == y.shape[0]
    assert x.shape[1] == y.shape[1]
    h, w, _ = x.shape
    rangew = (w - crop_size[0]) // 2 if w>crop_size[0] else 0
    rangeh = (h - crop_size[1]) // 2 if h>crop_size[1] else 0
    offsetw = 0 if rangew == 0 else np.random.randint(rangew)
    offseth = 0 if rangeh == 0 else np.random.randint(rangeh)
    cropped_x = x[offseth:offseth+crop_size[0], offsetw:offsetw+crop_size[1], :]
    cropped_y = y[offseth:offseth+crop_size[0], offsetw:offsetw+crop_size[1], :]
    cropped_y = cropped_y[:, :, ~np.all(cropped_y==0, axis=(0,1))]
    if cropped_y.shape[-1] == 0:
        return x, y
    else:
        return cropped_x, cropped_y

As you can see, I cropped the images and masks and then padded them to same size. The all zero masks will be removed but will return original data if there is no mask left after cropping for time saving.

@waleedka I put this operation before resizing and padding but sometimes I encounter the error about empty sequence from build_rpn_targets. What do you propose where the implementation should be?

paulcx on 28 Feb 2018

@wouterdewinter: After trained with size of 128x128, and test in the original size of 1024x1024, I got the error

ValueError: Error when checking : expected input_image to have shape (128, 128, 3) but got array with shape (1024, 1024, 3). I think it used fully connected layer, so we cannot use different size in inference phase with training size. Do you solve it? Note that, I have changed the

    IMAGE_MIN_DIM = 1024
    IMAGE_MAX_DIM = 1024

John1231983 on 28 Feb 2018

@John1231983 did you get some progress with random cropping? I played with a little, did not see any gain so far.

In DSB 2018 train set, the smallest image is 256x256, thus i choose crop window 224x224 and set IMAGE_MIN_DIM as 224 and IMAGE_MAX_DIM as 512 (did not use 1024 due to GPU RAM pb). Such cropping is done before resizing and padding. So in the end, mask's ratio and size is kept the same in train phase. The downside is in prediction phase, the big size picture will be down sample first then predication then upsample the mask instances, for example 1024x1024-> 512x512, but anyway i can't do anything about it.

Another schema i applied is set cropping window to 500x500, then the small images will be scaled x2 first, for example 256x256->512x512 or 256x320->256x320, then cropping. Since upsample usually will not any information from original image, so this method should also keep train data as close to original as possible.

But both these two schema do not help anything with final performance, still i see train performance is quite good, but val_loss is quite high.

Any tips?

keven4ever on 12 Mar 2018

I just crop to 512x512 and I can use it due to úing titanx. Crop to 256x256 may not good because some cropped image has no mask or a few mask

John1231983 on 13 Mar 2018

@John1231983 any progress on focal loss or ohem?

paulcx on 13 Mar 2018

No. It is difficult to integrate in this repo. I have tried jaco. Loss but it is error. Have you try it? This loss is in keras contribute

John1231983 on 13 Mar 2018

same here. I has nan error with loss and I'm trying to rewrite some of functions.

paulcx on 13 Mar 2018

@John1231983 hmm, so you first upsample 256x256 images to 512x512, then do cropping? I guess this is fine since upsample doesn't lose information. Did you see any gain with random cropping? I saw you recently move up a lot in LB, congratulations, may I ask what is the key contributor?

keven4ever on 13 Mar 2018

@keven4ever: actually, i have tried to set min is 256 and max 512 and imagenet pretrain but it can not run. The error shows that the expected size is 512x512, while input size is 256x256. I am not sure what is wrong in my code. My main improved comes from mosanis image ( combined image together) it can provide 0.04 % . You can try it

John1231983 on 13 Mar 2018

@John1231983 ok, weird, I can use min 256 and max 512, then the padding fills in the gaps, actually this is one improvement I want to try later, instead of padding all image with black colour, I could detect the background colour in the image then pad with background colour. Actually, in my opinion, min 256 configuration ideally should have better accuracy, since some picture is 256x320, if all upsample to 512, then some pixel will be upsampled 1.6x (512/320), this might lose some information. However, these are all in theory, I fail to achieve a good result still.

For mosanis, do you mean image mosaicing or sth else?

keven4ever on 13 Mar 2018

@keven4ever : Try this one. First convert it to combined image
https://github.com/killthekitten/kaggle-ds-bowl-2018-baseline/blob/master/rebuild_mosaics.py

Then modify the loading dataset as
https://github.com/killthekitten/kaggle-ds-bowl-2018-baseline/issues/3

For min and max image, do you change anything in the code to working with min/max size ? For my current code, I only can run when min=max=512 :(. If I change to min=256, max=512. It shows error

 File "train.py", line 78, in <module>
    layers='heads')
  File "/home/john/DSB/model.py", line 2000, in train
    use_multiprocessing=True,
  File "/home/john/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/john/.local/lib/python3.5/site-packages/keras/engine/training.py", line 2116, in fit_generator
    val_x, val_y, val_sample_weight)
  File "/home/john/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1426, in _standardize_user_data
    exception_prefix='input')
  File "/home/john/.local/lib/python3.5/site-packages/keras/engine/training.py", line 120, in _standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected input_image to have shape (256, 512, 3) but got array with shape (512, 512, 3)
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x7f6ace931c50>>

John1231983 on 13 Mar 2018

@John1231983 thx! Regarding image size, original I had pb with zero size mask when generate bounding box in load_image_gtmethod, so I added two lines below to go through each mask instance after resizing and augmentation, in case all pixel is 0 (black) then remove it and update class_ids corresponding, this makes sure any invalid mask instances (due to resizing, cropping or any other augmentation) get removed

   # remove the mask instance which doesn't contain any mask after argumentation
    mask = mask[:,:, ~np.all(mask == 0, axis=(0, 1))]
    class_ids = class_ids[:mask.shape[-1]]



    # Bounding boxes. Note that some boxes might be all zeros
    # if the corresponding mask got cropped out.
    # bbox: [num_instances, (y1, x1, y2, x2)]
    bbox = utils.extract_bboxes(mask)

keven4ever on 13 Mar 2018

👍1

Sorry. I have fixed above error. It is because I have modified the line

self.IMAGE_SHAPE = np.array(
            [self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM, 3])

self.IMAGE_SHAPE = np.array(
            [self.IMAGE_MIN_DIM, self.IMAGE_MAX_DIM, 3])

I back up again and it works. I will check if it is better than using 512=min,512=max. For your above code, actually, I have fixed it before

John1231983 on 13 Mar 2018

@paulcx I am trying to implement Focal loss and OHEM. Were you able to do it?

ApoorvaSuresh on 28 Jun 2019

Hi all, I have a problem that I did not find.

I have an image of size 1024 x 1024. So I random crop the image to 256x256 for training because of GPU resource. In inference, I have to make the final prediction of 1024 x 1024 size by crop it into 256x256 and sliding it over the image. How can I make the final prediction from the result of cropped images result?

Hi. I have the same problem in 2020. I use transfer learning from coco and train 256X256 images that I cropped from original 1024X1024s. The model works well 256X256 images but its efficiency goes down significantly when I run it on 1024X1024 even though I change parameters for inference config to IMAGE_MIN_DIM = 1024 IMAGE_MAX_DIM = 1024. The instances that are not detected in a 1024X1024 test image can be detected if cropped out. I read the comments below but don't see a simple solution. do you have one?