I have trained an object detection model using Faster RCNN on a set of images where height and width or both of objects might be small as compared to the dimensions of the original image but the results are not good.
According to my hypothesis, this might have happened because after all the convolutions took place the feature map that remained might not have any information left regarding the objects as their dimensions are too small as compared to the dimension of the image.
So, I am just curious to know whether Mask RCNN will give me good detections or this hypothesis is true for Mask RCNN.
Thanks,
Sanyam Sharma
Also trying to figure out this problem
@Sanyam2095 What is the size of your raw images? What is the size of the mask rcnn input image? How big or small is your dataset?
@fastlater The height and width of the image are 1080p and 960p and the object occupy about 10% or less of the image. About the dataset, there are about 40k images with annotations.
I haven't used mask rcnn yet. I want to use it but I am not sure whether it will give me good results or not.
Based on my own tests, Mask RCNN should give you good results if the target is clear in the input images. Remember that the script will ask you for the input image size. If you want to reduce the size of your input image, try to resize the image by yourself and check if you still can see clearly the object of interest. I guess you also can use the original size of your image.
The problem is not the input image size but the proportion of the image my object of interest covers (less than 10%, imagine something like a keyboard in an image of a room). So changing input size is redundant.
@Sanyam2095 did you the first 12 words of my last comment?
@fastlater I read the words but was just telling about the issue I was facing and input size was not one of that.
Moreover, can you tell me which tool yo used for creating the mask of the images. Currently, I was thinking about using 'Labelme' for annotating images but it is an image level annotation tool and I wanted a tool which can create mask at video level.
Will share the progress report after doing the completing the training and checking the results.
I think an object that's 10% of the image is not small at all. With the default configuration, the network should easily handle objects as small as 32x32 pixels. You can also change the anchor sizes and detect objects that are 8x8 pixels or a little smaller. The same applies to Faster RCNN, so if you're not getting good results it might be something in the training data or the hyperparameters you've chosen.
@fastlater: I guess my above comments were a bit strongly worded. I sincerely apologize for that, it was not my intention but it happened. I really appreciate you helping me out and hope that it won't change in the future :)
Just to clarify my problem more: My dataset is for object detection (PASCAL VOC XMLs) and making one for Mask RCNN will require a lot of manual effort at my side, hence I asked here to gauge whether this effort is even worth it. Now, as you and Waleed have suggested, I will get the data made and train Mask-RCNN on it. I am closing this issue for now. Will ask again in case there is any other concern or update here if I make significant progress with my training. Thanks again for all your help.
what is the relation ship between anchor size and small object size?
@zhilaAI , anchor size is the quantum through which the network searches areas of interest. The relationship is directly proportional. Bigger the anchor size, bigger the objects in an image. You can also focus on the anchor ratios as they determine how can anchors be used in different ratios.
@waleedka
When we specify IMAGE_MAX_DIM = 1024, does it mean that all the images, masks and bounding box will be normalized according to this size?
I'm also trying to figure out this problem.
For me, the case is extreme.
My images have 640x640 px and the objects that I am trying to detect have 4-10 px diameter with 15-20 total px.
I wonder if there is a loss of information for such small objects through the FPN pyramid.
There are some hiperparameters that I think can be changed to improve detection for such objects:
but I'm not aware of the exact interpretation of the above par芒meters. I've read the Mask R-CNN original paper and the comments on the config.py file, but I didn't find what exactly those parameters mean.
Could someone indicate some references so that I could visualize better what those par芒meters mean, or the efect of changing then?
but I didn't find what exactly those parameters mean.
BACKBONE_STRIDES - I think it's strides for creating small images in FPN
RPN_ANCHOR_SCALES - length of square anchors images for creating ROI. If you have small objects tune this parameters
RPN_ANCHOR_RATIOS - it ratios between height and width for square anchors images for creating ROI.
RPN_ANCHOR_STRIDE - i don't know what it is mean.
Most helpful comment
I'm also trying to figure out this problem.
For me, the case is extreme.
My images have 640x640 px and the objects that I am trying to detect have 4-10 px diameter with 15-20 total px.
I wonder if there is a loss of information for such small objects through the FPN pyramid.
There are some hiperparameters that I think can be changed to improve detection for such objects:
but I'm not aware of the exact interpretation of the above par芒meters. I've read the Mask R-CNN original paper and the comments on the config.py file, but I didn't find what exactly those parameters mean.
Could someone indicate some references so that I could visualize better what those par芒meters mean, or the efect of changing then?