I am training a model on a custom dataset, and I am having difficulty getting the model to identify large features. It is comically bad at it. The length of the bounding box sides of these features can range from ~15% to 100% of the length of the image. Using an altered inspect_model notebook, I have found that the RPN fails to predict candidate regions that are large. In response I've tried changing the RPN anchor size, stride, anchor ratio etc... Nothing seems to yield much of an improvement. What can I do to make my Mask RCNN model better identify large features? Thanks.
When reading the paper (faster rcnn & mask rcnn), I noticed a point that the authors don't explain clearly.
You can see that the receptive field of ZF network is only about 171 and VGG 19's is about 268 (please correct me if I'm wrong). Therefore, no matter how big the anchor box is, it seems unreasonable for the RPN to predict a bounding box bigger than the receptive field of the network used to extract the features.
Therefore, to resolve the problem, I think we need to increase the receptive field of the shared network by adding more covolution layers or increase it kernel size instead.
That might explain why the mask rcnn implemented by fair uses other network such as resnet (which definitely has remarkable receptive field size.
There's also some points that I cannot understand now.
If you read table 1 in Faster R-CNN paper, you may notice that the average proposal size predicted for each anchor is very large. This is really confusing me.
If you have time, we may discuss further about this problem.
What is interesting isn't that it is not predicting large objects. It's just really bad at it. It's confidence and accuracy is extremely low, so the larger anchors never make it through the threshold cutoffs when it is pruning proposals. The solution may just be creating a weakly labeled dataset that focuses on the large features. Train it. And then train the same weights on the strongly labeled images.
Here I found a blog mentioning about the proper anchor box size - link
By the way, if possible, after you replace your feature-extractor with some bigger network such as resnet, can you post the result here?
Most helpful comment
Here I found a blog mentioning about the proper anchor box size - link
By the way, if possible, after you replace your feature-extractor with some bigger network such as resnet, can you post the result here?