Vision: Pretrained MobileNet-V2 Backbones for Segmentation Tasks

Created on 23 Sep 2020  路  8Comments  路  Source: pytorch/vision

馃殌 Feature

Provide a pretrained MobileNet-V2 backbone for

  • Instance Segmentation
  • Semantic Segmentation

Motivation

As ecosystems mature and more object detection / segmentation libraries are released, they always leverage Torchvision pretrained models, or use Torchvision as a template for which architectures are supported (at least when they start out).
Mmdetection, Mmsegmentation are two great examples of this.

Heavy ResNet like backbones aren't feasible to run on mobile devices, and it's rough to train a model on COCO from scratch because of the sheer resources it demands. Having a pretrained model would facilitate quicker experimentation and broader PyTorch impact overall.

By providing a MobileNet backbone, I think Torchvision would have a significant cascading impact on the extended PyTorch ecosystem.

Pitch

Train Faster Mask R-CNN and DeepLabV3 models with a MobileNet-V2 backbone using canonical PyTorch scripts. Perhaps it also makes sense to add R-CNN, Keypoint R-CNN and FCN to that list given the existing pretrained models

Alternatives

  1. Wait for Ross Wightman to add segmentation models to https://github.com/rwightman/efficientdet-pytorch
  2. TensorFlow and its extended ecosystem

cc @vfdev-5

help wanted models semantic segmentation

All 8 comments

Seconded - MobileNet Backbones would be very helpful

Torchvision provides support for mobilenetv2. You can use it as a backbone for FRCNN and Mask-RCNN.
I have used them. You need to do the following

from torchvision.models.detection import FasterRCNN

mobile_net = torchvision.models.mobilenet_v2(pretrained=True)
# print(mobile_net.features) # From that I got the output channels for mobilenet
ft_backbone = mobile_net.features
ft_backbone.out_channels = 1280
ft_mean = [0.485, 0.456, 0.406]
ft_std = [0.229, 0.224, 0.225]
ft_model = FasterRCNN(backbone=ft_backbone, num_classes=num_classes, image_mean=ft_mean, image_std=ft_std,)
# I guess this should work for MaskRCNN
ft_model = MaskRCNN(backbone=ft_backbone, num_classes=num_classes, image_mean=ft_mean, image_std=ft_std,)

This should give you F-RCNN with MobileNetv2 backbone. You won't get FPN with this though.
I have tried, all the vision models as backbones for FRCNN here.

Unsure if this extends to Keypoint RCNN and DeepLab v3.

@oke-aditya I was under the impression that for max performance, the backbone should be finetuned on the object detection/segmentation dataset as well. Torchvision provides such ResNet backbones.

Besides, the code you provided above would be using a backbone pretrained on ImageNet, but the rest of the network isn't pretrained.

Am I reading this correctly?
Thanks

Yes. The backbone is pretrained on ImageNet.
For ResNet FPN models torchvision provides pre trained images on COCO datasets.

COCO weights can be obtained by fine tuning these ImageNet weights on COCO Dataset.

You can fine-tune the backbone and FRCNN layers or freeze backbone and fine tune FRCNN layers. Both are possible.

I'm not sure about max performance which is better. Even the ResNet FPN weights were obtained by fine tuning ImageNet weights over COCO.

My feature request is precisely that torchvision provide this :)

COCO weights can be obtained by fine tuning these ImageNet weights on COCO Dataset.

Just a minor precision. Segmentation models in torchvision: FCN and DeepLabV3 can have be initialized with the weights pre-trained on COCO train2017 dataset.

The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset.

https://pytorch.org/docs/stable/torchvision/models.html#semantic-segmentation

Let us see if we can retrain segmentation and detection models with MobileNetV2 backbone.

Thanks @vfdev-5, that would be amazing.

Was this page helpful?
0 / 5 - 0 ratings