Vision: Pretrained MobileNet-V2 Backbones for Segmentation Tasks

Created on 23 Sep 2020 · 8Comments · Source: pytorch/vision

🚀 Feature

Provide a pretrained MobileNet-V2 backbone for

Instance Segmentation
Semantic Segmentation

Motivation

As ecosystems mature and more object detection / segmentation libraries are released, they always leverage Torchvision pretrained models, or use Torchvision as a template for which architectures are supported (at least when they start out).
Mmdetection, Mmsegmentation are two great examples of this.

Heavy ResNet like backbones aren't feasible to run on mobile devices, and it's rough to train a model on COCO from scratch because of the sheer resources it demands. Having a pretrained model would facilitate quicker experimentation and broader PyTorch impact overall.

By providing a MobileNet backbone, I think Torchvision would have a significant cascading impact on the extended PyTorch ecosystem.

Pitch

Train Faster Mask R-CNN and DeepLabV3 models with a MobileNet-V2 backbone using canonical PyTorch scripts. Perhaps it also makes sense to add R-CNN, Keypoint R-CNN and FCN to that list given the existing pretrained models

Alternatives

Wait for Ross Wightman to add segmentation models to https://github.com/rwightman/efficientdet-pytorch
TensorFlow and its extended ecosystem

cc @vfdev-5

help wanted models semantic segmentation

Source

rsomani95

❤3

All 8 comments

Seconded - MobileNet Backbones would be very helpful

vade on 23 Sep 2020

Torchvision provides support for mobilenetv2. You can use it as a backbone for FRCNN and Mask-RCNN.
I have used them. You need to do the following

from torchvision.models.detection import FasterRCNN

mobile_net = torchvision.models.mobilenet_v2(pretrained=True)
# print(mobile_net.features) # From that I got the output channels for mobilenet
ft_backbone = mobile_net.features
ft_backbone.out_channels = 1280
ft_mean = [0.485, 0.456, 0.406]
ft_std = [0.229, 0.224, 0.225]
ft_model = FasterRCNN(backbone=ft_backbone, num_classes=num_classes, image_mean=ft_mean, image_std=ft_std,)
# I guess this should work for MaskRCNN
ft_model = MaskRCNN(backbone=ft_backbone, num_classes=num_classes, image_mean=ft_mean, image_std=ft_std,)

This should give you F-RCNN with MobileNetv2 backbone. You won't get FPN with this though.
I have tried, all the vision models as backbones for FRCNN here.

Unsure if this extends to Keypoint RCNN and DeepLab v3.

oke-aditya on 23 Sep 2020

👍1

@oke-aditya I was under the impression that for max performance, the backbone should be finetuned on the object detection/segmentation dataset as well. Torchvision provides such ResNet backbones.

Besides, the code you provided above would be using a backbone pretrained on ImageNet, but the rest of the network isn't pretrained.

Am I reading this correctly?
Thanks

rsomani95 on 23 Sep 2020

Yes. The backbone is pretrained on ImageNet.
For ResNet FPN models torchvision provides pre trained images on COCO datasets.

COCO weights can be obtained by fine tuning these ImageNet weights on COCO Dataset.

You can fine-tune the backbone and FRCNN layers or freeze backbone and fine tune FRCNN layers. Both are possible.

oke-aditya on 23 Sep 2020

👍1

I'm not sure about max performance which is better. Even the ResNet FPN weights were obtained by fine tuning ImageNet weights over COCO.

oke-aditya on 23 Sep 2020

My feature request is precisely that torchvision provide this :)

COCO weights can be obtained by fine tuning these ImageNet weights on COCO Dataset.

rsomani95 on 23 Sep 2020

👍1

Just a minor precision. Segmentation models in torchvision: FCN and DeepLabV3 can have be initialized with the weights pre-trained on COCO train2017 dataset.