Vision: [RFC] Model for segmentation/detection problem

Created on 16 Feb 2018 · 8Comments · Source: pytorch/vision

Currently, torchvision offers state-of-the-art pretrained/architecture networks mainly for classification.

I was wondering if there is any interest to add segmentation models such as:

mask R-CNN
RetinaNet
Fast R-CNN
SegNet
etc ...

which can actually reusing the available architecture (Resnet, VGG, etc.)

enhancement models object detection semantic segmentation

Source

glemaitre

👍7

All 8 comments

Maybe there is something interesting to share from this repo

vfdev-5 on 17 Feb 2018

@vfdev-5 Yep there is quite a lot of repo around:

https://github.com/mrgloom/awesome-semantic-segmentation

So basically, I am thinking if it would not be worth to actually upstream to torchvision the state-of-the-art networks. And having already the encoder, we can actually get model.features to plug in the decoder of each segmentation net.

glemaitre on 17 Feb 2018

👍1

Yes, that's a good idea!

In some unet-like networks as they use bridges, any idea how to get these from model.features?

Edit: actually this can be done with register_forward_hook. ref

vfdev-5 on 17 Feb 2018

Yes, I agree that it would be good to extend torchvision to also have other domains (not only classification on imagenet). But I'm unsure if we should be hosting the models in here, because it would probably require training code for the models as well.
So here are my thoughts:

we have all the reusable/generic components in torchvision
training code with pre-trained models live in a separate repo

What are your thoughts on that?

fmassa on 5 Mar 2018

There are many parts in detection models indeed. I'm working with the Single-Shot multibox Detector, and planning to add it to torchvision. (see #440)

I'm thinking about adding this folder to torchvision/models:

ssd/
    ssd.py              model, layers
    box_coder.py        bounding box encoding/decoding using prior boxes
    multibox_loss.py    criterion
    utils.py            (some parts may later be integrated in the library)

Then, we could add the data augmentation code in torchvision/transforms and the dataset management in torchvision/datasets. In my opinion, the training/evaluation code would become compact enough to have it in a single script, in pytorch/examples.

lemairecarl on 6 Mar 2018

👍1

@lemairecarl I think this looks reasonable.
We might want to reorganize the models in tasks though, because we would be mixing classification models with detection / segmentation models, so a hierarchy might be clearer.

Also, will the standard be to have the models trained on COCO instead of Pascal? I think this is reasonable, and should be enforced.

fmassa on 7 Mar 2018

👍1

I agree.

I've discussed with Max deGroot (https://github.com/amdegroot), maintainer of the ssd.pytorch repo. They already had plans to integrate their implementation in torchvision. Thus I will work with them to accelerate and facilitate the process.

I'll keep you posted.

lemairecarl on 7 Mar 2018

👍1

We have added detection models (Faster R-CNN, Mask R-CNN, Keypoint R-CNN), segmentation models (FCN and DeepLabV3) and reference training / evaluation scripts in TorchVision 0.3, see https://github.com/pytorch/vision/pull/898 and https://github.com/pytorch/vision/pull/820

fmassa on 18 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings