Vision: RegNet in torchvision ?

Created on 8 Sep 2020  路  18Comments  路  Source: pytorch/vision

馃殌 Feature

Add RegNet trunks in torchvision

Motivation

RegNets were proposed in this paper https://arxiv.org/pdf/2003.13678.pdf, they're showing very interesting performance and speed. They have been open sourced already, but are not usable in a straightforward way for people used to having reference models in torchvision. Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.

Pitch

Start from the ClassyVision RegNet support and implement RegNets in torchvision.

Alternatives

Let users use RegNets from external implementations

Additional context

This has been discussed with @pdollar, one of the RegNet authors. CC @fmassa

cc @vfdev-5

feature request models classification

Most helpful comment

We try to keep the model implementations fairly simple -- the space of configurations for a model is potentially infinite, and trying to expose too many options can make things very hard to understand for users.

I personally think that the citation metric can easily be gamed, but that's probably not a good enough incentive.

I agree that citations per se is not a perfect metric. But given the amount of research and activity around computer vision nowadays, with 100s of papers every year claiming SOTA, we need some metric to be able to define what should be in torchvision or not -- otherwise we will end up having 100s of models which are all respectively SOTA during their respective submission time, but being SOTA doesn't involve only architectural changes to the model but also to the training recipe.

We will be adding more information about what are the criteria for a model / op to be include in torchvision in the CONTRIBUTING.md file, and we have an issue tracking it in https://github.com/pytorch/vision/issues/2651 , thanks for the discussion, let me us know if there are anything that you disagree / would like to add to the discussion.

All 18 comments

@blefaudeux thanks for the suggestion ! For which tasks you think about for this model: at least classification, right ?

Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.

Could you detail which use-cases ClassyVision implementation does not cover ?

Would you like to draft a PR for that ? Otherwise, me or someone else can do that.

@blefaudeux thanks for the suggestion ! For which tasks you think about for this model: at least classification, right ?

Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.

Could you detail which use-cases ClassyVision implementation does not cover ?

Oh, I just meant that not everyone is using ClassyVision obviously, I for instance came across users telling me that they were sticking to EfficientNets or ResNets because they were only wiling to consider Torchvision.

Would you like to draft a PR for that ? Otherwise, me or someone else can do that.

I'm not sure what there is to know for a model to be supported by torchvision, apart from the raw code (which I can handle indeed, or anybody else, no preference). Are there some licence pre-requisites, pre-trained models, authorship constraints (validation from the original authors ?), things like that ? I don't have that much time right now, so if the requirements are clear (or minimal :)) I can handle that starting from the implementation in ClassyVision, else if there's some know-how required I would gladly stay around to assist but not do it myself

Are there some licence pre-requisites, pre-trained models, authorship constraints (validation from the original authors ?), things like that ?

Excellent question. I'd say we have to provide model's implementation and ImageNet pretrained weights. In the docstring we provide some information about the model, link to the paper etc. For example, MNasNet:
https://github.com/pytorch/vision/blob/190a5f8a32f9ae775d4379c55db7e2deb6eada6b/torchvision/models/mnasnet.py#L204-L208

ImageNet pretrained weights are often coming from retraining, but there are cases when they were converted let's say from TF weights etc. If torchvision's implementation will be a copy from ClassyVision, maybe we can reuse their weights if there are ones...

@fmassa can you also comment out this question, please ?

I don't have that much time right now, so if the requirements are clear (or minimal :)) I can handle that starting from the implementation in ClassyVision, else if there's some know-how required I would gladly stay around to assist but not do it myself

No worries. I send a PR where I can copy and adapt the implementation from ClassyVision and you could check if we are correct etc.

ImageNet pretrained weights are often coming from retraining, but there are cases when they were converted let's say from TF weights etc. If torchvision's implementation will be a copy from ClassyVision, maybe we can reuse their weights if there are ones...

I could provide reference weights from a ClassyVision ImageNet training, fairly easy to reproduce (if we're ok to limit this to some members of the RegNet family, probably not all of them :)). Another option is to translate the weights in the PyCls repo/model zoo, but that's some work because the model definition is not exactly the same (even if the actual underlying architecture is, of course). CC @mannatsingh from Classy

Yes, all model weights are available in pycls, not sure how easy it is to convert them to classy vision format. Link: https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md

@blefaudeux @pdollar thanks for the details ! So, we prefer to have here certain families of RegNet implemented as in Classy than the ones in pycls format right ? I let you define those families you think the most interesting for the users.
If there are families we would like to include, but you do not have the weights, we can retrain them too.

It would be, probably, helpful for the future to add some info, here in torchvision, on the reason why we prefered one implementation (classy) to another (pycls) if they are a bit different...

@blefaudeux @pdollar thanks for the details ! So, we prefer to have here certain families of RegNet implemented as in Classy than the ones in pycls format right ? I let you define those families you think the most interesting for the users.
If there are families we would like to include, but you do not have the weights, we can retrain them too.

It would be, probably, helpful for the future to add some info, here in torchvision, on the reason why we prefered one implementation (classy) to another (pycls) if they are a bit different...

Sorry for the imprecision, I forgot that the context is clearly not trivial, trying to address that :

  • the pycls implementation is the original one, straight from FAIR (research), it exposes a lot of features which were mostly there for experimentation (architecture search) but are not useful once the best models in the family were found
  • the ClassyVision implementation was tentatively more production ready, a little easier to read, with all the code to reproduce the paper but not more
  • what I meant by "not all the regnets" in the above was that RegNets define a family of models, each of them being fully defined by a couple of coefficients, loosely comparable with ResNext for instance (change the width, depth, etc.. even if the scaling is a little more complex here). I just meant that one has to decide on a couple of models whose weights would be provided, for instance the models comparable to RN50, RN101, things like that, but covering each and every data point may not be needed. Additionally RegNets can scale to very big models for vision, 128GF or more, and training these would be very costly and probably not needed for torchvision users, or at least that was my assumption.

@blefaudeux thanks for the explanation ! I see better the context. Yes, this sounds good !

I just meant that one has to decide on a couple of models whose weights would be provided, for instance the models comparable to RN50, RN101, things like that

If you have an idea of which RegNetX, RegNetY it would be good to provide with pretrained weights. Maybe, those from the paper's tables 5 and 6 :

  • REGNETX-3.2GF
  • REGNETX-6.4GF
  • REGNETX-12GF
  • REGNETX-4.0GF
  • REGNETX-8.0GF
  • REGNETY-400MF
  • REGNETY-600MF
  • REGNETY-800MF
  • REGNETY-1.6GF
  • REGNETY-4.0GF
  • REGNETY-8.0GF

What do you think ?

Looks good to me ! Probably worth it writing a small tool to transcribe the weights in between pycls and this more streamlined implementation, it's something I could do. Not sure whether @mannatsingh would have something handy around to help on that ?

@blefaudeux I don't have anything available to convert pycls weights to classy, unfortunately.
While you should be able to get similar results by training the models from scratch in Classy (we verified this for a few models), it might be easier to just convert the weights available in pycls.

What is the motivation for only including a subset of the models? Is it because they have to be retrained?

The reason I ask is that the benefit of RegNets is they give good accuracy models across a wide range of flop regimes, as opposed to say ResNet which typically is only optimized for a narrow range of 4GF-12GF (ResNet50-ResNet152). On the other hand RegNets can be good at very small sizes (200MF) and very large sizes (32GF). The very small and very large models are potentially the most interesting (for say mobile and state-of-the-art research). So I would advocate including the full range of models if possible.

It may be better to figure out how to convert pycls weights to classy vision weights. i don't know classy vision well, but i can't imagine it being that hard?!? (famous last words :P )

@pdollar it's not so much of an issue with Classy actually, it's just that some names have changed in between the pycls and classy implementation of the RegNets, I should be able to fix that by loading/mapping names/saving again. I'm just a bit wary of something really subtle there, but from a distance it should not be too hard indeed and probably the best thing to do

Hi,

I'm not sure we would be ready to add RegNet to torchvision yet.

We generally have a requirement of number of citations of the paper containing the model model before we include it in torchvision, similarly to what we do for PyTorch. We can reconsider this decision in 6 months.

Users can obtain RegNet variants by using PyTorch Image Models https://github.com/rwightman/pytorch-image-models ,

Thanks for the context @fmassa .

@blefaudeux one more thing to clarify, in your original note, you mention -

Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.

From what I understand, torchvision's implementations are even more strict (even fewer configuration parameters allowed, if any) - @fmassa can correct me if I'm wrong. Also, you should be able to generate any RegNet using the Classy implementation, so I wouldn't want anyone reading this issue to get the wrong impression :)

@mannatsingh ah ok, that's not what I meant with does not cover all use cases. I just meant that not every user of torchvision is using classy vision obviously (MoCo randomly comes to mind, they support torchvision out of the box but that's all), so I just meant that _having an implementation in Classy was not enough to make RegNets truly accessible to the pytorch ecosystem_.

I did not know about PyTorch Image Models linked above by @fmassa, seems that I'm not the only one, but if it does the trick then why not. I personally think that the citation metric can easily be gamed, but that's probably not a good enough incentive.

We try to keep the model implementations fairly simple -- the space of configurations for a model is potentially infinite, and trying to expose too many options can make things very hard to understand for users.

I personally think that the citation metric can easily be gamed, but that's probably not a good enough incentive.

I agree that citations per se is not a perfect metric. But given the amount of research and activity around computer vision nowadays, with 100s of papers every year claiming SOTA, we need some metric to be able to define what should be in torchvision or not -- otherwise we will end up having 100s of models which are all respectively SOTA during their respective submission time, but being SOTA doesn't involve only architectural changes to the model but also to the training recipe.

We will be adding more information about what are the criteria for a model / op to be include in torchvision in the CONTRIBUTING.md file, and we have an issue tracking it in https://github.com/pytorch/vision/issues/2651 , thanks for the discussion, let me us know if there are anything that you disagree / would like to add to the discussion.

Let's keep this issue open for now to track RegNets

Was this page helpful?
0 / 5 - 0 ratings

Related issues

datumbox picture datumbox  路  3Comments

martinarjovsky picture martinarjovsky  路  4Comments

Abolfazl-Mehranian picture Abolfazl-Mehranian  路  3Comments

yxlabs picture yxlabs  路  4Comments

ArashJavan picture ArashJavan  路  3Comments