Vision: [Feature request] RetinaNet with TorchVision 0.3.0

Created on 22 Jul 2019 · 8Comments · Source: pytorch/vision

I have seen that Faster-RCNN and Mask-RCNN have recently been integrated into TorchVision, including a lot of code to support other networks like RetinaNet as well. Are there any plans to integrate RetinaNet into TorchVision?

Although some other repositories implementing RetinaNet, they contain algorithms which are now available in TorchVision. In my opinion it would be preferable to have a RetinaNet implementation being based on the well-documented and long-term-maintained TorchVision instead a custom repository. The maskrcnn-benchmark repository features a different API compared to TorchVision (BoxList classes instead normal tensors for bounding boxes).

enhancement help wanted models object detection

Source

JohannesBrx

Most helpful comment

@JohannesBrx we might add support for RetinaNet in future releases. As you have mentioned, most of the necessary building blocks are already in torchvision, so it wouldn't be that much more work.

For the next release of torchvision beginning on August, we are focusing on adding support for video models (including video readers). We might look into adding RetinaNet support on later versions (but contributions are also welcome!)

fmassa on 23 Jul 2019

👍4

All 8 comments

@JohannesBrx we might add support for RetinaNet in future releases. As you have mentioned, most of the necessary building blocks are already in torchvision, so it wouldn't be that much more work.

fmassa on 23 Jul 2019

👍4

Apologies if this is hijacking the issue, but I would like to work on implementing retinanet in torchvision. I have been the main contributor on keras-retinanet and for a new project I am looking into using pytorch. I would like to spend time implementing retinanet using torchvision. Is this still a requested feature? Is there any progress or constraints I should be aware of?

hgaiser on 6 Dec 2019

@fmassa any objection if I start working on this?

hgaiser on 11 Dec 2019

@hgaiser sorry for the delay in replying.

The only constraint for now is that torchvision models are torchscript-compatible, and we would love to keep this still the case for future models. This might add some extra complexity while training the models.

Apart from that, adding a RetinaNet version in torchvision is definitely a welcome feature. You might want to have a look at in the implementation in maskrcnn-benchmark to get some initial inspiration.

Let me know if you need further pointers.

fmassa on 19 Dec 2019

The only constraint for now is that torchvision models are torchscript-compatible, and we would love to keep this still the case for future models. This might add some extra complexity while training the models.

I will look into this, but I aim to first have a running version. As a rough outline, what would it mean to be torchscript-compatible?

Apart from that, adding a RetinaNet version in torchvision is definitely a welcome feature. You might want to have a look at in the implementation in maskrcnn-benchmark to get some initial inspiration.

I see. This is an outdated implementation of retinanet I take it?

Let me know if you need further pointers.

For now I think I got my hands full. I have an advantage of knowing the retinanet network well, but I'm still getting accustomed to pytorch and torchvision :). When I have something to share I will open a pull request so that we can continue the discussion there. Thanks for the offer!

hgaiser on 19 Dec 2019

@hgaiser

As a rough outline, what would it mean to be torchscript-compatible?

being torchscript-compatible means that it can be converted to run on C++ (and mobile) devices without having to modify the code. But this also means that not all Python constructs are supported.

I see. This is an outdated implementation of retinanet I take it?

The implementation in maskrcnn-benchmark can be used as a reference, and is not maintained anymore. Parts of the implementation in torchvision of RPN are similar to RetinaNet, and I think that most of the code should be shared. Basically, I think that we should only slightly modify the implementation in RPN to support RetinaNet, see https://github.com/facebookresearch/maskrcnn-benchmark/pull/102

When I have something to share I will open a pull request so that we can continue the discussion there. Thanks for the offer!

Sure! And let us know if you have other questions, regarding design for example, so that we can iterate quicker. One thing to keep in mind is that the code should work for training as well.

fmassa on 20 Dec 2019

being torchscript-compatible means that it can be converted to run on C++ (and mobile) devices without having to modify the code. But this also means that not all Python constructs are supported.

I feel I will cross this bridge once the python implementation is done ;D

The implementation in maskrcnn-benchmark can be used as a reference, and is not maintained anymore. Parts of the implementation in torchvision of RPN are similar to RetinaNet, and I think that most of the code should be shared. Basically, I think that we should only slightly modify the implementation in RPN to support RetinaNet, see facebookresearch/maskrcnn-benchmark#102

I'm currently trying to re-use as much of the existing code as I can (extending GeneralizedRCNN for example), but I'm doubting a bit if it makes sense in some parts. I am planning to make a proposal for what design I feel makes sense and then open a PR as a platform for discussion. I hope to have that done today. My goal is to have the general idea implemented; testing the model will probably happen at a later time. Would this work for you or would you like to be involved in the design at an earlier time?