I have seen that Faster-RCNN and Mask-RCNN have recently been integrated into TorchVision, including a lot of code to support other networks like RetinaNet as well. Are there any plans to integrate RetinaNet into TorchVision?
Although some other repositories implementing RetinaNet, they contain algorithms which are now available in TorchVision. In my opinion it would be preferable to have a RetinaNet implementation being based on the well-documented and long-term-maintained TorchVision instead a custom repository. The maskrcnn-benchmark repository features a different API compared to TorchVision (BoxList classes instead normal tensors for bounding boxes).
@JohannesBrx we might add support for RetinaNet in future releases. As you have mentioned, most of the necessary building blocks are already in torchvision, so it wouldn't be that much more work.
For the next release of torchvision beginning on August, we are focusing on adding support for video models (including video readers). We might look into adding RetinaNet support on later versions (but contributions are also welcome!)
Apologies if this is hijacking the issue, but I would like to work on implementing retinanet in torchvision. I have been the main contributor on keras-retinanet and for a new project I am looking into using pytorch. I would like to spend time implementing retinanet using torchvision. Is this still a requested feature? Is there any progress or constraints I should be aware of?
@fmassa any objection if I start working on this?
@hgaiser sorry for the delay in replying.
The only constraint for now is that torchvision models are torchscript-compatible, and we would love to keep this still the case for future models. This might add some extra complexity while training the models.
Apart from that, adding a RetinaNet version in torchvision is definitely a welcome feature. You might want to have a look at in the implementation in maskrcnn-benchmark to get some initial inspiration.
Let me know if you need further pointers.
The only constraint for now is that torchvision models are torchscript-compatible, and we would love to keep this still the case for future models. This might add some extra complexity while training the models.
I will look into this, but I aim to first have a running version. As a rough outline, what would it mean to be torchscript-compatible?
Apart from that, adding a RetinaNet version in torchvision is definitely a welcome feature. You might want to have a look at in the implementation in
maskrcnn-benchmarkto get some initial inspiration.
I see. This is an outdated implementation of retinanet I take it?
Let me know if you need further pointers.
For now I think I got my hands full. I have an advantage of knowing the retinanet network well, but I'm still getting accustomed to pytorch and torchvision :). When I have something to share I will open a pull request so that we can continue the discussion there. Thanks for the offer!
@hgaiser
As a rough outline, what would it mean to be torchscript-compatible?
being torchscript-compatible means that it can be converted to run on C++ (and mobile) devices without having to modify the code. But this also means that not all Python constructs are supported.
I see. This is an outdated implementation of retinanet I take it?
The implementation in maskrcnn-benchmark can be used as a reference, and is not maintained anymore. Parts of the implementation in torchvision of RPN are similar to RetinaNet, and I think that most of the code should be shared. Basically, I think that we should only slightly modify the implementation in RPN to support RetinaNet, see https://github.com/facebookresearch/maskrcnn-benchmark/pull/102
When I have something to share I will open a pull request so that we can continue the discussion there. Thanks for the offer!
Sure! And let us know if you have other questions, regarding design for example, so that we can iterate quicker. One thing to keep in mind is that the code should work for training as well.
being torchscript-compatible means that it can be converted to run on C++ (and mobile) devices without having to modify the code. But this also means that not all Python constructs are supported.
I feel I will cross this bridge once the python implementation is done ;D
The implementation in
maskrcnn-benchmarkcan be used as a reference, and is not maintained anymore. Parts of the implementation in torchvision of RPN are similar to RetinaNet, and I think that most of the code should be shared. Basically, I think that we should only slightly modify the implementation in RPN to support RetinaNet, see facebookresearch/maskrcnn-benchmark#102
I'm currently trying to re-use as much of the existing code as I can (extending GeneralizedRCNN for example), but I'm doubting a bit if it makes sense in some parts. I am planning to make a proposal for what design I feel makes sense and then open a PR as a platform for discussion. I hope to have that done today. My goal is to have the general idea implemented; testing the model will probably happen at a later time. Would this work for you or would you like to be involved in the design at an earlier time?
One thing to keep in mind is that the code should work for training as well.
Of course! Wouldn't be a useful implementation if it doesn't train :)
FYI, there is no need to change GeneralizedRCNN nor RPN in most places, and we will just need to change (hopefully) very few lines of code.
Most helpful comment
@JohannesBrx we might add support for RetinaNet in future releases. As you have mentioned, most of the necessary building blocks are already in torchvision, so it wouldn't be that much more work.
For the next release of torchvision beginning on August, we are focusing on adding support for video models (including video readers). We might look into adding RetinaNet support on later versions (but contributions are also welcome!)