Vision: [feature request] ROI Pooling layers

Created on 25 Apr 2018 · 45Comments · Source: pytorch/vision

It would be great to have support for various ROI Pooling operations as easy to add layers to facilitate research in object detection and semantic/instance segmentation.

Here is a live checklist:

[x] ROI Pooling #592 #632
[ ] Position Specific ROI Pooling
[x] ROI Align #630

General PRs: #626

Source

varunagrawal

👍2

Most helpful comment

Check out #708
@fmassa plans to merge layers for v3. Let's hope the next release is pushed out soon.

varunagrawal on 18 Jan 2019

👍7 🎉3

All 45 comments

I agree. I've started sketching the structure of it in https://github.com/pytorch/vision/tree/layers?files=1 .
I'll look into opening a PR tomorrow with a few layers

fmassa on 25 Apr 2018

👍7 🎉2

Any movement on this?

wadimkehl on 15 May 2018

Hey Wadim,
ROIPool and ROIAlign are implemented in the layers branch. I'm holding on merging them as is because I might want to change a few things, but feel free to use them as is (they are working)

fmassa on 15 May 2018

🎉3

Great, will have a look! Thanks :)

wadimkehl on 15 May 2018

@fmassa any updates on this? I'm sure a lot of people would benefit from having a master branch version of this available soon.

varunagrawal on 18 Jun 2018

👍2

It would be super convenient to have this installed automatically with torch/torchvision

botcs on 19 Jun 2018

Having the master branch have cpu/cuda layers officially requires a few additional changes, like providing wheels with the compiled binaries for each supported architecture, and I'm not looking at this at the moment.

fmassa on 19 Jun 2018

Just wondering if the ROI pooling/align could theoretically be done in pure Pytorch (even if it will be slow?)

kevinlu1211 on 19 Jun 2018

Was thinking about the same...
ROI pooling: Adaptive MaxPools exist, if you can efficiently crop out all the tensors you need from each image in a batch and concat them in the batch dimension, maybe it could work, however I have a bad feeling about the efficiency of this naive approach

botcs on 19 Jun 2018

@kevinlu1211 it is possible to implement it using pure PyTorch, and performance is OK.
An (old, badly tested) implementation can be found in https://github.com/pytorch/examples/pull/21/files#diff-7573d025c4128229f8efa3ff042e09d1R38

fmassa on 19 Jun 2018

You are a life saver! I’m just writing a tutorial to explain mask rcnn
thanks a lot!

On Tue, 19 Jun 2018 at 11:44 pm, Francisco Massa notifications@github.com
wrote:

@kevinlu1211 https://github.com/kevinlu1211 it is possible to implement
it using pure PyTorch, and performance is OK.
An (old, badly tested) implementation can be found in
https://github.com/pytorch/examples/pull/21/files#diff-7573d025c4128229f8efa3ff042e09d1R38

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/pytorch/vision/issues/477#issuecomment-398403617, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AME8D-m7bWhfCgW9yzr5VXr8fwUFcI4Sks5t-QAkgaJpZM4Tifuk
.

kevinlu1211 on 19 Jun 2018

@fmassa I find it surprising this is not higher priority. The fact that every other major deep learning framework supports ROI Pooling and there is no easy way to write a Pytorch version of Detectron for research purposes despite the deep integration between Pytorch and Caffe2 is bewildering.

Is there some other way we can push this forward if you're too busy? I'm sure we can find volunteers to push this out the door as soon as possible.

varunagrawal on 19 Jun 2018

👍9

Come on @fmassa, make us all happy. If you don't have time, I'd gladly help!

wadimkehl on 13 Jul 2018

👍1

Yeah, captain @fmassa you have almost an army of volunteers that wait for your orders :)

vfdev-5 on 13 Jul 2018

😄2

@fmassa I guess I've figured out how to get the CppExtension module to work for me and I should be able to finish this feature.

I see you have TODOs to pull some common CUDA utilities out into a common file. Any other things you'd like to do before I make a PR?

varunagrawal on 13 Jul 2018

👍1

Hey guys, sorry for the delay here.

So, there are a number of things that should be done in order to be able to put this in torchvision:

package wheels with CPU / CUDA compiled code
add proper unit tests
documentation
code clean-up
CppExtensions and ATen rapidly changing and breaking the code :-)

I've been doing some great progress on Detectron, and I've currently moved all those layers to the detectron repo for the moment. I'm currently hesitating if I should put those layers in torchvision because of the aforementioned difficulties.

What do you guys think?

fmassa on 13 Jul 2018

👍1

Is the last issue a constantly persisting one? All the others I do not perceive to be big problems for a WIP branch, really. But it would enable everyone to have a working, if temporary, in-house pytorch solution.

wadimkehl on 13 Jul 2018

@fmassa I believe I can take care of everything else except Wheel generation since I'm not familiar with the python packaging pipeline at FB.

As @wadimkehl mentioned, is there a checkpoint we can use for CppExtension and ATen? I used the latest master of PyTorch as of yesterday and your branch compiles fine as is.

varunagrawal on 13 Jul 2018

@varunagrawal concerning wheels and packaging you can take a look at pytorch/builder

@fmassa I think torchvision can have a scope to provide models/datasets/transforms for tasks like

classification (as it is today)
segmentation (need new models and transforms that takes care of masks)
detection (at least transformations and maybe a stuff to encode/decode ground truth)
- bboxes
- keypoints

IMHO, ROI Pooling is very specific to an architecture and if torchvision is not intended to merge inside itself the research on faster-rcnn-like nets, this can be avoided.

Any thoughts?

vfdev-5 on 13 Jul 2018

@vfdev-5 your suggestion would turn this into a Chicken & Egg problem since we need ROI Pooling to implement even a basic RCNN model.

Given that Detectron supports Faster RCNN, Caffe2 is now intrinsically linked to Pytorch, and 2 stage detectors are still highly looked into in research (e.g. Light Head RCNN) and industry, having a ROI pooling/align layer would be beneficial for torchvision overall.

While I agree with your categorization for different tasks such as classification, segmentation and detection, doing so would require significant effort which the Pytorch team isn't able to provide given the priority of v1. Indeed, I have already spoken to @soumith about a separate repo for detection and segmentation related tasks and he's shown considerable interest. Until then, and looking at the large amount of interest on this issue, having the layers here for now would be sufficient.

varunagrawal on 23 Jul 2018

Sorry for the delay in replying.

@wadimkehl @varunagrawal as of today, my branch doesn't compile anymore on latest PyTorch because of https://github.com/pytorch/pytorch/pull/9435, and patches such as https://github.com/ngimel/pytorch/commit/ae176af8feb546b70eabed68e260d897bc4f7627 should be applied to ROIAlign (and maybe other functions as well)

This has been the case at least 3-4 times for me already, which means that supporting those extension layers officially in torchvision at the moment would be hard to maintain -- if the user updates PyTorch, torchvision breaks, if the user update torchvision but not pytorch, it also breaks, he needs to update both at the same time. This was a recurring issue with Lua-Torch, and I'd rather avoid it at the moment.

About where to put the aforementioned layers, I'm not yet convinced on what is the right solution.
In the one hand, those backward-compatibility issues makes me hesitant to put them in torchvision (as it's widely used and up to now has been a python-only lib), so putting them in Detectron would make sense.
On the other hand, having a unified place where the basic building blocks can be found is a nice thing to have.
I think once we release Detectron, we might converge into gradually migrating a few layers and abstractions to torchvision, as the BC breakages on the C++ level would be less recurrent I'd hope.

fmassa on 30 Jul 2018

@fmassa the good news is I have forked your branch and already made all the fixes. As of 07/27/2018, the ROI Pooling layer compiles successfully on my branch and I have also added a whole bunch of tests to check for correctness.

I can submit the PR and continue to maintain ROI Pooling (and ROI Align hopefully soon) until we get more stability from ATen and checkpoint at either PyTorch v0.5 or v1.

varunagrawal on 30 Jul 2018

Sure, if you send a PR to the layers branch, I will merge it ,thanks!
But I'm not going to be merging the layers branch into master before things stabilize, which might be before v1.0

fmassa on 30 Jul 2018

👍1

That works! For now let's point people towards the layers branch until we get the desired stability.

varunagrawal on 30 Jul 2018

Added support for ROIAlign with #630.

varunagrawal on 17 Oct 2018

👍3

FYI, we have released our implementation of {Faster, Mask} R-CNN in https://github.com/facebookresearch/maskrcnn-benchmark , which contains the implementations for ROI Pooling and ROI Align. It currently doesn't have all the nice improvements that @varunagrawal has pushed to the layers branch here (like backwards for a few layers).

I suggest we move this discussion there for now.

fmassa on 25 Oct 2018

It would be wonderful if the (working) ROI Pooling code in the layers branch could be updated and merged into torchvision. I think I speak for myself and many other vision researchers in that this is an essential functionality, and having it supported in the current torchvision is far less of a hassle than continuing to build this repo from source using an outdated branch.

seanremy on 18 Jan 2019

Check out #708
@fmassa plans to merge layers for v3. Let's hope the next release is pushed out soon.

varunagrawal on 18 Jan 2019

👍7 🎉3

@fmassa do you want to reopen this issue until we can get all the related PRs merged? I'll update the original Issue comments with the PR numbers to help keep track.

varunagrawal on 11 Mar 2019

@varunagrawal I'm going to be merging the layers branch this weekend. Thanks a lot for the awesome help improving it!

fmassa on 31 Mar 2019

❤4

@fmassa When is the model Roi pooling available on the master branch?

dungmn on 29 May 2019

It already is with 0.3

wadimkehl on 29 May 2019

👍3

Is anyone working on position sensitive ROI pooling similar to this one: https://github.com/tensorflow/models/blob/f9fe0fe97aee7964ac344ce38bafb20e977586dc/research/object_detection/utils/ops.py#L652?

LukasBommes on 30 Aug 2019

@LukasBommes there is an open PR adding it to torchvision, see https://github.com/pytorch/vision/pull/1259

fmassa on 30 Aug 2019

👍1

Hi all,
I need to extract different scales of ROIs to have 7x7, 5x6, 1x1
Any help please?

MitraTj on 17 Sep 2019

@MitraTj just add different RoIPool layers with different output sizes.

fmassa on 17 Sep 2019

Hi all, I would ask is there any implementation of an average version of ROI Pooling?
I find existing ROI Pool is only implemented with max pool.

XuYunqiu on 3 Oct 2019

@XuYunqiu there is RoIAlign, which performs bilinear interpolation (instead of max).
Would that be ok for your use-case?

fmassa on 3 Oct 2019

@XuYunqiu there is RoIAlign, which performs bilinear interpolation (instead of max).
Would that be ok for your use-case?

@fmassa Thanks for your quick reply. Actually, I just want to get the mean values of each ROIs.
So can I use ROI Align like this roi_align(conv_feat, rois, 1, spatial_scale=1.0/stride, sampling_ratio=1) ?

XuYunqiu on 4 Oct 2019

I've actually been considering adding average pooling as an option to the ROI operations. It's not hard and allows for some nice generalization.

varunagrawal on 4 Oct 2019

👍1

I've actually been considering adding average pooling as an option to the ROI operations. It's not hard and allows for some nice generalization.

Exactly, it will be helpful.

XuYunqiu on 4 Oct 2019

@XuYunqiu yes, that is going to be doing roughly what you are looking for

fmassa on 4 Oct 2019

@XuYunqiu yes, that is going to be doing roughly what you are looking for

But it might not work well with ROIs with a large area. I think the output using bilinear interpolation only relevant to a quite local context of the sample location (i.e., the center of ROIs in my case).

XuYunqiu on 4 Oct 2019

@fmassa Hi, sorry to bother you again. Would you mind to tell me which mode (average or max pool) is selected in RoIAlign calculating the output based on the value of several sampled pointes?
I find there are RoIAlignAvg and RoIAlignMax in former implementation. But I don't find any information about this in the documents of torchvision version RoIAlign.

XuYunqiu on 6 Oct 2019

@fmassa Hi, sorry to bother you again. Would you mind to tell me which mode (average or max pool) is selected in RoIAlign calculating the output based on the value of several sampled pointes?
I find there are RoIAlignAvg and RoIAlignMax in former implementation. But I don't find any information about this in the documents of torchvision version RoIAlign.

I‘ve gotten my answer from the source code. It seems that only average mode is set for RoIAlign.
https://github.com/pytorch/vision/blob/76702a03d6cc2e4f431bfd1914d5e301c07bd489/torchvision/csrc/cuda/ROIAlign_cuda.cu#L108

I really hope RoIPool and RoIAlign in torchvision could keep both average and max mode for more convenient usage.

XuYunqiu on 6 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings