It would be great to have support for various ROI Pooling operations as easy to add layers to facilitate research in object detection and semantic/instance segmentation.
Here is a live checklist:
General PRs: #626
I agree. I've started sketching the structure of it in https://github.com/pytorch/vision/tree/layers?files=1 .
I'll look into opening a PR tomorrow with a few layers
Any movement on this?
Hey Wadim,
ROIPool and ROIAlign are implemented in the layers branch. I'm holding on merging them as is because I might want to change a few things, but feel free to use them as is (they are working)
Great, will have a look! Thanks :)
@fmassa any updates on this? I'm sure a lot of people would benefit from having a master branch version of this available soon.
It would be super convenient to have this installed automatically with torch/torchvision
Having the master branch have cpu/cuda layers officially requires a few additional changes, like providing wheels with the compiled binaries for each supported architecture, and I'm not looking at this at the moment.
Just wondering if the ROI pooling/align could theoretically be done in pure Pytorch (even if it will be slow?)
Was thinking about the same...
ROI pooling: Adaptive MaxPools exist, if you can efficiently crop out all the tensors you need from each image in a batch and concat them in the batch dimension, maybe it could work, however I have a bad feeling about the efficiency of this naive approach
@kevinlu1211 it is possible to implement it using pure PyTorch, and performance is OK.
An (old, badly tested) implementation can be found in https://github.com/pytorch/examples/pull/21/files#diff-7573d025c4128229f8efa3ff042e09d1R38
You are a life saver! I’m just writing a tutorial to explain mask rcnn
thanks a lot!
On Tue, 19 Jun 2018 at 11:44 pm, Francisco Massa notifications@github.com
wrote:
@kevinlu1211 https://github.com/kevinlu1211 it is possible to implement
it using pure PyTorch, and performance is OK.
An (old, badly tested) implementation can be found in
https://github.com/pytorch/examples/pull/21/files#diff-7573d025c4128229f8efa3ff042e09d1R38—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/pytorch/vision/issues/477#issuecomment-398403617, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AME8D-m7bWhfCgW9yzr5VXr8fwUFcI4Sks5t-QAkgaJpZM4Tifuk
.
@fmassa I find it surprising this is not higher priority. The fact that every other major deep learning framework supports ROI Pooling and there is no easy way to write a Pytorch version of Detectron for research purposes despite the deep integration between Pytorch and Caffe2 is bewildering.
Is there some other way we can push this forward if you're too busy? I'm sure we can find volunteers to push this out the door as soon as possible.
Come on @fmassa, make us all happy. If you don't have time, I'd gladly help!
Yeah, captain @fmassa you have almost an army of volunteers that wait for your orders :)
@fmassa I guess I've figured out how to get the CppExtension module to work for me and I should be able to finish this feature.
I see you have TODOs to pull some common CUDA utilities out into a common file. Any other things you'd like to do before I make a PR?
Hey guys, sorry for the delay here.
So, there are a number of things that should be done in order to be able to put this in torchvision:
I've been doing some great progress on Detectron, and I've currently moved all those layers to the detectron repo for the moment. I'm currently hesitating if I should put those layers in torchvision because of the aforementioned difficulties.
What do you guys think?
Is the last issue a constantly persisting one? All the others I do not perceive to be big problems for a WIP branch, really. But it would enable everyone to have a working, if temporary, in-house pytorch solution.
@fmassa I believe I can take care of everything else except Wheel generation since I'm not familiar with the python packaging pipeline at FB.
As @wadimkehl mentioned, is there a checkpoint we can use for CppExtension and ATen? I used the latest master of PyTorch as of yesterday and your branch compiles fine as is.
@varunagrawal concerning wheels and packaging you can take a look at pytorch/builder
@fmassa I think torchvision can have a scope to provide models/datasets/transforms for tasks like
IMHO, ROI Pooling is very specific to an architecture and if torchvision is not intended to merge inside itself the research on faster-rcnn-like nets, this can be avoided.
Any thoughts?
@vfdev-5 your suggestion would turn this into a Chicken & Egg problem since we need ROI Pooling to implement even a basic RCNN model.
Given that Detectron supports Faster RCNN, Caffe2 is now intrinsically linked to Pytorch, and 2 stage detectors are still highly looked into in research (e.g. Light Head RCNN) and industry, having a ROI pooling/align layer would be beneficial for torchvision overall.
While I agree with your categorization for different tasks such as classification, segmentation and detection, doing so would require significant effort which the Pytorch team isn't able to provide given the priority of v1. Indeed, I have already spoken to @soumith about a separate repo for detection and segmentation related tasks and he's shown considerable interest. Until then, and looking at the large amount of interest on this issue, having the layers here for now would be sufficient.
Sorry for the delay in replying.
@wadimkehl @varunagrawal as of today, my branch doesn't compile anymore on latest PyTorch because of https://github.com/pytorch/pytorch/pull/9435, and patches such as https://github.com/ngimel/pytorch/commit/ae176af8feb546b70eabed68e260d897bc4f7627 should be applied to ROIAlign (and maybe other functions as well)
This has been the case at least 3-4 times for me already, which means that supporting those extension layers officially in torchvision at the moment would be hard to maintain -- if the user updates PyTorch, torchvision breaks, if the user update torchvision but not pytorch, it also breaks, he needs to update both at the same time. This was a recurring issue with Lua-Torch, and I'd rather avoid it at the moment.
About where to put the aforementioned layers, I'm not yet convinced on what is the right solution.
In the one hand, those backward-compatibility issues makes me hesitant to put them in torchvision (as it's widely used and up to now has been a python-only lib), so putting them in Detectron would make sense.
On the other hand, having a unified place where the basic building blocks can be found is a nice thing to have.
I think once we release Detectron, we might converge into gradually migrating a few layers and abstractions to torchvision, as the BC breakages on the C++ level would be less recurrent I'd hope.
@fmassa the good news is I have forked your branch and already made all the fixes. As of 07/27/2018, the ROI Pooling layer compiles successfully on my branch and I have also added a whole bunch of tests to check for correctness.
I can submit the PR and continue to maintain ROI Pooling (and ROI Align hopefully soon) until we get more stability from ATen and checkpoint at either PyTorch v0.5 or v1.
Sure, if you send a PR to the layers branch, I will merge it ,thanks!
But I'm not going to be merging the layers branch into master before things stabilize, which might be before v1.0
That works! For now let's point people towards the layers branch until we get the desired stability.
Added support for ROIAlign with #630.
FYI, we have released our implementation of {Faster, Mask} R-CNN in https://github.com/facebookresearch/maskrcnn-benchmark , which contains the implementations for ROI Pooling and ROI Align. It currently doesn't have all the nice improvements that @varunagrawal has pushed to the layers branch here (like backwards for a few layers).
I suggest we move this discussion there for now.
It would be wonderful if the (working) ROI Pooling code in the layers branch could be updated and merged into torchvision. I think I speak for myself and many other vision researchers in that this is an essential functionality, and having it supported in the current torchvision is far less of a hassle than continuing to build this repo from source using an outdated branch.
Check out #708
@fmassa plans to merge layers for v3. Let's hope the next release is pushed out soon.
@fmassa do you want to reopen this issue until we can get all the related PRs merged? I'll update the original Issue comments with the PR numbers to help keep track.
@varunagrawal I'm going to be merging the layers branch this weekend. Thanks a lot for the awesome help improving it!
@fmassa When is the model Roi pooling available on the master branch?
It already is with 0.3
Is anyone working on position sensitive ROI pooling similar to this one: https://github.com/tensorflow/models/blob/f9fe0fe97aee7964ac344ce38bafb20e977586dc/research/object_detection/utils/ops.py#L652?
@LukasBommes there is an open PR adding it to torchvision, see https://github.com/pytorch/vision/pull/1259
Hi all,
I need to extract different scales of ROIs to have 7x7, 5x6, 1x1
Any help please?
@MitraTj just add different RoIPool layers with different output sizes.
Hi all, I would ask is there any implementation of an average version of ROI Pooling?
I find existing ROI Pool is only implemented with max pool.
@XuYunqiu there is RoIAlign, which performs bilinear interpolation (instead of max).
Would that be ok for your use-case?
@XuYunqiu there is
RoIAlign, which performs bilinear interpolation (instead of max).
Would that be ok for your use-case?
@fmassa Thanks for your quick reply. Actually, I just want to get the mean values of each ROIs.
So can I use ROI Align like this roi_align(conv_feat, rois, 1, spatial_scale=1.0/stride, sampling_ratio=1) ?
I've actually been considering adding average pooling as an option to the ROI operations. It's not hard and allows for some nice generalization.
I've actually been considering adding average pooling as an option to the ROI operations. It's not hard and allows for some nice generalization.
Exactly, it will be helpful.
@XuYunqiu yes, that is going to be doing roughly what you are looking for
@XuYunqiu yes, that is going to be doing roughly what you are looking for
But it might not work well with ROIs with a large area. I think the output using bilinear interpolation only relevant to a quite local context of the sample location (i.e., the center of ROIs in my case).
@fmassa Hi, sorry to bother you again. Would you mind to tell me which mode (average or max pool) is selected in RoIAlign calculating the output based on the value of several sampled pointes?
I find there are RoIAlignAvg and RoIAlignMax in former implementation. But I don't find any information about this in the documents of torchvision version RoIAlign.
@fmassa Hi, sorry to bother you again. Would you mind to tell me which mode (average or max pool) is selected in
RoIAligncalculating the output based on the value of several sampled pointes?
I find there areRoIAlignAvgandRoIAlignMaxin former implementation. But I don't find any information about this in the documents of torchvision versionRoIAlign.
I‘ve gotten my answer from the source code. It seems that only average mode is set for RoIAlign.
https://github.com/pytorch/vision/blob/76702a03d6cc2e4f431bfd1914d5e301c07bd489/torchvision/csrc/cuda/ROIAlign_cuda.cu#L108
I really hope RoIPool and RoIAlign in torchvision could keep both average and max mode for more convenient usage.
Most helpful comment
Check out #708
@fmassa plans to merge
layersfor v3. Let's hope the next release is pushed out soon.