Maskrcnn-benchmark: Will maskrcnn-benchmark support torch.jit.trace or torch.jit.script mode in the nearly future?

Created on 26 Oct 2018 · 19Comments · Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

As we know,in pytorch1.0 Torch Script is a way to create serializable and optimizable models from PyTorch code. Any code written in Torch Script can be saved from Python process and loaded in a process where there is no Python dependency.
So will maskrcnn-benchmark support torch.jit.trace or torch.jit.script mode in the nearly future?

enhancement

Source

xxradon

👍4

Most helpful comment

So I did look at this in some depth and continue to do so, here is a bit of a progress report for discussion. I'm also happy to share a branch with my code, but the code is even more "stream of consciousness" than this write-up.

Goal and and plan

My goal is to be able to detect in single images of a fixed size (known during tracing) in C++ as close as possibe to the "load traced model in C++" example.

My first step is to get something that

has scripted/traced paths for every "processing step" bit, and
manages to reproduce the output on the image it has been traced on (so I postpone variations that occur during detection for different scores, but I try not to screw up things too much),
I did take the "do whatever works and cleanup later" approach - it's really, really messy right now.
I think there will be JIT bits to think about on the PyTorch/JIT side, too.

My findings so far

C++ bits / Custom Ops

The mere addition of custom ops support (for inference) for the C++ ops seems easy:
- Change int -> int64_t, float -> double (I'm not 100% certain it's needed),
- link to libtorch.so and libcaffe2.so (it's probably silly to use the extension mechanism, but that is "clean up later").
- Add registry in vision.cpp

#include <torch/script.h>
...
static auto registry =
  torch::jit::RegisterOperators()
    .op("maskrcnn_benchmark::nms", &nms)
    .op("maskrcnn_benchmark::roi_align_forward(Tensor input, Tensor rois, float spatial_scale, int pooled_height, int pooled_width, int sampling_ratio) -> Tensor", &ROIAlign_forward);

Using them: In layers/nms.py:

import torch

nms = torch.ops.maskrcnn_benchmark.nms

Easy. However, I could not trace the resulting nms.
(torch.jit.trace(lambda x,y: maskrcnn_benchmark.layers.nms(x,y, 2), (torch.randn(5,5), torch.randn(5,5))) gives an error, it should not).
This can be worked around (for a fixed threshold, but that's OK, I think) by a double torchscript wrapper:

    @torch.jit.script
    def nms_fixed_thresh1(dets, scores, th: float=coco_demo.model.rpn.box_selector_test.nms_thresh):
        return maskrcnn_benchmark.layers.nms(dets, scores, th)

    @torch.jit.script
    def nms_fixed_thresh(dets, scores):
        return nms_fixed_thresh1(dets, scores)

Now we can trace nms_fixed_thresh in place of the lambda above. @goldsborough will want to know. :)

A similar wrapping trick was needed for roi align forward, I put that in the layer (where all the constants are parameters, so it's natural).

I did change some lists to tuples to make the jit happier.

Custom bookkeeping types (boxlist oh oh)

The jit isn't very fond of the boxlist things. Where it works, a minimal fix is to "unpack" the parameters of functions, assuming that all Tensors are arguments and all others are constants. That works reasonably well when operating on the same input again in traced mode. It remains to be seen if we run into generalization problems. To facilitate that, I added two methods to bounding_box:

    # note: _get_tensors/_set_tensors only work if the keys don't change in between!
    def _get_tensors(self):
        return (self.bbox,)+tuple(f for f in (self.get_field(field) for field in sorted(self.fields())) if isinstance(f, torch.Tensor))

    def _set_tensors(self, ts):
        self.bbox = ts[0]
        for i, f in enumerate(sorted(self.fields())):
            if isinstance(self.extra_fields[f], torch.Tensor):
                self.extra_fields[f] = ts[1 + i]

and there is some wrapper code.

Some things that don't work well with tracing/scripting

The box_coder uses

            pred_boxes = torch.zeros_like(rel_codes)
            # x1
            pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
            # y1
            pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
            # x2 (note: "- 1" is correct; don't be fooled by the asymmetry)
            pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
            # y2 (note: "- 1" is correct; don't be fooled by the asymmetry)
            pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1

The jit doesn't love that, so I used:

            pred_boxes = torch.stack([pred_ctr_x - 0.5 * pred_w,
                                      pred_ctr_y - 0.5 * pred_h,
                                      pred_ctr_x + 0.5 * pred_w - 1,
                                      pred_ctr_y + 0.5 * pred_h - 1], 2).view(*rel_codes.shape)

Similarly the pooling over several levels in the roi_heads.box.feature_extractor.pooler forward seems problematic for tracing. As indexed assignment isn't for torch script, I wrote a new custom op to replace

    for level, (per_level_feature, pooler) in enumerate(zip(x, self.poolers)):
        idx_in_level = torch.nonzero(levels == level).squeeze(1)
        rois_per_level = rois[idx_in_level]
        result[idx_in_level] = pooler(per_level_feature, rois_per_level)

There still is a problem (a riddle?) around script wanting to pass a tensor list as a list of tensors and not a tuple, but tracing not accepting lists, I will have to sort that out. Maybe one could convince JIT people to allow passing tuples of tensors where the JIT wants lists of tensors.

Mask composition

This uses PIL at the moment, it'll be replaced. @fmassa has this for GPU, but I will do a CPU version and custom op for it.

Things that work at the moment

The backbone,
the rpn_head,
from what I understand the generated anchors only depend on the image size and are fixed for our purposes, so I didn't worry about that,
the post_processing forward for single feature map, I think it should not be too hard to make the entire postprocessing work, too.

So I'm now at the roi heads (as you can see above), the box first.

t-vi on 5 Nov 2018

❤11 🎉3

All 19 comments

There are currently some python functionality in this codebase that is not supported by torch.jit.script, but which will be supported in the future.

Currently, you can trace almost all the model, except the custom C++ layers. Once we add support for those missing C++ layers by registering them as torch ops, I believe tracing should work without issues for same-sized images.

I'll look into registering the C++ layers into the torch ops

fmassa on 26 Oct 2018

@fmassa the C++ layers can be registered with the JIT, see @goldsborough's slides from DevCon

soumith on 26 Oct 2018

Yes, but I believe that it currently requires some extra code that follows a different codepath.
I'll check with Peter about that.

fmassa on 26 Oct 2018

@fmassa Thanks for your concern，if there is a way to trace the maskrcnn or any userful information,plesea let us know.

xxradon on 27 Oct 2018

I'll look into adding tracing support for the custom ops early this week, it should not be hard. I'll update on the issue once it's done

fmassa on 27 Oct 2018

❤7

Awesome!

t-vi on 27 Oct 2018

I am also interested by this feature!

hadim on 1 Nov 2018

@fmassa Do you have added tracing support for the custom ops? Thanks.

Eric-Zhang1990 on 5 Nov 2018

Goal and and plan

My goal is to be able to detect in single images of a fixed size (known during tracing) in C++ as close as possibe to the "load traced model in C++" example.

My first step is to get something that

has scripted/traced paths for every "processing step" bit, and
manages to reproduce the output on the image it has been traced on (so I postpone variations that occur during detection for different scores, but I try not to screw up things too much),
I did take the "do whatever works and cleanup later" approach - it's really, really messy right now.
I think there will be JIT bits to think about on the PyTorch/JIT side, too.

My findings so far

C++ bits / Custom Ops

The mere addition of custom ops support (for inference) for the C++ ops seems easy:
- Change int -> int64_t, float -> double (I'm not 100% certain it's needed),
- link to libtorch.so and libcaffe2.so (it's probably silly to use the extension mechanism, but that is "clean up later").
- Add registry in vision.cpp

#include <torch/script.h>
...
static auto registry =
  torch::jit::RegisterOperators()
    .op("maskrcnn_benchmark::nms", &nms)
    .op("maskrcnn_benchmark::roi_align_forward(Tensor input, Tensor rois, float spatial_scale, int pooled_height, int pooled_width, int sampling_ratio) -> Tensor", &ROIAlign_forward);

Using them: In layers/nms.py:

import torch

nms = torch.ops.maskrcnn_benchmark.nms

    @torch.jit.script
    def nms_fixed_thresh1(dets, scores, th: float=coco_demo.model.rpn.box_selector_test.nms_thresh):
        return maskrcnn_benchmark.layers.nms(dets, scores, th)

    @torch.jit.script
    def nms_fixed_thresh(dets, scores):
        return nms_fixed_thresh1(dets, scores)

Now we can trace nms_fixed_thresh in place of the lambda above. @goldsborough will want to know. :)

A similar wrapping trick was needed for roi align forward, I put that in the layer (where all the constants are parameters, so it's natural).

I did change some lists to tuples to make the jit happier.

Custom bookkeeping types (boxlist oh oh)

    # note: _get_tensors/_set_tensors only work if the keys don't change in between!
    def _get_tensors(self):
        return (self.bbox,)+tuple(f for f in (self.get_field(field) for field in sorted(self.fields())) if isinstance(f, torch.Tensor))

    def _set_tensors(self, ts):
        self.bbox = ts[0]
        for i, f in enumerate(sorted(self.fields())):
            if isinstance(self.extra_fields[f], torch.Tensor):
                self.extra_fields[f] = ts[1 + i]

and there is some wrapper code.

Some things that don't work well with tracing/scripting

The box_coder uses

            pred_boxes = torch.zeros_like(rel_codes)
            # x1
            pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
            # y1
            pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
            # x2 (note: "- 1" is correct; don't be fooled by the asymmetry)
            pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
            # y2 (note: "- 1" is correct; don't be fooled by the asymmetry)
            pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1

The jit doesn't love that, so I used:

            pred_boxes = torch.stack([pred_ctr_x - 0.5 * pred_w,
                                      pred_ctr_y - 0.5 * pred_h,
                                      pred_ctr_x + 0.5 * pred_w - 1,
                                      pred_ctr_y + 0.5 * pred_h - 1], 2).view(*rel_codes.shape)

    for level, (per_level_feature, pooler) in enumerate(zip(x, self.poolers)):
        idx_in_level = torch.nonzero(levels == level).squeeze(1)
        rois_per_level = rois[idx_in_level]
        result[idx_in_level] = pooler(per_level_feature, rois_per_level)

Mask composition

This uses PIL at the moment, it'll be replaced. @fmassa has this for GPU, but I will do a CPU version and custom op for it.

Things that work at the moment

The backbone,
the rpn_head,
from what I understand the generated anchors only depend on the image size and are fixed for our purposes, so I didn't worry about that,
the post_processing forward for single feature map, I think it should not be too hard to make the entire postprocessing work, too.

So I'm now at the roi heads (as you can see above), the box first.

t-vi on 5 Nov 2018

❤11 🎉3

That's awesome progress @t-vi !

We were aware of the problems that BoxList would bring to the JIT, but that's something that we discussed with @zdevito and team, and we will want to support it in the future (but I think that an approach similar to namedtuples (that we were thinking) might not be enough as is).
But can't the torch.jit.trace work with the BoxList objects? I thought it would work...

Indexing with the JIT doesn't work very well yet (but we are improving support for it), so the approach you followed for the box_coder seems good to me. I'd hope that we could avoid a custom op for the pooler, but for that we need to have better support for mutability and indexing in the script, which is planned and being worked on I believe.

I didn't quite understand the problem with tracing the constant parameters, but I suppose this is a bug in upstream PyTorch?

Thanks a lot for all your help!

fmassa on 5 Nov 2018

So I filed the two JIT observations as issues with PyTorch (see above).

t-vi on 5 Nov 2018

👍1

It seems that pytorch/pytorch#13564 had been fixed.

Zehaos on 15 Dec 2018

Yes, and we managed to do tracing in #138. There is a "regression" in 1.0 that invalidates the merge_levels script, so you'd currently need to replace it with a (very straightforward) custom op.

t-vi on 15 Dec 2018

👍2

@t-vi , I met a core dump bug, when I executed the trace_model.py from your patch. The core information is below.
(gdb) bt

0 0x00007f4e28193eeb in mkldnn::impl::scales_t::set(int, int, float const*) ()

from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libmkldnn.so.0

1 0x00007f4e281989e2 in mkldnn_primitive_desc_create_v2 () from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libmkldnn.so.0

2 0x00007f4e2ea1a9bc in mkldnn::convolution_forward::primitive_desc::primitive_desc(mkldnn::convolution_forward::desc const&, mkldnn::engine const&) () from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libcaffe2.so

3 0x00007f4e2ea16516 in at::native::mkldnn_convolution(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long) () from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libcaffe2.so

4 0x00007f4e2ebb718c in at::TypeDefault::mkldnn_convolution(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long) const () from /home/user/code/maskrcnn-benchmark/maskrcnn_benchmark/libcaffe2.so

5 0x00007f4e2d8c7c55 in torch::autograd::VariableType::mkldnn_convolution(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long) const ()

jordanxlj on 25 Dec 2018

Hello, any progress on this? I am also very interested.
Thank yo!

nicolasCruzW21 on 30 Apr 2019

Hi, any progress on this? Anyone who managed to do this?

tuboxin on 3 May 2019

I'm also interested in knowing about the progress on this.

imranparuk on 20 Jun 2019

Hi @t-vi @fmassa, I'm interested in using TVM to run one of the maskrcnn-benchmark models (specifically e2e_mask_rcnn_X-152-32x8d-FPN-IN5k_1.44x_caffe2), but during the conversion, it fails on torch.jit.trace() (because of BoxList). Any updates on this? Thx in advance.