detectron2 🚀 - Convert models to TorchScript

Torchscript does not currently support these models.

ppwwyyxx on 13 Oct 2019

Got it. Is that because some of the ops aren't supported yet? Is there another way to deploy these models to a c++ environment? (E.g. onnx --> caffe2 or tensorrt)

Is this lack of support true for object detection models in general? Or is this more specific to the SOTA implementations in detectron?

How much work would it be to get one of these models into a c++ compatible format?

Thanks!

bfortuner on 13 Oct 2019

We're working on getting TorchScript support. onnx/caffe2 deployment support (discussed in #8) is ~~internal for now, but will also be released later~~ released already.

ppwwyyxx on 13 Oct 2019

👍11

Thank you! Do you know if this lack of support true for object detection models in general? Or is this more specific to the SOTA implementations in detectron?

And second, does torchscript support the operations used in detectron or does torchscript require source changes to make this work?

bfortuner on 13 Oct 2019

I'm curious to know the best way to deploy object detection models trained in pytorch to an optimized format runnable in c++.

bfortuner on 13 Oct 2019

@bfortuner Using libtorch does not gain much acceleration in terms of speed. Exporting to onnx and convert to TensorRT engine is the best way to deploy these models.

Also, onnxruntime trying supporting all ops on top of TensorRT provider, but there are lots of them does not supported and have to running on CPU.

jinfagang on 14 Oct 2019

👍9

@ppwwyyxx Thanks for the added clarity. Could you expand at all on what you mean with "take some time to be ready"? Is that something like for the next release or more in some unknown distant future?

nikolausWest on 14 Oct 2019

👍1

Yeah, I'm wondering if there is a tutorial/paper about recommended approaches to c++ deployment with PyTorch. It seems there are a lot of different ways, but it's not clear what the "best" way is, or what the PyTorch team recommends in the future. I'll post in PyTorch discussion!

bfortuner on 14 Oct 2019

👍2

FYI torchvision models (including Faster R-CNN and Mask R-CNN) will soon support exporting its models to both ONNX and TorchScript, see https://github.com/pytorch/vision/pull/1461 https://github.com/pytorch/vision/pull/1407 and https://github.com/pytorch/vision/pull/1401 for some representative PRs.

I believe the learnings from this conversion step done for torchvision models will be very helpful for planning detectron2 models to be exportable to TorchScript.

fmassa on 15 Oct 2019

👍6

Thanks for the update! I'm curious to know if TorchScript needs to make changes, too (are there any hard blockers)? Or is it mostly on our end to make our code compatible with the current TorchScript api?

The PRs above suggest it will still be a burden for our developers to bring their SOTA models into production

bfortuner on 15 Oct 2019

@bfortuner I think it will be a two-sided change: TorchScript support for Python features will continue improving, but the user might need to adapt a bit their code to make it better fit the current supported.
This means avoiding using some libraries in the inference code-path (like numpy, scipy, etc).

As https://github.com/pytorch/vision/pull/1407 already shows, a complicated model such as Mask R-CNN can already be converted to TorchScript, without changing too much the code (although the original code took some precautions to avoid using too many Python features).

cc @suo who can give a more accurate picture of TorchScript

fmassa on 16 Oct 2019

Say I want to convert a detectron2 mask-rcnn model to C++ (ideally using torchscript/libtorch), what's the current best approach? I tried various things last week but with no good solution. Things I tried (using recent detectron2, pytorch and torchvision code):

1) Naively try to convert some of the blocks to torchscript using torch.jit.script. This will fail one various things. Example stacktrace:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 1255, in script
    return torch.jit._recursive.recursive_script(obj)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 534, in recursive_script
    return create_script_module(nn_module, infer_methods_to_compile(nn_module))
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 296, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 336, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 1593, in _construct
    init_fn(script_module)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 328, in init_fn
    scripted = recursive_script(orig_value)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 534, in recursive_script
    return create_script_module(nn_module, infer_methods_to_compile(nn_module))
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 296, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 340, in create_script_module_impl
    create_methods_from_stubs(concrete_type, stubs)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 259, in create_methods_from_stubs
    concrete_type._create_methods(defs, rcbs, defaults)
RuntimeError: 
Unknown type name 'torch.nn.SyncBatchNorm':
  File "/detectron2_repo/detectron2/layers/wrappers.py", line 64
            # https://github.com/pytorch/pytorch/issues/12013
            assert not isinstance(
                self.norm, torch.nn.SyncBatchNorm
                           ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            ), "SyncBatchNorm does not support empty inputs!"

I get similar errors when trying to convert other layers (example, torchscript didn't support del statements in some of the forward passes etc)

2) Secondly I tried to just copy the weights from detectron2 to a mask-rcnn model defined in pytorch/torchvision. I limited myself to the backbone architecture, example definition in torchvision:

backbone = torchvision.models.detection.backbone_utils.resnet_fpn_backbone('resnet50', True)
Although I succeeded in copying the weights, off course the model.backbone.forward calls ended up giving different results. Most likely due to slightly different definitions of the two (detectron2 vs torchvision) architectures and forward passes.

3) My third and final try was to use the onnx/caffe2 exporter. This more or less worked (eg, I ended up with a caffe2 model, but I haven't compared the outputs of the models yet), however, only afterwards I realized I couldn't import onnx models into pytorch, and adding caffe2 support for our deployment would be quite cumbersome, since we just switched from caffe to libtorch... To me it seems that exporting to onnx should be quite similar to exporting to torchscript, so maybe it's quite easy to change the caffe2 exporter to torchscript?

gslotman on 13 Jan 2020

👍2

Hi all, looks like PyTorch 1.4 and torchvision 0.5 have made progress on this and a couple of related issues. When will we see the updates rolling out to detectron2? Please see my related question here on the forum: https://discuss.pytorch.org/t/pytorch-1-4-torchvision-0-5-vs-detectron/67002

cbasavaraj on 18 Jan 2020

👍10 👎1

Hi, I am also having some problems of JIT conversion. It raises an error:

RuntimeError: 
Unknown type name 'torch.nn.SyncBatchNorm':
  File "/detectron2/detectron2/layers/wrappers.py", line 67
            # https://github.com/pytorch/pytorch/issues/12013
            assert not isinstance(
                self.norm, torch.nn.SyncBatchNorm
                           ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            ), "SyncBatchNorm does not support empty inputs!"

Since ONNX conversion gives fix size of input, it is not suitable in my case. Any help please?

tengerye on 23 Jun 2020

An obvious disadvantage of ONNX is we need to fix the input, but some detection models can take flexible size input. JIT-supporting is necessary and urgent.

tengerye on 25 Jun 2020

any good news?

GitHubChrischen on 15 Jul 2020

Progress has been made recently (https://github.com/facebookresearch/detectron2/pulls?q=is%3Apr+author%3Achenbohua3+) on this issue and if everything goes well most models should be scriptable within a few months.

ppwwyyxx on 15 Jul 2020

Say I want to convert a detectron2 mask-rcnn model to C++ (ideally using torchscript/libtorch), what's the current best approach? I tried various things last week but with no good solution. Things I tried (using recent detectron2, pytorch and torchvision code):

Naively try to convert some of the blocks to torchscript using torch.jit.script. This will fail one various things. Example stacktrace:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 1255, in script
    return torch.jit._recursive.recursive_script(obj)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 534, in recursive_script
    return create_script_module(nn_module, infer_methods_to_compile(nn_module))
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 296, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 336, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 1593, in _construct
    init_fn(script_module)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 328, in init_fn
    scripted = recursive_script(orig_value)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 534, in recursive_script
    return create_script_module(nn_module, infer_methods_to_compile(nn_module))
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 296, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 340, in create_script_module_impl
    create_methods_from_stubs(concrete_type, stubs)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 259, in create_methods_from_stubs
    concrete_type._create_methods(defs, rcbs, defaults)
RuntimeError: 
Unknown type name 'torch.nn.SyncBatchNorm':
  File "/detectron2_repo/detectron2/layers/wrappers.py", line 64
            # https://github.com/pytorch/pytorch/issues/12013
            assert not isinstance(
                self.norm, torch.nn.SyncBatchNorm
                           ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            ), "SyncBatchNorm does not support empty inputs!"
I get similar errors when trying to convert other layers (example, torchscript didn't support del statements in some of the forward passes etc)

Secondly I tried to just copy the weights from detectron2 to a mask-rcnn model defined in pytorch/torchvision. I limited myself to the backbone architecture, example definition in torchvision:

backbone = torchvision.models.detection.backbone_utils.resnet_fpn_backbone('resnet50', True)
Although I succeeded in copying the weights, off course the model.backbone.forward calls ended up giving different results. Most likely due to slightly different definitions of the two (detectron2 vs torchvision) architectures and forward passes.

My third and final try was to use the onnx/caffe2 exporter. This more or less worked (eg, I ended up with a caffe2 model, but I haven't compared the outputs of the models yet), however, only afterwards I realized I couldn't import onnx models into pytorch, and adding caffe2 support for our deployment would be quite cumbersome, since we just switched from caffe to libtorch... To me it seems that exporting to onnx should be quite similar to exporting to torchscript, so maybe it's quite easy to change the caffe2 exporter to torchscript?

Very through try. Did you figure out a way to export the model to an onnx model that can be loaded by other runtime or to an torchscript model?

rosebbb on 4 Aug 2020

subscribe the thread

hyc-xyz on 31 Aug 2020

https://github.com/LESSuseLESS/d2

LESSuseLESS on 6 Sep 2020

Thanks a lot for all the amazing work being done on this project, it's appreciated a lot!

I understand that detectron models are currently not scriptable with TorchScript. @ppwwyyxx could you please elaborate on what exactly is missing for making Mask R-CNN and PointRend scriptable? Is it blocked by https://github.com/pytorch/pytorch/issues/36061?

tkuenzle on 21 Oct 2020

pytorch/pytorch#36061 is the main blocker

ppwwyyxx on 21 Oct 2020

👀2

Replacing the lists of modules with nn.ModuleList works pretty well and we are able to script the models (although we have to retrain them). Now we are running into https://github.com/pytorch/pytorch/issues/46944 because detectron2 relies on classes like Instances and Boxes, which are not included in the scripted model and thus the model is not runnable in a non-Python environment.

@ppwwyyxx do you think it makes sense to wait for proper support for classes in TorchScript or rather change the implementation of Instances, etc. to be based on e.g. NamedTuple as in https://github.com/pytorch/pytorch/issues/42258?

tkuenzle on 30 Oct 2020

I haven't got to that step yet (since we can't break pre-trained models) but I'll go double check the story around scripted classes in C++.
Btw, with latest github version, tracing (with fixed batch size) already works fine under https://github.com/facebookresearch/detectron2/blob/4ef254fbd7b5edeb93305d3937ecae469f79505b/detectron2/export/torchscript_patch.py#L196-L197 (except for some postprocessing which is often not used in deployment), and using it in C++ is probably more straightforward.

ppwwyyxx on 1 Nov 2020

It turns out that converting the Instances to a Dict[str, Tensor] as final output does the trick and the model can be run in C++ even though it uses Instances internally. Sorry for the confusion, it looks like everything is working as expected!

tkuenzle on 2 Nov 2020

Hi @tkuenzle Do you gain anything in terms of time per frame with the C++ / libtorch version? For a single frame, or maybe by running multiple C++ threads in parallel?
And would it be possible to either share code or outline the main steps you had to take to make scripting work? Thanks.

cbasavaraj on 5 Nov 2020

👍2

I cannot really comment on time per frame because our focus is on running the model on mobile devices. I don't think sharing code would be that helpful, because it mostly depends on what models you want to script. Thanks to the work of @chenbohua3 most of the heads are scriptable already and thus the effort to make complete models scriptable is rather small.

The main steps you have to take are the following:

Use export_torchscript_with_instances to export your model
Fix any TorchScript errors in the detectron2 repo. This will mainly consist of
- Replace lists of modules with nn.ModuleList (you will need to retrain the models because of this)
- Add python type hints for non-tensor arguments
- Replace some Python expressions which are not supported by TorchScript with equivalent supported expressions
- Ignore code branches that you do not need by adding assert not torch.jit.is_scripting()
Extract the needed fields of instances in the last layer. You could for example define a wrapper module that takes the original model as input and has the following forward method (assuming you are only interested in pred_masks):

   def forward(input):
       output = self.model(input)
       return [o["instances"].pred_masks for o in output]

I hope this helps!

tkuenzle on 11 Nov 2020

❤1

FYI we just added support scripting & tracing for the most common models (R-CNN and RetinaNet). They will export models to torchscript format successfully.
(pytorch built from master branch is required)

There aren't proper APIs & docs yet, but basic usage is now shown in unittests: https://github.com/facebookresearch/detectron2/blob/f1d0c05bc5580388348388213e0a77d722d90f17/tests/test_export_torchscript.py#L23-L150

ppwwyyxx on 11 Nov 2020

🎉2 👍2

Thanks a lot, that's great news @ppwwyyxx! Would you be willing to accept PRs for making some of the other models scriptable?

tkuenzle on 11 Nov 2020

@ppwwyyxx When I run the test, I get this error
(Python 3.6.9, torch 1.8.0.dev20201110

RuntimeError:
Module 'ResNet' has no attribute 'stages' (This attribute exists on the Python module, but we failed to convert Python type: 'list' to a TorchScript type.):
  File "detectron2/modeling/backbone/resnet.py", line 437
        if "stem" in self._out_features:
            outputs["stem"] = x
        for name, stage in zip(self.stage_names, self.stages):
                                                 ~~~~~~~~~~~ <--- HERE
            x = stage(x)
            if name in self._out_features:

Seems to be related to the issue above. Is there something I'm supposed to do to preprocess the models so they don't have lists and instead have ModuleLists?

danielgordon10 on 11 Nov 2020

👍1

If I add

model.backbone.bottom_up.stages = nn.ModuleList(model.backbone.bottom_up.stages)
model.backbone.lateral_convs = nn.ModuleList(model.backbone.lateral_convs)
model.backbone.output_convs = nn.ModuleList(model.backbone.output_convs)

it seems to work, but only for a single image. Does batched mode not work yet?

danielgordon10 on 11 Nov 2020

❤1

@danielgordon10 your pytorch is still not new enough.

ppwwyyxx on 11 Nov 2020

@ppwwyyxx What's the minimum pytorch version? That was yesterday's nightly.

danielgordon10 on 11 Nov 2020

It now requires yesterday's pytorch commits which are supposed to be in today's nightly.
Once a few other ongoing pytorch features are implemented we expect to require them as well.

I'm closing this issue because the scope is too general (also renaming it so it only involves torchscript) and majority of work is done. There are some remaining TODOs about usability that should be addressed as separate issues:

Now the way to convert model is slightly different for every model due to their input/output formats. Ideally they could be unified.
Some examples to demonstrate the usage, in addition to the unittests https://github.com/facebookresearch/detectron2/blob/master/tests/test_export_torchscript.py
C++ deployment example. For scripting this requires https://github.com/pytorch/pytorch/issues/46944 which is expected to be fixed in pytorch 1.8.
Support more models. Feel free to make PRs. If difficulties / hacks are involved, would be nice to discuss them in separate issues.
Performance: As the model is new it may encounter some rough edges with pytorch's JIT optimization backend.

Thanks a lot to pytorch JIT team and @chenbohua3 @bddpqq from Alibaba for making this happen!

ppwwyyxx on 11 Nov 2020

Detectron2: Convert models to TorchScript

Most helpful comment

All 34 comments

Related issues