Do you have any examples of how to convert these models into a format runnable in C++?
Torchscript does not currently support these models.
Got it. Is that because some of the ops aren't supported yet? Is there another way to deploy these models to a c++ environment? (E.g. onnx --> caffe2 or tensorrt)
Is this lack of support true for object detection models in general? Or is this more specific to the SOTA implementations in detectron?
How much work would it be to get one of these models into a c++ compatible format?
Thanks!
We're working on getting TorchScript support. onnx/caffe2 deployment support (discussed in #8) is internal for now, but will also be released later released already.
Thank you! Do you know if this lack of support true for object detection models in general? Or is this more specific to the SOTA implementations in detectron?
And second, does torchscript support the operations used in detectron or does torchscript require source changes to make this work?
I'm curious to know the best way to deploy object detection models trained in pytorch to an optimized format runnable in c++.
@bfortuner Using libtorch does not gain much acceleration in terms of speed. Exporting to onnx and convert to TensorRT engine is the best way to deploy these models.
Also, onnxruntime trying supporting all ops on top of TensorRT provider, but there are lots of them does not supported and have to running on CPU.
@ppwwyyxx Thanks for the added clarity. Could you expand at all on what you mean with "take some time to be ready"? Is that something like for the next release or more in some unknown distant future?
Yeah, I'm wondering if there is a tutorial/paper about recommended approaches to c++ deployment with PyTorch. It seems there are a lot of different ways, but it's not clear what the "best" way is, or what the PyTorch team recommends in the future. I'll post in PyTorch discussion!
FYI torchvision models (including Faster R-CNN and Mask R-CNN) will soon support exporting its models to both ONNX and TorchScript, see https://github.com/pytorch/vision/pull/1461 https://github.com/pytorch/vision/pull/1407 and https://github.com/pytorch/vision/pull/1401 for some representative PRs.
I believe the learnings from this conversion step done for torchvision models will be very helpful for planning detectron2 models to be exportable to TorchScript.
Thanks for the update! I'm curious to know if TorchScript needs to make changes, too (are there any hard blockers)? Or is it mostly on our end to make our code compatible with the current TorchScript api?
The PRs above suggest it will still be a burden for our developers to bring their SOTA models into production
@bfortuner I think it will be a two-sided change: TorchScript support for Python features will continue improving, but the user might need to adapt a bit their code to make it better fit the current supported.
This means avoiding using some libraries in the inference code-path (like numpy, scipy, etc).
As https://github.com/pytorch/vision/pull/1407 already shows, a complicated model such as Mask R-CNN can already be converted to TorchScript, without changing too much the code (although the original code took some precautions to avoid using too many Python features).
cc @suo who can give a more accurate picture of TorchScript
Say I want to convert a detectron2 mask-rcnn model to C++ (ideally using torchscript/libtorch), what's the current best approach? I tried various things last week but with no good solution. Things I tried (using recent detectron2, pytorch and torchvision code):
1) Naively try to convert some of the blocks to torchscript using torch.jit.script. This will fail one various things. Example stacktrace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 1255, in script
return torch.jit._recursive.recursive_script(obj)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 534, in recursive_script
return create_script_module(nn_module, infer_methods_to_compile(nn_module))
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 296, in create_script_module
return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 336, in create_script_module_impl
script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 1593, in _construct
init_fn(script_module)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 328, in init_fn
scripted = recursive_script(orig_value)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 534, in recursive_script
return create_script_module(nn_module, infer_methods_to_compile(nn_module))
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 296, in create_script_module
return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 340, in create_script_module_impl
create_methods_from_stubs(concrete_type, stubs)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 259, in create_methods_from_stubs
concrete_type._create_methods(defs, rcbs, defaults)
RuntimeError:
Unknown type name 'torch.nn.SyncBatchNorm':
File "/detectron2_repo/detectron2/layers/wrappers.py", line 64
# https://github.com/pytorch/pytorch/issues/12013
assert not isinstance(
self.norm, torch.nn.SyncBatchNorm
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
), "SyncBatchNorm does not support empty inputs!"
I get similar errors when trying to convert other layers (example, torchscript didn't support del statements in some of the forward passes etc)
2) Secondly I tried to just copy the weights from detectron2 to a mask-rcnn model defined in pytorch/torchvision. I limited myself to the backbone architecture, example definition in torchvision:
backbone = torchvision.models.detection.backbone_utils.resnet_fpn_backbone('resnet50', True)
Although I succeeded in copying the weights, off course the model.backbone.forward calls ended up giving different results. Most likely due to slightly different definitions of the two (detectron2 vs torchvision) architectures and forward passes.
3) My third and final try was to use the onnx/caffe2 exporter. This more or less worked (eg, I ended up with a caffe2 model, but I haven't compared the outputs of the models yet), however, only afterwards I realized I couldn't import onnx models into pytorch, and adding caffe2 support for our deployment would be quite cumbersome, since we just switched from caffe to libtorch... To me it seems that exporting to onnx should be quite similar to exporting to torchscript, so maybe it's quite easy to change the caffe2 exporter to torchscript?
Hi all, looks like PyTorch 1.4 and torchvision 0.5 have made progress on this and a couple of related issues. When will we see the updates rolling out to detectron2? Please see my related question here on the forum: https://discuss.pytorch.org/t/pytorch-1-4-torchvision-0-5-vs-detectron/67002
Hi, I am also having some problems of JIT conversion. It raises an error:
RuntimeError:
Unknown type name 'torch.nn.SyncBatchNorm':
File "/detectron2/detectron2/layers/wrappers.py", line 67
# https://github.com/pytorch/pytorch/issues/12013
assert not isinstance(
self.norm, torch.nn.SyncBatchNorm
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
), "SyncBatchNorm does not support empty inputs!"
Since ONNX conversion gives fix size of input, it is not suitable in my case. Any help please?
An obvious disadvantage of ONNX is we need to fix the input, but some detection models can take flexible size input. JIT-supporting is necessary and urgent.
any good news?
Progress has been made recently (https://github.com/facebookresearch/detectron2/pulls?q=is%3Apr+author%3Achenbohua3+) on this issue and if everything goes well most models should be scriptable within a few months.
Say I want to convert a detectron2 mask-rcnn model to C++ (ideally using torchscript/libtorch), what's the current best approach? I tried various things last week but with no good solution. Things I tried (using recent detectron2, pytorch and torchvision code):
- Naively try to convert some of the blocks to torchscript using
torch.jit.script. This will fail one various things. Example stacktrace:Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 1255, in script return torch.jit._recursive.recursive_script(obj) File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 534, in recursive_script return create_script_module(nn_module, infer_methods_to_compile(nn_module)) File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 296, in create_script_module return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs) File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 336, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 1593, in _construct init_fn(script_module) File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 328, in init_fn scripted = recursive_script(orig_value) File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 534, in recursive_script return create_script_module(nn_module, infer_methods_to_compile(nn_module)) File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 296, in create_script_module return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs) File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 340, in create_script_module_impl create_methods_from_stubs(concrete_type, stubs) File "/usr/local/lib/python3.6/dist-packages/torch/jit/_recursive.py", line 259, in create_methods_from_stubs concrete_type._create_methods(defs, rcbs, defaults) RuntimeError: Unknown type name 'torch.nn.SyncBatchNorm': File "/detectron2_repo/detectron2/layers/wrappers.py", line 64 # https://github.com/pytorch/pytorch/issues/12013 assert not isinstance( self.norm, torch.nn.SyncBatchNorm ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE ), "SyncBatchNorm does not support empty inputs!"I get similar errors when trying to convert other layers (example, torchscript didn't support
delstatements in some of the forward passes etc)
- Secondly I tried to just copy the weights from
detectron2to a mask-rcnn model defined in pytorch/torchvision. I limited myself to the backbone architecture, example definition in torchvision:
backbone = torchvision.models.detection.backbone_utils.resnet_fpn_backbone('resnet50', True)
Although I succeeded in copying the weights, off course the model.backbone.forward calls ended up giving different results. Most likely due to slightly different definitions of the two (detectron2 vs torchvision) architectures and forward passes.
- My third and final try was to use the onnx/caffe2 exporter. This more or less worked (eg, I ended up with a caffe2 model, but I haven't compared the outputs of the models yet), however, only afterwards I realized I couldn't import onnx models into pytorch, and adding caffe2 support for our deployment would be quite cumbersome, since we just switched from caffe to libtorch... To me it seems that exporting to onnx should be quite similar to exporting to torchscript, so maybe it's quite easy to change the caffe2 exporter to torchscript?
Very through try. Did you figure out a way to export the model to an onnx model that can be loaded by other runtime or to an torchscript model?
subscribe the thread
Thanks a lot for all the amazing work being done on this project, it's appreciated a lot!
I understand that detectron models are currently not scriptable with TorchScript. @ppwwyyxx could you please elaborate on what exactly is missing for making Mask R-CNN and PointRend scriptable? Is it blocked by https://github.com/pytorch/pytorch/issues/36061?
pytorch/pytorch#36061 is the main blocker
Replacing the lists of modules with nn.ModuleList works pretty well and we are able to script the models (although we have to retrain them). Now we are running into https://github.com/pytorch/pytorch/issues/46944 because detectron2 relies on classes like Instances and Boxes, which are not included in the scripted model and thus the model is not runnable in a non-Python environment.
@ppwwyyxx do you think it makes sense to wait for proper support for classes in TorchScript or rather change the implementation of Instances, etc. to be based on e.g. NamedTuple as in https://github.com/pytorch/pytorch/issues/42258?
I haven't got to that step yet (since we can't break pre-trained models) but I'll go double check the story around scripted classes in C++.
Btw, with latest github version, tracing (with fixed batch size) already works fine under https://github.com/facebookresearch/detectron2/blob/4ef254fbd7b5edeb93305d3937ecae469f79505b/detectron2/export/torchscript_patch.py#L196-L197 (except for some postprocessing which is often not used in deployment), and using it in C++ is probably more straightforward.
It turns out that converting the Instances to a Dict[str, Tensor] as final output does the trick and the model can be run in C++ even though it uses Instances internally. Sorry for the confusion, it looks like everything is working as expected!
Hi @tkuenzle Do you gain anything in terms of time per frame with the C++ / libtorch version? For a single frame, or maybe by running multiple C++ threads in parallel?
And would it be possible to either share code or outline the main steps you had to take to make scripting work? Thanks.
I cannot really comment on time per frame because our focus is on running the model on mobile devices. I don't think sharing code would be that helpful, because it mostly depends on what models you want to script. Thanks to the work of @chenbohua3 most of the heads are scriptable already and thus the effort to make complete models scriptable is rather small.
The main steps you have to take are the following:
export_torchscript_with_instances to export your modelnn.ModuleList (you will need to retrain the models because of this)assert not torch.jit.is_scripting()pred_masks): def forward(input):
output = self.model(input)
return [o["instances"].pred_masks for o in output]
I hope this helps!
FYI we just added support scripting & tracing for the most common models (R-CNN and RetinaNet). They will export models to torchscript format successfully.
(pytorch built from master branch is required)
There aren't proper APIs & docs yet, but basic usage is now shown in unittests: https://github.com/facebookresearch/detectron2/blob/f1d0c05bc5580388348388213e0a77d722d90f17/tests/test_export_torchscript.py#L23-L150
Thanks a lot, that's great news @ppwwyyxx! Would you be willing to accept PRs for making some of the other models scriptable?
@ppwwyyxx When I run the test, I get this error
(Python 3.6.9, torch 1.8.0.dev20201110
RuntimeError:
Module 'ResNet' has no attribute 'stages' (This attribute exists on the Python module, but we failed to convert Python type: 'list' to a TorchScript type.):
File "detectron2/modeling/backbone/resnet.py", line 437
if "stem" in self._out_features:
outputs["stem"] = x
for name, stage in zip(self.stage_names, self.stages):
~~~~~~~~~~~ <--- HERE
x = stage(x)
if name in self._out_features:
Seems to be related to the issue above. Is there something I'm supposed to do to preprocess the models so they don't have lists and instead have ModuleLists?
If I add
model.backbone.bottom_up.stages = nn.ModuleList(model.backbone.bottom_up.stages)
model.backbone.lateral_convs = nn.ModuleList(model.backbone.lateral_convs)
model.backbone.output_convs = nn.ModuleList(model.backbone.output_convs)
it seems to work, but only for a single image. Does batched mode not work yet?
@danielgordon10 your pytorch is still not new enough.
@ppwwyyxx What's the minimum pytorch version? That was yesterday's nightly.
It now requires yesterday's pytorch commits which are supposed to be in today's nightly.
Once a few other ongoing pytorch features are implemented we expect to require them as well.
I'm closing this issue because the scope is too general (also renaming it so it only involves torchscript) and majority of work is done. There are some remaining TODOs about usability that should be addressed as separate issues:
Thanks a lot to pytorch JIT team and @chenbohua3 @bddpqq from Alibaba for making this happen!
Most helpful comment
We're working on getting TorchScript support. onnx/caffe2 deployment support (discussed in #8) is
internal for now, but will also be released laterreleased already.