Onnxruntime: Using Experimental operators (such as ATen) over C/C++ API.

Created on 18 Aug 2019 · 14Comments · Source: microsoft/onnxruntime

Is your feature request related to a problem? Please describe.
My exported ONNX model contains ATen operator. This operator is considered experimental and deprecated. However, there is no easy way to remove ATen from the exported ONNX file. Looking at ONNX source, I see that ATen is implemented in 0.4.0 version (1 April commit). When I track it, registration of those operators are done in a function named registerContribSchemas() in onnxruntime/core/graph/contrib_ops/contrib_defs.cc file.
Looking at shared library files in both 0.4 and 0.5 release, I don't see any exported ATen related contrib exports. So I cannot myself add the prototype and call it.

How can I call the internal onnxruntime::contrib::registerContribSchemas() from C/C++ API? or how can I use the deprecated contrib operators?

System information
Tried both 0.4 and 0.5 versions.

Describe the solution you'd like
C/C++ can export onnxruntime::contrib::registerContribSchemas()

Describe alternatives you've considered
-
Additional context
-

converter support

Source

furkankirac

Most helpful comment

@furkankirac - PR https://github.com/pytorch/pytorch/pull/27071 has been added for supporting torch.group_norm in the PyTorch exporter.

spandantiwari on 30 Sep 2019

🚀2

All 14 comments

The contrib ops should be registered by default (users don't need to register them manually) as https://github.com/microsoft/onnxruntime/blob/17c8fe44e37bfbe007a9f36fd105fb29a26181e3/onnxruntime/core/session/environment.cc#L43.

Of course, build config should not have "--disable_contrib_ops" specified. @pranavsharma

linkerzhang on 19 Aug 2019

Thanks for the feedback.
I already ensured that DISABLE_CONTRIB_OPS is OFF in CMake side.
During onnx export it writes:
Warning: ATen was a removed experimental ops. In the future, we may directly reject this operator. Please update your model as soon as possible.

I have ATen operators exported as below:
%387 : Float(1, 64, 28, 40) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%385, %386), scope: FRCNNModel/GeneralizedRCNN/Sequential[backbone]/FPN[fpn]/Sequential[fpn_inner4]/Conv2d[0] # /lib/python3.7/site-packages/torch/nn/modules/conv.py:340:0 %388 : Float(64) = onnx::Constant[value=<Tensor>](), scope: FRCNNModel/GeneralizedRCNN/Sequential[backbone]/FPN[fpn]/Sequential[fpn_inner4]/GroupNorm[1] # /lib/python3.7/site-packages/torch/nn/functional.py:1692:0 %389 : Float(64) = onnx::Constant[value=<Tensor>](), scope: FRCNNModel/GeneralizedRCNN/Sequential[backbone]/FPN[fpn]/Sequential[fpn_inner4]/GroupNorm[1] # /lib/python3.7/site-packages/torch/nn/functional.py:1692:0 %390 : Float(1, 64, 28, 40) = onnx::ATen[cudnn_enabled=1, eps=1e-05, num_groups=4, operator="group_norm"](%387, %388, %389), scope: /lib/python3.7/site-packages/torch/nn/functional.py:1692:0

When I try to load this model by OnnxRuntime C++ API, I get the following error printed to stderr:
Warning: ATen was a removed experimental ops. In the future, we may directly reject this operator. Please update your model as soon as possible. libc++abi.dylib: terminating with uncaught exception of type Ort::Exception: Load model from model.onnx failed:Fatal error: ATen is not a registered function/op

Only thing that comes to my mind is that the operator="group_norm" part in the ATen block changes the signature and as a result ATen cannot be matched exactly.

What would you suggest for supporting GroupNorm exported via ATen op?
Best

furkankirac on 19 Aug 2019

The warning is there for awareness, but the model should still run.

@linkerzhang @ebarsoumMS is there a suggested alternative to this op if someone wanted to revise their model?

faxu on 19 Aug 2019

It looks like the schema for the Aten op was registered. (contrib_defs.cc). The kernel, however, wasn't registered inside cpu_contrib_kernels.cc. I'll fix this.

pranavsharma on 19 Aug 2019

On further examination, we don't have an implementation for this op, which is why a kernel was never registered for it. What is the intended functionality of the op?

pranavsharma on 19 Aug 2019

This op happens to be exported only when a Group Normalization module (instead of batch norm) is used with a scratch training of a maskrcnn-benchmark model.
Standard training of this model incorporates pretrained weights from ImageNet. In order to change number of filters etc. one needs to train from scratch without fixed layers. Scratch training is only stable with Group Norm usage. GroupNorm seems to be a more stable variant of normalization, and becomes accepted more recently.

furkankirac on 19 Aug 2019

We'll need to propose this op here after which we can implement it in onnxruntime. Would you like to propose this op? Instructions are here https://github.com/onnx/onnx/blob/master/docs/AddNewOp.md.

pranavsharma on 19 Aug 2019

I would like to work-around the faced problem due to mission critical deadlines.
For the time being, I could use a suggested alternative if there is any. @linkerzhang @ebarsoumMS
If in the end it comes to propose a new op, I'll look into it.
Best

furkankirac on 19 Aug 2019

@furkankirac, are you exporting the model with Pytorch exporter? Re-importing the model with latest Pytorch can resolve this issue. Latest Pytorch exporter removed Aten.

yufenglee on 10 Sep 2019

Hi @yufenglee, thanks for the alternative. I already tried to export with Pytorch 1.2. ATen operator still gets exported when Group Normalization is used. Most probably it's due to GroupNorm not having an implementation in ONNX standard yet. ONNX exporter doesn't know the op, and there is only one way to export it, namely using ATen ops.

furkankirac on 14 Sep 2019

Currently ORT doesn't have an implementation for this op. So either (1) Pytorch should stop using this op in the exported ONNX model or (2) ORT should implement this op. Since this is an experimental op and was intended to be removed before, I would prefer option 1. @spandantiwari @ebarsoumMS @linkerzhang

pranavsharma on 24 Sep 2019

@furkankirac - yes, the torch.nn.groupnorm op is not supported in the ONNX exporter (not correctly at least). I have opened an issue in PyTorch repo https://github.com/pytorch/pytorch/issues/26753 and have assigned it to myself. We will try to add the support soon.

spandantiwari on 24 Sep 2019

🎉2

Closing this issue since the fix will be in PyTorch. pytorch/pytorch#26753

faxu on 25 Sep 2019

@furkankirac - PR https://github.com/pytorch/pytorch/pull/27071 has been added for supporting torch.group_norm in the PyTorch exporter.

spandantiwari on 30 Sep 2019

🚀2

Was this page helpful?

0 / 5 - 0 ratings