Onnxruntime: Using Experimental operators (such as ATen) over C/C++ API.

Created on 18 Aug 2019  路  14Comments  路  Source: microsoft/onnxruntime

Is your feature request related to a problem? Please describe.
My exported ONNX model contains ATen operator. This operator is considered experimental and deprecated. However, there is no easy way to remove ATen from the exported ONNX file. Looking at ONNX source, I see that ATen is implemented in 0.4.0 version (1 April commit). When I track it, registration of those operators are done in a function named registerContribSchemas() in onnxruntime/core/graph/contrib_ops/contrib_defs.cc file.
Looking at shared library files in both 0.4 and 0.5 release, I don't see any exported ATen related contrib exports. So I cannot myself add the prototype and call it.

How can I call the internal onnxruntime::contrib::registerContribSchemas() from C/C++ API? or how can I use the deprecated contrib operators?

System information
Tried both 0.4 and 0.5 versions.

Describe the solution you'd like
C/C++ can export onnxruntime::contrib::registerContribSchemas()

Describe alternatives you've considered
-
Additional context
-

converter support

Most helpful comment

@furkankirac - PR https://github.com/pytorch/pytorch/pull/27071 has been added for supporting torch.group_norm in the PyTorch exporter.

All 14 comments

The contrib ops should be registered by default (users don't need to register them manually) as https://github.com/microsoft/onnxruntime/blob/17c8fe44e37bfbe007a9f36fd105fb29a26181e3/onnxruntime/core/session/environment.cc#L43.

Of course, build config should not have "--disable_contrib_ops" specified. @pranavsharma

Thanks for the feedback.
I already ensured that DISABLE_CONTRIB_OPS is OFF in CMake side.
During onnx export it writes:
Warning: ATen was a removed experimental ops. In the future, we may directly reject this operator. Please update your model as soon as possible.

I have ATen operators exported as below:
%387 : Float(1, 64, 28, 40) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[1, 1], pads=[0, 0, 0, 0], strides=[1, 1]](%385, %386), scope: FRCNNModel/GeneralizedRCNN/Sequential[backbone]/FPN[fpn]/Sequential[fpn_inner4]/Conv2d[0] # /lib/python3.7/site-packages/torch/nn/modules/conv.py:340:0 %388 : Float(64) = onnx::Constant[value=<Tensor>](), scope: FRCNNModel/GeneralizedRCNN/Sequential[backbone]/FPN[fpn]/Sequential[fpn_inner4]/GroupNorm[1] # /lib/python3.7/site-packages/torch/nn/functional.py:1692:0 %389 : Float(64) = onnx::Constant[value=<Tensor>](), scope: FRCNNModel/GeneralizedRCNN/Sequential[backbone]/FPN[fpn]/Sequential[fpn_inner4]/GroupNorm[1] # /lib/python3.7/site-packages/torch/nn/functional.py:1692:0 %390 : Float(1, 64, 28, 40) = onnx::ATen[cudnn_enabled=1, eps=1e-05, num_groups=4, operator="group_norm"](%387, %388, %389), scope: /lib/python3.7/site-packages/torch/nn/functional.py:1692:0

When I try to load this model by OnnxRuntime C++ API, I get the following error printed to stderr:
Warning: ATen was a removed experimental ops. In the future, we may directly reject this operator. Please update your model as soon as possible. libc++abi.dylib: terminating with uncaught exception of type Ort::Exception: Load model from model.onnx failed:Fatal error: ATen is not a registered function/op

Only thing that comes to my mind is that the operator="group_norm" part in the ATen block changes the signature and as a result ATen cannot be matched exactly.

What would you suggest for supporting GroupNorm exported via ATen op?
Best

The warning is there for awareness, but the model should still run.

@linkerzhang @ebarsoumMS is there a suggested alternative to this op if someone wanted to revise their model?

It looks like the schema for the Aten op was registered. (contrib_defs.cc). The kernel, however, wasn't registered inside cpu_contrib_kernels.cc. I'll fix this.

On further examination, we don't have an implementation for this op, which is why a kernel was never registered for it. What is the intended functionality of the op?

This op happens to be exported only when a Group Normalization module (instead of batch norm) is used with a scratch training of a maskrcnn-benchmark model.
Standard training of this model incorporates pretrained weights from ImageNet. In order to change number of filters etc. one needs to train from scratch without fixed layers. Scratch training is only stable with Group Norm usage. GroupNorm seems to be a more stable variant of normalization, and becomes accepted more recently.

We'll need to propose this op here after which we can implement it in onnxruntime. Would you like to propose this op? Instructions are here https://github.com/onnx/onnx/blob/master/docs/AddNewOp.md.

I would like to work-around the faced problem due to mission critical deadlines.
For the time being, I could use a suggested alternative if there is any. @linkerzhang @ebarsoumMS
If in the end it comes to propose a new op, I'll look into it.
Best

@furkankirac, are you exporting the model with Pytorch exporter? Re-importing the model with latest Pytorch can resolve this issue. Latest Pytorch exporter removed Aten.

Hi @yufenglee, thanks for the alternative. I already tried to export with Pytorch 1.2. ATen operator still gets exported when Group Normalization is used. Most probably it's due to GroupNorm not having an implementation in ONNX standard yet. ONNX exporter doesn't know the op, and there is only one way to export it, namely using ATen ops.

Currently ORT doesn't have an implementation for this op. So either (1) Pytorch should stop using this op in the exported ONNX model or (2) ORT should implement this op. Since this is an experimental op and was intended to be removed before, I would prefer option 1. @spandantiwari @ebarsoumMS @linkerzhang

@furkankirac - yes, the torch.nn.groupnorm op is not supported in the ONNX exporter (not correctly at least). I have opened an issue in PyTorch repo https://github.com/pytorch/pytorch/issues/26753 and have assigned it to myself. We will try to add the support soon.

Closing this issue since the fix will be in PyTorch. pytorch/pytorch#26753

@furkankirac - PR https://github.com/pytorch/pytorch/pull/27071 has been added for supporting torch.group_norm in the PyTorch exporter.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vera121 picture vera121  路  3Comments

Pavel-Konarik picture Pavel-Konarik  路  4Comments

diwakar-ravichandran picture diwakar-ravichandran  路  5Comments

Exlsunshine picture Exlsunshine  路  4Comments

walbermr picture walbermr  路  3Comments