Onnxruntime: Why throw exception when inferring shape using onnx

Created on 15 Apr 2019 · 7Comments · Source: microsoft/onnxruntime

Describe the bug
Onnx shape inference is not stable for now and doesn't throw the exception timely.
It's really painful to debug these exceptions.
And for some models, onnx fails to infer the shape but onnxruntime can still execute it.
I suppose onnxruntime will also check the shape at execution, so why onnxruntime doesn't catch these exceptions when inferring shape and throw them when executing kernels if necessary.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: master
Python version: 3.6
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

To Reproduce
Describe steps/code to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Source

lucienwang1009

All 7 comments

Shape inference in ONNX is designed as a "best-effort" component. It will do the best as it can, but nobody can't rely on it. So, there is any shape inference error, it shouldn't block onnxruntime's execution.

Thanks for the quick reply.
Here is a real model for objection detection (YoloV3) that fails to be loaded into onnxruntime with a shape inference error but succeed to run if shape inference is disabled.
It seems onnxrutime just screw up if encountering any exception when invoking onnx: https://github.com/Microsoft/onnxruntime/blob/f19d9a490798a5b9769940db238c6119907a8a67/onnxruntime/core/graph/graph.cc#L1499-L1503

lucienwang1009 on 16 Apr 2019

Here is a simple example to show this issue. Test on master branch.

import onnx
from onnx import helper, numpy_helper, shape_inference
from onnx import AttributeProto, TensorProto, GraphProto
import numpy as np
import onnxruntime

# The protobuf definition can be found here:
# https://github.com/onnx/onnx/blob/master/onnx/onnx.proto


# Create one input (ValueInfoProto)
A = helper.make_tensor_value_info('A', TensorProto.INT32, [2])
B = helper.make_tensor_value_info('B', TensorProto.FLOAT, [4])
C = helper.make_tensor_value_info('C', TensorProto.FLOAT, [2, 2])

# Create one output (ValueInfoProto)
Y = helper.make_tensor_value_info('Y', TensorProto.FLOAT, [2, 2])

# Create a node (NodeProto)
cast = helper.make_node(
    "Cast",
    inputs=["A"],
    outputs=["cast"],
    to=TensorProto.INT64
)
reshape = helper.make_node(
    "Reshape",
    inputs=["B", "cast"],
    outputs=["reshape"],
)
matmul = helper.make_node(
    'MatMul', # node name
    inputs=['reshape', 'C'], # inputs
    outputs=['Y'], # outputs
)

initializers = [
    numpy_helper.from_array(
        np.array([2, 2], dtype=np.int32),
        name='A'
    ),
    numpy_helper.from_array(
        np.array([1, 2, 3, 4], dtype=np.float32),
        name='B'
    ),
    numpy_helper.from_array(
        np.array([[1, 2], [3, 4]], dtype=np.float32),
        name='C'
    )
]

# Create the graph (GraphProto)
graph_def = helper.make_graph(
    [cast, reshape, matmul],
    "test-model",
    [A, B, C],
    [Y],
    initializers
)

# Create the model (ModelProto)
model_def = helper.make_model(graph_def,
                              producer_name='onnx-example')

shape_inference.infer_shapes(model_def)


print('The ir_version in model: {}\n'.format(model_def.ir_version))
print('The producer_name in model: {}\n'.format(model_def.producer_name))
print('The graph in model:\n{}'.format(model_def.graph))
onnx.checker.check_model(model_def)
print('The model is checked!')
onnx.save(model_def, "model.onnx")

sess = onnxruntime.InferenceSession("model.onnx")
lotus_result = sess.run(['Y'], {})
print(lotus_result)

It will throw [ShapeInferenceError] Input tensors of wrong rank (0) that comes from https://github.com/onnx/onnx/blob/3717dc617fa06e4eea326e85dc0ccfdcdf4f4ab5/onnx/defs/math/defs.cc#L754-L763.

lucienwang1009 on 16 Apr 2019

we encountered a similar problem. shape inference of "slice" in onnx has a bug, and onnxruntime will throw an exception directly, while if I disable slice's shape inference then onnxruntime can run the model and its output is right.

zhijxu-MS on 18 Apr 2019

In order to not create more confusion, I'll close this PR.
For any type and shape inference exception thrown from ONNX, it's an error, it must get fixed in ONNX.
If you think runtime should ignore this exception and continue to run, then please don't throw exception.
Shape inference is optional, but type inference is a must. Type inference must finish successfully.

snnn on 19 Apr 2019

In order to not create more confusion, I'll close this PR.
For any type and shape inference exception thrown from ONNX, it's an error, it must get fixed in ONNX.
If you think runtime should ignore this exception and continue to run, then please don't throw exception.
Shape inference is optional, but type inference is a must. Type inference must finish successfully.

Thanks for these explanations.

lucienwang1009 on 19 Apr 2019

The shape inference bugs with 2 ops mentioned in this issue are fixed and checked in. Refer to
MatMul Shape Inference Fix : https://github.com/onnx/onnx/pull/1941
ConstantOfShape Shape Inference Fix: https://github.com/onnx/onnx/pull/1951