Onnxruntime: Error creating onnxruntime session

Created on 29 Apr 2020  路  8Comments  路  Source: microsoft/onnxruntime

Problem Description

I'm trying to export the following pytorch model to onnx using torch scripting.

@torch.jit.script
def get_count(array, thresh):
    count = torch.tensor(0)
    for i in range(5):
       if array[i] > thresh:
        count = count + 1
    return count

class test_model(nn.Module):
    def forward(self, array, thresh):
        return get_count(array, thresh)

I use the following to export my model to onnx.

array = torch.tensor([10,11,5,4,6])
thresh = torch.tensor(5)
model = test_model()

torch.onnx.export(model=model,
                  args=(array, thresh),
                  f="test_model.onnx",
                  verbose=True,
                  opset_version=11,
                  input_names=['input_data', 'threshold'])

Here is the onnx graph that i get after export:

graph(%input_data : Long(5),
      %threshold : Long()):
  %2 : Long() = onnx::Constant[value={1}]()
  %3 : Long() = onnx::Constant[value={5}]()
  %4 : Long() = onnx::Constant[value={0}]()
  %5 : Tensor = onnx::Cast[to=9](%2)
  %6 : Long() = onnx::Loop(%3, %5, %4) # <ipython-input-70-c5725febec4a>:4:4
    block0(%i.1 : Long(), %cond : bool, %count.10 : Tensor):
      %10 : Tensor = onnx::Gather[axis=0](%input_data, %i.1) # <ipython-input-70-c5725febec4a>:5:10
      %11 : Tensor = onnx::Greater(%10, %threshold) # <ipython-input-70-c5725febec4a>:5:10
      %12 : Tensor = onnx::If(%11) # <ipython-input-70-c5725febec4a>:5:7
        block0():
          %13 : Long() = onnx::Constant[value={1}]()
          %14 : LongTensor = onnx::Add(%count.10, %13) # <ipython-input-70-c5725febec4a>:6:16
          -> (%14)
        block1():
          -> (%count.10)
      %15 : Tensor = onnx::Cast[to=9](%2)
      -> (%15, %12)
  return (%6)

However, when I create an onnx runtime inference session using

 ort_sess = onnxruntime.InferenceSession('test_sort.onnx')

I get the following error:

---------------------------------------------------------------------------
Fail                                      Traceback (most recent call last)
<ipython-input-73-592f4c5d3a84> in <module>
      3 inputs = (array,threshold)
      4 
----> 5 ort_sess = ort.InferenceSession('test_sort.onnx')
      6 
      7 # ort_inputs = {ort_session.get_inputs()[i].name:inpt for i, inpt in enumerate(to_np(inputs))}

/opt/conda/lib/python3.7/site-packages/onnxruntime/capi/session.py in __init__(self, path_or_bytes, sess_options, providers)
     23         self._path_or_bytes = path_or_bytes
     24         self._sess_options = sess_options
---> 25         self._load_model(providers)
     26         self._enable_fallback = True
     27 

/opt/conda/lib/python3.7/site-packages/onnxruntime/capi/session.py in _load_model(self, providers)
     41             raise TypeError("Unable to load from type '{0}'".format(type(self._path_or_bytes)))
     42 
---> 43         self._sess.load_model(providers)
     44 
     45         self._session_options = self._sess.session_options

Fail: [ONNXRuntimeError] : 1 : FAIL : Exception during loading: /onnxruntime_src/onnxruntime/core/graph/graph.cc:2485 onnxruntime::common::Status onnxruntime::Graph::SetGraphInputsOutputs() node_arg was false. Graph ctor should have created NodeArg for initializer.

Urgency
Moderate.

System Information
OS: Linux Ubuntu 16.04
ONNX runtime version 1.1.0 installed from binary
Python version: 3.7.4

Expected Behavior
Successful creation of onnxruntime session.

Could someone please help me resolve this error?

bug wontfix

Most helpful comment

There are actually two edge case bugs involved here.

  1. The 'count' input to the Loop subgraph has no type information (which is valid in an ONNX model) and is not used directly in the Loop. It's used inside an If node in the Loop which is a separate nested subgraph. That creates a subtle interaction where we don't create a necessary piece in the Loop subgraph.

After fixing that I hit another issue related to the combination of Loop and If in a trivial graph.

  1. The loop state variable for 'count' is being passed directly to an If node and used directly as the output in one of the branches there. As the loop state variable may change shape across iterations we can't infer a shape, which means we couldn't infer the output shape from the If branch. We do have a way to handle this sort of unknown output shape so a fix was also required to use that.

Once I create some unit tests I'll put the changes in a PR.

All 8 comments

Can you please share the ONNX model file?

@hariharans29 the code runs as is -- I reproduced in a colab notebook

Running the notebook gives you this model: onnxruntime_3755.zip

There are actually two edge case bugs involved here.

  1. The 'count' input to the Loop subgraph has no type information (which is valid in an ONNX model) and is not used directly in the Loop. It's used inside an If node in the Loop which is a separate nested subgraph. That creates a subtle interaction where we don't create a necessary piece in the Loop subgraph.

After fixing that I hit another issue related to the combination of Loop and If in a trivial graph.

  1. The loop state variable for 'count' is being passed directly to an If node and used directly as the output in one of the branches there. As the loop state variable may change shape across iterations we can't infer a shape, which means we couldn't infer the output shape from the If branch. We do have a way to handle this sort of unknown output shape so a fix was also required to use that.

Once I create some unit tests I'll put the changes in a PR.

There are actually two edge case bugs involved here.

  1. The 'count' input to the Loop subgraph has no type information (which is valid in an ONNX model) and is not used directly in the Loop. It's used inside an If node in the Loop which is a separate nested subgraph. That creates a subtle interaction where we don't create a necessary piece in the Loop subgraph.

After fixing that I hit another issue related to the combination of Loop and If in a trivial graph.

  1. The loop state variable for 'count' is being passed directly to an If node and used directly as the output in one of the branches there. As the loop state variable may change shape across iterations we can't infer a shape, which means we couldn't infer the output shape from the If branch. We do have a way to handle this sort of unknown output shape so a fix was also required to use that.

Once I create some unit tests I'll put the changes in a PR.

Hi,
When I run the above code, I met a different error:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/graph.cc:912 void onnxruntime::Graph::InitializeStateFromModelFileGraphProto() This is an invalid model. Graph output (count.10) does not exist in the graph.

I want to know if this error can be fixed in your new merge request #4004

Running the notebook gives you this model: onnxruntime_3755.zip
@snsun I tested with this model and see no errors. How are you creating the model that fails?

Running the notebook gives you this model: onnxruntime_3755.zip
@snsun I tested with this model and see no errors. How are you creating the model that fails?

Tracked it down to being due to the pytorch version. Version 1.4 emits a bad subgraph where a
graph output is not produced by any nodes in the subgraph. Version 1.5 works fine (which is what produced the graph I tested with).

Oh, I am using pytorch 1.4. I will test with pytorch 1.5. Thank you!

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

Was this page helpful?
0 / 5 - 0 ratings