Describe the bug
I try the PyOp in onnxruntime. I build the onnxruntime from source with "sh build.sh --config Release --enable_language_interop_ops --build_shared_lib --enable_pybind", and run the official test code shown as follows, and run onnxruntime inference. However, I got "segmentation fault" in "session.run(None, dic)".
System information
To Reproduce
import os
import numpy as np
import onnxruntime
from onnx import *
A = helper.make_tensor_value_info('A', TensorProto.FLOAT, [1, 2])
B = helper.make_tensor_value_info('B', TensorProto.FLOAT, [1, 2])
C = helper.make_tensor_value_info('C', TensorProto.FLOAT, [1, 2])
D = helper.make_tensor_value_info('D', TensorProto.FLOAT, [1, 2])
E = helper.make_tensor_value_info('E', TensorProto.FLOAT, [1, 2])
F = helper.make_tensor_value_info('F', TensorProto.FLOAT, [1, 2])
ad1_node = helper.make_node('Add', ['A','B'], ['S'])
mul_node = helper.make_node('Mul', ['C','D'], ['P'])
py1_node = helper.make_node(op_type = 'PyOp', #required, must be 'PyOp'
inputs = ['S','P'], #required
outputs = ['L','M','N'], #required
domain = 'pyopmulti_1', #required, must be unique
input_types = [TensorProto.FLOAT, TensorProto.FLOAT], #required
output_types = [TensorProto.FLOAT, TensorProto.FLOAT, TensorProto.FLOAT], #required
module = 'mymodule', #required
class_name = 'Multi_1', #required
compute = 'compute', #optional, 'compute' by default
W1 = '5', W2 = '7', W3 = '9') #optional, must all be strings
ad2_node = helper.make_node('Add', ['L','M'], ['H'])
py2_node = helper.make_node('PyOp',['H','N','E'],['O','W'], domain = 'pyopmulti_2',
input_types = [TensorProto.FLOAT, TensorProto.FLOAT, TensorProto.FLOAT],
output_types = [TensorProto.FLOAT, TensorProto.FLOAT],
module = 'mymodule', class_name = 'Multi_2')
sub_node = helper.make_node('Sub', ['O','W'], ['F'])
graph = helper.make_graph([ad1_node,mul_node,py1_node,ad2_node,py2_node,sub_node], 'multi_pyop_graph', [A,B,C,D,E], [F])
model = helper.make_model(graph, producer_name = 'pyop_model')
onnx.save(model, './model.onnx')
a=np.zeros((1,2)).astype('float32')
b=np.zeros((1,2)).astype('float32')
c=np.zeros((1,2)).astype('float32')
d=np.zeros((1,2)).astype('float32')
e=np.zeros((1,2)).astype('float32')
original_model = onnx.load('model.onnx')
session = onnxruntime.InferenceSession("model.onnx")
dic={}
dic['A']=a
dic['B']=b
dic['C']=c
dic['D']=d
dic['E']=e
pred = session.run(None, dic)
print (pred)
Expected behavior
print the prediction
Screenshots
Segmentation fault (core dumped)
Additional context
I insert "cout" in all the functions in "language_interop_oos.cc" "pyop.cc" and "pywrapper.cc", and re-build, but found the fault occurs in pyobject_getattrstring when args exists.
If there is no args, the fault will occur in pyeval_callobject.
@RandySheriff is this something you can take a look at? thanks!
@liuzhengzhe : would u mind share us your py modules please? BTW, in your test they are placed in sys.path, right?
Thanks for your reply.
Please find the attached file and I think you can run it directly. I have put the onnxruntime folder in the same folder with main.py and mymoudle.py
custom_op.zip
@liuzhengzhe: thanks!
I am looking at the stack with my gdb - the segmentation fault is from PyUnicode_InternFromString called from within PyObject_GetAttrString. Further, what do u mean by "if there is no args..."?
This is my output. (I inserted "cout" in "language_interop_oos.cc" "pyop.cc" and "pywrapper.cc")
loadinterop2
loadinterop...
loadpyop...
loadpyop...
getname
input
inputtype
inputtype
output
outputtype
outputtype
outputtype
getname
getname
input
inputtype
inputtype
inputtype
output
outputtype
outputtype
getname
kernelcustom...
kernel...
getinst
proxy...
init...
scope
scope2
newins...
scope
scope2
add
add
add
add
kernelcustom...
kernel...
getinst
newins...
scope
scope2
add
add
add
add
compute...
0
gettype..
1
gettype..
getinst
invoke...
scope
scope2
Segmentation fault (core dumped)
@RandySheriffH I have modified main.py and mymodule.py by removing args. Then PyObject_GetAttrString passed, but the segment fault occurs in pyeval_callobject.
@liuzhengzhe
Now we have a candidate fix in my private branch:
https://github.com/microsoft/onnxruntime/tree/rashuai/RefactorPyOp
Please comiple with "./build.sh --config RelWithDebInfo --enable_language_interop_ops --build_wheel
". Note that the branch abandoned libonnxruntime_pywrapper.so meaning you only need to pip install the latest whl. Tested with your model, got the result without segfault. But please let me know if it works on your side.
It works fine, thanks! @RandySheriffH
@liuzhengzhe
Good, will send out PR shortly.
@RandySheriffH
Can you send me the modified main.py and mymodule.py.
I get a segmentation fault with latest onnxruntime version.
With your branch I get
self._sess.load_model(providers)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: Unknown model file format version.
thanks
Amir
@liuzhengzhe I meet the same problem, can you send me your modified code?
@RandySheriffH,
hello, i am using onnxruntime-1.3.0 built from source code, and i meet the same problem, i get segmentation fault when inference with pyop. Can you you provide the code which can solve the problem? And why dont you merge the code into master?
@RandySheriffH, i want to know whether i can solve the problem if i upgrade onnxruntime1.13.0 to onnxruntime1.14.0? Thank you.
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
This issue has been automatically closed due to inactivity. Please reactivate if further support is needed.
Most helpful comment
@liuzhengzhe
Now we have a candidate fix in my private branch:
https://github.com/microsoft/onnxruntime/tree/rashuai/RefactorPyOp
Please comiple with "./build.sh --config RelWithDebInfo --enable_language_interop_ops --build_wheel
". Note that the branch abandoned libonnxruntime_pywrapper.so meaning you only need to pip install the latest whl. Tested with your model, got the result without segfault. But please let me know if it works on your side.