Support ONNXIFI interface
https://github.com/pytorch/glow/pull/1369
Putting more details into the implementation: define event, backendid, backend, graph and other entities required for implementing onnxifi interface.
One of the last remaining issue is onnxGetBackendCompatibility tries to do a full parse on the input onnx model, which like to lead to failure because we use onnx model as a vessel for op info. It usually only has one node in the model and no input/output. We can add input/output but doing a full parse for each onnxGetBackendCompatibility is likely inefficient. Plus in the end, we are going to do the full parse on the whole model anyway. It's better that we have a short-cut for this. (ref: https://github.com/onnx/onnx/blob/219aaf91f3ae62762628aff4f68d84412202bcac/onnx/onnxifi.h#L1032-L1036)
BTW, I noticed that in the importer, we have a lot of assert which will just crash the program. Why don't we throw?
In order to create a specific Op Glow needs to know all inputs to the Op ahead of time (that's why parsing of inputs happens). In the compatibility case, we do not actually allocate memory for input tensors (for tensor elements) so that should have very low overhead. In addition that does not happen on a hot path.
Would it be problematic to add inputs to the Op on compatibility mode call?
assert
Those are removed in release build, but used in debug builds
I added inputs but it failed at the same place because it's looking for weight but we didn't supply it in the initialization list. So we still need to avoid full parse.
As per assert, I don't think it's a proper way to check things here. assert means something that should not happen. This might not be the case when we check input because we cannot control it. In debug build, it will crash the program. In the opt build, it will mask the issue, and may lead to later crashes if your predicate doesn't hold.
CC @nadavrot on use of assert. In Glow code base, what's the policy of using assert. Or do we allow throwing exceptions at all?
@rdzhabarov @yinghai May be GLOW_ASSERT should be used? This one is always checked at run-time, even in Release builds. And it aborts if the assert fails. At least it does not mask the issue and does not let the program to proceed.
OK. Let's discuss error handling and propagation later. We can settle with GLOW_ASSERT for now.
Another issue I notice @rdzhabarov, is that I'm printing out the tensor names in https://github.com/pytorch/glow/blob/4e7e8eca6f0236f67cfde8584e8d82fd1431da2d/lib/Importer/ONNXIFILoader.cpp#L95 and it seems that the name sometime is empty and sometimes is truncated. Another misalignment issue?
and it seems that the name sometime is empty and sometimes is truncated. Another misalignment issue?
I found the issue in Pytorch. https://github.com/pytorch/pytorch/pull/10630
Overall, resnet50 in the test_onnxifi.py seems to be running. :)
GLOW_ASSERT will crash the process, which might be problematic for the ONNXIFI case. It might be good now to catch the issues, but we need to handle that gracefully returning back an internal error or something.
Will send a PR soon.
Nice!
Note, I'm OOO today. But will look later.
The basic support of ONNXIFI is added to Glow: we can run Resnet-50 thought the pytorch->onnxifi->glow pipeline.
I'll open specific issues in the future for specific concerns.
@rdzhabarov @yinghai Great work!
Most helpful comment
The basic support of ONNXIFI is added to Glow: we can run Resnet-50 thought the pytorch->onnxifi->glow pipeline.
I'll open specific issues in the future for specific concerns.