Glow: [onnxifi] Execute Resnet50 via ONNXIFI interface

Created on 28 Jun 2018 · 11Comments · Source: pytorch/glow

Source

rdzhabarov

Most helpful comment

The basic support of ONNXIFI is added to Glow: we can run Resnet-50 thought the pytorch->onnxifi->glow pipeline.
I'll open specific issues in the future for specific concerns.

rdzhabarov on 28 Aug 2018

❤2 🎉2

All 11 comments

https://github.com/pytorch/glow/pull/1369
Putting more details into the implementation: define event, backendid, backend, graph and other entities required for implementing onnxifi interface.

rdzhabarov on 31 Jul 2018

One of the last remaining issue is onnxGetBackendCompatibility tries to do a full parse on the input onnx model, which like to lead to failure because we use onnx model as a vessel for op info. It usually only has one node in the model and no input/output. We can add input/output but doing a full parse for each onnxGetBackendCompatibility is likely inefficient. Plus in the end, we are going to do the full parse on the whole model anyway. It's better that we have a short-cut for this. (ref: https://github.com/onnx/onnx/blob/219aaf91f3ae62762628aff4f68d84412202bcac/onnx/onnxifi.h#L1032-L1036)

BTW, I noticed that in the importer, we have a lot of assert which will just crash the program. Why don't we throw?

yinghai on 17 Aug 2018

In order to create a specific Op Glow needs to know all inputs to the Op ahead of time (that's why parsing of inputs happens). In the compatibility case, we do not actually allocate memory for input tensors (for tensor elements) so that should have very low overhead. In addition that does not happen on a hot path.
Would it be problematic to add inputs to the Op on compatibility mode call?

assert

Those are removed in release build, but used in debug builds

rdzhabarov on 17 Aug 2018

I added inputs but it failed at the same place because it's looking for weight but we didn't supply it in the initialization list. So we still need to avoid full parse.

As per assert, I don't think it's a proper way to check things here. assert means something that should not happen. This might not be the case when we check input because we cannot control it. In debug build, it will crash the program. In the opt build, it will mask the issue, and may lead to later crashes if your predicate doesn't hold.

yinghai on 17 Aug 2018

CC @nadavrot on use of assert. In Glow code base, what's the policy of using assert. Or do we allow throwing exceptions at all?

yinghai on 17 Aug 2018

@rdzhabarov @yinghai May be GLOW_ASSERT should be used? This one is always checked at run-time, even in Release builds. And it aborts if the assert fails. At least it does not mask the issue and does not let the program to proceed.

opti-mix on 17 Aug 2018

👍1

OK. Let's discuss error handling and propagation later. We can settle with GLOW_ASSERT for now.

Another issue I notice @rdzhabarov, is that I'm printing out the tensor names in https://github.com/pytorch/glow/blob/4e7e8eca6f0236f67cfde8584e8d82fd1431da2d/lib/Importer/ONNXIFILoader.cpp#L95 and it seems that the name sometime is empty and sometimes is truncated. Another misalignment issue?

yinghai on 17 Aug 2018

and it seems that the name sometime is empty and sometimes is truncated. Another misalignment issue?

I found the issue in Pytorch. https://github.com/pytorch/pytorch/pull/10630
Overall, resnet50 in the test_onnxifi.py seems to be running. :)

yinghai on 17 Aug 2018

👍1

GLOW_ASSERT will crash the process, which might be problematic for the ONNXIFI case. It might be good now to catch the issues, but we need to handle that gracefully returning back an internal error or something.

Will send a PR soon.

Nice!

Note, I'm OOO today. But will look later.

rdzhabarov on 17 Aug 2018

👍1

The basic support of ONNXIFI is added to Glow: we can run Resnet-50 thought the pytorch->onnxifi->glow pipeline.
I'll open specific issues in the future for specific concerns.

rdzhabarov on 28 Aug 2018

❤2 🎉2

@rdzhabarov @yinghai Great work!