Glow: Codegen of FullyConnected (via ONNXIFI) is really suboptimal

Created on 20 Dec 2018 · 9Comments · Source: pytorch/glow

Right now using ONNXIFI to hand a Caffe2 FC to Glow ends up generating something like
Add(MatMul(X, Transpose(Weights)), Tile(Reshape(Bias)))
Which is kind of unfortunate (particularly the Bias->Reshape->Tile->Add part).

I'm debating whether to just special case this in the ONNX Gemm loader or write a GraphOptimizer pattern for it. The latter is probably cleaner (loading Gemm is already complicated), but it makes me a bit sad to produce an inefficient representation only to turn around and optimize it later :-p.

good first issue

Source

bertmaher

Most helpful comment

I'm assuming you're saying the Bias->Reshape->Tile->Add is unfortunate because the Reshape->Tile is not folded up into Bias? If so, we have https://github.com/pytorch/glow/issues/1469 open for implementing constant folding which would solve this -- and I think that is a more preferable solution over writing individual passes for specific constant folding patterns or implementing it specially in the loader.

jfix71 on 8 Jan 2019

👍3

All 9 comments

Maybe optimization pass looks better. You can't control how user generates their models. :)

yinghai on 5 Jan 2019

jfix71 on 8 Jan 2019

👍3

I think it's a bit more specialized than just constant folding -- if I understand correctly, Reshape->Tile->Add should become BatchedAdd. (And honestly the whole thing should probably just be "FC" :) ).

bertmaher on 8 Jan 2019

Oh, whoops -- I assumed when you said Add above you meant BatchedAdd. Is the FC being broken down into MatMul->Add before it gets to the Caffe2ModelLoader? Otherwise I'm not sure why this would happen, as it should be created as a FullyConnectedNode via the "FC" case in Caffe2ModelLoader.

If the FC is being broken down before the Caffe2ModelLoader gets it, then yeah this seems like a specialized pattern to write a graph optimization pass for (Reshape->Tile->Add replaced by BatchedAdd).

jfix71 on 8 Jan 2019

👍1

If we are to process C2 models through ONNXIFI we'd not have this problem and it seems that we can do this now.

Do we really need to address and fix this?

rdzhabarov on 8 Jan 2019

I think with C2 models passed over ONNXIFI this problem would just go away. The problem is the C2->ONNX conversion, which creates a bad node sequence.

bertmaher on 8 Jan 2019

👍1

I think with C2 models passed over ONNXIFI this problem would just go away. The problem is the C2->ONNX conversion, which creates a bad node sequence.

@bertmaher
The c2 loader convert the c2 fc to ir fc:
https://github.com/pytorch/glow/blob/2793672d9368f47690208acc3c8933944136cef9/lib/Importer/Caffe2ModelLoader.cpp#L621
Is it the problem that causes c2 fc suboptimal?

JerryShih on 7 Mar 2019

Is it the problem that causes c2 fc suboptimal?

Actual problem as Bert mentioned above is that if ONNXIFI uses onnx format to pass model down to Glow there is an intermediate conversion from C2 FC to something that can be represented via ONNX (ONNX does not have FC). This introduces a lot of unnecessary nodes.

rdzhabarov on 7 Mar 2019

I am looking at this issue.