Glow: Codegen of FullyConnected (via ONNXIFI) is really suboptimal

Created on 20 Dec 2018  路  9Comments  路  Source: pytorch/glow

Right now using ONNXIFI to hand a Caffe2 FC to Glow ends up generating something like
Add(MatMul(X, Transpose(Weights)), Tile(Reshape(Bias)))
Which is kind of unfortunate (particularly the Bias->Reshape->Tile->Add part).

I'm debating whether to just special case this in the ONNX Gemm loader or write a GraphOptimizer pattern for it. The latter is probably cleaner (loading Gemm is already complicated), but it makes me a bit sad to produce an inefficient representation only to turn around and optimize it later :-p.

good first issue

Most helpful comment

I'm assuming you're saying the Bias->Reshape->Tile->Add is unfortunate because the Reshape->Tile is not folded up into Bias? If so, we have https://github.com/pytorch/glow/issues/1469 open for implementing constant folding which would solve this -- and I think that is a more preferable solution over writing individual passes for specific constant folding patterns or implementing it specially in the loader.

All 9 comments

Maybe optimization pass looks better. You can't control how user generates their models. :)

I'm assuming you're saying the Bias->Reshape->Tile->Add is unfortunate because the Reshape->Tile is not folded up into Bias? If so, we have https://github.com/pytorch/glow/issues/1469 open for implementing constant folding which would solve this -- and I think that is a more preferable solution over writing individual passes for specific constant folding patterns or implementing it specially in the loader.

I think it's a bit more specialized than just constant folding -- if I understand correctly, Reshape->Tile->Add should become BatchedAdd. (And honestly the whole thing should probably just be "FC" :) ).

Oh, whoops -- I assumed when you said Add above you meant BatchedAdd. Is the FC being broken down into MatMul->Add before it gets to the Caffe2ModelLoader? Otherwise I'm not sure why this would happen, as it should be created as a FullyConnectedNode via the "FC" case in Caffe2ModelLoader.

If the FC is being broken down before the Caffe2ModelLoader gets it, then yeah this seems like a specialized pattern to write a graph optimization pass for (Reshape->Tile->Add replaced by BatchedAdd).

If we are to process C2 models through ONNXIFI we'd not have this problem and it seems that we can do this now.

Do we really need to address and fix this?

I think with C2 models passed over ONNXIFI this problem would just go away. The problem is the C2->ONNX conversion, which creates a bad node sequence.

I think with C2 models passed over ONNXIFI this problem would just go away. The problem is the C2->ONNX conversion, which creates a bad node sequence.

@bertmaher
The c2 loader convert the c2 fc to ir fc:
https://github.com/pytorch/glow/blob/2793672d9368f47690208acc3c8933944136cef9/lib/Importer/Caffe2ModelLoader.cpp#L621
Is it the problem that causes c2 fc suboptimal?

Is it the problem that causes c2 fc suboptimal?

Actual problem as Bert mentioned above is that if ONNXIFI uses onnx format to pass model down to Glow there is an intermediate conversion from C2 FC to something that can be represented via ONNX (ONNX does not have FC). This introduces a lot of unnecessary nodes.

I am looking at this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jackm321 picture jackm321  路  3Comments

dati91 picture dati91  路  3Comments

mciprian13 picture mciprian13  路  3Comments

opti-mix picture opti-mix  路  4Comments

mciprian13 picture mciprian13  路  4Comments