Glow: Quantizing/profiling after lowering and DepthWise convolution

Created on 20 Feb 2019  路  3Comments  路  Source: pytorch/glow

After #2311 now everything is lowered always. This becomes and issue with MobileNet which has a few depthwise convolutions. In ONNX/GLOW they are represented as regular convolutions with groupSize > 1. In at least one case group size is 1024.

In Lower the lowerGroupConvolutionNode breaks these convolutions in to multiple convolutions with group size 1. This leads to explosion of nodes.
This leads to
1) Drastic increase in compile time
2) Failure in IRGen in uniqueName because it fails to find unique name since it's limited by arbitrary 10k limit.

Is lowerGroupConvolutionNode needed? Seems like CPU backend supports groupSize > 1.

Most helpful comment

Yeah seems like adding control that's not specifically tied to any backend implementation as @rdzhabarov proposed would be a good idea. This way profiling can still be done with generic CPU implementation, but what is lowered and what isn't would closer reflect whatever backend would use the profile.

All 3 comments

Related issue: https://github.com/pytorch/glow/issues/2395.

For this specific case, we should explore supplying nodes which must not be lowered as part of the Loader CLI arguments. Currently it's up to backend whether shouldLower or not, but in this case it should be controlled at a higher level.

Is lowerGroupConvolutionNode needed? Seems like CPU backend supports groupSize > 1.

It's currently not needed for any of our open source backends, however other backends might not support group convolutions. Perhaps we could remove the call into lowerGroupConvolutionNode() from lower(), and allow a backend to call it during transformPostLowering(). However this means we would not be able to get fine grained profiles for each convolution of the group, though this was already the case prior to https://github.com/pytorch/glow/pull/2311 anyway.

The benefit of @rdzhabarov's suggestion is that we could still allow the profile to be generated for each convolution of a group if desired. I'd prefer that here -- it'd be a simple CLI flag to specify a list of Node kinds, which would then be passed to lower().

Yeah seems like adding control that's not specifically tied to any backend implementation as @rdzhabarov proposed would be a good idea. This way profiling can still be done with generic CPU implementation, but what is lowered and what isn't would closer reflect whatever backend would use the profile.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jackm321 picture jackm321  路  3Comments

dati91 picture dati91  路  3Comments

artemrakhov-glow picture artemrakhov-glow  路  4Comments

mciprian13 picture mciprian13  路  4Comments

pjaaskel picture pjaaskel  路  4Comments