After #2311 now everything is lowered always. This becomes and issue with MobileNet which has a few depthwise convolutions. In ONNX/GLOW they are represented as regular convolutions with groupSize > 1. In at least one case group size is 1024.
In Lower the lowerGroupConvolutionNode breaks these convolutions in to multiple convolutions with group size 1. This leads to explosion of nodes.
This leads to
1) Drastic increase in compile time
2) Failure in IRGen in uniqueName because it fails to find unique name since it's limited by arbitrary 10k limit.
Is lowerGroupConvolutionNode needed? Seems like CPU backend supports groupSize > 1.
Related issue: https://github.com/pytorch/glow/issues/2395.
For this specific case, we should explore supplying nodes which must not be lowered as part of the Loader CLI arguments. Currently it's up to backend whether shouldLower or not, but in this case it should be controlled at a higher level.
Is lowerGroupConvolutionNode needed? Seems like CPU backend supports groupSize > 1.
It's currently not needed for any of our open source backends, however other backends might not support group convolutions. Perhaps we could remove the call into lowerGroupConvolutionNode() from lower(), and allow a backend to call it during transformPostLowering(). However this means we would not be able to get fine grained profiles for each convolution of the group, though this was already the case prior to https://github.com/pytorch/glow/pull/2311 anyway.
The benefit of @rdzhabarov's suggestion is that we could still allow the profile to be generated for each convolution of a group if desired. I'd prefer that here -- it'd be a simple CLI flag to specify a list of Node kinds, which would then be passed to lower().
Yeah seems like adding control that's not specifically tied to any backend implementation as @rdzhabarov proposed would be a good idea. This way profiling can still be done with generic CPU implementation, but what is lowered and what isn't would closer reflect whatever backend would use the profile.
Most helpful comment
Yeah seems like adding control that's not specifically tied to any backend implementation as @rdzhabarov proposed would be a good idea. This way profiling can still be done with generic CPU implementation, but what is lowered and what isn't would closer reflect whatever backend would use the profile.