This is more of a wishlist-type idea, but here goes: other compilers I've seen have some "optimization properties" on a per-instruction basis that let the optimizer work in terms of those flags, rather than explicitly checking node types everywhere.
I've noticed one concrete use case (there may be others), which is sinking transposes. Many different node types allow transpose to sink past them: BatchNorm, ReLU, Sigmoid, etc. basically just grep GraphOptimizer.cpp for "sink transpose". We could have a node flag that says "can sink transpose". (I haven't thought through details -- maybe each node type really is a special snowflake).
This would've been useful to me recently when adding a new backend-specific node type that would benefit from transpose sinking.
I've noticed this for Max and RELU. Most optimizations that apply to one apply to the other. Right now we essentially need to replicate all of those optimizations for both of them.
I like this suggestion. I think that the interesting property here is that the operators that you mentioned operate per-element, regardless of the layout of the tensor. We already have a similar flag at the instruction level and we use it for stacking.
@bertmaher Do you want to have such a flag per instruction kind or per instruction instance?
I was thinking it would be per instruction kind (at least that's what makes sense for the transpose one). There could be other properties that make more sense per instance, though, but I don't have examples.
I think that this could be a good task for people who are new to the project.
It's not only about transpose sinking but also about ReLU sinking below Concat. For the backends that enjoy fusing ReLU with preceding Convolution, sinking ReLU behind Concat is really bad.
Most helpful comment
I was thinking it would be per instruction kind (at least that's what makes sense for the transpose one). There could be other properties that make more sense per instance, though, but I don't have examples.