When I was running some NN models with GLOW, I noticed that final (after all optimizations) graph contains the following sequence:
Convolution -> LRN -> RescaleQuantized -> MaxPool -> Convolution -> ...
To get rid of RescaleQuantized one can improve sinkRescaleQuantized pass the following way:
1) Allow Rescale to sink down with MaxPool/AvgPool
2) Combine Rescale down with Convolution
I made experimental change (4aa6449770d94be247175972c0f40ce6e6f42486) and it helped.
To me, both of these transformations seem legal, but I might be missing something. Can anyone confirm that they are OK?
If they are, I'll add some tests and split the changes in two pull request.
Thanks for doing this work @aturetsk. @rdzhabarov @qcolombet What do you think?
These transformations seem useful to have. MaxPool itself does not change the quantization params of input/output so it should be safe to sink Rescale down the node. The reason it's not implemented as we've not seen the use case before.
Another consideration would be to merge rescale up to the LRN node. We do not really support quantized LRN as of now, but since you were planning to add support for quantized LRN, would this be possible as well? @aturetsk
Another consideration would be to merge rescale up to the LRN node.
I think combining Rescale up with LRN will work as well. But...
I find this code a bit weird:
while (sinkRescaleQuantizedNode(F)) { // sink + combine down
DCE(F);
optimizeQuantization(F); // combine up + other optimizations
}
I wonder if it really necessary to have "combine up" rescale optimization at all, taking into account that we use rescale "sink" strategy. Because moving rescale down kind of contradicts our attempts to combine them up. If I'm not missing something, sinking rescale and combining it down with nodes should be enough to get rid of all unnecessary rescales. Do you have any use case where it's not so?
Another idea about rescale combination/sinking:
Do you think ClassGen can be used to generate sinkRescaleQuantizedNode/optimizeQuantization (add a special attribute to every node describing which strategy to chose to optimize rescales, e.g. sink, combine or custom)?
This would help to:
1) Avoid extending sinkRescaleQuantizedNode/optimizeQuantization manually for every node for trivial cases (sink and combine). Only complicated cases ('custom', e.g. for concat node, where we need to ensure all inputs have the same scale/offset) will be added manually.
2) Be able to cover backend-specific nodes in rescale optimizations as well. Backend may replace some nodes with backend specific versions and create some rescales (e.g. CPUMaxSplat). Right now, they won't be optimized, because sinkRescaleQuantizedNode/optimizeQuantization work only with generic nodes.
What do you think?
I wonder if it really necessary to have "combine up" rescale optimization at all, taking into account that we use rescale "sink" strategy. Because moving rescale down kind of contradicts our attempts to combine them up. If I'm not missing something, sinking rescale and combining it down with nodes should be enough to get rid of all unnecessary rescales. Do you have any use case where it's not so?
One case is when you cannot sink anything in the network, but you can immediately merge Rescale up. Note, we apply optimizeQuantization at the very beginning.
// Optimize things that are related to quantization.
optimizeQuantization(F);
while (sinkRescaleQuantizedNode(F)) {
DCE(F);
optimizeQuantization(F);
}
Do you think ClassGen can be used to generate sinkRescaleQuantizedNode/optimizeQuantization (add a special attribute to every node describing which strategy to chose to optimize rescales, e.g. sink, combine or custom)?
I think it's a nice idea, but probably optimization opportunities should not be built into a node itself.
Rescale, transpose and other things could bloat nodes.
@rdzhabarov: Another consideration would be to merge rescale up to the LRN node.
@aturetsk: I think combining Rescale up with LRN will work as well. But...
Looking at it closely, LRN requires output type to be quantized with the same scale/offset as input. Which means that we can't combine rescale with it.
So I'll stick to the original plan: sink rescale through pooling and combine down with convolution.
One more possible optimization to consider: sinking rescale through LRN, though I don't have a use case for this at the moment.
I think it's a nice idea, but probably optimization opportunities should not be built into a node itself.
Rescale, transpose and other things could bloat nodes.
Understood. But I'm still a bit concerned about the fact that backend-specific nodes do not participate in rescale optimizations (unless backend itself implements similar passes).
Probably providing interfaces to backend to extend sinkRescaleQuantizedNode and optimizeQuantization with sinking/combining logic for backend-specific nodes might be a good idea...
Anyway, these are just thoughts, as I don't have a use case for this at the moment.
Close the issue, as the related PRs were merged.