I've discovered a pattern slice(constant) in the AlexNet.
In principle, we could implement slice(constant1)->constant2 as a compile-time transformation in the GraphOptimizer, to save computations at run-time.
My only concern here is that it may potentially increase the amount of memory required for constants.
Opinions?
Seems like a good idea.
My only concern here is that it may potentially increase the amount of memory required for constants.
Yeah, this is another one that might be dependent on the backend's preferences, so would be great to have it be controlled by the backend (https://github.com/pytorch/glow/issues/1641).
This is also is related to constant propagation (https://github.com/pytorch/glow/issues/1469) -- I wonder if we could design this in a generic way, e.g. generate the Tensor to back the new Constant by calling into the kernels in the Interpreter.
I also think it makes sense to have it backend controlled as it is a tradeoff between performance and memory with a decision that can vary across targets. Also even a single target could have different memory configuration depending on the customer. In this case, the decision will depend on the memory constraints.
I agree that in theory there could be a tradeoff here. But I am not sure if the pattern that exposes this tradeoff between memory and compute is a common pattern. What do you say that we start with a simple peephole optimization (instead of target-specific hook) instead?
@nadavrot Could you elaborate how peephole optimizations would solve this tradeoff issue?
If I understand correctly, we would still be bound by memory constraints depending on the hardware.
I agree with @jfix71 on how constant folding is more relevant here and the broader solution.
I would like to contribute to these issues. I am starting new here. Would you be willing to give me starting pointers?
@SumedhArani I don't have a solution to the tradeoff issue. I suggest implementing one solution (just do the peephole) and give it a try. If we figure out that we do have memory issues in real workloads then we can iterate and improve things. There is a difference between what is a possible workload, and what is likely. Maybe in practice one optimization strategy is always superior? I suggest that we implement the simple peephole optimization and give it a try and evaluate how it performs on the workloads that we are testing today. What do you think?
@nadavrot A feasibility analysis sounds good to me.
I will try implementing a simple peephole optimization. We can then decide the further roadmap for this depending on the results.
Are there any specific ideas that you already have in your mind regarding this implementation? Any suggestions are much valued. :)
I suggest taking a look at the graph optimizer. The authors @jfix71 and @opti-mix can review the PR. Thank you for doing this work!
@opti-mix , @jfix71 :
I have written code for performing the compile-time optimization, as specified by @nadavrot . Where do I find the Alexnet model to test my code? (Should I submit a PR, and then continue to incrementally update it?)
Are there any other benchmarks where we can see this optimization put to use?
Also, at what stage should this optimization be called?
@opti-mix @SumedhArani Thanks for doing this work. Yes, please submit a PR. You can test your code with our unit tests and with tests in the model zoo.
@SumedhArani Yes, please submit a PR.
Where do I find the Alexnet model to test my code?
The AlexNet model can be downloaded using utils/download_onnx_models.sh
For the PR, you can easily construct a small graph with the pattern in question, apply optimizations and check at the end that slice(constant1) is not present in the graph anymore.
Also, at what stage should this optimization be called.
It should be called as part of GraphOptimizer transformations.
Thank you @nadavrot and @opti-mix ! :)
I will test for the cases as suggested by @opti-mix and then submit the PR.
I have written the optimization as a part of the GraphOptimizer transformations. I wanted to ask at a finer level though. Would performing this optimization before a specific optimization give better results?
Thank you.
@SumedhArani
I have written the optimization as a part of the GraphOptimizer transformations. I wanted to ask at a finer level though. Would performing this optimization before a specific optimization give better results?
Have a look at the optimizeSliceOfSplat. It does a very similar transformation, but for splats instead of constants. You probably want to create a similar function called optimizeSliceOfConstant and invoke it at the same place where optimizeSliceOfSplat is invoked.
You may also consider extending optimizeSliceOfSplat to handle both cases. If you go this way, you may want to rename it into something like optimizeSliceOfConstantPayload...
I did exactly that. I looked at optimizeSliceOfSplat and wrote a very similar function that handled the required case. I call this new optimizeSliceOfConstant, just after optimizeSliceOfSplat.
So I think I am headed in the right direction. I will submit a PR, as soon I have tested the code.
@SumedhArani are you still interested in working on this? Can we salvage your PR?
@opti-mix I am looking through starter issues and commit logs, I see you recently merged a commit implementing general constant folding? I assume it takes care of this / issue should be closed?
@shajrawi Yes, you are right! This is taken care of by #3161. Closing.