Glow: Perform slice(constant1) -> constant2 transformation at compile time

Created on 12 Dec 2018 · 16Comments · Source: pytorch/glow

I've discovered a pattern slice(constant) in the AlexNet.

In principle, we could implement slice(constant1)->constant2 as a compile-time transformation in the GraphOptimizer, to save computations at run-time.

My only concern here is that it may potentially increase the amount of memory required for constants.

Opinions?

good first issue

Source

opti-mix

👍3

All 16 comments

Seems like a good idea.

My only concern here is that it may potentially increase the amount of memory required for constants.

Yeah, this is another one that might be dependent on the backend's preferences, so would be great to have it be controlled by the backend (https://github.com/pytorch/glow/issues/1641).

This is also is related to constant propagation (https://github.com/pytorch/glow/issues/1469) -- I wonder if we could design this in a generic way, e.g. generate the Tensor to back the new Constant by calling into the kernels in the Interpreter.

jfix71 on 13 Dec 2018

I also think it makes sense to have it backend controlled as it is a tradeoff between performance and memory with a decision that can vary across targets. Also even a single target could have different memory configuration depending on the customer. In this case, the decision will depend on the memory constraints.

tlepley-cadence on 13 Dec 2018

👍1

I agree that in theory there could be a tradeoff here. But I am not sure if the pattern that exposes this tradeoff between memory and compute is a common pattern. What do you say that we start with a simple peephole optimization (instead of target-specific hook) instead?

nadavrot on 14 Dec 2018

@nadavrot Could you elaborate how peephole optimizations would solve this tradeoff issue?

If I understand correctly, we would still be bound by memory constraints depending on the hardware.

I agree with @jfix71 on how constant folding is more relevant here and the broader solution.

I would like to contribute to these issues. I am starting new here. Would you be willing to give me starting pointers?

SumedhArani on 18 Jan 2019

@SumedhArani I don't have a solution to the tradeoff issue. I suggest implementing one solution (just do the peephole) and give it a try. If we figure out that we do have memory issues in real workloads then we can iterate and improve things. There is a difference between what is a possible workload, and what is likely. Maybe in practice one optimization strategy is always superior? I suggest that we implement the simple peephole optimization and give it a try and evaluate how it performs on the workloads that we are testing today. What do you think?

nadavrot on 18 Jan 2019

@nadavrot A feasibility analysis sounds good to me.

I will try implementing a simple peephole optimization. We can then decide the further roadmap for this depending on the results.

Are there any specific ideas that you already have in your mind regarding this implementation? Any suggestions are much valued. :)

SumedhArani on 18 Jan 2019

👍1

I suggest taking a look at the graph optimizer. The authors @jfix71 and @opti-mix can review the PR. Thank you for doing this work!

nadavrot on 18 Jan 2019

@opti-mix , @jfix71 :
I have written code for performing the compile-time optimization, as specified by @nadavrot . Where do I find the Alexnet model to test my code? (Should I submit a PR, and then continue to incrementally update it?)

Are there any other benchmarks where we can see this optimization put to use?

Also, at what stage should this optimization be called?

SumedhArani on 12 Feb 2019

👍1

@opti-mix @SumedhArani Thanks for doing this work. Yes, please submit a PR. You can test your code with our unit tests and with tests in the model zoo.

nadavrot on 12 Feb 2019

@SumedhArani Yes, please submit a PR.

Where do I find the Alexnet model to test my code?

The AlexNet model can be downloaded using utils/download_onnx_models.sh

For the PR, you can easily construct a small graph with the pattern in question, apply optimizations and check at the end that slice(constant1) is not present in the graph anymore.

Also, at what stage should this optimization be called.

It should be called as part of GraphOptimizer transformations.

opti-mix on 12 Feb 2019

Thank you @nadavrot and @opti-mix ! :)

I will test for the cases as suggested by @opti-mix and then submit the PR.

I have written the optimization as a part of the GraphOptimizer transformations. I wanted to ask at a finer level though. Would performing this optimization before a specific optimization give better results?

Thank you.

SumedhArani on 12 Feb 2019

@SumedhArani

I have written the optimization as a part of the GraphOptimizer transformations. I wanted to ask at a finer level though. Would performing this optimization before a specific optimization give better results?

Have a look at the optimizeSliceOfSplat. It does a very similar transformation, but for splats instead of constants. You probably want to create a similar function called optimizeSliceOfConstant and invoke it at the same place where optimizeSliceOfSplat is invoked.

You may also consider extending optimizeSliceOfSplat to handle both cases. If you go this way, you may want to rename it into something like optimizeSliceOfConstantPayload...

opti-mix on 12 Feb 2019

I did exactly that. I looked at optimizeSliceOfSplat and wrote a very similar function that handled the required case. I call this new optimizeSliceOfConstant, just after optimizeSliceOfSplat.

So I think I am headed in the right direction. I will submit a PR, as soon I have tested the code.

SumedhArani on 13 Feb 2019

👍1

@SumedhArani are you still interested in working on this? Can we salvage your PR?

nickgg on 30 Apr 2019

@opti-mix I am looking through starter issues and commit logs, I see you recently merged a commit implementing general constant folding? I assume it takes care of this / issue should be closed?