Tvm: [RFC][PASS][RUNTIME] Enable Slice on LHS during concat

Created on 5 Apr 2019 · 10Comments · Source: apache/tvm

This was mentioned in https://github.com/dmlc/tvm/issues/2122#issuecomment-439581161; I also asked about it here a while back.

Basically I'd like to implement a compiler pass that replaces concat operations with simple slicing operations, where:

out[0:1] = concat(f1(), f2())

is replaced with

out[0] = f1()
out[1] = f2()

This can provide large improvements in some cases, e.g. DenseNet, which goes from having O(n^2) memory copies to having O(n) memory copies. It also reduces memory requirements in low-memory deployment scenarios. It can be implemented by partially specializing compute kernels over data size.

I'd like to implement this pass (partly just to get more used to working inside TVM), but I'm not exactly sure where in the stack this should go. It's somewhat backend-dependent, and AFAICT Relay can't really represent slicing in this way (although I might be wrong). Maybe a pass can be added that annotates data-movement operations, and backends can consume those annotations?

A place to start might be just implementing this in the graph runtime. If someone could provide some guidance that would be helpful ( @tqchen ?)

RFC

Source

kazimuth

Most helpful comment

What I mean is that runtime need to be aware of the memory layout and provide out[slice] = f(inputs). Another possible "obstacle" is that TVM's compute kernel requires the buffer to be somewhat aligned, and we need to generate a special kernel for out[slice] = f(inputs), with a known offset(so we still benefit from good alignment). This is necessary for OpenCL

tqchen on 6 Apr 2019

👍3

All 10 comments

@kazimuth Probably you could write a Relay pass. Should be easy if it is intraprocedural.

junrushao1994 on 5 Apr 2019

AFAICT Relay can't really represent slicing in this way (although I might be wrong).

It's true. But I guess the more principled approach would be "let relay support such thing"...

junrushao1994 on 5 Apr 2019

😄1

Does it help to add a pass to analyze “buffers that should be allocated continuously”, and then hint the runtime to do this?

junrushao1994 on 5 Apr 2019

Does it help to add a pass to analyze “buffers that should be allocated continuously”, and then hint the runtime to do this?

I think this makes sense. How do I hint the runtime though? Should I just add a new Relay node / a field to one of the node types, or is there a more principled way to annotate a Relay AST?

Should be easy if it is intraprocedural.

It will definitely be to start. It's the sort of analysis that's easy to start small with and expand later.

kazimuth on 5 Apr 2019

We will likely need changes in both compiler and runtime.
What possibly need to happen is to have a special annotation in the relay IR and do special codegen in the runtime. It also requires us to enhance the runtime semantics, to allow specification of slice.

tqchen on 6 Apr 2019

@tqchen Didn't get it why we need runtime codegen. In the case of intraprocedural opt, it would be fine to just hint the runtime to alloc contiguous memory and then skip concat operator

junrushao1994 on 6 Apr 2019

tqchen on 6 Apr 2019

👍3

Shouldn't title be "Slice on LHS"?

Also @tqchen what's a good way to add that annotation?

kazimuth on 6 Apr 2019

👍1

I have not yet put very deep thoughts how can we generate annotation, this is something that worth some thoughts and discussion

tqchen on 6 Apr 2019

👍1

Once you have liveness analysis, you could detect when some vars die on an instruction, while another var is created. Let call it substitution. To optimize concat, we could detect substitution on concat instruction, and extend the liveness of vars who previously die on this instruction.

junrushao1994 on 8 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings