Tvm: TVM and HLO/XLA

Created on 22 May 2017 · 16Comments · Source: apache/tvm

Any quick overview on the differences?

Source

bhack

Most helpful comment

Here is what a deep learning system stack would look like in nowday.

1 Build operator level graph description language
- Name whatever dl frameworks you care about
2 Tensor primitive level graph description lanugage
- NNVM, HLO, NGraph
- It is close enough to the first one that you can also build graph optimization on first layer and bypass this layer
3 DSL for description and codegen.
4 Hardcoded optimized kernel library like nnpack, cudnn, libdnn
5 Device dependent library

Most libraries goes with 1 -> 4. An easy and restrictive path for compilation and fusion is going from 2 -> 4/5, by manually code up fused kernels, or have rules to generate certain fused kernels. TVM sits on level 3, to make jump from level 2 to level 5 easier and give user more control.

In terms of design philosophy, we want to make it work together with existing ecosystem. This include

Friendly frontend that can be directly used for kernel generation
Give framework full control of memory allocation, graph execution, data layout etc.
Generate DLPack compatible kernels that every framework can directly take.
Make use of blackbox calls like cudnn when user says so.

I think we can expect all approaches in the stack will continue to exist. We just design a layer in 3 that can incrementally transit toward automation while still being able to transparently benefit from things in 4.

tqchen on 22 May 2017

👍4

All 16 comments

They are orthogonal.

XLA is more high level, like NNVM, developer of XLA need to define codegen and loop transformation rules(like writing kernel) for each operator, on how to generate kernels, and the system stitches the kernel for you
TVM is one level below, provide common low level primitives for describing the computation, as well as the loop transformation rules, and allow user to do these, you can use these to implement something like XLA(by using NNVM or high level graph description), or simply directly bypass the high level description layer and directly use it in framework

tqchen on 22 May 2017

What will be the role of Fabian libdnn and Fair sponsored NNPACK in this?

bhack on 22 May 2017

both libdnn and nnpack are different, they can maybe be used as blackbox calls. (NNPACK is not FAIR sponsored, it's just continued research/dev after FAIR)

soumith on 22 May 2017

What is the goal here? Rewrite new kernels?

bhack on 22 May 2017

write kernels in a new language that can be retargeted to multiple backends with great perf.
folks can build languages or collectives to write kernels on top of TVM.

soumith on 22 May 2017

see the matrix-multiply or persistent-rnn examples, maybe?

soumith on 22 May 2017

@soumith I thought that investing FAIR work hours on NNPACK was like sponsoring. But it is ok if you meant that is not officially sponsored by FAIR

bhack on 22 May 2017

yes, we did not sponsor a grant and say: give us NNPACK.

soumith on 22 May 2017

Yes ok.. so what I meant is that we would try to superseed libdnn and NNPACK at some point if we will share this DSL kernels

bhack on 22 May 2017

yes, slowly and incrementally we can try move the value into TVM backend. Will happen over time. There's some systems research needed to be done before we get there as well, so there's a little bit of uncertainty too.

soumith on 22 May 2017

Yes of course I was just talking about the "great design"

bhack on 22 May 2017

So are you trying to do what TF team didn't want to do?

bhack on 22 May 2017

@soumith with collectives you mean different frameworks (like the ones we represent) sharing kernel codes?

edgarriba on 22 May 2017

Here is what a deep learning system stack would look like in nowday.

1 Build operator level graph description language
- Name whatever dl frameworks you care about
2 Tensor primitive level graph description lanugage
- NNVM, HLO, NGraph
- It is close enough to the first one that you can also build graph optimization on first layer and bypass this layer
3 DSL for description and codegen.
4 Hardcoded optimized kernel library like nnpack, cudnn, libdnn
5 Device dependent library

In terms of design philosophy, we want to make it work together with existing ecosystem. This include

Friendly frontend that can be directly used for kernel generation
Give framework full control of memory allocation, graph execution, data layout etc.
Generate DLPack compatible kernels that every framework can directly take.
Make use of blackbox calls like cudnn when user says so.

tqchen on 22 May 2017

👍4

Can we put some of this info in a file so that we can close it?

bhack on 23 May 2017

Yes, let us have an FAQ file https://github.com/dmlc/tvm/blob/master/docs/faq.md

tqchen on 23 May 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Upgrade AutoTensorCore as to a TIR Pass

tqchen · 6Comments

[RELAY][RFC] Modify repr to return a valid Python AST

jroesch · 5Comments

Support Boundary Checking for Loop Dependent Iterators

yzh119 · 3Comments

[RFC][Relay][HalideIR] Automatically generate the AST

jroesch · 5Comments

[RFC][Relay] Dynamic Dimensions

jroesch · 6Comments