Tvm: TVM and HLO/XLA

Created on 22 May 2017  路  16Comments  路  Source: apache/tvm

Any quick overview on the differences?

Most helpful comment

Here is what a deep learning system stack would look like in nowday.

  • 1 Build operator level graph description language

    • Name whatever dl frameworks you care about

  • 2 Tensor primitive level graph description lanugage

    • NNVM, HLO, NGraph

    • It is close enough to the first one that you can also build graph optimization on first layer and bypass this layer

  • 3 DSL for description and codegen.
  • 4 Hardcoded optimized kernel library like nnpack, cudnn, libdnn
  • 5 Device dependent library

Most libraries goes with 1 -> 4. An easy and restrictive path for compilation and fusion is going from 2 -> 4/5, by manually code up fused kernels, or have rules to generate certain fused kernels. TVM sits on level 3, to make jump from level 2 to level 5 easier and give user more control.

In terms of design philosophy, we want to make it work together with existing ecosystem. This include

  • Friendly frontend that can be directly used for kernel generation
  • Give framework full control of memory allocation, graph execution, data layout etc.
  • Generate DLPack compatible kernels that every framework can directly take.
  • Make use of blackbox calls like cudnn when user says so.

I think we can expect all approaches in the stack will continue to exist. We just design a layer in 3 that can incrementally transit toward automation while still being able to transparently benefit from things in 4.

All 16 comments

They are orthogonal.

  • XLA is more high level, like NNVM, developer of XLA need to define codegen and loop transformation rules(like writing kernel) for each operator, on how to generate kernels, and the system stitches the kernel for you
  • TVM is one level below, provide common low level primitives for describing the computation, as well as the loop transformation rules, and allow user to do these, you can use these to implement something like XLA(by using NNVM or high level graph description), or simply directly bypass the high level description layer and directly use it in framework

What will be the role of Fabian libdnn and Fair sponsored NNPACK in this?

both libdnn and nnpack are different, they can maybe be used as blackbox calls. (NNPACK is not FAIR sponsored, it's just continued research/dev after FAIR)

What is the goal here? Rewrite new kernels?

write kernels in a new language that can be retargeted to multiple backends with great perf.
folks can build languages or collectives to write kernels on top of TVM.

see the matrix-multiply or persistent-rnn examples, maybe?

@soumith I thought that investing FAIR work hours on NNPACK was like sponsoring. But it is ok if you meant that is not officially sponsored by FAIR

yes, we did not sponsor a grant and say: give us NNPACK.

Yes ok.. so what I meant is that we would try to superseed libdnn and NNPACK at some point if we will share this DSL kernels

yes, slowly and incrementally we can try move the value into TVM backend. Will happen over time. There's some systems research needed to be done before we get there as well, so there's a little bit of uncertainty too.

Yes of course I was just talking about the "great design"

So are you trying to do what TF team didn't want to do?

@soumith with collectives you mean different frameworks (like the ones we represent) sharing kernel codes?

Here is what a deep learning system stack would look like in nowday.

  • 1 Build operator level graph description language

    • Name whatever dl frameworks you care about

  • 2 Tensor primitive level graph description lanugage

    • NNVM, HLO, NGraph

    • It is close enough to the first one that you can also build graph optimization on first layer and bypass this layer

  • 3 DSL for description and codegen.
  • 4 Hardcoded optimized kernel library like nnpack, cudnn, libdnn
  • 5 Device dependent library

Most libraries goes with 1 -> 4. An easy and restrictive path for compilation and fusion is going from 2 -> 4/5, by manually code up fused kernels, or have rules to generate certain fused kernels. TVM sits on level 3, to make jump from level 2 to level 5 easier and give user more control.

In terms of design philosophy, we want to make it work together with existing ecosystem. This include

  • Friendly frontend that can be directly used for kernel generation
  • Give framework full control of memory allocation, graph execution, data layout etc.
  • Generate DLPack compatible kernels that every framework can directly take.
  • Make use of blackbox calls like cudnn when user says so.

I think we can expect all approaches in the stack will continue to exist. We just design a layer in 3 that can incrementally transit toward automation while still being able to transparently benefit from things in 4.

Can we put some of this info in a file so that we can close it?

Was this page helpful?
0 / 5 - 0 ratings