Tvm: TVM v0.5 Roadmap

Created on 13 Aug 2018  路  27Comments  路  Source: apache/tvm

This roadmap for TVM v0.5. TVM is a community-driven project and we love your feedback and proposals on where we should be heading. Please open up discussion in the discussion forum as well as bring RFCs.

  • Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).
  • Please also check out the help wanted list in the github issues on things that need help

Features

  • Fully featured 8-bit network support

    • [x] 8bit quantizer

    • [x] arbibtary bits quantization algorithm

    • [x] ARM support

    • [x] Intel cpu support

  • NVidia GPU 8-bit kernel

    • [x] int8 gemm recipe

    • [x] int8 conv2d

    • [x] autotvm integration

  • Automated tuning and scheduling

    • [x] AutoTVM optimizations for mobile GPUs

    • [x] AutoTVM optimizations for CUDA

    • [x] AutoTVM for x86

    • [ ] graph level automated optimization

  • Ultra low-bit support

    • [ ] tutorials of low-bit ops

    • [ ] customized accelerator support

  • VTA enhancements

    • [ ] support generic high level models

    • [ ] Enhanced operator/model coverage

    • [ ] Ultra-96, ZCU102 support

    • [ ] Amazon F1 preliminary support

    • [ ] Low-bit support, bit serial support

    • [ ] Chisel version

  • High level IR improvements

    • [x] A more coupled design with tvm runtime system

    • [x] support control flows

    • [x] Type system support

  • Runtime

    • [x] Hetrogenuous runtime

  • Micro-asm kernel exploration

    • [ ] Core micro-asm primitives for certain ops

  • Hybrid python programming model

    • [ ] transition of vision operators to hybrid mode.

  • RPC and Device API

    • [ ] Support a c++ version of cross platform RPC

  • Security

    • [x] tutorials on how to use SGX backend

  • Tutorials and docs

    • [x] How to write a pass in python

    • [x] General lowering flow of TVM

  • Language runtime

    • [x] Golang runtime

    • Rust support



      • [x] rust runtime


      • [x] rust frontend



roadmap

Most helpful comment

@tqchen from TVM perspective, any comments on ONNXIFI? I'm thinking about how TVM stack can fit into it.

All 27 comments

Shall we add heterogeneous graph runtime? @zhiics is working on that.

I am interested in implementing the Intel CPU support for INT8 quantization

I'm interested in implementing the RUST runtime.

@tqchen @siju-samuel My Rust runtime (dylib) support which follows the same generic API as Java for example (CPU, GPU, etc.) is 70%-ish done! I'll need to finish the callback support, add docs and cleanup. Any contributions is welcomed!

@nhynes Rust static support is in a good shape as well but is specific to CPU with custom allocator etc.

@ehsanmok OK
Anyone doing "Support a c++ version of cross platform RPC"? If not, I'm interested in taking up this.

@tqchen I have started working 8 bit quantizer and its operator support for conv2d, dense and relu. To avoid duplicate work pls let me know if anyone else is doing this work.

PR for static Rust runtime in https://github.com/dmlc/tvm/issues/1597.

@ehsanmok I'm not sure what you mean by "custom allocator etc." It uses whatever GlobalAlloc you care to use.

@nhynes I meant you've defined your own allocator, threading, parallel backend support for CPU usage only for staticlib compiling with xargo while I've taken different route relying on existing layeouts for example and seems working for GPU. Though I admit I've done the project for my own enrichment first.

@PariksheetPinjari909 the UW SAML team is working on a generic n-bit quantizer and hopefully things will get RFCed and upstreamed in this release cycle

Please feel free to open new issues to track the working items, @siju-samuel standalone RPC is tracked by https://github.com/dmlc/tvm/issues/1496

The first post contains an initial list of things based on the community feedback, please also feel free to propose new things and we will add it to the roadmap

Will the new graph runtime make it into this release? I'd love to upstream some training codes, but they all depend on the semi-kluge FExpandCompute.

@nhynes it belongs to the "high-level IR improvements"

@tqchen Ok. Let me know what support i can give in 8 bit quantization. I am interested to contribute here.

I would like to take up the control flow ops. Let me know if someone is working on that.

@PariksheetPinjari909 We will make a major RFC to upgrade the IR system including control flow ops and type system, and after the first phase proposal is done, everyone is welcomed to contribute

Sorry for being late. I鈥檇 like to add preliminary support for HLS shecudler to allow compiling actual neural networks with AOCL and SDAccel backends.

@tqchen from TVM perspective, any comments on ONNXIFI? I'm thinking about how TVM stack can fit into it.

Re microkernels/tensorization, I've been looking at that stuff the last few months or so. There's some WIP stuff in https://github.com/ajtulloch/tvm/tree/tvm-using-val/tensorize, notably well-tuned assembly versions of:

  • FP32 GEMM kernels (ARMv7, AVX2)
  • Int8 x Int8 -> Int32 GEMM kernels (AVX2, adding ARMv7 shortly)

My hypothesis is that we can get a pretty decent part of the way with just GEMM microkernels for a lot of these dense workloads, but it's to-be-tested currently.

Some examples of using them in GEMM-based convs and for the batch gemm of a minimal F(6x6, 3x3) Winograd (~2-3x faster than current trunk on most configurations for ARMv7) are in that dir as well. For folks interested in the "Micro-asm kernel exploration" and "8-bit network stuff" (esp on CPUs), it'd be good to collaborate :).

@ajtulloch I am working on Intel 8-bit Conv implementation using Intel Skylake AVX512 instructions (with the long-term goal of using VNNI instructions). I am not using GEMM-based convolution though. I am starting from NCHWc format direct convolution present in current conv2d topi implementation. I should have some numbers for the conv operator by the next weekend and can share them.

@ajtulloch It will be great if you can send a tutorial or topi recipe

@anijain2305 you might find https://github.com/ajtulloch/tvm/blob/tvm-using-val/tensorize/gemm__avx2.c#L424-L531 or a similar microkernel for AVX512 useful on Skylake (same as MKL-DNN's vpmaddubsw/vpmaddwd/vpaddd sequence on AVX2/AVX512 pre VNNI).

@merrymercy what would be useful to have documented/tutorialized or made into a recipe?

I think making a simple runnable conv2d example and showing its speedup will be very useful.

+1 to one conv2d runnable example. Besides ARMv7 / AVX2, I think we should also add SSE too. For some embbeding platforms, which would use Intel ATOM processors. However, Intel ATOM processors only support SSE4.2 at most, not AVX2.

0.5 release note candidate is now up at #2448

v0.5 is now tagged, next cycle roadmap issue is available at https://github.com/dmlc/tvm/issues/2623

Was this page helpful?
0 / 5 - 0 ratings