Tvm: TVM 0.4 Release Note

Created on 9 Aug 2018 · 9Comments · Source: apache/tvm

This release features several major improvements. The high-level graph optimizer is now part of TVM repo. Some of the highlights are: Initial support of AutoTVM for automated optimization; customized accelerator backend VTA. Please also check out tvm.ai for latest blogposts.

The community welcomes new reviewers @kazum @alex-weaver @masahi @zhreshold @PariksheetPinjari909 @srkreddy1238 @eqy, new code owner @merrymercy, and new committer @yzhliu

Change List

Tensor Expression and Optimization

Tensor operator primitives
- Introduce attrs field to operator primitives(e.g. compute) to store additional metadata, the attrs can be used as hint for scheduling
Enable embedding of asm micro-kernels
Hybrid python programming model
- python AST based IR builder interface
- support GPU programs
AutoTVM, Automated tuning, and scheduling
- basic autotvm infra
- GPU IR verifier
- basic autotuning tutorial
- topi integration
ARM support
- winograd support
- initial support of ARM autotuning records
TOPI Vision
- Generic GPU sort support(useful for vision)
- SSD operator support
TOPI numpy consistency
- Rename all binary operators for numpy consistecy: broadcast_add-> add, broadcast_sub -> substract, broadcast_mul -> multiply, broadcast_div->divide
- New operators: slice, LRN, equal, not_equal, less, greater
- tutorials on topi
Initial low-bit operator support support
- Optimized popcount generation on ARM
- general bit-serial convolution and GEMM
- optimized low bit kernels
- parallel optimization
New topi backend optimization for intel graphics
Adapt AVX schedules for SSE target

Backend

VTA: customized accelerator backend
- custom hardware backend example
- tutorials on how to use customized accelerator
Initial experimental support for HLS backend
Bugfix in SPIRV code generator for vulkan
libdevice support, enable NVPTX backend

Runtime

Introduce NDArrayContainer for managed NDarray
RPC and Device API
- Support communication between big/small endian machines.
- RPC and device API protocol upgrade (this is a non-backward compatible change) to support big-small endian communication. This is a non-backward compatible change, need to use the latest version of TVM runtime with the RPC
- graduate rpc from contrib, tvm.contrib.rpc->tvm.rpc
  
  -Support tracker in Android RPC, add fault tolerance for AutoTVM
BIG.LITTLE aware threadpool
tvm4j graph runtime that runs end to end workload in java
DLPack support
- Support from_dlpack and to_dlpack
- Enables bridges to pytorch
Enable link of stackvm in runtime

NNVM

Tensorflow graphdef frontend
Keras frontend
- improved to support reuse layers, add activations
ONNX
- gather, LRN
CoreML frontend
- Support C-RNN and activation functions
Fix grads for sum and expand_like
Enhanced operator fusion for multiple elemwise branches
Separate nnvm fusion and compilation pass

Misc

Unified build system to cmake, customizable cmake path for vulkan, rocm, cuda

Contributors

See the complete list here. Thanks to all the contributors to contribute to this release.

Code reviewers

@yzhliu topi, tvm4j, nnvm
@kevinthesun nnvm
@Huyuwei topi operators
@tmoreau89 hardware backends
@comaniac fpga backends
@kazum nnvm, opencl backend, fpga
@nishi-t nnvm, opencl backend
@merrymercy topi, arm,
@vinx13 gpu backend
@masahi nnvm, topi
@eqy autotvm
@jroesch runtime
@PariksheetPinjari909 frontends, topi
@srkreddy1238 frontends, topi
@FrozenGene autotvm

Compiler

@alex-weaver vulkan
@were hybrid script mode
@nishi-t CUDA, fp16, int8 support
@ktabata intel FPGA support
@kazum xilinx fpga support
@cowanmeg arm optimized popcount
@tmoreau89 VTA customized accelerator

TOPI, graph optimization

@merrymercy AutoTVM
@yzhliu tvm4j graph runtime, x86
@Laurawly intel graphics
@abergeron conda build fix
@nhynes sgx random
@masahi topi, more robust op fusion
@kevinthesun vision ops
@grwlf argmax/min ops
@cowanmeg bit-serial operator
@ehsanmok topi tutorial
@zhiics refactor fusion and compilation into separate pass
@liangfu binary logical operators

Frontends

@srkreddy1238 tutorials for deployment, tensorflow frontend
@siju-samuel coreml, tf frontend
@PariksheetPinjari909 nnvm, slice
@kazum keras
@nishi-t mxnet, nnvm

Deploy

@eqy rpc, thread runtime
@dayanandasiet android tutorials

roadmap

Source

tqchen

👍13

Most helpful comment

Thanks to everyone who have pushed to last release cycle in the past three months. We would like to propose the release of v0.4 on Aug 13th.

We encourage everyone in the community to put their weights to review and vote the release. @dmlc/tvm-team

Please reply this thread on

Things that we missed in the release note
Bugfixes that need to be included in this release

tqchen on 9 Aug 2018

❤4 👍1

All 9 comments

Thanks to everyone who have pushed to last release cycle in the past three months. We would like to propose the release of v0.4 on Aug 13th.

We encourage everyone in the community to put their weights to review and vote the release. @dmlc/tvm-team

Please reply this thread on

Things that we missed in the release note
Bugfixes that need to be included in this release

tqchen on 9 Aug 2018

❤4 👍1

Operator Fusion enhancement to nnvm is missing in the release note !

masahi on 9 Aug 2018

👍1

@masahi just added that

tqchen on 9 Aug 2018

👍1

@tqchen fusion now is a separate pass.

zhiics on 9 Aug 2018

@zhiics thanks for pointing this out, just added that to release note

tqchen on 9 Aug 2018

👍1

GraphRuntime support for tvm4j - E2E inference in Java!

yzhliu on 10 Aug 2018

👍1

broadcast operators like not_equal, greater_equal and less_equal is now supported in both nnvm and topi.

liangfu on 11 Aug 2018

👍1

v0.4 has been tagged https://github.com/dmlc/tvm/releases/tag/v0.4

tqchen on 13 Aug 2018

v0.5 roadmap is available at https://github.com/dmlc/tvm/issues/1596

tqchen on 13 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[DOCS] Neural network Deployment Guide with System Module Mode

tqchen · 3Comments

[RELAY][RFC] Modify repr to return a valid Python AST

jroesch · 5Comments

Upgrade AutoTensorCore as to a TIR Pass

tqchen · 6Comments

[WINDOWS][AutoTVM] OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted and OSError: [WinError 10049] The requested address is not valid in its context

Coderx7 · 5Comments

Add tvm.contrib.sort.argsort to iOS RPC

KayneWest · 6Comments