Tvm: [RFC][DEBUG]Support a debug framework for TVM Runtime

Created on 22 Jun 2018  ·  13Comments  ·  Source: apache/tvm

OBJECTIVE
Support a debugging tool for TVM's computation graphs which helps to access internal graph structures, ops, input and output values at TVM runtime.

In TVM's current computation-graph framework, computation after graph construction happens as part of Python function(graphruntime.run). Basic Python debugging tools such as pdb cannot be used to debug graphruntime.run because TVM's graph execution happens in the underlying C++ layer. C++ debugging tools such as gdb are not ideal either, because of their inability to recognise and organise the stack frames and variables in a way relevant to TVM's operations, tensors and other graph constructs.

Runtime debug will fulfil the below objectives.

  • Easy access enabling debug by setting a variable while creating graphruntime.
  • Inspection of runtime ops output values and node connections

TODOs

  • [ ] Show fused graph summary
  • [ ] Perform debug run and show node details including inputs & outputs tensors
  • [ ] Provide flexibility to run without debug
  • [ ] Call graph run-time n times from UI
  • [ ] Support check for NAN during computation and break
  • [ ] Support check for INF during computation and break
  • [ ] Support for step debug(debug step by step over the graph nodes)
  • [ ] Inject a specific graph node value as numpy array through CLI and re-run the dependent nodes explicitly
  • [ ] Inject a graph node value from dump file through CLI
  • [ ] Support dumping of node outputs to a file
  • [ ] Support comparison of node output with a dump output
  • [ ] Support profiler for performance debugging
  • [ ] Test framework for tvmdbg

Proposed API Changes
tvm.contrib.graph_runtime.create add a new Boolean flag debug to make the runtime debug-gable, this API will be exposed to user to enable or disable debug functionality.
In class GraphModule two members debug and dbgobj are added. debug flag will store whether the debug for this is enabled or not and dbgobj holds the object of debugruntime(including the ui framework)

tvm.contrib.graph_runtime.set_inputs is modified to pass the inputs data set from script to the debugruntime if the debug flag is enabled.

tvm.contrib.graph_runtime.run is modified to invoke the _debug_cli_run which will bring up the ncurses framework.
ncurses framework will wait for actual user-input for the run operation. once user gives the input, will invoke the runtime.GraphRuntime.DebugRun() in graph_runtime.cc if user select to run with debug. Otherwise usual runtime.GraphRuntime.Run() in graph_runtime.cc is invoked. 'DebugRun' can execute a specific node only if all the inputs are ready.
c_runtime_api.his modified to add new struct to hold the output information.

/*!
 * \brief A Device context for Tensor and operator.
 */
typedef struct {
  /*! \brief DL Tensor to collect the output. */
  DLTensor out_tensor;
  /*! \brief The timestamp of each output */
  int64_t time_stamp;
} TVMDbgTensor;

tvm.contrib.graph_runtime.set_debug_buffers this new api is introduced to collect the run output of each node. In GraphRuntime a new field std::vector<TVMDbgTensor*> debug_buffers_; is introduced to store the pointers of output buffers.

After each operation execution is completed runtime.GraphRuntime.DebugRun() the output is copied to the debug buffer and the outputs are dumped to a temporary directory. UI framework will read this outputs from the temporary directory and will show in the display.

tvm.contrib.graph_runtime.inject_value used to inject a node tensor value during the execution
Stepper functionality is supported to run each node by node.
Stepper will be invoked with 'invoke_stepper' from 'tvm.tools.debug.wrapper.ui_framework' based on the user run option.
invoke_stepper in tvm.tools.debug.wrapper.ui_wrapper create DebugStepper class (in tvm.tools.debug.ui.ui_stepper) for Stepper UI and handlers.
tvm.tools.debug.runtime.debug_runtime uses tvm.contrib.graph_runtime to create below stepper interfaces:

  • step: perform the step by step execution from the current node
  • goto: Specify the node to be executed next, step will continue from the this next node
  • inject_value: used to inject a node tensor value during the execution

A wrapper interfaces layer will be created in tvm.tools.debug.wrapper.ui_wrapper for the above interfaces.
Based on DebugStepper user events, stepper runtime interfaces will be called through tvm.tools.debug.wrapper.ui_wrapper

TVMDBG profiler can be used for profiling the model based on TVM kernels.
The objective is to provide the execution time of each graph node and map its source in the TVM kernels.
This can be used to identify the time consuming nodes and analyse its kernel source.
This helps identify the areas to be analyse more to optimise.

WIP

Most helpful comment

First of all, it could be indeed helpful to introduce profiling mode into the runtime. There are three major technical issues that I would like to see being addressed

Zero Overhead

We should design the debug runtime/ or profiler to be zero cost, this means that we should not worry about it when it is switched off. That would likely mean we would need a common implementation with two subclasses of graph runtime, and the debugger only linked when the debug is switched on.

Ideally, the debugger/profiler should not introduce changes to data structures and keeps everything internal.

Clear Log Data Schema and Separation of Logger and UX

It is important to have a clear separation between UX and the data logging, in this case, a data schema of the log data being generated is extremely important, as we may want to switch UX and make it de-coupled from the logger itself.

Choice of UX

When possible, is it OK to reuse the existing UX frameworks, for example, logging data into tensor board format and reuse the tensorboard's infrastructure, which seems to be better than the current one? Of course this design choice can be deferred as long as there is a clear data schema that does the de-coupling

All 13 comments

cc @dmlc/tvm-team, please discuss this proposal, also cc @yidawang @jroesch . Everyone is welcomed to chime in about their experience and expectation on this. I would like to focus on specific requirements we want, and possible design suggestions. I would like to invite @zihaolucky who worked on tensorboard lite to join the discussion

First of all, it could be indeed helpful to introduce profiling mode into the runtime. There are three major technical issues that I would like to see being addressed

Zero Overhead

We should design the debug runtime/ or profiler to be zero cost, this means that we should not worry about it when it is switched off. That would likely mean we would need a common implementation with two subclasses of graph runtime, and the debugger only linked when the debug is switched on.

Ideally, the debugger/profiler should not introduce changes to data structures and keeps everything internal.

Clear Log Data Schema and Separation of Logger and UX

It is important to have a clear separation between UX and the data logging, in this case, a data schema of the log data being generated is extremely important, as we may want to switch UX and make it de-coupled from the logger itself.

Choice of UX

When possible, is it OK to reuse the existing UX frameworks, for example, logging data into tensor board format and reuse the tensorboard's infrastructure, which seems to be better than the current one? Of course this design choice can be deferred as long as there is a clear data schema that does the de-coupling

This sounds great! cc @mnuyens

@siju-samuel please try to address these points by updating the design doc and post them in this issue.

WIP, will update soon

After NNVM build, tvm.contrib.GraphModule object is created to interact with runtime.graph.GraphRuntime.
To enable the debug, instead of importing from tvm.contrib import graph_runtime user needs to import from tvm.tools.debug.runtime import debugruntime

#from tvm.contrib import graph_runtime as graph_runtime
from tvm.tools.debug.runtime import debugruntime as graph_runtime
m = graph_runtime.create(graph, lib, ctx)'
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
m.run()
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).asnumpy()

It creates a tvm.tools.debug.runtime.DebugGraphModule which hold tvm.contrib.GraphModule object to interact with runtime.graph.GraphRuntime.

The DebugGraphModule is resposible for end to end data logging and UX interactions.

Data Logging
Currently we choose data logging format as json to support different UXs like curses_ui/tensorboard since TVM graph and params can be easily converted to json. It can be also changed to protobuf to support tensorboard.
UX triggers DebugGraphModule for performing runtime operation(debugging, stepping, profiling) based on user inputs and the output will be send back to UX.

Interfaces
1.DebugGraphModule handles the below operations.

  1. DebugGraphModule extends the tvm.contrib.GraphModule. Frontend script will receive DebugGraphModule object in case of debug.
  2. DebugGraphModule handles bifurcation of data dumping for curses_ui/tensorboard.
  3. It invokes the interfaces of graph_runtime.cc to run and inject_values(for stepper)

2.graph_runtime.cc need to add two interfaces to do the debugging, stepping & profiling.

  1. DebugRun is used to run single node from the 'DebugGraphModule'.
  /*!
   * \brief Run a particular operation and returns its output.
   * \param index The index which needs to run.
   * \param data_out The input data to be set.
   * \return The time taken to run current op.
   */
  int DebugRun(int index, DLTensor **data_out)
  1. InjectValue used to inject a node tensor value during the execution, this is used to change the value of any node.
  /*!
   * \brief Set the value of a tensor during execution.
   * \param eid The input eid node which needs to be set.
   * \param data_in The input data to be set.
   */
  void InjectValue(int eid, DLTensor* data_in)

There is no need to change the data structure or there is no impact for the runtime without debug(zero overhead for normal operation).

This looks like a great direction. Even having a rough dump of time-per-stage (like Halide's) would be super helpful for performance tuning.

@ajtulloch are you asking for a different thing that this one? Since the current proposal was mainly about per step timing of an end to end graph, rather than per stage timing.

There are some useful ways to get the per stage timing. One way to do so is to use debug_skip_region pragma to skip certain stages.

@siju-samuel Thanks for the updated discussion. To follow-up on the points:

  • We don't want the user to pay the size of the debug code when the debug is not used(affects the binary size of libtvm_runtime.so), here is a way to do so

    • Move the class definition of graphruntime into graph_runtime.h

    • Add graph_runtime_debug.cc, add class DebugGraphRuntime : public GraphRuntime;

    • Override GetFunction, to dispatch debug functions, by default, dispatch to GraphRuntime::GetFunction

    • Only link this in when USE_GRAPH_RUNTIME_DEBUG = ON

  • Please include a specification of the json schema that the UX is proposed to take
  • Please include a proposal on how we can interface this json schema with tensorboard

We don't want the user to pay the size of the debug code when the debug is not used(affects the binary size of libtvm_runtime.so), here is a way to do so

This will be taken as you said. I have done these modifications. Since the code is protected under compilation macro USE_GRAPH_RUNTIME_DEBUG, user need to change makefile, build and install tvm again to enable the debug whereas in most other platforms, user can go to debug environment without much overhead.

Please include a specification of the json schema that the UX is proposed to take

  1. GRAPH. Here we do not need any special format change for the graph. curses_ux can directly use the nnvm graph json format for visualising the graph data. So we donot need to do conversion two times. If any new UX is added, it can add a wrapper to convert to its own format.
  2. TENSOR data. For Tensordata, I suggest to use numpy format for dumping from tvm and loading in UX, this is how its done currently. Any other thoughts?

Please include a proposal on how we can interface this json schema with tensorboard

To visualise the NNVM graph using tensorboard, similar to MXBOARD, TVM can introduce a new component TVMBOARD _(this can be either part of tvm repo or could be another repo like mxboard)_. Tvmboard will convert the graph and tensors from tvm format to tensorboard format.
The graphical visualisation, tensor data and the profiling data will be transformed by TVMBOARD for loading in tensorboard. Achieving the debugger functionality via tensorboard need to analyse further.

Proposed Folder Structure
Under tvm/python/tvm/will add new folder tools
tvm
..contrib
..exec
..ffi
_..tools
.......└── tvmdbg
................├── runtime
................├── tvmboard
................└── curses_ux_

Under tvm/src/runtime/graph add new folder debug

tvmdbg/runtime :- Consists of exposed debug interfaces to frontend; Distributing data to curses_ux/tvmboard.
tvmdbg/tvmboard :- Tvmboard will convert the graph and tensors from tvm format to tensorboard format.
tvmdbg/curses_ux :- An ncurses based ux tool to visualize the graph, tensor data and debugging.
runtime/graph/debug :- A debug wrapper over GraphRuntime class to extend the runtime debug functionalities.

@siju-samuel The general proposal looks good. A few minor comments

  • Let us put debugger at tvm/contrib/debugger, the location can be moved once the
  • The ux can be put under tvm/contrib/debugger/curses

Marking this RFC as WIP, as we have a good step forward. When opening PR, please list detailed actionable items, and include regression test cases for the components

@jroesch have some interesting thoughts on how to use pdb on TVM runtime, @jroesch can you comment on that?

The first debugger version has been merged in #1378, with profiling statistics for each layer

Was this page helpful?
0 / 5 - 0 ratings

Related issues

leandron picture leandron  ·  61Comments

anijain2305 picture anijain2305  ·  25Comments

tqchen picture tqchen  ·  52Comments

tqchen picture tqchen  ·  25Comments

tqchen picture tqchen  ·  25Comments