I have begun to experiment with writing a new library called astgen to replace the large quantity of boilerplate required by the AST today, and enable us to more flexibly evolve the node system, and its APIs.
The first version of this tool will take a Python file like this:
import astgen
import tvm
class Expr:
pass
@astgen.astgen
class Constant(Expr):
"""
\\brief Constant tensor, backed by an NDArray on the cpu(0) device.
\\note Scalar constants are represented by rank-0 constant tensors,
enabling uniform constant folding over scalars and tensors.
"""
"""The data of the tensor."""
data: tvm.ndarray.NDArray
astgen.generate_all("expr.h", "tvm::relay")
and produce this C++ file:
namespace tvm {
namespace relay {
/*!
* \brief Constant tensor, backed by an NDArray on the cpu(0) device.
* \note Scalar constants are represented by rank-0 constant tensors,
* enabling uniform constant folding over scalars and tensors.
*
*/
class Constant;
/*!
* \brief Constant container.
*
*/
class ConstantNode : public ExprNode {
public:
void VisitAttrs(tvm::AttrVisitor* v) final {
v->Visit("data", &data);
}
TVM_DLL static Constant make(runtime::NDArray data);
static constexpr const char* _type_key = "relay.Constant";
TVM_DECLARE_NODE_TYPE_INFO(ConstantNode, ExprNode);
};
}
RELAY_DEFINE_NODE_REF(Constant, ConstantNode, Expr);
} // relay
} // tvm
This compliments Tianqi's recent proposal to evolve the low level IR see #3474.
Specifically by not hand writing all AST code, we should be able to flexibly change representation without requiring extensive refactors, and make unifying the IRs of TVM less effort as time goes on.
A secondary goal of mine is to allow any language with a C ABI compatible FFI to construct and manipulate TVM ASTs.
By supporting this we could allow users to build tools in languages of choice without having to change how we develop the core of TVM.
Furthermore this will improve Python interop. as we will no longer have to deal with hidden C++ fields as is the case today.
Unfortunately we have heavily relied on C++ objects, and C++ datatypes such as std::string and resolving these are essential to provide an FFI friendly AST.
I hope the community can help come up with a design for Relay's AST using a code generation based approach.
My goal is to first replace the AST today with little to no changes, and then incrementally evolve it over time.
I will follow up with more details on my proposed solutions over the next few days.
See this branch for more details: https://github.com/jroesch/tvm/tree/astgen.
cc @jermainewang @kazimuth @junrushao1994 @icemelon9 @ajtulloch @yzhliu @merrymercy who might be interested in this. Some initial thoughts:
tvm.schema.expr.py -> include/IR/expr.hHey Jared,
Nice proposal!
I am mostly interested in using the node system across C ABI.
First, I would love to understand:
1) how member methods could be generated, and
2) their usability across C ABIs.
If we wrap up data fields of generated nodes in pure C, and if packed functions' global registry can be used across C ABI (not now), we could have a systematic way to wrap the methods up
1) For virtual methods, we may leave a field in the pure C struct, like what DLManagedTensor's virtual destructor did.
2) For non-virtual methods, we should somehow register them as packed functions. We can design our own name mangling mechanism.
3) For each instance with vtable, we probably need a type key to indicate its type. And then we could implement them in a thin C++ wrapper.
Second, basic data structures are still in C++, for example, tvm::Array, tvm::Map, and std::string. Maybe this would be an opportunity to rewrite them in C.
BTW, packed functions can be across-ABI if we have to, by simply adding several C APIs. We don't have to make std::function across-ABI, just register pointers to them and execution context to them, like what DLPack did using deleter and manager_ctx.
Yeah, I strongly agree with the point that we need to decouple schema reading and the generation.
This is somehow like LLVM's tablegen, which manages repeat and regular codes in a centralized description file to minimize the changes we need to add new IR nodes.
close for now due to inactive status, @yzhliu will followup once he have a more specific proposal
Most helpful comment
cc @jermainewang @kazimuth @junrushao1994 @icemelon9 @ajtulloch @yzhliu @merrymercy who might be interested in this. Some initial thoughts:
tvm.schema.expr.py -> include/IR/expr.h