Concisely describe the proposed feature
A few users want to ship compiled Taichi kernels, so that they can run it without Python
Describe the solution you'd like
We can add a method like ti.export_all(lang='C'/'C++'), to dump
Then users can basically do something like
#include "mpm99_exported.h"
int main() {
initialize_taichi();
mpm99_substep();
finalize_taichi();
}
... or using a more OOP C++ version.
Additional comments
If you also need this or have any suggestions, please feel free to comment! :-)
I would really like the ability to do this, esp. with rust support.
Yeah I think we can start with a standard C interface and most other languages like C++/rust/go/Ruby can make use of it as well.
Makes sense. I'd be happy to help out with this but I'm personally not sure where to start.
For CPU code we can just dump LLVM IR and use llc to compile it into a .obj. Not sure what else we should do to make it a loadable shared object. The place where the optimized LLVM IR is emitted: https://github.com/taichi-dev/taichi/blob/abbf5b13537ab5e0f1d951fb3502b65a4f509bb1/taichi/backends/codegen_llvm_x86.cpp#L104
We should explore this direction and a good starting point is to compile a simple taichi kernel, such as
for i in range(n):
a[i] += 1
Some related discussions on SO: https://stackoverflow.com/questions/22956761/generate-binary-code-shared-library-from-embedded-llvm-in-c
For GPU code we can simply dump the compiled PTX and invoke the CUDA runtime to load and run the PTX code.
Possible solution:
First, dump llvm::errs() into /tmp/a.ll.
Second, call llc /tmp/a.ll -o /tmp/a.o.
Third, call gcc -fPIC -shared -o /tmp/a.so /tmp/a.o.
How to convert llvm::errs() into std::string?
Warning: The issue has been out-of-update for 50 days, marking stale.
Note: Currently the experimental C backend is already capable of creating .so, which could be later linked into user programs.
Most helpful comment
For CPU code we can just dump LLVM IR and use
llcto compile it into a .obj. Not sure what else we should do to make it a loadable shared object. The place where the optimized LLVM IR is emitted: https://github.com/taichi-dev/taichi/blob/abbf5b13537ab5e0f1d951fb3502b65a4f509bb1/taichi/backends/codegen_llvm_x86.cpp#L104We should explore this direction and a good starting point is to compile a simple taichi kernel, such as
Some related discussions on SO: https://stackoverflow.com/questions/22956761/generate-binary-code-shared-library-from-embedded-llvm-in-c
For GPU code we can simply dump the compiled PTX and invoke the CUDA runtime to load and run the PTX code.