Currently wasm32-wasi target is tied to the libc implementation for compatibility with C/C++ code. But for pure-Rust applications C environment is unnecessary and causes a certain bloat of resulting binaries.
For example a very simple code like this:
fn main() {
let mut buf = [0u8; 4];
unsafe {
let _ = wasi::random_get(buf.as_mut_ptr(), buf.len());
let c = wasi::Ciovec { buf: buf.as_mut_ptr(), buf_len: buf.len() };
let _ = wasi::fd_write(1, &[c]);
}
}
Gets compiled into a 64 KB binary (after strip), I guess a significant amount of which has to do with correctly setting up and destroying C environment.
Thus it could be beneficial to add a target like wasm32-wasi-rust intended for pure-Rust WASI applications. Most of the preliminary work has been already done and dependence of wasm32-wasi on libc is minimal (allocator and environment functions, abort, exit, __wasilibc_find_relpath), so IIUC addition of such target should be relatively simple. Depending on the rate of WASI evolution, such addition could be postponed until a certain level of stabilization to reduce maintenance burden.
Previously it was proposed in #63676.
cc @alexcrichton @sunfishcode
I would personally see the addition of a duplicate target as a very heavyweight operation that needs to be very well motivated. I don't really know what benefit we stand to gain other than shaving off some binary size, which could also arguably just be construed as a bug to fix in upstream wasi-libc. Are there other reasons than binary size to introduce a new target?
I am not sure setting up C environment can be called a libc bug.
I guess in addition to smaller sizes, having a pure-Rust target could:
Also note that there is a general interest in having libc-independent targets if platform allows it (let's call it "ideological reasons"?): rust-lang/rfcs#2610
Also there is this issue: WebAssembly/WASI#24
I guess it could be solved by tweaking libc, but I think the proposed target probably will be a more natural solution.
wasi-libc actually already has a change to help with https://github.com/WebAssembly/WASI/issues/24 (see https://github.com/CraneStation/wasi-libc/pull/74), so at this point I think it's just a matter of rust taking advantage of it. And I managed to figure out how to hack the rust compiler to get it to work, involved setting wasi-root in rust's config.toml (to point to a recent build of wasi-libc) and changing the crt object linked against for the wasi target in librustc_target to crt1-reactor.o instead of crt1.o. Then I was able to produce a wasm file using a binary target with #![no_main] and rustflags = ["-C","link-args=--no-entry"] (creating a cdylib didn't work), and got a wasm file with an _initialize export which does the libpreopen setup (this is provided by crt1-reactor.o). I was able to run the wasm file using wasmtime and call other functions that did file io successfully (I was able to get it working in wasmer as well, though wasmer doesn't call _initialize for you like wasmtime does right now, so you have to add that call yourself).
The first step here is to optimize Rust to avoid pulling in as much code. Right now with a default cargo init --bin project compiled to wasm32-wasi, Rust is still pulling in the environment-variable code and preopen code, even though neither should be needed just to print "hello world". Fixing these will be a code size win, whether we link with libc or a Rust library providing similar functionality, and whether one uses the new reactor ABI or not.
@sunfishcode
Maybe I am missing something, but shouldn't a pure-Rust target solve it automatically? IIUC right now a variable containing environmental data gets populated unconditionally on startup as part of setting-up C environment, while for a pure-Rust target it will be done as part of std::env functions (we even could remove the caching altogether), thus if program does not use environmental data, compiler will be able to trivially remove associated code using dead-code elimination pass.
WASI libc is structured such that it avoids pulling in the environment variable code if getenv, environ, etc. are not referenced, so it should work automatically for Rust too.
I've now debugged this a little more: the env is called from within std::panicking::default_hook. I briefly tried panic="abort" and using a completely empty fn main() {}, and Rust still linked in its own Rust panic support code, env code, UTF-8 decoding, backtracing, thread management, some fmt things, and other stuff.
The source of bloat there is panicking. The runtime thinks it can panic in a few places during fn main(){} (and generally in the internals of println! there's at least one path that LLVM can't prove won't panic). Panicking, as discovered, brings in the fmt machinery. Switching to a pure-Rust target would not solve this (and might make it worse if the Rust code is bigger than the equivalent C!)
I don't think the preopen bits are needed by fn main() {}, though, so maybe it's a bug that's pulled in?
Some preopen bits are being pulled in because:
fn main() {} program which calls various functions in libstd__wasilibc_find_relpath.__wasilibc_find_relpath then causes the linker to read in the .o file from libc.a which defines it__wasilibc_find_relpath also contains the constructor __wasilibc_populate_libpreopen__wasilibc_find_relpath and removes that function__wasilibc_populate_libpreopen because it doesn't ever DCE constructor functions, once they're pulled in.Ah indeed that makes sense.
That seems unlikely to get fixed (it's just the behavior of the linker), but @sunfishcode perhaps we could refactor the APIs in wasi-libc? Could the population of libpreopen have an explicit entry point which we call from Rust's std::fs functions (the one that calls __wasilibc_find_relpath), and that way we'd avoid pulling in a constructor function?
Basically Rust really has no need for constructor functions, and we'd rather lazily initialize them ourselves, and that way the linker GC should prevent everything from getting pulled in.
Refactoring libc do do the libpreopen initialization lazily is tricky because once user code is running, it may have renumbered the file descriptors, which would break the way libpreopen currently discovers them.
I've now submitted https://reviews.llvm.org/D85062 which is a patch to wasm-ld which allows it to DCE the __wasilibc_populate_libpreopen constructor when it isn't needed. [edit: link to the right patch]