Hi, I wrote a simple C application and use emcc of latest version to compile it to wasm file with SIDE_MODULE=1 option:
emcc -O3 -s SIDE_MODULE=1 -s TOTAL_MEMORY=65536 -s TOTAL_STACK=4096 -s "EXPORTED_FUNCTIONS=['_main']" -o test.wasm main.c
It is strange that the import memory size is 0 in the generated wasm file, no matter what memory size and stack size are set with "-s TOTAL_MEMOYR=xxx" and "-s TOTAL_STACK=xxx":
(import "env" "__memory_base" (global $6 i32))
(import "env" "memory" (memory $5 0))
But if I use emcc of previous version, e.g. 1.38.42, the import memory is set correctly:
(import "env" "__memory_base" (global $5 i32))
(import "env" "memory" (memory $4 1))
Not sure whether it is a bug of the latest emcc version? Or is there anything I am missing?
When I use the latest emscripten to run the above command I get "1" in the imported memory.
Note that in emscripten the wasm memory object is created by the JS code and then imported into the WebAssembly module. The 0 or 1 you see there in the import is not the size of the memory but rather the minimum size that module will accept. Still in the above case I would expect a 1 there. Which version exactly of emscripen are you running? Are you using fastcomp or the llvm backend?
The appears to be a difference with the llvm backend. I tooks like the llvm backend ignores the memory size when building a side module. This makes some sense since it is the main module and the runtime linker that loads the side module that determines the size of he memory (the memory that is shared between the main module and side module).
Thanks a lot , @sbc100, how to know which the emcc backend is, and how to switch the backend? And also could we dump all the arguments passing from emcc to llvm backend?
If you use emsdk then simply installing latest will give you the llvm backend. If you installed latest-fastcomp you will get the old fastcomp compiler.
I suggest you switch from SIDE_MODULE=1 to STANDALONE_WASM=1 if thats fits with what you are trying to do?
To see the arguments that emcc passes the easiest way is probably to run with -v .. or for even more information (probably way more than you want) you can set EMCC_DEBUG=1
Hi, using STANDALONE_WASM=1 creates wasm binary which imports wasi APIs, and most of the libc source code is compiled into wasm bytecode, which greatly increases the binary size. Currently we are developing wasm vm for embedded system, and the footprint is very import, so it is better for us to use the SIDE_MODULE mode.
I tested the latest-fastcomp version, it did generate the wasm binary with memory page count 1 as expected. The backend is asm.js: "check tells us to use asm.js backend".
For the latest version(or upstream version), I set EMCC_DEBUG=1, and dump the log. The backend is wasm: "check tells us to use wasm backend". And I found that when it uses wasm-ld to generate the wasm binary with .o file, see line 25 in the uploaded log file, it only passes "--import-memory" argument to wasm-ld, but not pass "--initial-memory=
I am confused that why emcc doesn't pass "--initial-memory" and "--max-memory" arguments to wasm-ld? And who should determine the memory size and create the memory? If it is the main module, then the main module might initialize the memory with its memory init data firstly, and then when it loads the side module, should it write the side module's memory init data to the memory again? If not, the side module cannot work correctly; if yes, the memory data may be overwritten, can the main module work correctly? And why the behaviors are different between latest version and latest-fastcomp version?
When importing memory, the "initial memory size" acts more like an "minimum memory size". As you can see emscripten will normally pass --import-memory. In this case the "initial memory"/"minimum" memory is not that important since JS creates the memory and passes it to the wasm. In the case of the side module the side module data is loading at some address that is not known until runtime, so the side module has not idea how big the memory should be to it doesn't make much sense to specify one. For example the side module might have 1k of static data, but then get loaded at address 500k in the actual memory at runtime.
Is there some reason why having the initial size (of the imported memory) set to zero is a problem for your use case? Its not a problem in general in emscripten? Perhaps you are trying to know how much static memory the SIDE_MODULE contains? In that is the case you can look at the initial 'dylink' section which tells you this (in bytes, which is much more accurate that the memory page granularity of 64k).
I do think that ultimately what you do want is STANDALONE_WASM. The part of the libc that get linked in should be exactly those that are needed and no more. Are you seeing parts of libc being linked in that you are not using? If you use SIDE_MODULE then the libc functions you use will end up as imports and you will have to somehow implement them in the embedder side? Is that what you hope to do?
@wenyongh
Hi, using STANDALONE_WASM=1 creates wasm binary which imports wasi APIs, and most of the libc source code is compiled into wasm bytecode, which greatly increases the binary size. Currently we are developing wasm vm for embedded system, and the footprint is very import, so it is better for us to use the SIDE_MODULE mode.
Possibly we should add some kind of -nostdlib support? But you can do something very similar right now, with EMCC_ONLY_FORCED_STDLIBS=1 in the environment. That will only include system libs that you mention, and if you mention none, if won't add any. But if you use any libc functions then you'll get undefined symbol errors. You can ignore those with -s ERROR_ON_UNDEFINED_SYMBOLS=0, and then they end up as just imports in the wasm.
Doing EMCC_ONLY_FORCED_STDLIBS=1 ./emcc tests/hello_world.c -o a.wasm -s ERROR_ON_UNDEFINED_SYMBOLS=0 -O3 for example I get a 359 byte wasm file, with an import for puts from libc.
Thanks a lot, @sbc100, @kripken. We do need the support of "-nostdlib" feature like clang to reduce the binary size, and let wasm binary calls the native libc wrapper API's, which do some address conversion, and then call the actual libc API. The STANDALONE_WASM option needs wasi support, it is another option but not for embedded system, we are also trying to enable it.
The EMCC_ONLY_FORCED_STDLIBS=1 option works, after "emsdk activate latest-fastcomp" and run emcc, the wasm binary only imports some libc APIs, and the import memory size is not 0. But for latest version(emsdk activate latest), the wasm binary imports some wasi functions, including "args_get", "args_sizes_get", "proc_exit", and also some other function like "iprintf", even when I set "-s STANDALONE_WASM=0". Is there any option to disable wasi for the "latest" version?
STANDALONE_WASM implies the use of wasi for low level sys calls. Thats kind of the point of it. It makes the wasm module independent of emscripten if possible. This is designed for embedded environments that don't want to run the emscripten JS code. If we didn't use wasi syscalls we would need to use custom ones which seems strictly worse.
We maybe able to eliminate the argv syscalls when the application doesn't need argv at all. wasi-libc was able to do this. But honestly implementing those syscalls seems like part of parsle of being a wasm embedder at this point. If you don't want wasi syscalls or emscripten syscalls then you can design you own libc and just use clang + lld directly.
OK, thanks a lot, we will also try using clang + lld. And test STANDALONE_WASM option in wasi feature development.
I opened #9812 now to optimize out the argv/argc stuff, if a program doesn't need it.
Seem like we can close this now.
Most helpful comment
@wenyongh
Possibly we should add some kind of
-nostdlibsupport? But you can do something very similar right now, withEMCC_ONLY_FORCED_STDLIBS=1in the environment. That will only include system libs that you mention, and if you mention none, if won't add any. But if you use any libc functions then you'll get undefined symbol errors. You can ignore those with-s ERROR_ON_UNDEFINED_SYMBOLS=0, and then they end up as just imports in the wasm.Doing
EMCC_ONLY_FORCED_STDLIBS=1 ./emcc tests/hello_world.c -o a.wasm -s ERROR_ON_UNDEFINED_SYMBOLS=0 -O3for example I get a 359 byte wasm file, with an import forputsfrom libc.