When I flipped the switch from fastcomp to upstream a couple months ago on the Binaryen.js buildbot, I noticed that builds produced by upstream on any CI setup (Travis, GitHub Actions, various OS versions, with and without Docker) yield a broken JS file reliably, which I cannot reproduce locally (Windows, Ubuntu VM in WSL) using the same versions of Emsdk/Emscripten/etc., Something about the CI environments appears special.
I made an issue on the buildbot at that time describing the problem in more detail, but haven't been able to pinpoint what's going on since then. The same problem affects the Binaryen testsuite itself btw, which still uses fastcomp for that reason and hasn't ever seen a successful CI run against upstream.
There are quite a few build logs for reference meanwhile as I made the buildbot run both fastcomp and upstream since then in the hope that upstream magically unbreaks, which it didn't. The interesting ones are those where fastcomp passes and upstream does not.
Now i worry that letting this issue slip while slowly fading out fastcomp will become problematic at some point, so any help to resolve this for good would be much appreciated :)
Oh yes, sorry, I remember you mentioned it, but for some reason I thought it had been resolved...
What I'd recommend is this: make a CI build using a specific version (say, 1.39.10) on linux, with --profiling (hopefully the bug still reproduces with that), then get that build from CI and verify it fails locally. (If it doesn't fail, maybe it's a memory limit or such on the CI?)
If that fails, please send me that together with the full build command (and simplified steps to see the problem). Then I'll build it locally (with the same version, and on linux), confirm the CI one fails and my build works, and then I'll investigate how the builds differ (in general they should be identical as we have deterministic builds, however, for some reason the builds on different machines are deterministic on each machine but not between them).
Alright, pinned a branch of the bot to build with just upstream and let it blindly push the broken files:
The bot reuses the build command provided by Binaryen, which builds with --profiling iiuc (Edit: will try to monkey-patch). In a recent binaryen repo (currently https://github.com/WebAssembly/binaryen/commit/5d7248b1613017af4f26d176e6ee725210b0fa88):
mkdir build
cd build
emcmake cmake .. -DCMAKE_BUILD_TYPE=Release
emmake make -j2 binaryen_js
cp bin/binaryen_js.js path/to/binaryen.js/index.js
Build environment is:
ubuntu-latest (18.04)latest (1.39.10)The error can be triggered by
git clone --single-branch --branch upstream https://github.com/dcodeIO/binaryen.js.git
cd binaryen.js
npm install
node tests/sanity.js
which runs tests/sanity.js (a very basic test) against the broken index.js (the binaryen_js build linked above that the bot pushed), yielding the following output for me locally:
Error: abort(Error: abort(Assertion failed: int(_id) == int(T::SpecificId), at: /home/runner/work/binaryen.js/binaryen.js/binaryen/src/wasm.h,565,cast).
These errors tend to differ between Binaryen commits (here: https://github.com/WebAssembly/binaryen/commit/5d7248b1613017af4f26d176e6ee725210b0fa88), sometimes leading to assertions being hit and sometimes missing JS functions.
With monkey-patched --profiling the build fails with
shared:WARNING: disabling closure because debug info was requested
shared:ERROR: treating warnings as errors (-Werror)
which is most likely fixable. Going to give this another shot in a few hours.
Thanks @dcodeIO , I can confirm the CI build fails while a local build for me works.
Without --profiling it will be very hard to compare the builds though, so hopefully you can build with that.
Another possible issue is that the emsdk will install its own closure compiler now locally - which may differ by system. Removing --closure 1 would also help quite a lot here.
Oh, another thing that may help, separately from all this, is if you can make a CI build with -s WASM=2. That will emit both wasm2js and wasm output, in *.wasm, *.wasm.js, *.mem files (plus the usual *.js). If you can get me a build with that, and we can confirm that as expected the wasm2js one fails but the wasm one works, that might be the quickest path to debugging this.
Managed to make a build with --profiling by disabling ENABLE_WERROR:
Yields quite a few build warnings, yet this build works for me locally and doesn't trigger an error. Matches my experience so far that the error goes away as soon as debug info is enabled. Going to try more variants.
Here is a build with just --closure 0 (everything else default, no profiling):
This one appears to work up to a certain point before hitting another error:
requiring binaryen
awaiting ready
constructing a module
creating an expression
creating a statement
adding a function
creating a multi-value type
adding a function import
adding a function export
adding a memory import
adding a memory export
validating the module
emitting text
Fatal: Module::getExport: asyncify_stop_unwind does not exist
@dcodeIO Yeah, I see the same with the noclosure build. Interesting the error changes...
Trying to compare it to a noclosure build of mine (that works), the differences are still substantial. The size of the table is different, even (without closure that's easier to read). That suggests the difference is at the LTO level.
A non-LTO build, build without -s WASM_OBJECT_FILES=0 and also without --llvm-lto 1, would be interesting to compare.
But that assumes the problem is with LLVM, yet if the wasm build works but not wasm2js, then a single WASM=2 build as mentioned earlier would be the best thing to debug that.
Next one is a build with WASM=2 and SINGLE_FILE disabled (everything else default, no profiling):
Since the build is named binaryen_js.js et al., running it requires changing a line in tests/sanity.js:
-var binaryen = isWasm ? require("../wasm") : require("..");
+var binaryen = require("../binaryen_js");
which seems to work (probably picking up the .wasm file). Monkey-patching binaryen_js.js with
-"undefined"===typeof WebAssembly
+true
(not sure if that's the correct way to force the .js variant) yields
ReferenceError: asmLibraryArg is not defined
at new Instance (eval at <anonymous> (path\to\binaryen.js\binaryen_js.js:8:75), <anonymous>:89:3)
That should have worked. I see the same issue locally. It's possible WASM=2 is not yet compatible with closure compiler (it's pretty new) - maybe try a build that way?
A non-LTO build, build without -s WASM_OBJECT_FILES=0 and also without --llvm-lto 1, would be interesting to compare.
Here's a build analogous to --closure 0 with removed WASM_OBJECT_FILES=0 / --llvm-lto 1:
Like --closure 0, this one works up to a point, but interestingly fails a little earlier (?!) with
requiring binaryen
awaiting ready
constructing a module
creating an expression
creating a statement
adding a function
creating a multi-value type
adding a function import
adding a function export
adding a memory import
adding a memory export
validating the module
[wasm-validator error in module] unexpected false: module function exports must be found, on
main
AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
assert(mod.validate())
That should have worked. I see the same issue locally. It's possible WASM=2 is not yet compatible with closure compiler (it's pretty new) - maybe try a build that way?
This one is analogous to WASM=2, but also with --closure 0:
Requires the same changes to tests/sanity.js as above (binaryen_js.js et al. again). Works by default (probably picking up the .wasm variant), while patching binaryen_js.js to use the JS variant (as above) yields:
requiring binaryen
awaiting ready
constructing a module
creating an expression
creating a statement
adding a function
creating a multi-value type
adding a function import
adding a function export
adding a memory import
adding a memory export
validating the module
emitting text
(module
optimizing the module
Fatal: Could not find pass: duplicate-function-elimination
Before erroring, it also hangs for a bit.
Great, I see the same! That last build should be all I need. Now the hard work for me begins :smile:
Ah, I'm afraid it's still quite hard...
Any chance the last (WASM=2, no closure) build still shows the error with --profiling-funcs? (not regular --profiling) If it does I'd have a much easier time with it. If it doesn't, that's actually interesting too, as the only difference should be function names and nothing else - comparing with and without that flag could help.
Also please check if you can if EMCC_AUTODEBUG=1 in the env still lets the bug show itself. If it does, a WASM=2 no closure build could be hugely helpful.
Any chance the last (WASM=2, no closure) build still shows the error with --profiling-funcs?
Alright, on it again :) Here's WASM=2 with --closure 0 plus --profiling-funcs:
Has similar build warnings to the earlier --profiling build, and like anything debug before appears to work (using the simple sanity test, perfectly possibly that it becomes too simple at some point) both by default and when patched to run the JS variant, hmm.
Also please check if you can if EMCC_AUTODEBUG=1 in the env still lets the bug show itself. If it does, a WASM=2 no closure build could be hugely helpful.
This one is WASM=2 with --closure 0 and EMCC_AUTODEBUG=1:
By default yields a 300mb+ output log, and patching to use the JS variant fails at 38 KB with
Assertion failed: id < typeLists.size(), at: /home/runner/work/binaryen.js/binaryen.js/binaryen/src/wasm/wasm-type.cpp,132,expand
log_execution 29735
My local output: out.txt
Thanks @dcodeIO ! That last build is very useful. After debugging for a while I found an issue, which has a fix in #10682. I am not sure it will fix this, though. While the problem definitely seems to be in that pass (based on all the above builds), that PR fixes something that I don't think was a correctness bug (just a silly bug leading to unoptimal results).
Any chance you can make a build like that very last one but with --profiling-funcs? That should prevent the bug from happening, and then I can compare it to the one that does have name minification on.
Sure :) Here's WASM=2 with --closure 0 and EMCC_AUTODEBUG=1 plus --profiling-funcs:
Is happily printing instrumentation when patched to use the JS variant, so appears to work.
Whoops, appears Binaryen had a commit recently, so this is not exactly the same anymore. Going to update the non --profiling-funcs build as well so there's one to compare on equal https://github.com/WebAssembly/binaryen/commit/5c0bf993bc530a5b69caac1a9d70eec13c887d70.
Updated WASM=2 with --closure 0 and EMCC_AUTODEBUG=1 plus --profiling-funcs minus --profiling-funcs:
Now fails with
TypeError: o[LX] is not a function
at Lec (eval at <anonymous> (path\to\binaryen.js\binaryen_js.js:6:1844), <anonymous>:67:1721192)
Thanks for the builds! Sorry this was so long, but it was one of the trickier bugs to figure out (only on CI, nondeterministic, and vanishes when any debug info is used).
I think #10682 should fix this. The issue is that in a big-enough project we got confused between local and global names, and the name minifier made us call the wrong target.
If there isn't an easy way to test that PR on CI, we can land it and check a tot build after that.
Testing on CI will be a challenge as building LLVM on these restricted environments requires quite a bit of hackery. Last time I checked I needed to link with gold due to memory, and re-trigger multiple times using automated commit messages, catching up on ccache state until it finishes ^^. Pre-built binaries are ideal there. The buildbot should automatically unbreak itself and show a successful upstream run if it works.
Always glad to help, thanks for being awesome :)
Oh, wait, since this is just a JS change, it might be possible to copy the updated file into the repo and replace it within Emscripten before building. Need to check.
No worries, if it's not easy though, I'd just wait for the tip of tree build in a few hours (and then just emsdk install tot on the CI will use that, if you aren't using tot already).
And thanks again! Without your help I'd have had almost no chance at all to figure this out.
Oh wow, great! :smile:
The buildbot shows that all of latest, latest-fastcomp and tot are working, so closing :)