Emscripten: Merge LLVM 6.0

Created on 17 May 2018 · 15Comments · Source: emscripten-core/emscripten

Discussion started in https://github.com/juj/emsdk/issues/144#issuecomment-389682105

I opened next-merge branches in the 3 repos and got things to compile pretty easily. Running tests, so far I see a SIMD test started to fail but it could just be autovectorization heuristics, and most ofher stuff seems to be passing.

It might make sense to wait for 6.0.1 which is in a few months. However, given this would fix at least 2 bugs (also https://github.com/kripken/emscripten/issues/6536 ) it seems like doing this sooner might make sense.

Source

kripken

Most helpful comment

Other than that (and the autovectorization issue which I think we can safely ignore) all tests pass!

kripken on 17 May 2018

🎉2

All 15 comments

Failures:

other.test_linpack_autovectorize: no longer seeing vectorization happen in LLVM IR on that code.
~~default.test_floatvars: unoptimized fmin(NAN, 3.3) returns NaN now. The man page suggests that is a bug.~~
other.test_wasm_backend: fails in s2wasm, perhaps it expects a still newer LLVM than this?

[[unknown directive:]]:
==========
= exp10@FUNCTION
    .ident  "clang version 6.0.1 (https://chrom
==========

kripken on 17 May 2018

The full line s2wasm fails on is

pow10 = exp10@FUNCTION

(from wasm_libc_rt.a). I guess this is an alias or something like that? Is it expected it would not work in LLVM6.0 but work in LLVM trunk, i.e. has that changed in s2wasm in recent months? cc @jgravelle-google

kripken on 17 May 2018

Other than that (and the autovectorization issue which I think we can safely ignore) all tests pass!

kripken on 17 May 2018

🎉2

https://github.com/WebAssembly/binaryen/pull/1491 it would appear

jgravelle-google on 17 May 2018

Thanks @jgravelle-google ! Well, in that case, we can probably just disable the test. The point of it was to help avoid regressing the wasm backend while working on asm2wasm in a simple way, but if it's not practical to check that using fastcomp, we'll just need to be more diligent to test properly.

kripken on 18 May 2018

A serious problem has shown up in the benchmark suite. On the one hand, there are some tiny code size wins across the board, and some big ones, 5% on box2d (!), 3% on bullet. But there are also several big slowdowns that must be investigated

skinning, box2D: 2x slowdown! :(
primes, bullet: 10% slowdown

Given these are all floating-point benchmarks, it seems possible there is a single shared cause here. (Perhaps related to the fmin/fmax changes?)

kripken on 18 May 2018

Perf issue looks like a hard problem. The main change is that LLVM inlines a little less in the new version, and in e.g. skinning we no longer inline all the calls in calculateVerticesAndNormals_x87. The binaryen optimizer does inline them later, but that only removes the call overhead, we still have the stack usage (more loads and stores).

Options seem to be

Accept the regression - inlining is just heuristics and some stuff will get faster or slower. If it were just one benchmark, maybe that would make sense, but 2 with a big slowdown is worrying.
Investigate the LLVM inliner issue. But even if we can figure it out, it might mean we need to make local modifications, unless the new heuristics happen to make general sense upstream.
Improve the binaryen inliner to handle stack usage. ~~Not easy.~~ Actually maybe not that hard.

kripken on 19 May 2018

In box2d, a suspicious issue is that in the new version we import sin/cos from JS and use them, which could cause slowness. I didn't confirm that, but it is wrong in itself. Looking in the IR, LLVM used to emit cosf and now emits llvm.cos.f32, which we apparently don't handle properly.

edit: Appears to be the linking-of-intrinsics issue. When we see cosf we link in that libc stuff, but when we see the intrinsic we don't and we end up falling back to JS support code...

kripken on 19 May 2018

I added a pass to lower llvm.cos.f32 etc. to libc calls after optimization, and then linking brings in libc properly etc. This practically fixes the box2d and primes regressions. That leaves skinning and bullet, and perhaps a tiny regression on box2d, all of which I suspect come down to the inlining issue.

kripken on 22 May 2018

The inlining issue was that something in TTI changed, and it wasn't using the asm.js TTI, and then it thought float operations were always expensive (the default in LLVM now?) which meant inlining didn't make as much sense.

Fixed now, so I think we can merge this soon.

kripken on 22 May 2018

Running the full benchmark suite, major speed regressions are gone. Some minor changes in both directions, and one big speedup on copy. On the other hand, all the size improvements are gone now that we inline properly again ;) and now we have tiny size regressions across the board, oh well.

kripken on 22 May 2018

I verified the entire test suite passes now properly. If there are no concerns I can merge this tomorrow.

kripken on 24 May 2018

👍1

Sounds good to me! +1

jgravelle-google on 24 May 2018

Pretty sure this should be closed.