Emscripten: BINARYEN_IMPRECISE, on or off by default?

Created on 14 Oct 2016  路  43Comments  路  Source: emscripten-core/emscripten

wasm will trap on e.g. 1/0 in integers, but asm.js didn't. Binaryen's asm2wasm can compile code in two modes, "precise" which should behave exactly like before (and exactly like a native build), by avoiding traps carefully, and an "imprecise" mode which ignores that things might trap, which is faster (but might trap).

  • Precise by default (current setting) means people aren't surprised by correctness bugs, but are surprised by perf issues.
  • Imprecise by default would mean people are surprised by correctness bugs, but are not surprised by perf issues.
  • A third option might be to try to get the precise mode up to the same speed as imprecise.

No obvious answer what is better here. Thoughts?

wontfix

All 43 comments

Maybe it depends on exactly what sort of operations are included here aside from to division by 0. But "exactly like a native build" isn't really a thing; at least it's not just one thing: there is more than one native platform, and they have different behaviors (even different OSes on the same hardware can have different behavior). This is exactly why C has undefined behavior in the first place. For division by 0 (and other places where the standard has undefined behavior) I would say we should let it trap. In fact, for most cases I would really prefer trapping to some other behavior, but maybe that depends on the specifics. Of course this is slightly complicated by the fact that users may be migrating existing codebases from asm.js, so there's potential desire for compatibility; but again, we are talking about broken code, so I'd say it should be fine to have a compatibility option, but off by default.

But what other behaviors are affected byBINARYEN_IMPRECISE? (and what exactly is meant by "precise", anyway? To me, trapping seems more "precise" than some other behavior, in which you might not discover your bug until some random time further down the line, sort of like why sanitizers are useful in cases of memory safety bugs).

Hmm, I find the terms "precise" and "imprecise" a bit confusing here. Does this only control the behavior of what happens on an integer div by zero and an integer modulus of zero? If so, could the option be something explicit like BINARYEN_INT_DIV0_THROWS=0/1 or similar?

How do we avoid traps? Manually by inserting if (divisor == 0) checks in the emitted code before each integer division?

Yeah, the term isn't the best I guess. The idea behind it was "precise = behave precisely the same as we always have and the way most native builds do".

Things affected:

  • i32, i64 div and rem, signed and unsigned
  • float-to-int truncation

This does affect real-world code, e.g. some things in the emscripten test suite, as well as the csmith fuzzer generates code that hits these.

We currently avoid traps by calling out into another function. I think right now that does an ffi into JS, which is easiest - we are guaranteed to get the same semantics as JS. Instead we could check for 0 or the float-to-int trunc limits and carefully do what JS would do, in wasm, which would be faster. Note that we don't and probably shouldn't do this inline because of code size concerns.

I can think of the following build modes could be useful:

  • abort: Abort execution on each integer div/mod by zero, and print a callstack where it occurs.
  • fast: "I know my code will not do any bad div/mod by zeros, give me the fastest possible code, and if it happens to do a div/mod by zero, I don't care what ever happens (can abort or produce garbage or whatever)"
  • quiet: If my code happens to do a div/mod by zero, under any circumstances don't abort execution, but generate an undefined result from that operation and keep on going. (garbage in, garbage out principle)
  • strict: If my code does a div/mod by zero, throw me a C++ exception that I can try { } catch() and choose how to proceed. (c.f. on Windows SEH handling)

The first three sound like we can do, but not sure if wasm will have facilities (yet?) for the fourth strict mode.

Is the float-to-int truncation about float not being representable as an int? E.g. converting, say, 1e90 to an integer? Perhaps that could have a separate flag to control, but with the same modes as above?

(Currently asm.js only has build mode quiet).

Generally div/mod by zeroes have generated 0 as result on asm.js, but can't recall if we actually strongly defined them that way?

I'm not sure about float->int truncation; but I think for wasm div/rem, abort and fast are the same, and they are just equivalent to letting wasm trap (browsers will give you a callstack in any case there I think). I think quiet is the current "precise" behavior? We could maybe do strict with some support from the compiler; not sure how much work it would be or who would want benefit from it.

@juj yes, trunc operations trap if the FP value is NaN or doesn't round to an integer in range: see https://github.com/WebAssembly/design/blob/master/Semantics.md#datatype-conversions-truncations-reinterpretations-promotions-and-demotions

For that we could continue to thunk to JS, or probably just compile something from libm into wasm and call it before each trunc operation, which might (?) be faster. zero checks would probably be faster compiled into wasm if it's just making the result 0 when the dividend is 0.

I think the default should be to trap on zero dividends and truncation range errors, which would be the fastest, and probably would be what I'd want as a developer (since it gives you immediate feedback when you have the problem and tells you exactly where it is).

We could also have an asm.js compatibility mode, which would have zero-checks on divide and emulate whatever the asm.js behavior is for truncation. Depending on how much work we think that is, it might be worth finding out if anyone wants that; but if it's really simple then maybe it's easy just to do it.

Perhaps strict might not be such a good term, but calling it throw might be better.

In that case I vote for having four build modes abort, fast, quiet, throw. It could be that fast ends up meaning the same as e.g. abort, but it's nice to have the developer be able to semantically say what they want, even though the result might be identical. That would retain a flexibility to future proof optimizations under the hood.

As for the default, I think abort would be fine behavior. Then people can opt in to fast when they get to optimization land.

If we are going do to anything beyond "do the fastest thing that is standards-compliant" or "get asm.js compatibility" then we are designing a debugging feature. If we are designing a debugging feature then we should give it some more thought about what exactly we want it to accomplish, the scope, the users and uses cases, etc. For example there's no reason it has to be limited to the arbitrary class of "things that just happen to be different between asm.js and wasm". We can also ask what other tools are available for the same purpose, and what other toolchain features we have that might be related, etc.

https://github.com/WebAssembly/binaryen/pull/832 suggests we optimize our precise code, avoiding traps by checking for the conditions in a "safe" function in wasm that we call. This is a 3x improvement over the call out to JS we used to have, but still slow when the critical operations are done in a tight loop. Inlining might help, but it's tricky.

And on the other side of things, a project reported that they can't fix the traps - our recommended solution - because it's in third-party code for them that they don't want to maintain a fork of.

So I'm still not sure we have a good answer here. Precise mode will continue to be slow; imprecise mode requires people fix their source which in practice isn't always doable. This area remains one in which wasm is a regression compared to asm.js, sadly.

I propose we give the following linker flags:

  • -ffloat-to-int-overflow=abort: Lets the wasm trap exception from float to int conversion being out of range get thrown unhandled to JS. In our JS shell, we could catch these and print out a message "Use -ffloat-to-int-overflow=abort|clamp|fast to specify the desired behavior." to guide users to the option.
  • -ffloat-to-int-overflow=clamp: Clamps out of bound values to INTMIN and INTMAX.
  • -ffloat-to-int-overflow=fast: Don't care what the behavior is, just make float to int conversions run as fast as possible when overflows are assumed not to occur in the program.
  • -fdiv-by-zero=abort: Lets the wasm trap exception from div by zero and remainder by zero get thrown unhandled to JS. In our JS shell, we could catch these and print out a message "Use -fdiv-by-zero=abort|quiet|fast to specify the desired behavior." to guide users to the option.
  • -fdiv-by-zero=quiet: Div by zero and remainder by zero should not abort in any condition, but always keep on going, with undefined result from the computation.
  • -fdiv-by-zero=fast: Don't care what the behavior is, just make float divisions and remainder ops run as fast as possible when div/mod by zero are assumed not to occur in the program.

Also, I believe we have a limitation that these flags can't be controlled at compilation unit level, but they always need to be specified for the final link, so we should make the compiler yell out and abort if these are being passed when not producing a final link (as opposed to silently ignoring them).

This kind of options set would allow nicely later expanding to throw modes where we'd throw a C++ exception, but no need to implement that now. As for defaults, it would be sensible to have -ffloat-to-int-overflow=clamp be a default, because that option is well defined (although also -ffloat-to-int-overflow=abort would also be a sensible default), and for div/mod by zeroes, -fdiv-by-zero=abort is good default, albeit noted that it's not the same behavior as with asm.js. I don't think we need to follow the same defaults that asm.js had, especially when it will be easy and explicit to guide users to which flags to set if they want to have the asm.js behavior. The issue #3836 was about adding this option for debugging earlier, so the old asm.js behavior wasn't too good as a default I think.

So for wasm, I guess fast is the same as abort (since wasm traps in both of these cases), and for asm.js there would be extra code? What's the current behavior on asm.js for float->int overflow?

@dschuff: asm.js does what JS does for float->int overflow, which is to do a mod of 2^32 (!).

@juj Overall that sounds ok, some notes though:

  • "fast" is ambiguous, because in theory, it might be faster to do one of the others (e.g. the VM could fold out its own check for trapping if it sees the code guarded against it). What we can say with certainty is that the "natural" or "compact" way to do those things is to just emit the wasm operation, no guards, no special handling around it.
  • I do still think we need an option for "do precisely what asm.js does". That's a massive benefit when debugging in some cases.

So another possible suggestion is:

  • WASM_F2INT_OVERFLOW_MODE and WASM_IDIVMOD_ZERO_MODE with options

    • "natural" or "fast" or "aborting" or something like that - do the simple, natural thing in wasm, which is compact and usually fast and can abort.

    • "js" - do exactly what JS and asm.js do (which, in particular, also means we never abort).

    • "silent" - never abort, do something "reasonable" here, which would be clamping and 0, respectively.

As for defaults, I kind of want "js" to be the default in both, but I realize that's probably just me. I agree with @juj that we should make div/mod by 0 abort, like native, and clamping for float2int overflow which is also like native (but we'd need to decide which native).

I think "trapping" is maybe a better name that "aborting" since "trapping" matches the wasm notion of trapping (as opposed to calling libc abort which we aren't doing).
I'm still not sure it's worth supporting a mode which isn't fast/trapping and "js"...
Someone who cares about the possibility of division by 0 would/should already have code that checks, especially since div0 and its collection of different behaviors is so well known (was this user with the third-party code getting div0 traps or float->int overflow traps?) float->int overflow is maybe different because IIUC our behavior is more different from other native platforms?

Agreed on "trapping", sounds better than "aborting". However I somewhat prefer "natural" to both. But not by much.

I believe that third-party code issue was indeed float to int stuff, not int div0. Yeah, maybe we just need 2 modes for int div0, but for floats to int, I do think we need the third, since

  • "trapping/natural' is not usable if it's in third-party code, and this isn't so rare actually, like in a game engine it could have triangles that are not going to be shown, but it transforms them anyhow, so garbage in garbage out. If it aborts, they'd need to add range checks which no other platforms needs.
  • "js" is slow and cumbersome, as it needs to do an fmod of 2^32.

So a "silent" mode that does something similar to native platforms, quickly, seems like what many people would need.

And if we have 3 modes for float to int, maybe for symmetry we should have it for int div0 too?

I think "trapping" sounds better than "aborting", but that is because they describe different things: "trapping" hints that it should be possible to catch and recover from those traps, which isn't true for wasm as it currently stands and probably won't be true for the MVP. My suggestion would be to call this wasm mode.

Otherwise, I agree that as wasm currently stands, it probably is best to use compiler switches for at least four modes (aborting/wasm, fast, js/asmjs, throw).

I'd really rename this. "Precise" is a very... imprecise... name.

I think it should only be the default if the codegen is such that a Sufficiently Smart Compiler could:

  • Look at the code generated by emscripten
  • Generate fewer instructions than wasm's semantics mandate

So basically, if you're "undoing" wasm's UB avoidance and giving x86 / ARM semantics to the user's code then I'm happy. Our compiler will detect this pattern and optimize it.

Yeah, all the proposals above rename it (e.g. the values are "trapping", "js", "silent" in my last proposal, and the setting name is "imod_overflow" or such). But I don't think we have a decision on the terms, nor on which should be the default.

I don't know enough about CPU behavior to know whether the SSC stuff can work. Perhaps you can elaborate or point me to how to find the docs? But regardless, how can that work if x86 and ARM do different things? Seems like generating less code for both is hard, but maybe that's my lack of knowledge.

I suggest making "trapping" the default. For integer div/rem, x86 raises SIGFPE (etc.) on divide by zero and signed overflow, so it's reasonable to assume that portable code doesn't depend on it being silent, so trapping is fine. That's not the case for float-to-int conversion, but I still suggest trapping so that overflow/invalid errors will be quickly identified, and developers can choose how they want to handle them (by switching to "silent" or by changing the code).

I think the question of which is the default is one of convenience for developers:

  • IIUC @kripken thinks seamless transition from asm.js is most important: if they had UB before they still do now and they're used to (some) of that behavior "just working" on native.
  • IIUC @sunfishcode thinks the wasm semantics are the right ones: we're actually helping developers by pointing out their UB, and they can easily revert to non-trapping if they want.

I don't care which is chosen. I'd like the name changed.

Whichever default is chosen, I'd like it if the non-trapping approach were optimizable to a single instruction on x86 and/or ARM. I care about this being fast, and I can't make it fast if you call into JS instead of "undoing" what wasm's spec mandates.

On code: can you just dump what SM generates for each of these opcodes? Agreed with @sunfishcode that div/rem probably doesn't matter for the reasons he states (also, it's already slow so whatever).

Trying to collect together the thoughts above, how about the following linker flags, with proposed documentation:

When compiled code performs float to int conversions, it is possible for the floating point value to lie outside the range of the integer. The following linker flag controls how generated code will handle this case:

  • -s F2INT_OVERFLOW_MODE='trap': Out of bounds values generate a WebAssembly trap condition. This mode generates no extra WebAssembly instructions besides the conversion operation.
  • -s F2INT_OVERFLOW_MODE='clamp': Instead of trapping, too small out of bounds values are rounded up to INT_MIN, and too large out of bounds values are rounded down to INT_MAX.
  • -s F2INT_OVERFLOW_MODE='js': In this mode, floats are converted to integers with the same semantics that the JavaScript expression Number | 0 provides. That is, the float to int conversion is performed on the number computed modulo 2^32. This mode aligns with the behaviour that asm.js provides.

Likewise, when performing integer division and integer modulo arithmetic, it is possible for the divisor to be zero. This can be handled in the following ways.

  • -s IDIVMOD_ZERO_MODE='trap': Integer division by 0 and modulo by 0 operations generate a WebAssembly trap condition. This mode generates no extra WebAssembly instructions besides the arithmetic operation.
  • -s IDIVMOD_ZERO_MODE='zero': Instead of trapping, integer division by 0 and modulo by 0 operations both return zero. This mode aligns with the behaviour that asm.js provides and how JavaScript division and modulus operators work.

Rationales I am considering here is that C/C++ developers do not know about the underlying wasm machinery, which is why terms like "natural" and "trap" are difficult. However I think the word trap can be used as a technical term if it is introduced and formally defined in the documentation. In the long run, comparing asm.js vs wasm has less value, so terms that arise from that perspective, "precise"/"imprecise" are less illustrative also. Comparisons against asm.js and wasm will be very critical, but can be done in documentation, like attempted above. My earlier proposal of using words that would carry developer intent: "quiet", "fast", "strict" didn't get much attraction, so dropped those and instead using words above that have strict technical meaning, which is probably best. Also dropped the WASM_ prefix (but used Alon's setting name wording), because I don't think this needs to be Wasm specific, and in the limit, everything will be Wasm specific. (Even if we didn't implement the same options for asm.js, we could abort if an unsupported mode was used to build for asm.js)

How does that sound?

Sounds great to me, except for

  1. What should the defaults be?
  2. @jfbastien, you thumbs-upped that post. Does that mean it addresses all your concerns above?
  3. If we drop the WASM_ prefix, we should document briefly that this has no effect on asm.js.

It address the naming concerns.

It doesn't address the codegen concerns: 'trap' means "generate a single wasm op" and I'm cool with that, but I want others to be optimizable by a wasm implementation.

I have no opinion on what the default should be.

An extra note here with respect to i32.div_s is that in addition to generating divide by zero exceptions, it can also generate a integer overflow exception on the input -2147483648 / -1, which can't fit in an i32. So a -s IDIVMOD_ZERO_MODE='zero' mode would have to also check for this special case in addition to division by zero. Having a build mode that does if (y != 0 && (x != -2147483648 || y != -1)) return x / y; else return 0; for every i32.div_s feels a bit rough.

If there was a i32.div_s_or_zero instruction, then wasm implementation could do the division without checks, and if it traps, have the trap handler report 0 as the result, and resume execution? This would have best performance as well as minimal extra generated code size? The other options would be to emit the if checks in wasm code before each division (bloats up code), or call a function each time to do the divide (slower).

If there was a i32.div_s_or_zero instruction, then wasm implementation could do the division without checks, and if it traps, have the trap handler report 0 as the result, and resume execution? This would have best performance as well as minimal extra generated code size? The other options would be to emit the if checks in wasm code before each division (bloats up code), or call a function each time to do the divide (slower).

It's worth filing an issue in the design repo if one doesn't already exist.

A PR optimizing integer div/rem landed, which was noncontroversial. Current status is

  • In precise mode, we now emit calls to wasm methods for integer div/rem, guarding against div/rem by 0. This avoids the previous ffis.
  • In precise mode, we still emit ffis for float to int.
  • In imprecise mode, we still emit a single wasm instruction for both.

What is left is the rest of the stuff discussed here,

  • Separate flags for integer and float ops. However, now that we do integer ops quickly either way (no ffis), do we actually need a separate flag? Integer ops should no longer be an issue here (although they do incur a function call, so perhaps worth inlining, for that there is https://github.com/WebAssembly/binaryen/tree/optimize-precise-2-inline which I am not sure is worth it or not).
  • Renaming "precise/imprecise" to something. Depends on whether we have separate int/float opts here.
  • Renaming the values. I think we all agreed on trap/clamp/js/zero as @juj summarizes in the last comment.
  • What to do with float to int. The discussion in this issue ended with @jfbastien suggesting we emit code that CPUs can optimize well. As I said earlier, I can't do that part without help - I don't know CPUs that well nor do I know what is best for VMs to pattern match for CPUs.

For float-to-int, something simple like:

fabsf(x) < 2147483648.0f ? (int32_t)x : -2147483648

should be sufficient for reasonably portable C/C++ code (x86, ARM, and JS are all wildly different from each other, not to mention that it's UB in C/C++, so it's unlikely that portable C/C++ code depends on the overflow/invalid value being anything in particular). This should be enough to avoid the ffi cost, which would be interesting to measure.

(In general, I'd encourage Binaryen not to worry too much about emitting code that VMs can pattern-match, because that gets hardware-specific.)

Cool, thanks. Here's a PR that does that: https://github.com/WebAssembly/binaryen/pull/929

Optimizing non-trapping integer div/rem is good, and WebAssembly/binaryen#929 seems fine too, so now that we have the 3 modes, I think that leaves the following:

  1. Whether to have separate flags for div/rem and float->int, or the 3 modes for both, (where JS and clamp are the same for div/rem)
  2. What they should be named
  3. What the defaults should be

For 1. I don't have strong opinions, but I'd guess that a user most likely will have been careful about avoiding UB and/or will want have trapping in both cases, or will just want the JS behavior in both cases. Or put another way, do we have any ideas (or have we heard from any users) about why someone would want one without the other?

For 2, it probably depends on the answer to 1, but @juj's proposal seems fine if we split them. If they stay together, then maybe something like -mtrapping-op-mode=js or -mtrapping-math=trap if we like GNU-style machine flags or -s TRAPPING_MATH_MODE=clamp if we like emscripten-style flags.

For 3, I still really think that we should default to trapping mode, at least when generating only wasm code (e.g. the same cases where we'd use I64). I think I expressed that opinion before but since then I encountered an example in the wild (https://github.com/google/draco) where the existing PRECISE flag made a dramatic difference in performance, and the devs had no idea it existed or that they had accidentally left so much performance on the table.

The value for generating JS semantics has is probably mostly useful when migrating existing asm.js code over and something doesn't work - it allows to remove one source of discrepancy from the equation (debugging asm.js -> wasm migration failures, and wasm in general is hard at present).

Splitting the div/rem by 0 and float-to-int ops to separate flags it useful for micro-optimizing when one knows a certain type of operation is ok in a codebase, but other one isn't. For example, I see that in UE4, div/rem by 0 never occurs, but float-to-int converting infs and nans does happen in multiple places. So there's a tiny perf and code size win in allowing trapping div/rem by 0, we don't need to keep checking those, but float-to-ints need to be checked so they don't trap.

As for what the defaults are, I don't have a strong preference either way. Experience with this shows that each codebase will likely want to do a different thing here. I feel like I would lean towards trapping in both cases, because that is the ideal state and gives explicit failure so people will be forced to pay attention. That paired with good FAQ item in migration guide as well as descriptive error message in browsers should avoid too many spurious bug reports from coming in "why doesn't my wasm code work, it did work in asm.js?". If we defaulted to JS or clamp, I worry that a lot of people will generate code that they don't know is slower than it needs to be.

there's a tiny perf and code size win in allowing trapping div/rem by 0

If it's really tiny, maybe it's not worth the complexity of another option? Measuring on BananaBread, which seems similar (no div0, but there are float2int overflows), I see a 0.25% code size difference. I don't have an easy way to measure perf though.

I'm not opposed to adding separate options, but worry that it's less clear for users, and that we may need to add yet more if we go down this path - e.g., why not split up div0 from rem0, etc.

Does anyone know of any code that needs silent div0 or silent rem0?

@sunfishcode

Does anyone know of any code that needs silent div0 or silent rem0?

Yes chill(div) and chill(mod) from B3:

Chill
Applies to: Div, Mod. You can create a chill Div/Mod by saying chill(Div). This creates a Kind that has the Chill flag set. This can only be used with interer types. An operation is said to be chill if it returns a sensible value whenever its non-chill form would have had undefined behavior. Chill Div turns x/0 into 0 and -231/-1 into -231. We recognize this in IR because it's exactly the semantics of division on ARM64, and it's also exactly the semantics that JavaScript wants for "(x/y)|0". Chill Mod turns x%0 into 0 and -231%-1 into 0. This matches the semantics of JavaScript "(x%y)|0".

@jfbastien More specifically, does anyone know of any real-world C/C++ code which does integer division or remainder by zero, in C/C++, and depends on it returning a value silently?

@sunfishcode I'm not sure what you're trying to get at. C++ or not isn't relevant here: if we translate B3 to WebAssembly then keeping chill would be useful. What's the purpose of trying to find a C++ codebase which relies on it? Is it to evaluate how important this is compared to other features?

You're right, Binaryen should consider other languages too. In other languages, throwing a specific exception (eg. Java, C#, Python, etc.) seems at least as popular, if not more so, than silently returning 0, so for maximum convenience, perhaps Binaryen should have a way to call a handler function on divide by zero and remainder by zero.

I am also curious about the question for C/C++ regardless.

I was only half-kidding about splitting up div and rem of 0. Why should those be merged as opposed to split? And together with what @sunfishcode is saying, perhaps another perspective is that there is the float2int stuff, and everything else (as there may be other things than div/rem of 0). So we could have an option for float2int specifically and a flag for everything (which is what just landed)?

A further question is whether or not to break up float2int into 4 options of f32/f64 * signed/unsigned. It seems like if splitting float2int from int-div/rem helps, then this could help more, as codebases may just have traps in one of the float2int ops of the 4.

Recent confusion among users suggests to me that 'allow' is a better default. Details in #5134.

This issue has been automatically marked as stale because there has been no activity in the past 2 years. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kripken picture kripken  路  4Comments

rpellerin picture rpellerin  路  3Comments

JCash picture JCash  路  3Comments

lokpoi888 picture lokpoi888  路  4Comments

nerddan picture nerddan  路  4Comments