Design: Proposal: Fp IEEE compliance level flags for Wasm (FP-Fast-Math for Wasm Scalar & SIMD)

Created on 12 Jan 2021  路  7Comments  路  Source: WebAssembly/design

Native compilers like gcc, clang, msvc allow developers to set fp IEEE compliance levels through pragmas or compile-time flags like /fp(strict-fast), ffast-math, -ffp-contract, etc. These flags are beneficial for a wide range of applications (e.g. ML-convolutions, DSP, low precision graphics/physics engines..) where developers prefer the trading off portable-precision in favor of performance.

The performance gains can be significant depending on the specific settings and compilers used. These developer hints/preferences allow compilers to safely perform more Fp math optimizations, better instruction selection(fusing/fma), value-range restrictions and relaxed validations.

Currently, these flags when specified by developers for Wasm are consumed by the developer toolchain and they may honor a few depending on the specific tool used (e.g. https://github.com/WebAssembly/binaryen/pull/3155). These preferences information will be discarded are not available/visible for the runtimes if they desire to use them.

Wasm runtimes can benefit from having the means to access these developer flags/hints to make their own decisions on optimization and instruction selection when it's safe to do so. This will be particularly useful to perform additional runtime optimizations, especially in AOT wasm compilers. This also helps to address a few of the known performance concerns in FP Wasm SIMD codegen like rounding, min/max etc. One a high level, this will allow runtimes not to be dependent on developer tools for certain FP optimizations and removes a blocker for Wasm to track native performance more closely.

There is the precedent of JVM 1.2 relaxing IEEE compliance as the default mode and introducing 'strictfp' modifier to ensure portability in a class/interface/method granularity. There is the opportunity to explore a more backward-compatible approach for Wasm.

I would like to propose a mechanism to encode fp IEEE compliance flags in the Wasm binary to be consumed by the runtime engines. As is the case in native languages, the flags themselves can be treated as optional and their use can be a choice of the runtimes. The impact will manifest as improved performance, consistent semantics on a given platform, and lesser platform portability. The proposed mechanism will enable unambiguously marking code sections within a Wasm binary with these preferences/hints at the granularity of a block of instructions.

The design can be ironed out in detail as we proceed, One option is to introduce a new custom section with entries marking preferences and code segment offsets and another option is to introduce a new instruction to mark code segments with these developer preferences like in JVM. The specific flags to support can be discussed and incorporated as we proceed (e.g. -fp-finate-math-only, -fp-no-signed-zeros..)

This mechanism will complement the discussions to add Scalar/Vector FMA, and FP approximation instructions to Wasm and/or SIMD spec.

This issue is to track the interest in this topic and to discuss this in the CG sync.

Most helpful comment

Typical projects which use gcc/clang's -ffast-math are compiling to native code. Application developers using them typically test all the native code variants that they themselves build. And since all popular hardware ISAs have fully deterministic floating-point behavior, once the developers have tested those variants, they can be fairly sure that the behavior won't change for their users. Currently, WebAssembly's floating-point works this way too; it's fully deterministic, just like hardware ISAs, so existing long-standing developer assumptions about being able to add -ffast-math and test that it "works for them" are upheld.

A WebAssembly-level -ffast-math flag would mean that WebAssembly no longer resembles a hardware ISA in this regard, and no longer upholds these assumptions. Developers would test the binary they produce, but when shipping wasm to their users, their users should expect to be able to run it on different hardware or different engines. With fast-math-like flags at the WebAssembly level, WebAssembly wouldn't behave like an ISA, and users could see different floating-point results than the developer tested with.

It's also worth pointing out that projects using these flags aren't missing out when compiling to WebAssembly today. For example, many of the optimizations enabled by -fassociative-math are loop-oriented optimizations that LLVM is able to do before producing WebAssembly.

All 7 comments

Since the strictness of fp operations affects the semantics of a program, I would expect that the best way to make more permissive semantics possible would be to introduce new instructions rather than a custom section or block construct. We already have the spec mechanisms necessary to handle float nondeterminism due to our NaN propagation rules, so I expect it would be straightforward to introduce new floating point instructions that allow for more possible results.

As additional context, the JVM, for its part, is considering removing strictfp and only supporting the strict semantics.

the best way to make more permissive semantics possible would be to introduce new instructions rather than a custom section or block construct.

Introducing new instructions are useful, but it will not scale too well or remove the toolchain dependency completely. The instructions with good platform support like fma, reciprocal etc are good candidates for new instruction addition. There is considerable variety in hints/flags offered by native compilers.
-fassociative-math -ffast-math -fno-honor-nans -ffinate-math-only -fdenormal-fp-math -fno-strict-float-cast-overflow -fno-math-errno -fno-trapping-math ...
Most of these are hints to allow more aggressive fp optimizations lifting restrictions on algebraic transformations, nans, signed zeros, traps, rounding etc. Expressing all the useful flags as new instructions is not be ideal imo.

As additional context, the JVM, for its part, is considering removing strictfp and only supporting the strict semantics.

Thanks @sunfishcode for this added context! I didnt know about this new update and seems like the motivation is to consolidate math library variants. On a closer look Java strict-fp appear to be a bit more specialized and is not a good representation of the range of control points offered by gcc/clang. -ffast-math flags and associated flags seem to be quite popular in native repos from a quick github search. I plan to look into the uses of the above flags and the associated pargmas

Typical projects which use gcc/clang's -ffast-math are compiling to native code. Application developers using them typically test all the native code variants that they themselves build. And since all popular hardware ISAs have fully deterministic floating-point behavior, once the developers have tested those variants, they can be fairly sure that the behavior won't change for their users. Currently, WebAssembly's floating-point works this way too; it's fully deterministic, just like hardware ISAs, so existing long-standing developer assumptions about being able to add -ffast-math and test that it "works for them" are upheld.

A WebAssembly-level -ffast-math flag would mean that WebAssembly no longer resembles a hardware ISA in this regard, and no longer upholds these assumptions. Developers would test the binary they produce, but when shipping wasm to their users, their users should expect to be able to run it on different hardware or different engines. With fast-math-like flags at the WebAssembly level, WebAssembly wouldn't behave like an ISA, and users could see different floating-point results than the developer tested with.

It's also worth pointing out that projects using these flags aren't missing out when compiling to WebAssembly today. For example, many of the optimizations enabled by -fassociative-math are loop-oriented optimizations that LLVM is able to do before producing WebAssembly.

Yes, any decisions that affect float precision/behavior should be baked into the Wasm module by the tools. New instructions can be added for behavior that cannot be expressed with the current ones.

We had a discussion on this topic at the last CG meeting with the new instruction addition as the alternative means to the goal. There seems to be a general interest in solving this through the latter direction. I don't have objections if developer expectations on performance gains can be upheld by toolchain optimizations and introducing missing instruction variants. In that case, there is no good need to justify propagating these flags to the runtime compromising Wasm's level of abstraction. It will be good to understand the semantic variant instructions that are necessary to reach this goal. We have a few new instructions identified in the context of SIMD which may need to be extended to match with the instructions tools needs to express fast-math flags fully. Will continue looking in that direction.

Thanks for the feedback.

Content used for CG discussion. Wasm ffast-math.pptx

Was this page helpful?
0 / 5 - 0 ratings

Related issues

void4 picture void4  路  5Comments

nikhedonia picture nikhedonia  路  7Comments

beriberikix picture beriberikix  路  7Comments

dpw picture dpw  路  3Comments

frehberg picture frehberg  路  6Comments