Design: Proposal: Variable Length SIMD

Created on 6 Nov 2017  路  12Comments  路  Source: WebAssembly/design

The normal SIMD standard is likely too far along but I'd like for people to consider the option of including a vector length parameter with the SIMD arguments. This would eliminate the need to add op-codes for 128, 256, and 512 bit vectors. Non-standard sized vector lengths could be emulated efficiently in the interpreter itself.

Example:
SIMD.ADD 5:

  • Pop 5 values off the stack for vector 1
  • Pop 5 values off the stack for vector 2
  • Add the vectors
  • Push the 5 value result onto the stack

Most helpful comment

I think @JohnSully is unfairly being mocked in this thread. The recently published ARM Scalable Vector Extension scales automatically across all vector lengths without recompilation. The Risc-V vector extension proposal also offers the same kind of model ... https://github.com/riscv/riscv-v-spec

This being said, the vector model of old, though it has clear benefits, has also problems of its own. Try to do a vectorized JSON parser (https://github.com/lemire/simdjson) with this model... could be possible, but it brings about difficulties.

All 12 comments

FYI, there's a SIMD design repo here: https://github.com/WebAssembly/simd .

Generally, WebAssembly's instructions map one-to-one with CPU hardware features. I'm not aware of any CPU that supports variable-length SIMD...

It's actually the way SIMD was initially implemented in machines such as the CDC Star. In anycase WASM will run on machines with varying capabilities. It would be nice to have code work as efficiently on AVX-512 machines as it would on smaller NEON based ARM devices.

Interesting, I didn't know about that. Are you writing a compiler for that 42-year old CPU as a research project?

Understanding the history is an important part of understanding why SIMD operations are organized the way they are. The Cray 1 went to fixed length SIMD because it was a register machine and required limited sizes for that reason. WASM is not a register machine but a stack machine. A stack machine need not have the limitations of a register machine.

Wasm is really both a stack machine and a register machine. Locals are meant to be used by compilers more-or-less as registers are; and implementations typically treat them that way too (allocate them to registers, spilling as necessary). It seems unlikely that we'd want to have an entire feature class that can be expressed only via the stack-machine nature of wasm. You could of course argue that the multi-value proposal is a step in that direction, but it's a pretty small extension compared to gating access to all of SIMD on a toolchain's ability to efficiently use the stack.

The Cray 1's SIMD was variable-length; registers had a maximum length, but the machine could use any length up to that maximum. Today's AVX152, ARM SVE, and even GPUs could theoretically support similar programming models. So, a true "variable-length SIMD" feature is not entirely implausible, though it would face some big questions (Who would use it? Who would implement it? Is it better than alternatives?).

That said, the example of SIMD.ADD 5 isn't variable-length -- the length is an immediate, which is quite different.

Concerning the stack-machine nature of wasm, consider code like this:

  SIMD.ADD 5
  iconst 1
  SIMD.ADD 5

Engines would need to detect when things like this happen, as they'd need to generate shuffle code on all modern architectures. It seems simpler for engines, and humans too, if SIMD values are first-class and shuffles are explicit operations.

The other obvious advantage of a stack-oriented design would to make non-power-of-two vector sizes more convenient. As a reference point, the current wasm SIMD proposal used to contain some explicit support for 3-element vectors (load3 and store3), however it was removed because these features are somewhat complex to implement on popular architectures (they interact with common sandboxing techniques), and because there wasn't much perceived demand for them. So at least on the current path, non-power-of-two sizes doesn't seem a high priority.

Agreed with @dschuff that a stack based interface is not desirable. Rather, if you'd have this feature, it would be mapped to N SIMD registers, in this case e.g. 2 SSE2 registers or a single AVX register.

I personally think having free-form variable length SIMD would be rather elegant.. but is also a bit high-level, in the sense that it would require a lot of wasm-specific code generation in language backends to make full use of it.

Does anyone know why variable-length SIMD architectures fell out of favor? Understanding that might be informative in deciding whether we should add variable-length SIMD to Wasm.

Declared bit scope facilitates selection of appropriate host machine implementation AND allows composition of sequential logical operation in the same hw operation.

@eholk

Original vector machines had a close match of CPU speed with Memory speed so they loaded values directly from memory. When CPU speed started to outrun memory the fix was to use registers which had faster access times.

WASM doesn't know how many registers its target will have, its better to let the underlying implementation decide how to map to registers as is done with scalars now.

I think @JohnSully is unfairly being mocked in this thread. The recently published ARM Scalable Vector Extension scales automatically across all vector lengths without recompilation. The Risc-V vector extension proposal also offers the same kind of model ... https://github.com/riscv/riscv-v-spec

This being said, the vector model of old, though it has clear benefits, has also problems of its own. Try to do a vectorized JSON parser (https://github.com/lemire/simdjson) with this model... could be possible, but it brings about difficulties.

Definitely, RISCV Vector ISA is gaining momentum and have to be supported at some point.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

frehberg picture frehberg  路  6Comments

dpw picture dpw  路  3Comments

Artur-A picture Artur-A  路  3Comments

JimmyVV picture JimmyVV  路  4Comments

chicoxyzzy picture chicoxyzzy  路  5Comments