Design: Request for opcode: dup

Created on 16 Aug 2020 · 25Comments · Source: WebAssembly/design

Hi,

I would like to request a webassembly opcode: dup. The semantics would be that the top of stack is duplicated.

Regards,
Windel

Source

windelbouwman

❤1 👍1

Most helpful comment

Besides the typing issue discussed above, there's probably no downside in the sense you're thinking of. It takes a surprising amount of work and time to spec new instructions and get them implemented in every tool and engine out there, though, so generally only changes with significant benefits get all the way through the process. Unfortunately that means that there are a lot of "nice to have" proposals, even tiny ones like adding a single dup instruction, that don't make the cut. I'm not saying we'll never add dup, but if we do it will because it solves an important problem so lots of folks agree it's important to add and will be motivated to implement and maintain it throughout the ecosystem.

This is one of the costs of standards-based work; if Wasm were controlled by a single party, it would be easy to add a single instruction like dup. Since it's not, you first have to get a lot of different people with different priorities and opinions to agree that adding dup is both a good idea and worth their time and effort. Because of this extra consensus-building work, the community can have more confidence in the robustness and benefits of the proposals that do make it through the process.

tlively on 22 Feb 2021

👍3

All 25 comments

Hi @windelbouwman, I took the liberty of transferring this issue to the design repo, where spec proposals usually start. You can see the full process all changes would need to go through to become standardized here.

Can you explain the motivation for this proposal a bit more? dup can currently be emulated by a local.tee instruction followed by a local.get instruction, so as far as I can see this proposal would only be a code size improvement. Do you have an estimate on what the code size savings could be if we introduced a dup instruction?

tlively on 16 Aug 2020

Not only the code size is improved, but also the readability of the code, the speed of the direct interpretation, also there will be no need of a local variable (that further obscures the simple operation of a duplication). This dup instruction is discussed in few other issues, maybe someone can link them. It is really interesting that instructions like clz ctz found their place into the basic WASM byte code instruction set, but not dup and i32.not_equal_z.

verbessern on 17 Aug 2020

Hi!

Thanks for moving this request. I understand the request for a concrete use case for this instruction.

The specific use case where this would be handy, is when implementing a for loop. As a gimmick, I tried to make a python -> wasm compiler, and when implementing the for loop, it would be handy to leave the loop index value on the stack. Now I solved this by creating a new local per for loop, but this results potentially in sloppy code, when for loops are not nested. Some extra logic is required to generate the least amount of locals for the for loops.

Exact spot of the use case is here:

https://github.com/windelbouwman/corepython/blob/master/corepython/src/compile.rs#L205

This is one use case, but I can imagine there are more use cases.

Notice also the JVM has a dup instruction: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.dup

It might be worth noting that the JVM machine and webassembly machine are pretty similar, so there might be more ideas to copy from the JVM.

Hope this clarifies the request a bit.

Regards,
Windel

windelbouwman on 20 Aug 2020

Do you have an estimate on what the code size savings could be if we introduced a dup instruction?

No, I do not. Interesting question though, how would one create such an estimation?

I can see compilers exploiting this instruction, and spec implementations implementing this pretty straight forward.

windelbouwman on 20 Aug 2020

The function-references proposal introduces a let construct that provides the ability to implement dup. In essence, it allows you to create new locals from stack values. Some have also suggested a pick instruction, (where dup is equivalent to pick 0). This doesn't give you quite what you want though, since you still may need to do some kind of rotate or swap to get the appropriate stack layout.

binji on 20 Aug 2020

I remember discussing with @rossberg a few years ago that the typing of syntactically dead code would become much more interesting if a dup instruction of type [t] -> [t,t] were to be introduced, because one would have to ensure that multiple uses of the same "unknown type" stay consistent with each other.

e.g. the code fragment

i32.clz
drop
i64.clz

is well-typed, but the code fragment

dup
i32.clz
drop
i64.clz

would be ill-typed. But this requires some more sophisticated book-keeping than has been necessary up to this point,

The smallest solution might be to require that dup is explicitly type-annotated. At a quick glance, it looks like Java avoided this problem by being more restrictive in its validation of dead code, in a way that probably wouldn't work for Wasm (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.return).

conrad-watt on 20 Aug 2020

👀1

The function-references proposal introduces a let construct that provides the ability to implement dup. In essence, it allows you to create new locals from stack values. Some have also suggested a pick instruction, (where dup is equivalent to pick 0). This doesn't give you quite what you want though, since you still may need to do some kind of rotate or swap to get the appropriate stack layout.

You are right, a swap operation would also be very convenient. Can I upgrade my request to include both the swap and the dup operations? swap could arguably we called rotate, as is done in the python bytecode, but that might be confusing with the bit rotation operations.

windelbouwman on 21 Aug 2020

Still my primary use case for this would be implementation of a for loop without having to generate locals.

windelbouwman on 21 Aug 2020

I think that a combination of the validation logic that applies to local.tee + local.get should be the base for creating the validation logic of dup, by making the local implicit.

verbessern on 21 Aug 2020

Similar discussion in the interface-types proposal: https://github.com/WebAssembly/interface-types/issues/118

mikevoronov on 21 Aug 2020

@verbessern local.[tee/get] only avoid the issue I outlined above because these instructions explicitly refer to a type-annotated local. This is analogous to requiring an explicit type annotation for dup.

Another solution, avoiding the explicit type annotation, would be to weaken Wasm's validation so that syntactically dead code is no longer validated. i.e. introduce a ⊥ stack type and the typing rules

C ⊢ unreachable : t* -> ⊥
<similar for return, br, br_table>

C ⊢ e* : ⊥ -> t*

I've been told that there were extensive discussions over this during Wasm's initial design.

FWIW, the same issue applies to pick, but not swap.

conrad-watt on 21 Aug 2020

Ignoring any dead code would be a wonderful simplification to make, yes.

Failing that, add i32.dup and friends? :)

That wastes opcodes, but certainly it being polymorphic wasn't useful anyway.

aardappel on 21 Aug 2020

Ignoring any dead code would be a wonderful simplification to make, yes.

Failing that, add i32.dup and friends? :)

That wastes opcodes, but certainly it being polymorphic wasn't useful anyway.

It might be most natural to have it be one opcode with a static parameter, otherwise it wouldn't scale to reference/GC types. This slightly reduces the impact of code size savings though.

Also, in the scheme I gave above, dead code would still have to be parsed for "malformedness"/encoding errors. The spec's distinction between parse/validation-time errors has cropped up as a gotcha in tests before and this would become another example.

Flinging out 4 (really 3) ways forward:

Pursue dup with a static type parameter akin to a local declaration.
Pursue polymorphic dup, relaxing dead code validation (maybe arouses strong opinions?).
Don't pursue dup.
Pursue polymorphic dup as-is, requiring validators to handle some sort of ad-hoc parameterised type? (strongly against)

(also, all of the above but for pick; as @binji pointed out, it generalises dup so it might be preferable)

conrad-watt on 22 Aug 2020

One thing I didn't initially think about. For pick, option (1) is much less ideal, since the following would have to be a type error:

i32.pick 900
drop
i64.pick 900

In an archetypal single-pass validator like the OCaml reference implementation, typing the i32.pick 900 line would require creating ~900 elements at the head of the current stack type in one step. Having to think about typing dead code to this extent seems like a bad smell.

conrad-watt on 22 Aug 2020

I'm not convinced that the losing of the stack polymorphism is a good idea. I think its a nice thing to have.

verbessern on 22 Aug 2020

What's the advantage of stronger dead code validation? It's a corner case that shouldn't come up very often in actual wasm uses, right?

taralx on 22 Aug 2020

@taralx I believe the original motivation for the current approach to typing dead code was to ensure that a well-typed sequence of instructions can be arbitrarily split, and the resulting contiguous subsequences are guaranteed to be well-typed. I don't know if any tooling is relying on this.

@rossberg might have a first-hand memory of the decision-making process.

conrad-watt on 22 Aug 2020

Some browser vendors raised concerns at the time against allowing unvalidated code. I believe there were fears that it could become part of a potential attack vector, which is a reasonable concern. Another option that was discussed was disallowing dead code altogether. The rationale for the current design is summarised here.

The simplest solution would be a dup with a type annotation or a pick instruction with a stack type annotation of the appropriate depth. However, let is more general and can express all stack manipulation operators like pick, swap, or rotate (which you'll need otherwise).

rossberg on 24 Aug 2020

What about the use case of wanting to square or cube a number that is the result of the previous operation? One example is computing the distance between two points. It seems odd to have to put the result of x1 - x2 in a local variable and then get it twice in order to computing the square.

mvolkmann on 20 Feb 2021

Since WebAssembly is meant to be a compilation target rather than a convenient language to write by hand, it’s generally not a problem that certain code patterns require the use of locals. Using stack instructions instead of locals would not affect the performance of optimizing engines. The primary reasons to potentially add stack duplications instructions are to handle values that could not reasonably be stored in normal locals, such as non-nullable reference types (depending on your point of view).

tlively on 21 Feb 2021

I understand that WebAssembly is not typically hand written, but what is the down-side in making that practice easier/nicer by adding an instruction? Beyond dup there could be a sqr instruction to square a number since it is a relatively common operation.

mvolkmann on 21 Feb 2021

tlively on 22 Feb 2021

👍3

Thanks for explaining that Thomas!

mvolkmann on 22 Feb 2021

There may be a hidden benefit to not supporting a dup instruction: it makes reasoning with linear types a great deal easier. Of course, linear types is also a thing for the potential future.