Rust: Reject OOB shufflevector intrinsics

Created on 20 Jun 2020 · 12Comments · Source: rust-lang/rust

Looks like there are some open questions regarding the semantics of shufflevector intrinsics in LLVM:

When the mask is out of bounds, the current semantics are that the resulting element is undef. If shufflevector is used to remove an element from the input vector, then the instruction cannot be removed, as the input might have been poison. The solution is to switch to give poison instead.

So my question is, what are the semantics of those intrinsics in MIR? We had some discussion on Zulip and it seems like it should be one of the two:

OOB shufflevector should be statically ruled out. We could even have the new MIR validation pass check that.
OOB shufflevector returns "uninitialized memory" for the affected elements. (Following this paper, MIR does not have a poison vs undef distinction; our "uninit" is closest to LLVM's poison.)

The former seems definitely the safest ;) but currently this is not enforced. @hanna-kruppe was concerned that stdarch might rely on OOB shufflevector somewhere. If we have consensus that for now, we should statically rule out OOB indices to avoid UB/uninit here, I guess the main open issue is (a) implementing that check and (b) fixing the fallout, if any.

Cc @rust-lang/lang @rust-lang/libs @gnzlbg @Lokathor (not sure who else to ping for stdarch)

A-LLVM A-mir A-simd E-needs-test

Source

RalfJung

Most helpful comment

Do we have tests that ensure this? I could not find any.

RalfJung on 20 Jun 2020

👍2

All 12 comments

Another option is that OOB values wrap to be inbounds.

I think that we should use static denial of bad values, but wrapping is at least _possible_ I suppose.

Lokathor on 20 Jun 2020

If the indices are always known statically then yes, we could statically wrap them.

RalfJung on 20 Jun 2020

I mentioned it because that's how intel shuffles are defined to work by intel. The intrinsic looks at particular bits within the control value and ignores the rest of the bits, so they end up being wrapping.

I'm not yet familiar enough with ARM intrinsics to say how the hardware handles it.

Lokathor on 20 Jun 2020

We can restrict attention to constant shuffle indices. Architectures with "variable" shuffles exist but those capabilities are accessed by separate intrinsics (certainly today, and I think this makes sense in the long term too). However, I don't see any reason to build wrapping into the intrinsics. Hardware behavior is not a good reason IMO, since:

The general shufflevector operation is different from and quite a lot more flexible than many real shuffle instructions, so there's no 1:1 correspondence anyway.
The rustc intrinsics in question are purely internal and perma-unstable, so if index wraparound is desirable in code targeting a particular platform, it can just as well be built into the core::arch functions and other wrappers, without affecting all other platforms.

In addition, although I just said we can focus on constant shuffle indices, I do think it would be nice to have some consistency between the intrinsics with constant indices and those with variable indices (current or future), which is easier to achieve with "hard error"/UB semantics than with implicit wrapping.

hanna-kruppe on 20 Jun 2020

I mentioned it because that's how intel shuffles are defined to work by intel. The intrinsic looks at particular bits within the control value and ignores the rest of the bits, so they end up being wrapping.

That is not immediately helpful though as LLVM makes this undef/poison. We'd still have to specify and perform the wrapping ourselves.

Architectures with "variable" shuffles exist but those capabilities are accessed by separate intrinsics (certainly today, and I think this makes sense in the long term too).

Do we support something like that in Rust already, and are there any open questions around OOB semantics?

RalfJung on 20 Jun 2020

Do we support something like that in Rust already, and are there any open questions around OOB semantics?

I am not sure. Certainly LLVM does not have a target-independent representation of such shuffles today, only target-specific intrinsics whose semantics may be unclear but probably follows the hardware in question (which generally have some defined behavior for those cases, e.g. ARM SVE and RISC-V V make those elements zero). If a target-independent representation is ever introduced, I expect the preferable behavior for LLVM will be "poison in the lanes where the index is OOB" and for Rust/MIR, anything that's at least as strict should be fine.

hanna-kruppe on 20 Jun 2020

So I'm not familiar with LLVM's details here, is it the case that if a shuffle constant is "OOB" then _all_ SIMD lanes become poison after the operation?

Lokathor on 20 Jun 2020

No, it is lane-wise undef (at least according to the quote in the first comment; I haven't verified this).

hanna-kruppe on 20 Jun 2020

Well, I guess I'm a little lost on when you could have one lane be OOB but another not be OOB.

For example, _mm_shuffle_ps uses two bits per lane index and lets you pick from among 4 lanes when shuffling. It's fairly plain that a two bit value can't be OOB within a 4 lane selection. You could of course set bits above the 8th bit, but that doesn't affect the bits used to pick the source lanes for output lanes 0 through 3.

So am I just looking at the wrong intrinsic when trying to come up with an example? Is there a different intrinsic I should be looking at? or put another way, could you show an example call to any intrinsic with a partial OOB control value?

Lokathor on 20 Jun 2020

This is about the simd_shuffleN (for N = 2, 4, 8, ...) intrinsics, which is used in the implementation of many core::arch shuffle/permute intrinsics (including _mm_shuffle_ps). Like I said before, simd_shuffleN is a fair bit more general than many specific instructions (such as shufps), representing the shuffle indices as N i32 constants. See the example linked in the first post for a small self-contained example.

hanna-kruppe on 20 Jun 2020

👍1

Oops, the example I constructed is actually wrong because I forgot about the consecutive numbering of lanes of the two input vectors. Indices 4 and 5 (in that example) aren't OOB, they just refer to the other input vector. Indices that are actually OOB are already rejected: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=c44648853090164522811e89c4937abc

hanna-kruppe on 20 Jun 2020

Do we have tests that ensure this? I could not find any.

RalfJung on 20 Jun 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings