Looks like there are some open questions regarding the semantics of shufflevector intrinsics in LLVM:
When the mask is out of bounds, the current semantics are that the resulting element is undef. If shufflevector is used to remove an element from the input vector, then the instruction cannot be removed, as the input might have been poison. The solution is to switch to give poison instead.
So my question is, what are the semantics of those intrinsics in MIR? We had some discussion on Zulip and it seems like it should be one of the two:
The former seems definitely the safest ;) but currently this is not enforced. @hanna-kruppe was concerned that stdarch might rely on OOB shufflevector somewhere. If we have consensus that for now, we should statically rule out OOB indices to avoid UB/uninit here, I guess the main open issue is (a) implementing that check and (b) fixing the fallout, if any.
Cc @rust-lang/lang @rust-lang/libs @gnzlbg @Lokathor (not sure who else to ping for stdarch)
Another option is that OOB values wrap to be inbounds.
I think that we should use static denial of bad values, but wrapping is at least _possible_ I suppose.
If the indices are always known statically then yes, we could statically wrap them.
I mentioned it because that's how intel shuffles are defined to work by intel. The intrinsic looks at particular bits within the control value and ignores the rest of the bits, so they end up being wrapping.
I'm not yet familiar enough with ARM intrinsics to say how the hardware handles it.
We can restrict attention to constant shuffle indices. Architectures with "variable" shuffles exist but those capabilities are accessed by separate intrinsics (certainly today, and I think this makes sense in the long term too). However, I don't see any reason to build wrapping into the intrinsics. Hardware behavior is not a good reason IMO, since:
shufflevector operation is different from and quite a lot more flexible than many real shuffle instructions, so there's no 1:1 correspondence anyway.In addition, although I just said we can focus on constant shuffle indices, I do think it would be nice to have some consistency between the intrinsics with constant indices and those with variable indices (current or future), which is easier to achieve with "hard error"/UB semantics than with implicit wrapping.
I mentioned it because that's how intel shuffles are defined to work by intel. The intrinsic looks at particular bits within the control value and ignores the rest of the bits, so they end up being wrapping.
That is not immediately helpful though as LLVM makes this undef/poison. We'd still have to specify and perform the wrapping ourselves.
Architectures with "variable" shuffles exist but those capabilities are accessed by separate intrinsics (certainly today, and I think this makes sense in the long term too).
Do we support something like that in Rust already, and are there any open questions around OOB semantics?
Do we support something like that in Rust already, and are there any open questions around OOB semantics?
I am not sure. Certainly LLVM does not have a target-independent representation of such shuffles today, only target-specific intrinsics whose semantics may be unclear but probably follows the hardware in question (which generally have some defined behavior for those cases, e.g. ARM SVE and RISC-V V make those elements zero). If a target-independent representation is ever introduced, I expect the preferable behavior for LLVM will be "poison in the lanes where the index is OOB" and for Rust/MIR, anything that's at least as strict should be fine.
So I'm not familiar with LLVM's details here, is it the case that if a shuffle constant is "OOB" then _all_ SIMD lanes become poison after the operation?
No, it is lane-wise undef (at least according to the quote in the first comment; I haven't verified this).
Well, I guess I'm a little lost on when you could have one lane be OOB but another not be OOB.
For example, _mm_shuffle_ps uses two bits per lane index and lets you pick from among 4 lanes when shuffling. It's fairly plain that a two bit value can't be OOB within a 4 lane selection. You could of course set bits above the 8th bit, but that doesn't affect the bits used to pick the source lanes for output lanes 0 through 3.
So am I just looking at the wrong intrinsic when trying to come up with an example? Is there a different intrinsic I should be looking at? or put another way, could you show an example call to any intrinsic with a partial OOB control value?
This is about the simd_shuffleN (for N = 2, 4, 8, ...) intrinsics, which is used in the implementation of many core::arch shuffle/permute intrinsics (including _mm_shuffle_ps). Like I said before, simd_shuffleN is a fair bit more general than many specific instructions (such as shufps), representing the shuffle indices as N i32 constants. See the example linked in the first post for a small self-contained example.
Oops, the example I constructed is actually wrong because I forgot about the consecutive numbering of lanes of the two input vectors. Indices 4 and 5 (in that example) aren't OOB, they just refer to the other input vector. Indices that are actually OOB are already rejected: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=c44648853090164522811e89c4937abc
Do we have tests that ensure this? I could not find any.
Most helpful comment
Do we have tests that ensure this? I could not find any.