This supersedes https://github.com/rust-lang/rust/issues/44367 - after https://github.com/rust-lang/rust/pull/47743 the unsoundness has been restricted to C FFI. The Rust ABI for vector types currently passes all types by memory.
There are sadly many C libraries that use vector types in their ABIs, some of which are pretty much "fundamental" like some of the short-vector math libraries: libmvec, SVML, etc.
As a summary of the previous issue, currently, the behavior of calling bar, `the following snippet of Rust code is _sometimes_ undefined:
#![feature(repr_simd, simd_ffi)]
#[repr(simd)]
struct F32x8(f32, f32, f32, f32, f32, f32, f32, f32);
impl F32x8 { fn splat(x: f32) -> Self { F32x8(x, x, x, x, x, x, x, x) } }
#[allow(improper_ctypes)]
extern "C" {
fn foo(x: F32x8) -> F32x8;
}
fn main() {
unsafe {
foo(F32x8::splat(0.)); // UB ?
}
}
When both the Rust program and the C library exposing foo are compiled with the same set of target-features such that their ABIs match, then the program above will work as expected.
When the C library is compiled with say AVX2, but the Rust program is compiled with SSE4.2, then Rust will try to pass the F32x8 in two 128-bit wide vector registers to C, while the C code only expects a single 256-bit wide vector. A similar problem occurs in the opposite case.
cc @rkruppe @parched @alexcrichton @eddyb - did I correctly represent the problem ?
A potential solution discussed in https://github.com/rust-lang/rust/issues/44367 would be to completely forbid vector types in FFI functions that do not specify their vector ABI:
extern "C" {
extern "vector-256" fn foo(x: F32x8) -> F32x8;
extern "vector-128" fn bar(x: F32x8) -> F32x8;
// fn baz(x: F32x8) -> F32x8; // ERROR: repr(simd) in C-FFI with unspecified vector ABI
}
let x = F32x8::splat(0.);
foo(x); // x is passed in a single 256-bit wide register
bar(x); // x is passed in two 128-bit wide registers
If the C library linked does not expose the specified ABIs for foo and bar, the program would fail to link, preventing undefined behavior.
If the C library linked does not expose the specified ABIs for foo and bar, the program would fail to link, preventing undefined behavior.
How do we expect linker to know about ABI? Target features or vector-ness is not encoded into symbols at all.
How do we expect linker to know about ABI? Target features or vector-ness is not encoded into symbols at all.
No idea, maybe this isn't possible at all.
Another issue here is how does the programmer know which ABI to use when writing a extern function declaration. The ABI of the C functions is up to whoever compiles the C library.
@gnzlbg the problems here look spot on, thanks for writing this up! I'm personally at a loss of how to solve it :(
Probably, the poorest possible solution to this problem would be to only enable the architecture specific vector types on C FFI, specify their ABI to be a single register, and only allow them when they are guaranteed to work. That is, the above example would look like this:
extern "C" {
#[cfg(target_feature = "avx")]
fn foo(x: __m256) -> _m256; // OK
#[cfg(target_feature = "sse")]
fn bar(x: __m128) -> __m128; // OK
fn baz(x: __m256); // ERROR IFF cfg!(target_feature="avx") is false
}
AFAICT this is guaranteed to work because the __m256 type is only available in C and C++ iff AVX is enabled at compile-time. This is not the case for the "portable packed vectors" extensions of the different C compilers which are always available.
Since the purpose of the C FFI is first and foremost to interface with C, and the architecture specific vector types are the SIMD types that most C libraries use on their APIs this would be enough to cover that use case.
If we ever figure out how to handle "portable packed vectors" (e.g. f32x8) in C FFI, I don't see any reasons why we couldn't relax this approach to enable them. In the mean time, those wanting to interface the portable packed vectors with C would just need to convert from/to the appropriate architecture-specific vector type in the Rust side of things.
Right now the architecture-specific vector types are just normal packed vector types. This might mean that we would need to make them "special".
I want to submit an RFC to fix this, I'd like feedback on the general approach (@parched @Gankro @alexcrichton @rkruppe ):
Some unknowns:
the layout of SIMD types in Rust is unspecified, I don't think this pre-RFC has to specify this in any way, the only thing it says is that we have to be able to know, in the Rust side, what's the C layout, so that we can insert shims when required
the pre-RFC uses extern "C" + #[target_feature] to control the C ABI - I don't know how I feel about this.
simd_ffiThe architecture-specific SIMD types provided in [core::arch] cannot currently
be used in C FFI. That is, Rust programs cannot interface with C libraries that
use these in their APIs.
One notable example would be calling into vectorized [libm] implementations
like [sleef], [libmvec], or Intel's [SVML]. The [packed_simd] crate
relies on C FFI with these fundamental libraries to offer competitive
performance.
Consider the following example
(playground):
extern "C" fn foo(x: __m256);
fn main() {
unsafe {
union U { v: __m256, a: [u64; 4] }
foo(U { a: [0; 4] }.v);
}
}
In this example, a 256-bit wide vector type, __m256, is passed to an extern
"C" function via C FFI. Is the behavior of passing __m256 to the C function
defined?
That depends on both the platform and how the Rust program was compiled!
First, let's make the platform concrete and assume that it follows the x64 SysV
ABI which states:
3.2.1 Registers and the Stack Frame
Intel AVX (Advanced Vector Extensions) provides 16 256-bit wide AVX registers
(%ymm0-%ymm15). The lower 128-bits of%ymm0-%ymm15are aliased to
the respective 128b-bit SSE registers (%xmm0-%xmm15). For purposes of
parameter passing and function return,%xmmNand%ymmNrefer to the same
register. Only one of them can be used at the same time.3.2.3 Parameter Passing
SSE The class consists of types that fit into a vector register.
SSEUP The class consists of types that fit into a vector register and can
be passed and returned in the upper bytes of it.
Second, in C, the __m256 type is only available if the current translation
unit is being compiled with AVX enabled.
Back to the example: __m256 is a 256-bit wide vector type, that is, wider than
128-bit, but it can be passed through a vector register using the lower and
upper 128-bits of a 256-bit wide register, and in C, if __m256 can be used,
these registers are always available.
That is, the C ABI requires two things:
__m256 via a 256-bit wide registerfoo has the #[target_feature(enable = "avx")] attribute !And this is where things went wrong: in Rust, __m256 is always available
independently of whether AVX is available or not1, but we haven't specified how we are
actually compiling our Rust program above:
if we compile it with AVX globally enabled, e.g., via -C
target-feature=+avx, then the behavior of calling foo is defined because
__m256 will be passed to C in a single 256-bit wide register, which is what
the C ABI requires.
if we compile our program without AVX enabled, then the Rust program cannot
use 256-bit wide registers because they are not available, so independently of
how __m256 will be passed to C, it won't be passed in a 256-bit wide
register, and the behavior is undefined because of an ABI mismatch.
1: its layout is currently unspecified but that
is not relevant for this issue since if 256-bit registers are not available they
cannot be used anyways, which is what matters here.
So, first of all, is this a big deal?
Currently, one cannot use SIMD types in C FFI in stable Rust, so technically,
nothing is broken yet, and no, this is not a big deal: stable Rust is still
safe! However, we would like to be able to call C FFI functions without
introducing undefined behavior independently of which -C target-features are
passed, so the example code shown above has to be rejected by the compiler.
Second, you might be wondering: why is __m256 available even if AVX is not
available? That's a good question and the answer is probably that nobody thought
about this much, and we didn't have the proper tools for this back then anyways.
Ideally, one should only be able to use __m256 and operations on it if AVX
is available. Which leads to how can we fix this ?
The most trivial solution would be to just always require
#[target_feature(enable = X)] in C FFI functions using SIMD types, where
"unblocking" the use of each type requires one or two particular feature to be
enabled, e.g., avx or avx2 in the case of __m256.
That is, the compiler would reject the example above with an error:
error[E1337]: `__m256` on C FFI requires `#[target_feature(enable = "avx")]`
--> src/main.rs:7:15
|
7 | fn foo(x: __m25a6) -> __m256;
| ^^^^^^^
And the following program would always have defined behavior
(playground):
#[target_feature(enable = "avx")]
extern "C" fn foo(x: __m256) -> __m256;
fn main() {
unsafe {
union U { v: __m256, a: [u64; 4] }
if is_x86_feature_detected!("avx") {
foo(U { a: [0; 4] }.v);
}
}
}
Note here that:
extern "C" foo is compiled with AVX enabled, so foo takes an __m256foo is guarded with an is_x86_feature_detected, that is, fooAVX is available at run-timeAVX, Rust will insert shims in thefoo to pass it as a 256-bit register. Rust already does this, and#[target_feature] is what allows it to do it. Without the#[target_feature] annotation, Rust does not know that C expects this. @gnzlbg requiring the correct #[target_feature] for any extern function imported sounds like a great solution to me, and I'd be totally down for supporting that.
FWIW allowing __m256 anywhere in a Rust program is intentional because we want to all some parts of the program to use it and other parts to not use it (e.g. just some functions have avx enabled). You could otherwise phrase this as "we don't have great infrastructure for conditionally only allowing it in some parts of the program and not others", alas!
I've submitted an RFC: https://github.com/rust-lang/rfcs/pull/2574
@rust-lang/wg-triage This is a soundness issue, I believe it should have the appropriate label added.
Most helpful comment
@gnzlbg requiring the correct
#[target_feature]for anyexternfunction imported sounds like a great solution to me, and I'd be totally down for supporting that.FWIW allowing
__m256anywhere in a Rust program is intentional because we want to all some parts of the program to use it and other parts to not use it (e.g. just some functions haveavxenabled). You could otherwise phrase this as "we don't have great infrastructure for conditionally only allowing it in some parts of the program and not others", alas!