Rust: Tracking issue for stable SIMD in Rust

Created on 26 Feb 2018  Â·  70Comments  Â·  Source: rust-lang/rust

This is a tracking issue for RFC 2325, adding SIMD support to stable Rust. There's a number of components here, including:

The initial implementation of this is being added in https://github.com/rust-lang/rust/pull/48513 and the next steps would be:


Known issues

  • [ ] [is_target_feature_detected! takes different arguments that #[target_feature]](https://github.com/rust-lang-nursery/stdsimd/issues/348)
B-RFC-approved C-tracking-issue T-lang T-libs final-comment-period

Most helpful comment

Procedural note: this tracking issue has gained ~40 comments about leading underscores in the space of less than 24 hours. Many of the posts appear to be re-iterating points that have already been made earlier in the thread. Before commenting, please consider whether your argument (or something very similar) has already been made.

All 70 comments

My one request for the bikeshed (which the current PR already does and may be obvious, but I'll write it down anyway): Please ensure they're not all in the same module as things like undefined_behaviour and [un]likely, so that those rust-defined things don't get lost in the sea of vendor intrinsics.

What will be the story for external LLVM? (lacking MCSubtargetInfo::getFeatureTable())

@scottmcm certainly! I'd imagine that if we ever stabilized Rust-related intrinsics they'd not go into the same module (they probably wouldn't even be platform-specific).

@cuviper currently it's an unresolved question, so if it doesn't get fixed it means that using an external LLVM would basically mean that #[cfg(target_feature = ...)] would always expand to false (or the equivalent thereof)

I'd imagine that if we ever stabilized Rust-related intrinsics they'd not go into the same module (they probably wouldn't even be platform-specific).

One option raised in the RFC thread (that I personally quite like) was stabilizing std::intrinsics (only the module), keep the stable rust intrinsics in that module (they can already be imported from that location due to a long-standing bug in stability checking) and put these new platform-specific intrinsics in submodules. IIUC this would also satisfy @scottmcm's request.

To be explicit, under that plan the rustdoc page for std::intrinsics would look like this:


Modules

  • x86_64
  • arm
  • ...

Functions

  • copy
  • copy_nonoverlapping
  • drop_in_place
  • ...

Another naming idea I've just had. Right now the feature detection macro is is_target_feature_enabled!, but since it's so target specific it may be more apt to call it is_x86_target_feature_enabled!. This'll make it a pain to call on x86/x86_64 though which could be a bummer.

Why keep all the leading underscores for the intrinsics? Surely even if we keep the same names as what the vendors chose, we can still remove those signs, right?

The point is to expose vendor APIs. The vendor APIs have underscores. Therefore, ours do too.

It is debatable that those underscores are actually part of the name. They only have one because C has no modules and namespacing, AFAICT.

I would be happy dropping the topic if it was discussed at length already, but I couldn't find any discussion specific to them leading underscores.

@nox https://github.com/rust-lang-nursery/stdsimd/issues/212 --- My comment above is basically a summary of that. I probably won't say much else on the topic.

@nox, @BurntSushi Continuing the discussion from there... since it hasn't been mentioned before:

Leading _ for identifiers in rust often means "this is not important" - so just taking the names directly from the vendors may wrongly give this impression.

@nox @Centril the recurring theme of stabilizing SIMD in Rust is "it's not our job to make this nice". Any attempt made to make SIMD different than what the vendors define has ended with uncomfortable questions and intrinsics that end up being left out. To that end the driving force for SIMD intrinsics in Rust is to get anything compiling on stable.

Crates like faster are explicitly targeted at making SIMD usage easy, fast, and ergonomic. The standard library's intrinsics are not intended to be widely used nor use for "intro level" problems. Leveraging the SIMD intrinsics is quite unsafe (due to target feature detection/selection) and can come at a high cost if used incorrectly.

Overall, again, the goal is to not enable ergonomic SIMD in Rust right now, but any SIMD in Rust. Following exactly what the vendors say is the easiest way for us to guarantee that all Rust programs will always have access to vendor intrinsics.

I agree that the leading underscores are a C artifact, not a vendor choice (the C standard reserves identifiers of this form, so that's what C compilers use for intrinsics). Removing them is neither "trying to make it nicer/more ergonomic" (it's really only a minor aesthetic difference) nor involves any per-intrinsic judgement calls. It's a dead simple mechanical translation for a difference in language rules, almost as much as __m128 _mm_foo(); is mechanically translated to fn _mm_foo() -> __m128;.

@rkruppe do we have a rock solid guarantee that no vendor will ever in the future add the same name without underscores?

@alexcrichton

@rkruppe do we have a rock solid guarantee that no vendor will ever in the future add the same name without underscores?

Can't speak for CPU vendors, but the probability seems very very low. Why would they add an intrinsic where the difference is only an underscore..? Further, as Rust's influence grows, they might not do this simply because of Rust.

A name like mm_foo (no leading underscore at all) is not reserved in the C language, so it can't be used for compiler-supplied extensions without breaking legal C programs. There are a few theoretical possibilities for a vendor to nevertheless create intrinsics without leading underscores:

  • they could expose it only in C++ (with namespacing) -- or, for that matter, another language that isn't C
  • they could break legal C programs (very unlikely, and I'll eat my hat if GCC or Clang developers accept this)
  • A future version of C adds some way of doing namespacing, and people start using it for intrinsics

All extremely unlikely. The first one seems like the only one that doesn't sound like science fiction to me, and if that happens we'd have other problems anyway (such intrinsics may use function overloading and other features Rust doesn't have).

It is debatable that those underscores are actually part of the name. They only have one because C has no modules and namespacing, AFAICT.

This. The whole point is that the underscore-leading names were chosen so as to specifically not clash with user-defined functions. Which means they should never be using non-underscore names. It's against well-established C conventions. Hence, we should just rename them to follow Rust conventions, with no real chance there will be any name clash in the future, providing the vendors stay sane and respect C conventions.

@Centril "probability seems very very low" is what I would say as well, but we're talking about stability of functions in the standard library, so "low probability" won't cut it unfortunately.

@rkruppe I definitely agree, yeah, but "extremely unlikely" to me says "follow the vendor spec to the letter and we can figure out ergonomics later".

Another point worth mentioning for staying exactly to the upstream spec is that I believe it actually boosts learnability. You'll have instant familiarity with any SIMD/intrinsic code written in C, of which there's already quite a lot!

If we stray from the beaten path then we'll have to have a section of the documentation which is very clear about defining the mappings between intrinsic names and what we actually expose in Rust.

I don't think renaming (no leading underscore or any other alteration) is useful. This is simply not the goal and only introduces pain points. I cannot think of a reason other than "i like that more" to justify that. It only introduces the possibility to naming clashes and "very very unlikely" is not convincing because we can prevent this 100% by not doing it altogether.

I think its the best choice to follow the vendor naming schema as close as possible and i think we should even break compatibility if we ever introduce an error in the "public API" without doing some renaming like _mm_intr_a to _mm_intr_a2 and start diverging the exact naming schema introduced by the vendor.

@alexcrichton But as @rkruppe said, removing the leading underscore isn't about ergonomics, it's about not porting C defects to Rust blindly.

Sorry for the double post, but I also want to add that arguing that a vendor may release an unprefixed intrinsic with the same name as a prefixed one is to me as hypothetical as arguing that bool may not be a single byte on some platform we would like to support.

@nox but why stop by the _? We could also fully rename the function with ps and pd into f32 and f64 which would be something "more Rust". Its somewhat arbitrary to just remove the leading underscore. And we could argue back and forth what is ergonomics and what isn't but i don't think there is a very good line to distinguish that to a point every body agrees.

@pythoneer Because the name is what the vendor decided, with a leading underscore because of nondescript limitations of C.

@nox and the explicit goal of stdsimd is to expose this (however defect) vendor defined interface.

@nox and the explicit goal of stdsimd is to expose this (however defect) vendor defined interface.

Interface, sure, but not necessarily the naming conventions!

@alexreg ps is also a naming convention, do you want that also to be changed?

@alexcrichton

"low probability" won't cut it unfortunately.

I think it should. This low probability isn't like 10% or even 1%, but like 0.00001% or so (yeah; I added a bunch of 0s, but I think it is justified). We can also make the probability 0% by notifying vendors of our naming convention so that they never add both _abc and abc.

@Centril I tend to agree... I mean, have you ever seen a version of the C stdlib or a compiler that defines intrinsics without using underscores to prefix their names? I haven't.

@Centril
Somehow I doubt vendors care what the Rust community thinks about how they should name their intrinsics. You think they're gonna keep a list of how everyone wants things to be named and follow that strictly?

Who cares if it has underscores? All this vendor specific stuff will just be wrapped by cleaner nicer to use APIs anyways. If you want it to have Rust-like names for your types, publish a crate with type aliases over the vendor intrinsic names, put it on your resume and call it a day.

There is also tons of documentation using the names those vendors created. Do we really want to create extra confusion and add the burden of having to maintain our own documentation on the SIMD intrinsics? Seems like there's a lot more useful things people could be working on than arguing over two characters preceding the type names... Like maybe... implementing SIMD.

@tdbgamer

Somehow I doubt vendors care what the Rust community thinks about how they should name their intrinsics.

All I can say is that we shouldn't underestimate the pull a language like Rust can have, especially as our reach grows.

@Centril
Sure, and if this were something intended for everyday users of Rust I'd agree that intuitive, rust specific names matter. But my understanding is that this isn't really intended to be used by everyone. This will need to be wrapped in a much more user-friendly library for us mortals to use regardless of how these types are named. The whole point of this is to just give people access to it now so we can start building more ergonomic libraries on top of it.

I want to emphasize again that I don't think there's any "ergonomics" gains or any "accessibility" or anything like that to be gained from removing the leading underscore from these identifiers. The names are far, far down the list of problems with this kind of intrinsic, and removing the underscore is only a minuscle change to the names. It would be solely a consistency thing.

And I'm increasingly unconvinced whether that's worth doing. Muscle memory might make it so that the kind of person most likely to use these intrinsics in Rust would be more annoyed than anything by the difference, even though it is completely mechanical.

Porting naming conventions simply does not make sense though.

but why stop by the _

Because you can google for mm_add_pi32 and it'll include _mm_add_pi32. If you google for _mm_add_f32 you get nothing.

But I do think overall that having the exact same name as https://software.intel.com/sites/landingpage/IntrinsicsGuide/ is a good plan.

@alexreg

have you ever seen a version of the C stdlib or a compiler that defines intrinsics without using underscores to prefix their names?

FWIW the Arm NEON instrinsics don't have an underscore but, AFAIK, they're only available if you include the appropriate header.

@alexcrichton Hypothetically speaking, if a vendor specified how they preferred the instrinsics to be modularized and named in Rust, would you be fine with that?

Underscore prefixes aren't just a convention, they are reserved by the C Standard for compilers and vendors to use. So using anything else for the vendors just straight up is considered breaking as those identifiers are not reserved:

1 Each header declares or defines all identifiers listed in its associated subclause, and
optionally declares or defines identifiers listed in its associated future library directions
subclause and identifiers which are always reserved either for any use or for use as file
scope identifiers.

— All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
— All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
— Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise (see 7.1.4).
— All identifiers with external linkage in any of the following subclauses (including the
future library directions) and errno are always reserved for use as identifiers with
external linkage.184)
— Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.

2 No other identifiers are reserved. If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.

Therefore this is clearly a just C artifact.

Thanks for providing the supporting evidence, @CryZe. I still strongly believe we shouldn't be porting a "C artifact", as you correctly point out.

Underscore prefixes aren't just a convention, they are reserved by ...

Do the historical details of why the underscores are there really matter?
We are implementing a well-known third-party interface and they are part of the spec, that's what is important.

The point is that the underscore prefixes are a namespacing scheme introduced in the C standard for the compilers and vendors. So for all intents and purposes, that‘s just how C mangles its namespaces into the identifiers.

I‘m definitely not entirely sure if it really does make sense to remove those prefixes in Rust, as all the C based documentation has them. However I also want to point out that when looking at C documentation you already have to do a certain amount of "translation" from the C documentation to actually using it in Rust, as there‘s not only syntactical differences, but also already differences in the type names, as seen in rust‘s libc crate: c_int, c_short, c_long, ...

So it wouldn‘t be unprecedented to remove the namespacing from the vendor instructions, as we 1. already have different names in libc for the type names at least and 2. underscore already has a different meaning prescribed to in Rust, which could lead to potential footguns where you forget to use a value, but it‘s not shown because you thought you are supposed to use an underscore prefix with these vendor instructions.

  1. underscore already has a different meaning prescribed to in Rust, which could lead to potential footguns where you forget to use a value, but it‘s not shown because you thought you are supposed to use an underscore prefix with these vendor instructions.

If it weren't for this, I'd be in favour of keeping them for documentation consistency since they're so low-level.

However, given all of the arguments on the topic of how harmless it would be to remove them, I think that, integrating properly with Rust's "meant to be unused" conventions is more important.

(After all, the consistency and comprehensiveness of these sorts of compile-time checks and lints are the reason Rust's strengths can't simply be retrofitted into established languages like C++ which need to retain backwards compatibility with their old design shortcomings.)

Procedural note: this tracking issue has gained ~40 comments about leading underscores in the space of less than 24 hours. Many of the posts appear to be re-iterating points that have already been made earlier in the thread. Before commenting, please consider whether your argument (or something very similar) has already been made.

I've posted a PR for renaming the is_target_feature_detected! macro.

@rfcbot fcp merge

While we merged this into the standard library relatively recently SIMD has been in the works for a very long time now and I think we're in a very good place to stabilize it. I'd like to propose that this tracking issue is stabilized, namely the std::arch module and the x86, x86_64 submodules. Language features stabilized here are things like #[target_feature], with the full list of things being stabilized still at the top of this issue

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

  • [x] @Kimundi
  • [x] @SimonSapin
  • [x] @alexcrichton
  • [x] @aturon
  • [x] @cramertj
  • [x] @dtolnay
  • [x] @eddyb
  • [x] @joshtriplett
  • [ ] @nikomatsakis
  • [x] @nrc
  • [x] @pnkfelix
  • [x] @scottmcm
  • [x] @sfackler
  • [x] @withoutboats

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot approved

@rfcbot reviewed

(Whoops)

:bell: This is now entering its final comment period, as per the review above. :bell:

Assuming the FCP progresses smoothly I've posted a PR implementing the stabilization here at https://github.com/rust-lang/rust/pull/49664

Will it be possible to migrate to something like #[target(feature="..")] and related macros in the future? I know it's a bit late to rename things, but it will be unfortunate if we'll end up with duplicated functionality.

Is cfg(target_feature) still supposed to be context sensitive? That is still a terrible idea, but it doesn't appear to be so on nightly.

@Zoxc you may wish to comment on https://github.com/rust-lang/rust/issues/42515, the issue dedicated to that.

@newpavlov at this point I think it's a bit late to hold up stabilization of SIMD on a pre-RFC, but if that ends up being stabilized we can always rename via deprecation.

The final comment period is now complete.

I don't know if it's too late to still tune things here, but the original RFC had two features that were changed during the discussion over there:

  • the submitted RFC put all intrinsics in std::arch::*, the revised RFC in std::arch::{arch_name}.
  • the submitted RFC used is_feature_detected! for run-time feature detection, the revised RFC uses is_{arch_name}_feature_detected!

The RFC was accepted before those changes were made. The changes were made in the RFC at the end of February, implemented at the beginning of March, and the FCP went through mid April. Right now we have ~2 month of experience with these changes

In any case, going through the RFC, I cannot pin point any concrete argument about why:

  • the intrinsics of each architecture should be in a different std::arch::{arch_name} module,
  • the architecture name should be part of the is_..._feature_detected! macros.

In particular, std::arch only contains one single module, the one of the current architecture, and that's it. Also, there is only one is_..._feature_detected! macro re-exported, the one of the current architecture.

These last-minute changes make it more painful than necessary to write code even for x86, where one has to:

#[target_feature(enabled = "sse3")]
unsafe fn foo() {
    #[cfg(target_feature = "x86")] use core::arch::x86::*;
    #[cfg(target_feature = "x86_64")] use core::arch::x86_64::*;
    /* ... */
}

all over the place, or at the top level, to avoid having to do this all over the place. Things don't get better when targeting multiple architectures. What before was horrible:

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "sse4.2"))] 
#[cfg_attr(any(target_arch = "arm", target_arch = "aarch64"), target_feature(enable = "neon"))] 
unsafe foo() {
    use core::arch::*;

     #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
         if is_feature_detected!("avx2") { ... } else { ... }
     }
     #[cfg(any(target_arch = "arm", target_arch = "aarch64"))] {
        if is_feature_detected!("crypto") { ... } else { ... }
     }  
}

now is worse:

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "sse4.2"))] 
#[cfg_attr(any(target_arch = "arm", target_arch = "aarch64"), target_feature(enable = "neon"))] 
unsafe foo() {
    #[cfg(target_arch = "x86")]  use core::arch::x86::*; 
    #[cfg(target_arch = "x86_64")]  use core::arch::x86_64::*
    #[cfg(target_arch = "arm")] use core::arch::arm::*;
    #[cfg(target_arch = "aarch64")] use core::arch::aarch64::*; 

     #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
         if is_x86_feature_detected!("crypto") { ... } else {  ... }
     }
     #[cfg(target_arch = "arm")] {
        if is_arm_feature_detected!("crypto") { ... } else { ... }
     }
     #[cfg(target_arch = "aarch64")] {
        if is_aarch64_feature_detected!("crypto") { ... } else { ... }
     }
}

This is particularly worrying if we want to add new "feature sets" for ergonomics like simd128 and simd256 since before the changes the above would just become:

#[target_feature(enable = "simd128")] 
unsafe foo() {
    use core::arch::*;
     if is_feature_detected!("crypto") { ... } else { ... }
}

I remember that to me they sounded like a potentially good idea back then, so I did not gave them more thought (I was more in the "I want SIMD now" mood). But now that the love story has faded and I've had the chance to use them a couple of times, I've clashed against them every single time:

Anyways, can somebody summarize why doing those two changes were a good idea?

In particular for the first change of putting the intrinsics in std::arch::{arch_name}, AFAIK we are never going to add more modules to std::arch because that would mean that the current code is being compiled for two archs at the same time, and in that case, one arch shouldn't be able to access the intrinsics of the other anyways. For the run-time feature detection macros, the benefits are smaller (but still there), since each arch has different intrinsics. But one idiom I would like to use is:

#[cfg(target_arch = "arm")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

#[cfg(target_arch = "aarch64")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

#[cfg(target_arch = "x86_64")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

fn foo() {
   if is_feature_detected("simd128") { bar() } else { fallback() }
}

and the named macros wouldn't allow that.


There are two ways of fixing this in a backwards compatible way:

  • re-exporting all of std::arch::{arch_name}::* via, e.g., std::arch::current::*
  • adding a is_feature_detected!("...") macro that dispatches to the named ones depending on the architecture.

So I don't think we should block landing this on these ergonomic issues. In any case, I don't feel I understand the real reasons behind the change, so maybe adding these conveniences defeats their purpose.


cc @alexcrichton @rkruppe @eddyb @hsivonen @BurntSushi @Ericson2314 (those who had opinions about this in the RFC)

@gnzlbg this was something I forgot about in the original RFC personally. In the standard library anything that isn't portable currently stylistically requires the "non portable part of it" to appear in the path you use it. For example Windows-specific functionality is at std::os::windows. Following suit for SIMD, architecture-specific intrinsics, was natural to place in submodules of std::arch as a warning that what you're using is indeed not portable and specific to only one platform.

The name of the macro was the same rationale, ensuring that you aren't tricked to thinking it can be invoked in a portable context but rather explicitly specifying that it's not portable.

In the standard library anything that isn't portable currently stylistically requires the "non portable part of it" to appear in the path you use it. For example Windows-specific functionality is at std::os::windows. Following suit for SIMD, architecture-specific intrinsics, was natural to place in submodules of std::arch as a warning that what you're using is indeed not portable and specific to only one platform.

Is this something that will be covered with the new portability lint? Also, by that rationale, should everything in std::arch be in target feature submodules?

@parched ideally, yes! If that exists we could perhaps consider moving everything wholesale to different modules.

we could perhaps consider moving everything wholesale to different modules.

For x86/x86_64 this should be easily doable since we already do this internally in stdsimd. For other platforms we can do this in a best effort basis.

core::simd::FromBits still points to this issue. Shouldn't it point to an open issue?

So should we do the changes? (add is_x86_64_feature_detected, expose the feature submodules instead of all intrinsics directly, ...) We don't have much time to do this if we want to, and I could do this on Friday this week.

Er sorry I misread, I think. I do not think we should change anything. Perhaps one day intrinsic can live directly in std::arch and be easier to use with the portability lint, but don't have the portability lint.

Is there any word on when we can stabilize instrinsics like https://doc.rust-lang.org/core/arch/x86_64/fn.cmpxchg16b.html ?
I am running into some issues implementing some lockfree algorithms without it.

Would stabilizing AtomicU128 (theoretically tracked in #32976) satisfy your use case, or is there some reason you specifically need the x86 intrinsic?

That would do it as long as it has weak compare and exchange or compare and swap. I really just need a 128 bit compare and swap to fit a pointer and refcount. How is that implemented on archs like spark and ppc that don't support it that easily. LL/SC?

AtomicU128 will only be available on targets that support it. AFAIK that's only x86_64 and AArch64.

Ah, it could be theoretically implemented with doublewidth LL/SC on other architectures I think. Is that a possible thing to do?

Only AArch64 has 2x64-bit LL/SC.

Are the half-precision x86/64 functions intended to remain unstable? The compiler errors and the documentation points to this issue, but it was closed quite a while ago along with the stabilization PR.

EDIT: I also noticed that the f16c feature isn't reported in CARGO_CFG_TARGET_FEATURE in the stable compiler when it's explicitly requested: RUSTFLAGS="-C target-cpu=x86-64 -C target-feature=+sse3,+sse4.1,+avx,+f16c" cargo test. However, it does show up in _nightly_.

I think someone just needs to send a stabilization PR for that feature. But first we need to ensure that all the intrinsics covered by the f16c feature are properly implemented.

Any updates on stabilizing the F16C instructions?

Was this page helpful?
0 / 5 - 0 ratings