Rfcs: More Exotic Enum Layout Optimizations

Created on 31 Jul 2015  Â·  69Comments  Â·  Source: rust-lang/rfcs

There are several things we currently don't do that we could. It's not clear that these would have practical effects for any real program (or wouldn't negatively affect real programs by making llvm super confused), so all I can say is that these would be _super cool_.

Use Undefined Values in Other Primitives

  • bool is a byte but can only contain 0 or 1
  • char is 4 bytes but can only contain values in the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]

    Use Multiple Invalid-Value Fields

(&u8, &u8) can be the same size as Option<Option<(&u8, &u8)>>

Support More than a Single Bit

  • bool can support 254 enum variants
  • char can support tons

    Use all other fields as free bits when there exists another forbidden field

(&u8, u64) supports 2^64 variants via the encoding (0, x)

Use the Fact that Void is Statically Unreachable

enum Void { } cannot be instantiated, so any variant the contains Void is statically unreachable, and therefore can be removed. So Option<Void> can only be None, and therefore can be zero-sized.

Support Enums that have More than One Non-C-Like Variant

enum Foo {
  A(&u8, &u8),
  B(&u8),
}

Can be represented without any tag as (&u8, *const u8) via the following packing:

  • self.1 is not null: A
  • self.1 is null: B

Treat any True Discriminant Tags as a Type with Undefined Values

Option<Option<u32>> can be "only" 8 bytes if the second Option stores its discriminant in the first Option's u32 tag.

T-compiler

Most helpful comment

Another idea: Make use of padding bytes in nested enums:

enum A {
    Aa(u32, B),
    Ab(u32, B),
}
enum B {
    Ba(u32),
    Bb(u32)
}

Today, this leads to this kind of memory layout:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                                         [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [  Apadding  ] Atag [       u32       ] [  Bpadding  ] Btag [       u32       ]

Now, we can't just combine those two tag ranges and padding areas into one because we need to support interior references. But what we can do is store the A tag in the B padding area:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                     [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [       u32       ] [ ABpad ] Atag Btag [       u32       ]

The idea here being that a B value never reads or writes its padding area, so its in principle free to be used by _outer_ enums tags.

This would require that the compiler ensures that padding in a enum doesn't get overwritten with junk bytes though, which might be tricky or inefficient in combination with mutable references to such an enum (You couldn't just overwrite all bytes on reassignment).

All 69 comments

:+1: to any of these, even if just as an experiment to see if it would be worth it in practice.

rust issue: rust-lang/rust#5977

Be wary of optimizations that are incompatible with pointers to the payload.

For example

let mut x: Option<Option<u32>> = Some(None);
let y: &mut Option<u32> = x.as_mut().unwrap();

y is a reference to just an Option<u32>, so we can't fudge its discriminant from the outer layer.

Option<u32> is isomorphic to (bool, u32) which should be null-pointer optimizable (well - 2'd pointer optimizable).

(bool, u32) is a nice way to think about it, it should work.

A general-purpose "ForbiddenValues" wrapper as you described in rust-lang/rust#27573 would be nice, but how exactly would you tell the compiler which values are forbidden? Even just for primitive types it's difficult, considering that as yet the type system has no support for variadics or constant parameters.

Supporting compound types is even trickier. To be fully general, you have to be able to describe any possible subset of any possible compound type, and do so in a concise way which the compiler can easily reason about, and do it all without relying on any particular layout choice. Short of using a compiler plugin, I have no idea how you do that.

Basically: maybe it's worth stabilizing simple cases like NonZero now - it may well be that in future it will simply be a type alias to a ForbiddenValues type, but that should be a backwards compatible change.

(&u8, &u8) can support 3 variants

It seems like one could also null the first field and use the second as a tag, allowing 2^64 (or 2^32) variants.

@rkjnsn Oh yeah, great point!

Another idea: Make use of padding bytes in nested enums:

enum A {
    Aa(u32, B),
    Ab(u32, B),
}
enum B {
    Ba(u32),
    Bb(u32)
}

Today, this leads to this kind of memory layout:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                                         [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [  Apadding  ] Atag [       u32       ] [  Bpadding  ] Btag [       u32       ]

Now, we can't just combine those two tag ranges and padding areas into one because we need to support interior references. But what we can do is store the A tag in the B padding area:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                     [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [       u32       ] [ ABpad ] Atag Btag [       u32       ]

The idea here being that a B value never reads or writes its padding area, so its in principle free to be used by _outer_ enums tags.

This would require that the compiler ensures that padding in a enum doesn't get overwritten with junk bytes though, which might be tricky or inefficient in combination with mutable references to such an enum (You couldn't just overwrite all bytes on reassignment).

You could also use the bit0/1 in pointers witch are always zero under some aligment conditions like
_if bit0 is 1 then it's not a pointer and bit7-1 are the actual tag_. This is a bit more efficient then just having a null pointer as additional marker. (e.g. in the (&u8, &u8) case)

@naicode That may be more space efficient, but it's less time efficient because of the bit shifting and masking required. Rust probably shouldn't make that kind of change automatically.

It's a neat idea, though. Reminds me of Ruby's internal VALUE type which if the LSB is 0 then it's a pointer to an object and if the LSB is 1 then the rest of the bits encode an immediate value like true, false, nil, a symbol, or an integer (there is a further way to differentiate those). So Ruby values are like a super-compact Rust enum in a way.

It's true that such bit-fiddling slows down the program at runtime and therefor should never be applied implicitly.

If more exotic layout optimisations are implemented it might still be possible to opt-in to such optimisations on per-struct-basis e.g. a #[repr(compact_rust)] Attribute witch uses rust representation + space optimisations witch have a runtime cost.

(e.g. for usage of rust in embedded environment with small sized RAM, or a enum's allocated in massive amounts on the heap).

cc rust-lang/rust#24041

Another idea: compress nested enum variant ids (thanks Mutabah over IRC!).

enum A { A1, A2 }
enum B { BA(A), B1, B2 }

This currently requires space in B for the variant id of A, but one could assign BA+A1 = 0, BA+A2 = 1, B1 = 2, B2 = 3, and both save space and maintain the ability to take a reference to the stored A (because its payload may presumably be stored immediately following its variant id [which is also B's] as usual).

@soltanmm Those kind of optimizations are not generally possible, because 'A' must have the same layout regardless of whether it is used within another enum (otherwise references to the 'A' within 'BA' would have a different layout from normal references to 'A'.

Furthermore, you can't adjust the layout of all 'A's in advance because 'B' might be in a different crate.

What you _can_ do is treat the determinant in 'A' as a restricted value (either 0 or 1 for A1/A2) in which case the 'B' is free to use the disallowed values as IDs for the other variants (B1, B2) but that kind of optimization has already been covered in this issue.

@Diggsey note that @soltanmm's suggestion doesn't adjust the layout of A (in both standalone and optimised form A1 is 0, and A2 is 1, although the particular values don't matter). AFAICT, your last paragraph is exactly what @soltanmm is saying.

@huonw EDIT: I just realized I might have interpreted that incorrectly.
Almost exactly, I think, but not quite (specifically the bit about this being already covered).

@Diggsey I believe the optimizations discussed most similar were using forbidden values in payloads or placing determinants in padding, as opposed to outright summing the determinants in the same location. Having the determinant be a value for which forbidden values might be used wasn't mentioned, and this has slightly different space-time trade-offs to @Kimundi's example with the determinants being split into different sequences of bit positions.

Arguably, though, everything in this discussion is some variant of, "Use forbidden values to encode variants," so as long as everyone's kicking the horse I guess I concede that to @Diggsey. :-)

Would it be possible to enable NonZero for the common Rust enum? Like

enum Status {
    Alive, // discriminant would start at 1 instead of 0
    Suspect,
    Dead,
}

size_of::<Status>() == size_of::<Option<Status>>()

@arthurprs No, that's not a good choice, first off you can't depend on knowing whether Option<Status> is used in the program, and secondly what if I have Option<Result<Status, ()>>?
The general niche optimization would use values _above_ those of the original enum, which scales.

The general optimization scales better, but @arthurprs's suggestion seems like it would technically work fine. You don't need to depend on knowing about Option<Status>, you can just always start from 1, unless we have existing reasons to prefer starting from 0.

Rust rely so much on these patterns that any gain would be huge.

Option<Result<Status, ()>>

Can't it be the usual 2 discriminants + Status?

If you manually wrote enum Status { Alive=1, ... }, could it decide this is NonZero? But generally speaking, I would agree that broader Forbidden Value analysis is more promising.

I'm surprised enums like

enum E {
    A = 1,
    B = 2,
    C = 3,
}

don't use non-zero optimization right now.
If you need to save space you have to add an artificial None-like variant instead of using idiomatic Option<E>.

From the reference:

If a discriminant isn't specified, they start at zero, and add one for each variant, in order.

So it'd have to not be visible to the user, which means doing a bunch of operations to hide it.

I'm surprised enums like ... don't use non-zero optimization right now.

@cuviper @petrochenkov That one's an easy fix, compared to the others, likely just a couple extra lines.

The problem we have is there aren't enough people to work on improvements like these, once they become possible. There's much bigger fish that have more serious implications and we have to pick.
Worse, every time we compromise, cleaning it up later becomes harder, which slows us down further.

@camlorn has recently cleaned up the duplication in rustc_trans and he might be interested in taking a look at niche optimizations, or I can also mentor someone else too. (Find me in #rustc on IRC)

From the reference:

If a discriminant isn't specified, they start at zero, and add one for each variant, in order.

:cry:

But this only affects C-Style enums. The others (vast majority) still has potential for improvement.

I might take some of these after I finish my current work to optimize field order for anything represented as Univariant, probably from most impact to least.

Making enums which fall into the CEnum case of Layout without being explicitly C enums start at 1 should be an incredibly tiny change.

But this only affects C-Style enums. The others (vast majority) still has potential for improvement.

Is there any reason None couldn't be defined to be 3? That seems less hacky. When you have generic enums wrapping generic enums, etc. rustc can just reserve discriminant ranges as needed. For example, None: Option<Option<E>> could be 4:

E::A: E == 0;
E::B: E == 1;
E::C: E == 2;
Some(E::A): Option<E> == 0;
Some(E::B): Option<E> == 1;
Some(E::C): Option<E> == 2;
None: Option<E> = 3;
Some(Some(E::A)): Option<Option<E>> == 0;
Some(Some(E::B)): Option<Option<E>> == 1;
Some(Some(E::C)): Option<Option<E>> == 2;
Some(None): Option<Option<E>> == 3;
None: Option<Option<E>> == 4;

@Stebalien That is the optimization @eddyb was referring to, which places values above the original enum's values. It's what I think we should ultimately go with, as well.

@solson Ah. Sorry, I missed that when skimming his reply.

I spoke to @eddyb on IRC and we discussed algorithms for these cases. The conclusion is that it's not actually hard now that repr is deduplicated. We think nothing outside layout has to change. I'm still on the fence, but am leaning towards trying unless someone else wants to take it.

I tend to think of things at like 2 in the morning. The thing I thought of at like 2 in the morning last night is how much optimizing a tree of types to use one discriminant could benefit futures-rs.

CC rust-lang/rust#36237 (a rebase of rust-lang/rust#31215): an attempt to optimize two special cases of Option.

Indeed, one of @eddyb's comments on that issue echos discussion here.

@mrhota See also this more recent comment - while I realized later that it's not exactly on-topic in that thread, I'm stating the situation as I see it - rust-lang/rust#39999 followed by some refactors is necessary.

Found myself wanting this for creating a small vector of min size 2, e.g.

enum SmallVec<'a> {
    A2(&'a u64, &'a u64),
    V(Vec<&'a u64>),
}

(since the first member of a Vec is the Unique pointer, i.e. non-null). Though this example seems like a clear win, it's less clear what rust should do if A0 and A1 variants were added, since doing the optimisations make it harder to figure out what variant the enum is.

Null pointer optimization in two variants case could be used even if both variants contain values. For example Result<(&i32, usize), usize> can be stored like (Option<&i32>, usize)

@xfix
I was going to say that I don't think this can be done easily, then I started thinking of ways we might be able to do so. So maybe.

But I do think it wouldn't be trivial.

Also. That's really only applicable in a small number of cases. Let (a, n, b) denote a type with a segment of fields (a) before a non-null pointer (n) followed by a segment of fields (b).

We can only do the optimization if the other variant has either sizeof(other) <= sizeof(b) or sizeof(other) <= sizeof(a). in any other case, the enum itself becomes larger, which we don't want.

It may be possible to tweak the layout algorithms to make this work reliably by always putting references at the beginning of structs--we sort of already do.

Today I am writing unsafe code to represent as (*const u8, usize, PhantomData<&'a str>) (on two words of memory) a type that is effectively:

pub enum BorrowedOrSharedString<'a> {
    Borrowed(&'a str)
    Shared(Rc<String>),
}

The compiler could do that optimization: represent Borrowed as &'a str’s (&'a u8, usize) directly (with a non-null pointer), and Shared as (null, Rc<String>)

@SimonSapin Nice! Did you publish it as separate crate? Seems very similar to Cow and I think it'd be good to enable code reuse. Similarly with Arc.

It’s not done yet, and I’m not sure I’ll bother with a separate crate. It is somewhat similar to Cow in the dual representation, but care much about the copy-on-write aspect.

As to code reuse, I feel that every time I want something like this I want something slightly different.

but care much about the copy-on-write aspect.

What exactly do you mean by that?

Could you at least throw raw code in the wild, so someone could take it and create the crate?

I mean that I haven’t implemented DerefMut, make_mut, to_mut, or anything that switches from one variant to the other. The code is in a work-in-progress branch, I’ll link it here when I open a PR.

I haven't read the whole thread here but I did attempt to grep through to see if this has been mentioned so far - can we store way more tags in a pointer because of its alignment? We know for sure that a pointer of 2-byte alignment can never be odd, so we can store one tag in P=0 and then P=1, P=3, P=5 etc. I think that for n-byte alignment, we have 1 + (n - 1) * 2.pow(size_of::<usize>() - (n - 1)) possible tags, which should mean that we can have zero tag overhead for almost any pointer-containing enum (since how often do you have a pointer to a single-byte-alignment datatype).

I should say that I didn't come up with this myself, I stole it from a really clever trick in the bytes crate.

@Jef while this reduces memory consummation it increases CPU consummation
as accessing the discernment requires bit making operations. It's also
havily dependant on the alignment a point has for a given type on a given
platform.

I think optimizations treating CPU cyclers for memory should not
necessarily be done without any way for the programmer to control them.
Maybe something like #[optimize(<what>)] with what being e.g. size or
cpu/access_time ... might be interesting. Note that optimize size is
still different form packed/compact which might create unaligned fields (as
it does not use any padding at all) but might be used to determine which of
multiple fields from which one has to be unaligned is made unaligned.

On Tue, Jul 4, 2017, 12:35 Jef notifications@github.com wrote:

I haven't read the whole thread here but I did attempt to grep through to
see if this has been mentioned so far - can we store way more tags in a
pointer because of its alignment? We know for sure that a pointer of 2-byte
alignment can never be odd, so we can store one tag in P=0 and then P=1,
P=3, P=5 etc. I think that for n-byte alignment, we have 1 + (n - 1) *
2.pow(size_of::() - (n - 1)) possible tags, which should mean that
we can have zero tag overhead for almost any pointer-containing enum (since
how often do you have a pointer to a single-byte-alignment datatype).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rfcs/issues/1230#issuecomment-312845071,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kW4avRn6xb0SCAP3-VNEsNYOUU-Oks5sKhWCgaJpZM4FjSb8
.

@dathinab Accessing the discriminant would never require bit masking, although accessing the pointer may require bit masking unless you're smart about it. In cases where bit masking would be necessary I would say that yes, that should be explicit opt-in, but it's by no means always the case. For example:

enum ContainsPointer {
    A(&Foo), // Assume `Foo` has 2-byte alignment
    B,
    C,
}

match contains_pointer {
    A(thing) => { ... },
    B => { ... },
    C => { ... },
}

Could be converted to:

struct ContainsPointer(*const Foo);

if contains_pointer.0 == 0 {
    // B's path
} else if contains_pointer.0 == 1 {
    // C's path
} else {
    // A's path
    // We know that any value that is not 0 or 1 is valid since we control what gets written
    // to a value of type `ContainsPointer`
}

Could you at least throw raw code in the wild, so someone could take it and create the crate?

Here it is.

/// A string that is either shared (heap-allocated and reference-counted) or borrowed.
///
/// Equivalent to `enum { Borrowed(&'a str), Shared(Rc<String>) }`, but stored more compactly.
///
/// FIXME(https://github.com/rust-lang/rfcs/issues/1230): use an actual enum if/when
/// the compiler can do this layout optimization.
pub struct CowRcStr<'a> {
    /// FIXME: https://github.com/rust-lang/rust/issues/27730 use NonZero or Shared.
    /// In the meantime we abuse `&'static _` to get the effect of `NonZero<*const _>`.
    /// `ptr` doesn’t really have the 'static lifetime!
    ptr: &'static (),

    /// * If `borrowed_len_or_max == usize::MAX`, then `ptr` represents `NonZero<*const String>`
    ///   from `Rc::into_raw`.
    ///   The lifetime parameter `'a` is irrelevant in this case.
    ///
    /// * Otherwise, `ptr` represents the `NonZero<*const u8>` data component of `&'a str`,
    ///   and `borrowed_len_or_max` its length.
    borrowed_len_or_max: usize,

    phantom: PhantomData<Result<&'a str, Rc<String>>>,
}

fn _static_assert_same_size<'a>() {
    // "Instantiate" the generic function without calling it.
    let _ = mem::transmute::<CowRcStr<'a>, Option<CowRcStr<'a>>>;
}

impl<'a> From<&'a str> for CowRcStr<'a> {
    #[inline]
    fn from(s: &'a str) -> Self {
        let len = s.len();
        assert!(len < usize::MAX);
        CowRcStr {
            ptr: unsafe { &*(s.as_ptr() as *const ()) },
            borrowed_len_or_max: len,
            phantom: PhantomData,
        }
    }
}

impl<'a> From<String> for CowRcStr<'a> {
    #[inline]
    fn from(s: String) -> Self {
        CowRcStr::from_rc(Rc::new(s))
    }
}

impl<'a> CowRcStr<'a> {
    #[inline]
    fn from_rc(s: Rc<String>) -> Self {
        let ptr = unsafe { &*(Rc::into_raw(s) as *const ()) };
        CowRcStr {
            ptr: ptr,
            borrowed_len_or_max: usize::MAX,
            phantom: PhantomData,
        }
    }

    #[inline]
    fn unpack(&self) -> Result<&'a str, *const String> {
        if self.borrowed_len_or_max == usize::MAX {
            Err(self.ptr as *const () as *const String)
        } else {
            unsafe {
                Ok(str::from_utf8_unchecked(slice::from_raw_parts(
                    self.ptr as *const () as *const u8,
                    self.borrowed_len_or_max,
                )))
            }
        }
    }

    /// Convert into `String`, re-using an existing memory allocation if possible.
    #[inline]
    pub fn into_owned(self) -> String {
        let unpacked = self.unpack();

        // Inhibit destructor: we’re taking ownership of this strong reference (if any)
        mem::forget(self);

        match unpacked {
            Ok(s) => s.to_owned(),
            Err(ptr) => {
                let rc = unsafe {
                    Rc::from_raw(ptr)
                };
                Rc::try_unwrap(rc).unwrap_or_else(|rc| (*rc).clone())
            }
        }
    }
}

impl<'a> Clone for CowRcStr<'a> {
    #[inline]
    fn clone(&self) -> Self {
        match self.unpack() {
            Err(ptr) => {
                let rc = unsafe {
                    Rc::from_raw(ptr)
                };
                let new_rc = rc.clone();
                mem::forget(rc);  // Don’t actually take ownership of this strong reference
                CowRcStr::from_rc(new_rc)
            }
            Ok(_) => {
                CowRcStr { ..*self }
            }
        }
    }
}

impl<'a> Drop for CowRcStr<'a> {
    #[inline]
    fn drop(&mut self) {
        if let Err(ptr) = self.unpack() {
            mem::drop(unsafe {
                Rc::from_raw(ptr)
            })
        }
    }
}

impl<'a> Deref for CowRcStr<'a> {
    type Target = str;

    #[inline]
    fn deref(&self) -> &str {
        self.unpack().unwrap_or_else(|ptr| unsafe {
            &**ptr
        })
    }
}

// Omitted: a bunch of trivial impls of standard library traits, deferring to str impls

Nice, thanks! I just feel like playing a bit, so I'm gonna try to clean it up a bit.

@SimonSapin
I think doing this automatically may be difficult because you are effectively changing the type by removing a level of pointer indirection only in some highly specific cases.

It feels like this would be on the order of struct field reordering in terms of the kinds of weird bugs it'd end up with and the effort required, with not nearly as much impact.

A more general optimization here would be to realize that the refcount of Rc is never zero and that the first variant never extends to cover it, but this relies on Rc not being reordered which isn't guaranteed anymore.

(FWIW the refcount is not stored in Rc<T> directly, it’s in the heap-allocated RcBox<T> that Rc<T> points to.)

Yeah, you're right. My bad. I still think my point stands, though.

So in https://github.com/rust-lang/rust/pull/45225#issuecomment-336474332 I mentioned some hard cases, but @trentj on IRC brought it up again and I've realized there's a simpler way to look at things, that is not type-based, as

enum E {
    A(ALeft..., Niche, ARight...),
    B(B...),
    C(C...)
}

can be seen as a generalization of the case where sizeof B and sizeof C are 0.

The challenge is to fit B and C in ALeft and/or ARight, which doesn't have to be too clever if Niche has either the smallest (e.g. bool, Option<ZST>, etc.) or largest (e.g. &T, Option<usize>, etc.) alignment in E, because we can hint the field reordering to place it at one end of the variant, giving B and C only one run of bytes to fit, instead of two.

EDIT: now over at https://github.com/rust-lang/rust/issues/46213.

One possible optimisation that hasn't been mentioned yet: NaN tagging. It's impossible to implement with current Rust types, however; floating-point types treat all possible NaN bit patterns as valid. Doing this would require adding NonZero-like wrapper types or some equivalent mechanism to restrict valid representations of floating-point values.

I've actually got something written up on this topic, I might submit it as an RFC soon...

@fstirlitz With const generics we could have a generalization of NonZero and use it to implement a type like NonNaN, by removing a specific range of values for f32 and another for f64.

@eddyb Will it be possible to use const generics to disallow NaNs with certain payloads, but not others? You'd have to have to_bits() be const fn, which is currently impossible, because it's implemented with transmute. (Use case for this: an interpreter for a dynamically-typed language which supports NaNs, but doesn't expose payloads.)

And even if sufficiently expressive const generics do arrive, it will be beneficial to have a canonical form of this feature in the standard library.

I'm not talking about CTFE to compute the layout, but a wrapper type with two integer parameters expressing a range of values that the inner type can't use but optimizations can.

You mean, like...

// possibly wrapped in an opaque struct;
// could probably be generic over float types
// by means of associated consts and types,
// but using f64 for readability
enum NaNaN64 {
    Positive(IntInRange<u64, 0x0000000000000000, 0x7ff0000000000000>),
    Negative(IntInRange<u64, 0x8000000000000000, 0xfff0000000000000>),
}

impl From<NaNaN64> for f64 { /* f64::from_bits */ }
/* other impls */

That's... not especially convenient, is it?

Plus, without compiler support it won't be able to additionally take advantage of LLVM's fast-math annotations.

More like:

struct NaNaN64 {
    float: WithInvalidRange<WithInvalidRange<f64,
        0x7ff0000000000000, 0x7fffffffffffffff>,
        0xfff0000000000000, 0xffffffffffffffff>
}

But internally we can't represent the two ranges anyway for now, so for NaN-boxing you'd have to choose only one of them, in the near future.
As for fast-math... It could work if LLVM actually understood range metadata as "can't possibly be a NaN" and optimizing based on that.

It's much more straightforward when everything is offsets and bitpatterns and ranges than "types".

I'm pretty sure that a safe "forbidden NaN" float type would undermine its own benefits with NaN-masking-cmov's (or worse) everywhere.

Note: If someone is willing to mentor me on this bug, I'm interested in tackling it.

@Gankro How are we going to track that most of the optimizations mentioned this have been implemented, and the various tricks required to make any further changes?

@eddyb are any of the optimizations in the OP not implemented? It looks like at least most are, and if so we might want to turn this into a metabug that points at sub-issues for the remainder, or just close it.

@Gankro: re NaNs, you mean checking after arithmetic operations whether the result was NaN? Maybe. On the other hand, such types could at least implement Ord and Eq; and a 'designated canonical NaN' type might even elide most of these checks (if the canonical NaN is chosen wisely), thanks to NaN propagation semantics of IEEE 754.

@eddyb Could Cow<str> be represented on three words? Cow::Owned(String) would be String (with a non-zero pointer, assuming it is first) and Cow::Borrowed(&str) would be (0, &str). I half expected https://github.com/rust-lang/rust/pull/45225 to do this but it doesn’t.

Yeah, I only figured out how to later. Also, it requires rustc_trans to generate LLVM constants from miri allocations, for us to be able to combine the niche with additional data.

@SimonSapin another alternative: since capacity > len, &str can be represented exactly as String with capacity == 0. The con is we can't distinguish between zero-sized &str or zero-sized String. The great thing is for all practical purposes we don't have to. My suggestion also keeps pointer non-zero, so even Option<MyCow> has the same layout.

I've actually started writing such crate for fun.

Same holds for Vec

But the compiler doesn’t know about capacity > len. It does already know that NonZero<*mut u8> inside String is non-null, though.

Yes. The compiler will probably never be able to do the optimization using capacity >= len because of the weird empty case, which we know doesn't matter but explaining that to the compiler would be a nightmare.

Since a part of the original desired optimizations have been implemented in rust-lang/rust#45225, and most of the remaining ones have various trade-offs that need to be explored by RFC for each category separately (with the exception of https://github.com/rust-lang/rust/issues/46213), I'm going to close this central issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nrc picture nrc  Â·  96Comments

mqudsi picture mqudsi  Â·  73Comments

rust-highfive picture rust-highfive  Â·  88Comments

pnkfelix picture pnkfelix  Â·  160Comments

glaebhoerl picture glaebhoerl  Â·  112Comments