Rfcs: More Exotic Enum Layout Optimizations

Created on 31 Jul 2015 · 69Comments · Source: rust-lang/rfcs

There are several things we currently don't do that we could. It's not clear that these would have practical effects for any real program (or wouldn't negatively affect real programs by making llvm super confused), so all I can say is that these would be _super cool_.

Use Undefined Values in Other Primitives

bool is a byte but can only contain 0 or 1
char is 4 bytes but can only contain values in the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]

Use Multiple Invalid-Value Fields

(&u8, &u8) can be the same size as Option<Option<(&u8, &u8)>>

Support More than a Single Bit

bool can support 254 enum variants
char can support tons

Use all other fields as free bits when there exists another forbidden field

(&u8, u64) supports 2^64 variants via the encoding (0, x)

Use the Fact that Void is Statically Unreachable

enum Void { } cannot be instantiated, so any variant the contains Void is statically unreachable, and therefore can be removed. So Option<Void> can only be None, and therefore can be zero-sized.

Support Enums that have More than One Non-C-Like Variant

enum Foo {
  A(&u8, &u8),
  B(&u8),
}

Can be represented without any tag as (&u8, *const u8) via the following packing:

self.1 is not null: A
self.1 is null: B

Treat any True Discriminant Tags as a Type with Undefined Values

Option<Option<u32>> can be "only" 8 bytes if the second Option stores its discriminant in the first Option's u32 tag.

T-compiler

Source

Gankra

👍56

Most helpful comment

Another idea: Make use of padding bytes in nested enums:

enum A {
    Aa(u32, B),
    Ab(u32, B),
}
enum B {
    Ba(u32),
    Bb(u32)
}

Today, this leads to this kind of memory layout:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                                         [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [  Apadding  ] Atag [       u32       ] [  Bpadding  ] Btag [       u32       ]

Now, we can't just combine those two tag ranges and padding areas into one because we need to support interior references. But what we can do is store the A tag in the B padding area:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                     [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [       u32       ] [ ABpad ] Atag Btag [       u32       ]

The idea here being that a B value never reads or writes its padding area, so its in principle free to be used by _outer_ enums tags.

This would require that the compiler ensures that padding in a enum doesn't get overwritten with junk bytes though, which might be tricky or inefficient in combination with mutable references to such an enum (You couldn't just overwrite all bytes on reassignment).

Kimundi on 22 Aug 2015

👍6

All 69 comments

:+1: to any of these, even if just as an experiment to see if it would be worth it in practice.

Kimundi on 31 Jul 2015

rust issue: rust-lang/rust#5977

bluss on 3 Aug 2015

Be wary of optimizations that are incompatible with pointers to the payload.

For example

let mut x: Option<Option<u32>> = Some(None);
let y: &mut Option<u32> = x.as_mut().unwrap();

y is a reference to just an Option<u32>, so we can't fudge its discriminant from the outer layer.

bluss on 3 Aug 2015

Option<u32> is isomorphic to (bool, u32) which should be null-pointer optimizable (well - 2'd pointer optimizable).

Gankra on 3 Aug 2015

(bool, u32) is a nice way to think about it, it should work.

bluss on 3 Aug 2015

A general-purpose "ForbiddenValues" wrapper as you described in rust-lang/rust#27573 would be nice, but how exactly would you tell the compiler which values are forbidden? Even just for primitive types it's difficult, considering that as yet the type system has no support for variadics or constant parameters.

Supporting compound types is even trickier. To be fully general, you have to be able to describe any possible subset of any possible compound type, and do so in a concise way which the compiler can easily reason about, and do it all without relying on any particular layout choice. Short of using a compiler plugin, I have no idea how you do that.

Basically: maybe it's worth stabilizing simple cases like NonZero now - it may well be that in future it will simply be a type alias to a ForbiddenValues type, but that should be a backwards compatible change.

Diggsey on 13 Aug 2015

(&u8, &u8) can support 3 variants

It seems like one could also null the first field and use the second as a tag, allowing 2^64 (or 2^32) variants.

rkjnsn on 22 Aug 2015

@rkjnsn Oh yeah, great point!

Gankra on 22 Aug 2015

Another idea: Make use of padding bytes in nested enums:

enum A {
    Aa(u32, B),
    Ab(u32, B),
}
enum B {
    Ba(u32),
    Bb(u32)
}

Today, this leads to this kind of memory layout:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                                         [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [  Apadding  ] Atag [       u32       ] [  Bpadding  ] Btag [       u32       ]

Now, we can't just combine those two tag ranges and padding areas into one because we need to support interior references. But what we can do is store the A tag in the B padding area:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                     [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [       u32       ] [ ABpad ] Atag Btag [       u32       ]

The idea here being that a B value never reads or writes its padding area, so its in principle free to be used by _outer_ enums tags.

Kimundi on 22 Aug 2015

👍6

You could also use the bit0/1 in pointers witch are always zero under some aligment conditions like
_if bit0 is 1 then it's not a pointer and bit7-1 are the actual tag_. This is a bit more efficient then just having a null pointer as additional marker. (e.g. in the (&u8, &u8) case)

rustonaut on 28 Aug 2015

@naicode That may be more space efficient, but it's less time efficient because of the bit shifting and masking required. Rust probably shouldn't make that kind of change automatically.

It's a neat idea, though. Reminds me of Ruby's internal VALUE type which if the LSB is 0 then it's a pointer to an object and if the LSB is 1 then the rest of the bits encode an immediate value like true, false, nil, a symbol, or an integer (there is a further way to differentiate those). So Ruby values are like a super-compact Rust enum in a way.

solson on 28 Aug 2015

It's true that such bit-fiddling slows down the program at runtime and therefor should never be applied implicitly.

If more exotic layout optimisations are implemented it might still be possible to opt-in to such optimisations on per-struct-basis e.g. a #[repr(compact_rust)] Attribute witch uses rust representation + space optimisations witch have a runtime cost.

(e.g. for usage of rust in embedded environment with small sized RAM, or a enum's allocated in massive amounts on the heap).

rustonaut on 28 Aug 2015

cc https://github.com/rust-lang/rust/issues/14540

huonw on 31 Aug 2015

cc rust-lang/rust#24041

mbrubeck on 12 Nov 2015

Another idea: compress nested enum variant ids (thanks Mutabah over IRC!).

enum A { A1, A2 }
enum B { BA(A), B1, B2 }

This currently requires space in B for the variant id of A, but one could assign BA+A1 = 0, BA+A2 = 1, B1 = 2, B2 = 3, and both save space and maintain the ability to take a reference to the stored A (because its payload may presumably be stored immediately following its variant id [which is also B's] as usual).

soltanmm on 13 Jan 2016

@soltanmm Those kind of optimizations are not generally possible, because 'A' must have the same layout regardless of whether it is used within another enum (otherwise references to the 'A' within 'BA' would have a different layout from normal references to 'A'.

Furthermore, you can't adjust the layout of all 'A's in advance because 'B' might be in a different crate.

What you _can_ do is treat the determinant in 'A' as a restricted value (either 0 or 1 for A1/A2) in which case the 'B' is free to use the disallowed values as IDs for the other variants (B1, B2) but that kind of optimization has already been covered in this issue.

Diggsey on 13 Jan 2016

@Diggsey note that @soltanmm's suggestion doesn't adjust the layout of A (in both standalone and optimised form A1 is 0, and A2 is 1, although the particular values don't matter). AFAICT, your last paragraph is exactly what @soltanmm is saying.

huonw on 13 Jan 2016

@huonw EDIT: I just realized I might have interpreted that incorrectly.
~~Almost exactly, I think, but not quite (specifically the bit about this being already covered).~~

@Diggsey I believe the optimizations discussed most similar were using forbidden values in payloads or placing determinants in padding, as opposed to outright summing the determinants in the same location. Having the determinant be a value for which forbidden values might be used wasn't mentioned, and this has slightly different space-time trade-offs to @Kimundi's example with the determinants being split into different sequences of bit positions.

Arguably, though, everything in this discussion is some variant of, "Use forbidden values to encode variants," so as long as everyone's kicking the horse I guess I concede that to @Diggsey. :-)

soltanmm on 13 Jan 2016

Would it be possible to enable NonZero for the common Rust enum? Like

enum Status {
    Alive, // discriminant would start at 1 instead of 0
    Suspect,
    Dead,
}

size_of::<Status>() == size_of::<Option<Status>>()

arthurprs on 14 Oct 2016

👍2 😕1

@arthurprs No, that's not a good choice, first off you can't depend on knowing whether Option<Status> is used in the program, and secondly what if I have Option<Result<Status, ()>>?
The general niche optimization would use values _above_ those of the original enum, which scales.

eddyb on 14 Oct 2016

👍3

The general optimization scales better, but @arthurprs's suggestion seems like it would technically work fine. You don't need to depend on knowing about Option<Status>, you can just always start from 1, unless we have existing reasons to prefer starting from 0.

solson on 14 Oct 2016

Rust rely so much on these patterns that any gain would be huge.

Option<Result<Status, ()>>

Can't it be the usual 2 discriminants + Status?

arthurprs on 14 Oct 2016

If you manually wrote enum Status { Alive=1, ... }, could it decide this is NonZero? But generally speaking, I would agree that broader Forbidden Value analysis is more promising.

cuviper on 14 Oct 2016

I'm surprised enums like

enum E {
    A = 1,
    B = 2,
    C = 3,
}

don't use non-zero optimization right now.
If you need to save space you have to add an artificial None-like variant instead of using idiomatic Option<E>.

petrochenkov on 14 Oct 2016

👍2

From the reference:

If a discriminant isn't specified, they start at zero, and add one for each variant, in order.

So it'd have to not be visible to the user, which means doing a bunch of operations to hide it.

I'm surprised enums like ... don't use non-zero optimization right now.

@cuviper @petrochenkov That one's an easy fix, compared to the others, likely just a couple extra lines.

The problem we have is there aren't enough people to work on improvements like these, once they become possible. There's much bigger fish that have more serious implications and we have to pick.
Worse, every time we compromise, cleaning it up later becomes harder, which slows us down further.

@camlorn has recently cleaned up the duplication in rustc_trans and he might be interested in taking a look at niche optimizations, or I can also mentor someone else too. (Find me in #rustc on IRC)

eddyb on 14 Oct 2016

From the reference:

If a discriminant isn't specified, they start at zero, and add one for each variant, in order.

:cry:

But this only affects C-Style enums. The others (vast majority) still has potential for improvement.

arthurprs on 14 Oct 2016

I might take some of these after I finish my current work to optimize field order for anything represented as Univariant, probably from most impact to least.

Making enums which fall into the CEnum case of Layout without being explicitly C enums start at 1 should be an incredibly tiny change.

camlorn on 14 Oct 2016

But this only affects C-Style enums. The others (vast majority) still has potential for improvement.

Is there any reason None couldn't be defined to be 3? That seems less hacky. When you have generic enums wrapping generic enums, etc. rustc can just reserve discriminant ranges as needed. For example, None: Option<Option<E>> could be 4:

E::A: E == 0;
E::B: E == 1;
E::C: E == 2;
Some(E::A): Option<E> == 0;
Some(E::B): Option<E> == 1;
Some(E::C): Option<E> == 2;
None: Option<E> = 3;
Some(Some(E::A)): Option<Option<E>> == 0;
Some(Some(E::B)): Option<Option<E>> == 1;
Some(Some(E::C)): Option<Option<E>> == 2;
Some(None): Option<Option<E>> == 3;
None: Option<Option<E>> == 4;

Stebalien on 14 Oct 2016

👍5

@Stebalien That is the optimization @eddyb was referring to, which places values above the original enum's values. It's what I think we should ultimately go with, as well.

solson on 14 Oct 2016

@solson Ah. Sorry, I missed that when skimming his reply.

Stebalien on 14 Oct 2016

I spoke to @eddyb on IRC and we discussed algorithms for these cases. The conclusion is that it's not actually hard now that repr is deduplicated. We think nothing outside layout has to change. I'm still on the fence, but am leaning towards trying unless someone else wants to take it.

I tend to think of things at like 2 in the morning. The thing I thought of at like 2 in the morning last night is how much optimizing a tree of types to use one discriminant could benefit futures-rs.

camlorn on 17 Oct 2016

👍5

CC rust-lang/rust#36237 (a rebase of rust-lang/rust#31215): an attempt to optimize two special cases of Option.

Indeed, one of @eddyb's comments on that issue echos discussion here.

mrhota on 19 Apr 2017

@mrhota See also this more recent comment - while I realized later that it's not exactly on-topic in that thread, I'm stating the situation as I see it - rust-lang/rust#39999 followed by some refactors is necessary.

eddyb on 20 Apr 2017

Found myself wanting this for creating a small vector of min size 2, e.g.

enum SmallVec<'a> {
    A2(&'a u64, &'a u64),
    V(Vec<&'a u64>),
}

(since the first member of a Vec is the Unique pointer, i.e. non-null). Though this example seems like a clear win, it's less clear what rust should do if A0 and A1 variants were added, since doing the optimisations make it harder to figure out what variant the enum is.

aidanhs on 23 Apr 2017

Null pointer optimization in two variants case could be used even if both variants contain values. For example Result<(&i32, usize), usize> can be stored like (Option<&i32>, usize)

xfix on 10 May 2017

@xfix
I was going to say that I don't think this can be done easily, then I started thinking of ways we might be able to do so. So maybe.

But I do think it wouldn't be trivial.

camlorn on 10 May 2017

Also. That's really only applicable in a small number of cases. Let (a, n, b) denote a type with a segment of fields (a) before a non-null pointer (n) followed by a segment of fields (b).

We can only do the optimization if the other variant has either sizeof(other) <= sizeof(b) or sizeof(other) <= sizeof(a). in any other case, the enum itself becomes larger, which we don't want.

It may be possible to tweak the layout algorithms to make this work reliably by always putting references at the beginning of structs--we sort of already do.

camlorn on 10 May 2017

Today I am writing unsafe code to represent as (*const u8, usize, PhantomData<&'a str>) (on two words of memory) a type that is effectively:

pub enum BorrowedOrSharedString<'a> {
    Borrowed(&'a str)
    Shared(Rc<String>),
}

The compiler could do that optimization: represent Borrowed as &'a str’s (&'a u8, usize) directly (with a non-null pointer), and Shared as (null, Rc<String>)

SimonSapin on 3 Jul 2017

👍2

@SimonSapin Nice! Did you publish it as separate crate? Seems very similar to Cow and I think it'd be good to enable code reuse. Similarly with Arc.

Kixunil on 3 Jul 2017

It’s not done yet, and I’m not sure I’ll bother with a separate crate. It is somewhat similar to Cow in the dual representation, but care much about the copy-on-write aspect.

SimonSapin on 3 Jul 2017

As to code reuse, I feel that every time I want something like this I want something slightly different.

SimonSapin on 3 Jul 2017

but care much about the copy-on-write aspect.

What exactly do you mean by that?

Could you at least throw raw code in the wild, so someone could take it and create the crate?

Kixunil on 3 Jul 2017

I mean that I haven’t implemented DerefMut, make_mut, to_mut, or anything that switches from one variant to the other. The code is in a work-in-progress branch, I’ll link it here when I open a PR.

SimonSapin on 3 Jul 2017

I haven't read the whole thread here but I did attempt to grep through to see if this has been mentioned so far - can we store way more tags in a pointer because of its alignment? We know for sure that a pointer of 2-byte alignment can never be odd, so we can store one tag in P=0 and then P=1, P=3, P=5 etc. I think that for n-byte alignment, we have 1 + (n - 1) * 2.pow(size_of::<usize>() - (n - 1)) possible tags, which should mean that we can have zero tag overhead for almost any pointer-containing enum (since how often do you have a pointer to a single-byte-alignment datatype).

I should say that I didn't come up with this myself, I stole it from a really clever trick in the bytes crate.

Vurich on 4 Jul 2017

👍2

@Jef while this reduces memory consummation it increases CPU consummation
as accessing the discernment requires bit making operations. It's also
havily dependant on the alignment a point has for a given type on a given
platform.

I think optimizations treating CPU cyclers for memory should not
necessarily be done without any way for the programmer to control them.
Maybe something like #[optimize(<what>)] with what being e.g. size or
cpu/access_time ... might be interesting. Note that optimize size is
still different form packed/compact which might create unaligned fields (as
it does not use any padding at all) but might be used to determine which of
multiple fields from which one has to be unaligned is made unaligned.

On Tue, Jul 4, 2017, 12:35 Jef notifications@github.com wrote:

I haven't read the whole thread here but I did attempt to grep through to
see if this has been mentioned so far - can we store way more tags in a
pointer because of its alignment? We know for sure that a pointer of 2-byte
alignment can never be odd, so we can store one tag in P=0 and then P=1,
P=3, P=5 etc. I think that for n-byte alignment, we have 1 + (n - 1) *
2.pow(size_of::() - (n - 1)) possible tags, which should mean that
we can have zero tag overhead for almost any pointer-containing enum (since
how often do you have a pointer to a single-byte-alignment datatype).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rfcs/issues/1230#issuecomment-312845071,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kW4avRn6xb0SCAP3-VNEsNYOUU-Oks5sKhWCgaJpZM4FjSb8
.

rustonaut on 4 Jul 2017

@dathinab Accessing the discriminant would never require bit masking, although accessing the pointer may require bit masking unless you're smart about it. In cases where bit masking would be necessary I would say that yes, that should be explicit opt-in, but it's by no means always the case. For example:

enum ContainsPointer {
    A(&Foo), // Assume `Foo` has 2-byte alignment
    B,
    C,
}

match contains_pointer {
    A(thing) => { ... },
    B => { ... },
    C => { ... },
}

Could be converted to:

struct ContainsPointer(*const Foo);

if contains_pointer.0 == 0 {
    // B's path
} else if contains_pointer.0 == 1 {
    // C's path
} else {
    // A's path
    // We know that any value that is not 0 or 1 is valid since we control what gets written
    // to a value of type `ContainsPointer`
}

Vurich on 4 Jul 2017

👍2

Could you at least throw raw code in the wild, so someone could take it and create the crate?

Here it is.

/// A string that is either shared (heap-allocated and reference-counted) or borrowed.
///
/// Equivalent to `enum { Borrowed(&'a str), Shared(Rc<String>) }`, but stored more compactly.
///
/// FIXME(https://github.com/rust-lang/rfcs/issues/1230): use an actual enum if/when
/// the compiler can do this layout optimization.
pub struct CowRcStr<'a> {
    /// FIXME: https://github.com/rust-lang/rust/issues/27730 use NonZero or Shared.
    /// In the meantime we abuse `&'static _` to get the effect of `NonZero<*const _>`.
    /// `ptr` doesn’t really have the 'static lifetime!
    ptr: &'static (),

    /// * If `borrowed_len_or_max == usize::MAX`, then `ptr` represents `NonZero<*const String>`
    ///   from `Rc::into_raw`.
    ///   The lifetime parameter `'a` is irrelevant in this case.
    ///
    /// * Otherwise, `ptr` represents the `NonZero<*const u8>` data component of `&'a str`,
    ///   and `borrowed_len_or_max` its length.
    borrowed_len_or_max: usize,

    phantom: PhantomData<Result<&'a str, Rc<String>>>,
}

fn _static_assert_same_size<'a>() {
    // "Instantiate" the generic function without calling it.
    let _ = mem::transmute::<CowRcStr<'a>, Option<CowRcStr<'a>>>;
}

impl<'a> From<&'a str> for CowRcStr<'a> {
    #[inline]
    fn from(s: &'a str) -> Self {
        let len = s.len();
        assert!(len < usize::MAX);
        CowRcStr {
            ptr: unsafe { &*(s.as_ptr() as *const ()) },
            borrowed_len_or_max: len,
            phantom: PhantomData,
        }
    }
}

impl<'a> From<String> for CowRcStr<'a> {
    #[inline]
    fn from(s: String) -> Self {
        CowRcStr::from_rc(Rc::new(s))
    }
}

impl<'a> CowRcStr<'a> {
    #[inline]
    fn from_rc(s: Rc<String>) -> Self {
        let ptr = unsafe { &*(Rc::into_raw(s) as *const ()) };
        CowRcStr {
            ptr: ptr,
            borrowed_len_or_max: usize::MAX,
            phantom: PhantomData,
        }
    }

    #[inline]
    fn unpack(&self) -> Result<&'a str, *const String> {
        if self.borrowed_len_or_max == usize::MAX {
            Err(self.ptr as *const () as *const String)
        } else {
            unsafe {
                Ok(str::from_utf8_unchecked(slice::from_raw_parts(
                    self.ptr as *const () as *const u8,
                    self.borrowed_len_or_max,
                )))
            }
        }
    }

    /// Convert into `String`, re-using an existing memory allocation if possible.
    #[inline]
    pub fn into_owned(self) -> String {
        let unpacked = self.unpack();

        // Inhibit destructor: we’re taking ownership of this strong reference (if any)
        mem::forget(self);

        match unpacked {
            Ok(s) => s.to_owned(),
            Err(ptr) => {
                let rc = unsafe {
                    Rc::from_raw(ptr)
                };
                Rc::try_unwrap(rc).unwrap_or_else(|rc| (*rc).clone())
            }
        }
    }
}

impl<'a> Clone for CowRcStr<'a> {
    #[inline]
    fn clone(&self) -> Self {
        match self.unpack() {
            Err(ptr) => {
                let rc = unsafe {
                    Rc::from_raw(ptr)
                };
                let new_rc = rc.clone();
                mem::forget(rc);  // Don’t actually take ownership of this strong reference
                CowRcStr::from_rc(new_rc)
            }
            Ok(_) => {
                CowRcStr { ..*self }
            }
        }
    }
}

impl<'a> Drop for CowRcStr<'a> {
    #[inline]
    fn drop(&mut self) {
        if let Err(ptr) = self.unpack() {
            mem::drop(unsafe {
                Rc::from_raw(ptr)
            })
        }
    }
}

impl<'a> Deref for CowRcStr<'a> {
    type Target = str;

    #[inline]
    fn deref(&self) -> &str {
        self.unpack().unwrap_or_else(|ptr| unsafe {
            &**ptr
        })
    }
}

// Omitted: a bunch of trivial impls of standard library traits, deferring to str impls

SimonSapin on 4 Jul 2017

❤2

Nice, thanks! I just feel like playing a bit, so I'm gonna try to clean it up a bit.

Kixunil on 4 Jul 2017

@SimonSapin
I think doing this automatically may be difficult because you are effectively changing the type by removing a level of pointer indirection only in some highly specific cases.

It feels like this would be on the order of struct field reordering in terms of the kinds of weird bugs it'd end up with and the effort required, with not nearly as much impact.

A more general optimization here would be to realize that the refcount of Rc is never zero and that the first variant never extends to cover it, but this relies on Rc not being reordered which isn't guaranteed anymore.

camlorn on 9 Jul 2017

(FWIW the refcount is not stored in Rc<T> directly, it’s in the heap-allocated RcBox<T> that Rc<T> points to.)

SimonSapin on 9 Jul 2017

Yeah, you're right. My bad. I still think my point stands, though.

camlorn on 14 Jul 2017

So in https://github.com/rust-lang/rust/pull/45225#issuecomment-336474332 I mentioned some hard cases, but @trentj on IRC brought it up again and I've realized there's a simpler way to look at things, that is not type-based, as

enum E {
    A(ALeft..., Niche, ARight...),
    B(B...),
    C(C...)
}

can be seen as a generalization of the case where sizeof B and sizeof C are 0.

The challenge is to fit B and C in ALeft and/or ARight, which doesn't have to be too clever if Niche has either the smallest (e.g. bool, Option<ZST>, etc.) or largest (e.g. &T, Option<usize>, etc.) alignment in E, because we can hint the field reordering to place it at one end of the variant, giving B and C only one run of bytes to fit, instead of two.

EDIT: now over at https://github.com/rust-lang/rust/issues/46213.

eddyb on 18 Nov 2017

👍1

One possible optimisation that hasn't been mentioned yet: NaN tagging. It's impossible to implement with current Rust types, however; floating-point types treat all possible NaN bit patterns as valid. Doing this would require adding NonZero-like wrapper types or some equivalent mechanism to restrict valid representations of floating-point values.

I've actually got something written up on this topic, I might submit it as an RFC soon...

fstirlitz on 19 Nov 2017

@fstirlitz With const generics we could have a generalization of NonZero and use it to implement a type like NonNaN, by removing a specific range of values for f32 and another for f64.

eddyb on 19 Nov 2017

@eddyb Will it be possible to use const generics to disallow NaNs with certain payloads, but not others? You'd have to have to_bits() be const fn, which is currently impossible, because it's implemented with transmute. (Use case for this: an interpreter for a dynamically-typed language which supports NaNs, but doesn't expose payloads.)

And even if sufficiently expressive const generics do arrive, it will be beneficial to have a canonical form of this feature in the standard library.

fstirlitz on 20 Nov 2017

I'm not talking about CTFE to compute the layout, but a wrapper type with two integer parameters expressing a range of values that the inner type can't use but optimizations can.

eddyb on 20 Nov 2017

You mean, like...

// possibly wrapped in an opaque struct;
// could probably be generic over float types
// by means of associated consts and types,
// but using f64 for readability
enum NaNaN64 {
    Positive(IntInRange<u64, 0x0000000000000000, 0x7ff0000000000000>),
    Negative(IntInRange<u64, 0x8000000000000000, 0xfff0000000000000>),
}

impl From<NaNaN64> for f64 { /* f64::from_bits */ }
/* other impls */

That's... not especially convenient, is it?

Plus, without compiler support it won't be able to additionally take advantage of LLVM's fast-math annotations.

fstirlitz on 20 Nov 2017

More like:

struct NaNaN64 {
    float: WithInvalidRange<WithInvalidRange<f64,
        0x7ff0000000000000, 0x7fffffffffffffff>,
        0xfff0000000000000, 0xffffffffffffffff>
}

But internally we can't represent the two ranges anyway for now, so for NaN-boxing you'd have to choose only one of them, in the near future.
As for fast-math... It could work if LLVM actually understood range metadata as "can't possibly be a NaN" and optimizing based on that.

It's much more straightforward when everything is offsets and bitpatterns and ranges than "types".

eddyb on 20 Nov 2017

I'm pretty sure that a safe "forbidden NaN" float type would undermine its own benefits with NaN-masking-cmov's (or worse) everywhere.

Gankra on 20 Nov 2017

Note: If someone is willing to mentor me on this bug, I'm interested in tackling it.

Yoric on 20 Nov 2017

@Gankro How are we going to track that most of the optimizations mentioned this have been implemented, and the various tricks required to make any further changes?

eddyb on 20 Nov 2017

@eddyb are any of the optimizations in the OP not implemented? It looks like at least most are, and if so we might want to turn this into a metabug that points at sub-issues for the remainder, or just close it.

Gankra on 20 Nov 2017

@Gankro: re NaNs, you mean checking after arithmetic operations whether the result was NaN? Maybe. On the other hand, such types could at least implement Ord and Eq; and a 'designated canonical NaN' type might even elide most of these checks (if the canonical NaN is chosen wisely), thanks to NaN propagation semantics of IEEE 754.

fstirlitz on 20 Nov 2017

@eddyb Could Cow<str> be represented on three words? Cow::Owned(String) would be String (with a non-zero pointer, assuming it is first) and Cow::Borrowed(&str) would be (0, &str). I half expected https://github.com/rust-lang/rust/pull/45225 to do this but it doesn’t.

SimonSapin on 21 Nov 2017

👍4

Yeah, I only figured out how to later. Also, it requires rustc_trans to generate LLVM constants from miri allocations, for us to be able to combine the niche with additional data.

eddyb on 21 Nov 2017

@SimonSapin another alternative: since capacity > len, &str can be represented exactly as String with capacity == 0. The con is we can't distinguish between zero-sized &str or zero-sized String. The great thing is for all practical purposes we don't have to. My suggestion also keeps pointer non-zero, so even Option<MyCow> has the same layout.

I've actually started writing such crate for fun.

Same holds for Vec

Kixunil on 23 Nov 2017

But the compiler doesn’t know about capacity > len. It does already know that NonZero<*mut u8> inside String is non-null, though.

SimonSapin on 23 Nov 2017

Yes. The compiler will probably never be able to do the optimization using capacity >= len because of the weird empty case, which we know doesn't matter but explaining that to the compiler would be a nightmare.

Kixunil on 23 Nov 2017

Since a part of the original desired optimizations have been implemented in rust-lang/rust#45225, and most of the remaining ones have various trade-offs that need to be explored by RFC for each category separately (with the exception of https://github.com/rust-lang/rust/issues/46213), I'm going to close this central issue.

eddyb on 23 Nov 2017

❤1

Was this page helpful?

0 / 5 - 0 ratings