Rust: Tracking issue for std::num::NonZero* types

Created on 18 Mar 2018 · 49Comments · Source: rust-lang/rust

This is a tracking issue for the RFC "Add std::num::NonZeroU32 and friends, deprecate core::nonzero" (rust-lang/rfcs#2307).

Steps:

[x] Implement the RFC: https://github.com/rust-lang/rust/pull/48265
[x] Adjust documentation (see instructions on forge)
[x] Stabilization PR (see instructions on forge) https://github.com/rust-lang/rust/pull/50808

Unresolved questions:

[x] Should the signed types be included?
[x] Decide on naming -- NonZeroU32 or PositiveU32? Should the U be dropped?
[ ] Should the memory layout of e.g. Option be a language guarantee?
[ ] Settle issues wrt. const generics before stabilization.
-- We should either end up with a generic type or with a clear explanation of why a generic type can't work or doesn't work well.

B-RFC-implemented C-tracking-issue T-lang T-libs disposition-merge finished-final-comment-period

Source

Centril

👍1

Most helpful comment

This is probably more amusing than significant, but another real-world use case of signed-numbers-without-zero is years.

glaebhoerl on 25 Nov 2018

😄6

All 49 comments

Implementation PR: https://github.com/rust-lang/rust/pull/48265

Documentation: @Centril, do doc-comments in this PR look sufficient to you?

My opinion on unresolved questions:

Signed types (e.g. NonZeroI32): as discussed in the RFC thread, they are included in the RFC and implementation PR because it was easy and because core::nonzero::NonZero<T> supports them too. I see a downside to having them, but I also haven’t had a use case for them. And they are easy to build outside of std in a crate that uses the unsigned types internally and does the conversion in new, new_unchecked, and get. (Unlike e.g. non-zero integers of different sizes if only NonZeroUsize were available.)

Positive* names: The fact that zero is not positive in English is subtle, and it is not shared by some other languages. (For example zero is both positive and negative in French, and the correct translation for “nombre positif” is “non-negative number”.) I don’t think relying on such a subtlety to communicate the core property of these types is a good idea.

Dropping the U in names: this is only possible if we also remove the signed types. But even so, I think that including the full name of the type accepted by new and returned by get is desirable.

Separate types v.s. one generic type: this was discussed at length in the RFC. The former has precedent with std::sync::atomic::Atomic*. The latter brings other unresolved questions that I feel don’t have satisfactory answers: trait bounds on that type parameter, and implementations of relevant traits for composite for types outside of std.

SimonSapin on 18 Mar 2018

Making the memory layout of Option<num::NonZero*> a language guarantee: I explicitly made the RFC not make this guarantee in order to keep the RFC within the realm of the standard library (and hopefully easier to get accepted). But if @rust-lang/lang feels comfortable making that guarantee, that sounds good to me.

SimonSapin on 18 Mar 2018

@SimonSapin

Documentation: @Centril, do doc-comments in this PR look sufficient to you?

Pretty much, yeah =)

On the unresolved questions:

Maybe we should skip the signed types but add them when someone asks for them with an issue or PR or on IRC (don't need to go to another RFC then...)?
I agree that NonZeroU32 is best.

Centril on 18 Mar 2018

👍1

For what it’s worth the implementation of the 6 signed types costs exactly an additional 111 characters in a macro invocation. They’re not a maintenance burden.

SimonSapin on 18 Mar 2018

For what it’s worth the implementation of the 6 signed types costs exactly an additional 111 characters in a macro invocation. They’re not a maintenance burden.

This is true; but they cost in compile times -- but probably not a noteworthy amount of time.

Centril on 18 Mar 2018

👍1

Honestly, compile time of end-user crates matters a lot more than compile time of libstd and co. because for most users libstd will already be compiled. So, even adding a minute of compile time to libstd isn't super bad.

clarfonthey on 19 Mar 2018

👍3

Sounds like I'm the only one who likes Positive, so let's mark that resolved and stick with NonZeroU32.

Maybe just put the signed ones under a different feature gate? That way we'll find out which ones people use, and there's still nobody "attached" to them, stabilize/remove them separately. (Personally I think 0 is a weird hole in a signed type, and would much rather have nice named types for symmetric integers [not INT_MIN] than ones without zero. [Until we get const generics and can make our own holes.])

scottmcm on 20 Mar 2018

👍3

I honestly think that Positive8 is a bit ambiguous considering how it could mean only 1 to 127 instead of 255.

NonZeroed8 may be a bit (heh) cleaner as it indicates that the bits aren't all zeroed, rather than the fact that 0 is an excluded value.

clarfonthey on 20 Mar 2018

The implementation https://github.com/rust-lang/rust/pull/48265 has landed, it should reach Nightly in a few hours.

SimonSapin on 23 Mar 2018

🎉2

@SimonSapin how do you feel about @scottmcm's note that we should:

Maybe just put the signed ones under a different feature gate?

Centril on 24 Mar 2018

How useful is this to do now? (v.s. changing the status later of only a subset)

My thoughts lately on the signed types is that there isn’t really a strong case against them, but there isn’t much of a use case either. Adding that they can be easily be built out of tree on top of the unsigned types, I’d be ok with removing or deprecating them and see if anyone objects. They’re easy to add (back) later if we decide to do so.

SimonSapin on 24 Mar 2018

@SimonSapin

I’d be ok with removing or deprecating them and see if anyone objects. They’re easy to add (back) later if we decide to do so.

👍

Centril on 24 Mar 2018

Deprecation of NonZeroI* https://github.com/rust-lang/rust/pull/49324

Support for NonZeroU* in serde: https://github.com/serde-rs/serde/pull/1191

SimonSapin on 24 Mar 2018

Sorry, I missed the RFC on this.

I implement Zeroable on a few structs that are defined like this struct A(usize). Sometimes I would stick an A in a NonZero and sometimes I wouldn't depending on the context. A has some invariants it maintains. Is the recommended solution, in light of the deprecation, to have two different types with the same interface? A and ANonZero - mirroring the RFC?

mtak- on 24 Mar 2018

Although it probably happened to work out in this specific case (struct with a single field that is a primitive Zeroable) I don’t know if implementing Zeroable for other types was ever officially well-defined.

I’d recommend either A + ANonZero as you said, or a generic A with a trait that you implement for both usize and NonZeroUsize.

SimonSapin on 24 Mar 2018

👍1

JFYI: I'm excited for this to become stable someday. =)

Though what I truly want is NonMaxU32 (which cannot be u32::MAX), so that I can model vector indices without having to subtract one. But I can live with NonZeroU32. =)

nikomatsakis on 8 Apr 2018

@nikomatsakis JFYI (and for anyone else looking around): The crate optional is on crates.io, and provides Optioned<T> with size_of::<Optioned<T>>() == size_of::<T>, for all of the primitive numeric types, reserving u*::MAX, i*::MIN, and f*::is_nan as the None value.

(Just don't use the char one in 0.3.0, it's unsound in the presence of niche-filling chars, that's my fault, and I'm trying to get it yanked.)

CAD97 on 23 Apr 2018

👍1

While the current form of the API has been in Nightly for only a month, this functionality has existed for several years now. I’d like to stabilize:

@rfcbot fcp merge

@nikomatsakis

Though what I truly want is NonMaxU32

Do you feel that’s a reason not to stabilize what’s implemented today?

Nominating for the @rust-lang/lang team to discuss whether the memory layout of Option<num::NonZeroU32> should be a language guarantee like Option<&T>. (And Option<ptr::NonNull<T>> as well?)

@Centril

We should either end up with a generic type or with a clear explanation of why a generic type can't work or doesn't work well.

I feel that this has been discuss at length in the RFC. What would you like to see here?

SimonSapin on 27 Apr 2018

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

[x] @Kimundi
[x] @SimonSapin
[x] @alexcrichton
[x] @aturon
[x] @dtolnay
[x] @sfackler
[ ] @withoutboats

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot on 27 Apr 2018

If we do this more generally later we could always make these type aliases. Considering how far off const generics are, I don't mind this.

clarfonthey on 27 Apr 2018

@SimonSapin

I feel that this has been discuss at length in the RFC. What would you like to see here?

Just lift / summarize the explanation from the RFC into here should be fine IMO.

PS: Should the FCP include the lang team's ticky boxes?

@clarcharr Are there no possible issues wrt. coherence there?

Centril on 27 Apr 2018

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot on 2 May 2018

I'm so excited for these features. I want Option<MyIndex> to be one word big darn it!

nikomatsakis on 2 May 2018

@SimonSapin The main reason to make such a guarantee would be to make sure you can use it in FFI, just like the null-pointer optimization.

That seems pretty reasonable to want. I'm not sure we should promise that yet, but it'll be true in practice, and we should plan on making that promise stable in the future.

joshtriplett on 10 May 2018

❤1

Note that for guaranteeing ABI, you'll want #[repr(tranparent)] on all the types involved.
Memory layout is not enough if you plan to pass these by-value from/to C.

eddyb on 11 May 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

rfcbot on 12 May 2018

Stabilization PR: https://github.com/rust-lang/rust/pull/50808, leaving this tracking issue open for memory layout and ABI guarantee discussion.

SimonSapin on 16 May 2018

I'm coming from the warning added in PR #49324. My use case for NonZeroI32 is an optimisation for specs. Specs marks each entity with a i32 generation, where the positive, negative and zero cases all have special meaning (code link). I would like to replace this with Generation(NonZeroI32) to enable some optimisations but instead am forced to use NonZeroU32 which is a bit less ergonomic.

andrewhickman on 2 Jun 2018

cc @SimonSapin on @andrewhickman's use case.

Centril on 9 Jun 2018

@Centril I was in favor of just adding the signed types together with the unsigned ones because it was easy, I’m not the one that needs convincing.

@andrewhickman Given that this is closed, consider filing a PR to add the signed types. In the meantime, maybe you can have a type with a NonZeroU32 private field that casts to/from signed on access.

SimonSapin on 9 Jun 2018

I arrived here through the deprecation notice for NonZeroI32. I am working with OpenGL and want to wrap uniform locations which are represented by an i32 in the OpenGL API. Uniform location values are non-negative except for the sentinel value -1 representing a location that wasn't found. I would liked to have defined the wrapper as struct UniformLocation(NonZeroI32), storing the raw uniform location value + 1. Currently I use NonZeroU32 and a few casts which is fine but I wanted to report my use case regardless. Alternatively I would be able to specify that -1 can represent None as mentioned in https://github.com/rust-lang/rust/issues/49137#issuecomment-374445092.

mickvangelderen on 26 Jul 2018

Alternatively I would be able to specify that -1 can represent None

There seems to be consensus that a more general feature, where you can specify what (ranges of) values are forbidden, would be desirable. But it would likely be based on const generics which is not implemented in the language yet.

In the meantime, NonZeroI32 with an offset would allow values -2, -3, etc which you’re saying should not be allowed in this context. So it sounds like it wouldn’t do what you want. If I missed something, could you give a more detailed example?

SimonSapin on 26 Jul 2018

Would using an unsafe trait to specify invalid ranges be viable? I think traits are a great way to specify interfaces, so that idea appeals to me at least from this point of view.

Example:

/// Allows for a type to specify invalid bit pattern in order to allow enum layout optimization.
#[some_magic_lang_item_annotation]
unsafe trait InvalidValues: Sized {
    /// How many invalid values can be written into the same memory Self fits in.
    const INVALID_VALUE_COUNT: usize;

    /// Must map discriminant to some invalid bit pattern and write it to `dest`.
    /// The caller must guarantee that `discriminant < Self::INVALID_VALUE_COUNT`.
    /// The caller must also provide a valid pointer. The pointer must point to uninitialized memory.
    const unsafe fn write_invalid_value(dest: NonNull<Self>, discriminant: usize);

    /// This function must inspect the memory pointed to by `value` and return the same discriminant as previously written by `write_invalid_value()`. In case the memory contains valid value, this function must return `Self::INVALID_VALUE_COUNT`.
    const unsafe fn get_discriminant(value: NonNull<Self>) -> usize;
}

It would allow users to specify arbitrary invalid bit patterns. That way we could even have things like:

#[repr(transparent)] // not necessary, but why not?
struct LessThan253(u8);

impl InvalidValues for LessThan254 {
    const INVALID_VALUE_COUNT: usize = 2;

    unsafe fn write_invalid_value(dest: NonNull<Self>, discriminant: usize) {
        mem::write(dest.as_ptr() as *mut u8, discriminant as u8 + 254);
    }

    unsfe fn get_discriminant(value: NonNull<Self>) -> usize {
        let value = mem::read(value.as_ptr() as *mut u8) as usize;
        if value < 254 {
            Self::INVALID_VALUE_COUNT
        } else {
            value - 254
        }
    }
}

fn test() {
    enum Foo {
        Value(LessThan254),
        Bar,
        Baz,
    }

    assert_eq!(mem::size_of<Foo>() == mem::size_of<LessThan254>());
    assert_eq!(mem::size_of<u8>() == mem::size_of<LessThan254>());
}

What do you think?

Kixunil on 25 Aug 2018

@Kixunil Using just associated consts could maybe work, but I really really do not want to involve the trait system during layout computation and risk weird inconsistencies because of how people wrote the impls, or have to do complex checking ahead of time, when the impls are first written.

Using methods is a no-no, it means you're getting no LLVM optimization (), no nested reuse (for the same reason Option<Option<(&T, &U)>> is not the same size as Option<(&T, &U)>), miri wouldn't work, and probably a myriad of other things that escape me right now.

IMO the only reliable way to get this is const generics and a wrapper type that removes a values of a certain width (or just full width), at a certain location (or just offset 0), from the type.
The name is (obviously) up for bikeshed, but WithInvalid<u8, 254..=255> is one option.

I think traits are a great way to specify interfaces

Sure. But this is not an interface, it's a fundamental representational property of the data type.
You can totally emulate Option<T> using some chosen value of T for None, in a library, but this is not that - the invalid values the compiler knows about are UB to exist.

In your case, who's stopping me from creating LessThan254(255)? You have to write checks in unsafe code, and manually maintain the invariants, whereas a wrapper type would do all that for you.

eddyb on 26 Aug 2018

❤1

@eddyb Could you give an example of weird inconsistency that might happen? People can already shoot themselves in the foot in unsafe code. I don't see a problem with careful use of unsafe when it's really needed.

Using methods is a no-no, it means you're getting no LLVM optimization

Why? If LLVM can optimize some functions, what's preventing it from optimizing other functions?

miri wouldn't work

Why one const fn works in miri but not other const fn?

the invalid values the compiler knows about are UB to exist.

would using &mut MaybeUninitialized<Self> in the signature help?

In your case, who's stopping me from creating LessThan254(255)

What's preventing me to implement e.g. Mutex incorrectly? Nothing. I have to use the unsafe keyword, though. If someone impls that trait for their type incorrectly, it's the same thing as screwing up other unsafe code. I don't understand, why this unsafe code should be special.

WithInvalid could be implemented in terms of InvalidValues. Additionally, InvalidValues can be used for things like OddNumber, which can't be achieved with simple range.

Kixunil on 9 Sep 2018

Could you give an example of weird inconsistency that might happen?

The problem is a trait impl is not "inherent" to a type. We'd have to pile on a bunch of checks, to ensure impls of this trait are "boring" wrt to when they hold. Most importantly, the implementation must not depend on type parameters, nor introduce relationships between lifetime parameters (or just not have generic parameters at all). In a way, this is similar to the restrictions of the Drop trait.

With all of that work, and const generics on the horizon, I prefer an "obviously correct" solution using const generics, instead.

Why? If LLVM can optimize some functions, what's preventing it from optimizing other functions?

I mean we can't tell LLVM anything about value validity, if you have functions doing encoding and decoding. Our existing contiguous "(in)valid ranges" actually lower to LLVM !range metadata.

Why one const fn works in miri but not other const fn?

We don't have const fn in traits yet.

would using &mut MaybeUninitialized<Self> in the signature help?

Sure, but you actually want a custom union between Self and a type with the same primitive field but no validity semantics, because otherwise you can't access the primitive.

Additionally, InvalidValues can be used for things like OddNumber, which can't be achieved with simple range.

I personally have no intention of supporting those kinds of optimizations. they can significantly complicate discriminant decoding (which happens at runtime, unlike encoding), whereas, currently, all of our enums, even when using invalid values, take at most one subtraction and one comparison, to decode the discriminant.

eddyb on 9 Sep 2018

That makes a lot more sense to me now, thanks for explaining!

Kixunil on 16 Sep 2018

Somewhere else (the PR that deprecated them) one is requested to drop in here and indicate if you use (or would use, I guess) signed NonZero variants. I do / would, lots. The most common use case is in differential dataflow (but it is probably more common) where you record changes to things. You don't ever need to record a zero change, and indeed often you want to use types to communicate the non-zero'd ness.

For example, I'd love to be using a trait

trait Addition {
    /// returns the result of the summation if non-zero.
    fn add(self, other: Self) -> Option<Self>;
}

The important point for me is that I'd like the opportunity for the zero value to not inhabit the types, so that e.g. () could implement GF(2), which is important for zero-overhead implementation of incremental updates to collections with set semantics (vs tracking a 0/1 multiplicity as an associated integer).

Without non-zero signed integers, the above Option<isize> ends up being a bit bigger and probably a bit more complicated logically (vs simply addition with the zero value interpreted as None).

Anyhow, I can work around forever, obvs, but I thought you all might want to know! :D

frankmcsherry on 25 Nov 2018

@frankmcsherry FWIW this should be sufficient:

#[derive(Copy, Clone)]
pub struct NonZeroI32(NonZeroU32);

impl NonZeroI32 {
    pub fn new(x: i32) -> Option<Self> {
        NonZeroU32::new(x as u32).map(NonZeroI32)
    }
    pub fn get(self) -> i32 {
        self.0.get() as i32
    }
}

eddyb on 25 Nov 2018

There is also a crate implementing signed NonZero type

andrewhickman on 25 Nov 2018

❤2

Yup, thank you! I'm def happy working around, but equally happy to let you folks know about use cases (and if other people have them, put something like this next to NonZeroU32 in the stdlib).

frankmcsherry on 25 Nov 2018

Given the existence of a work-around that preserves the enum layout optimization, this doesn’t seem very pressing. On the other hand, adding them is very easy, and not really any additional maintenance burden. So I’m really neutral on this one. Consider nagging some other member of the libs team :)

SimonSapin on 25 Nov 2018

This is probably more amusing than significant, but another real-world use case of signed-numbers-without-zero is years.

glaebhoerl on 25 Nov 2018

😄6

Another use case for signed non-zero numbers: I'm currently working with a C library where the number 0 is significant. Representing this as an enum would probably work well I reckon.

libnghttp2_sys priority_spec.stream_id

API Example

An example of how to represent the libnghttp API above:

pub type StreamId = NonZeroI32;

pub enum StreamDependency {
  Independent,
  Dependent(StreamId),
}

yoshuawuyts on 6 Jan 2019

For now, can't signed nonzero types just be implemented as a separate crate? Then people can judge inclusion into libstd based upon how many people use it.

clarfonthey on 6 Jan 2019

Yes, https://crates.io/crates/nonzero_signed was mentioned upthread.

SimonSapin on 6 Jan 2019

...oh. In that case, perhaps this issue could be locked from future comments on that and a larger note could be made to mention that crate?

clarfonthey on 6 Jan 2019

https://github.com/rust-lang/rust/pull/57475 Add signed num::NonZeroI* types

SimonSapin on 9 Jan 2019

Will this ever be available to users?
i.e. https://play.rust-lang.org/?gist=5c8f2f2438da9952963f88449bf0239e