Rust: Tracking issue for `NonZero`/`Unique`/`Shared` stabilization

Created on 12 Aug 2015  Â·  114Comments  Â·  Source: rust-lang/rust

We currently have three internal types, NonZero, Unique and Shared, that are very useful when working with unsafe Rust code. Ultimately, the hope is to have a clear vision for semi-high-level, but unsafe programming using types like these.

None of the current APIs have gone through the RFC process, and a comprehensive RFC should be written before any stabilization is done.

cc @Gankro

B-unstable C-tracking-issue T-libs

Most helpful comment

I still want the "common" constructor of NonZero to be T -> Option<NonZero<T>>.
It's safe, a noop and you're doing the check somewhere anyway (except for some unsafe APIs).
We could really be using NonZero<usize> -> Option<Unique<u8>> as the malloc signature.

All 114 comments

I'm not a fan of NonZero. I would like a more robust system that can hook into the things described in https://github.com/rust-lang/rfcs/issues/1230.

Unique is great, but I would like to rename it to Owned to better reflect what its actual semantics are (e.g. it's fine to have other pointers into it just like it's fine for Box).

I would like to add a Shared equivalent for Rc/Arc to use. This is blocked by https://github.com/rust-lang/rust/issues/26905

I expanded this issue to cover the new Shared type (https://github.com/rust-lang/rust/pull/29110).

CC @SimonSapin

Does Shared/Unique cover all your NonZero usecases?

I commented about this in #29110, reposting here for visibility:

The Deref impls on these types are bad interfaces because they remove the optimizations NonZero can trigger, specifically the control-flow ones (not the layout optimizations, which are unaffected).

They also allow writing *p where p is a ptr::Unique<T> or a ptr::Shared<T> to obtain a *mut T, which might make unsafe code more confusing (you need **p to get at the T lvalue).

NonZero and wrappers such as Unique and Shared, should use a Cell-like API (.get() and .set() - or just .get(), really), allowing LLVM to reason about their non-nullability.

@eddyb alternatively, we could just promote these types to compiler primitives, right? Have them coerce to raw pointers or summat? This may also improve debug perf, as there's less abstraction in the way if it's really "just" a raw pointer, and not a Foo(Bar(*const T)) that is accessed through two levels of methods.

I agree with @eddyb's comment about using a Cell-like API for NonZero (and other types). It makes semantic sense.

The libs team decided to not move these APIs into FCP this cycle because the NonZero type is at the heart of all of them, and we're the _least_ sure about NonZero.

CC @SimonSapin

Does Shared/Unique cover all your NonZero usecases?

(Answering with a small delay… :))

Unfortunately no. We would like to use NonZero<u64> instead of u64 in string_cache::Atom, but we can’t at the moment because NonZero’s field being private prevents matching on it in a pattern: https://internals.rust-lang.org/t/matching-on-struct-patterns-with-private-read-only-fields/3499

I don't know if this is the right place to comment, but I'm binding some C code and Unique appears perfect - except that it exposes the unsafe rather than that being internal - meaning that I need a further wrapper around it - since I know the C ptr in question isn't going to be freed randomly, and I know that its not null, I want to hide the unsafe entirely from further out users.

@rbtcollins My suggestion would be to write a wrapper struct that has a Unique as a member, and that frees the pointer in the Unique in its Drop impl.

N.B. *move from https://github.com/rust-lang/rfcs/pull/1646 would almost surely replace this.

If we keep NonZero or something like it, I would love a safe (checked) way to construct one. Currently the single line of unsafe code in Serde is constructing a NonZero.

if value == Zero::zero() {
    return Err(Error::invalid_value("expected a non-zero value"))
}
unsafe {
    Ok(NonZero::new(value))
}

I'm making changes to this API in this PR https://github.com/rust-lang/rust/pull/41064

Would it make sense for NonZero to be an attribute of fields rather than a wrapper type? That would enable string_cache::Atom to use it. Currently it can’t because we have a macro that expands to an Atom struct literal that can be used as a pattern in a match expression, so all transitive fields must be public.

I would like to propose we stabilize these APIs now that my changes in #41064 have landed. They have varying levels of controversy, so here's my priority's prefs:

  • Shared should absolutely be stabilized ASAP. I don't think it needs any more work.

  • Unique should probably be stabilized, but doesn't really need to be, and might want some more scrutiny. Anyone who wants to use it can use Shared, because they provide the same API and there's no actual optimization benefits between the two today. They should probably also use Shared anyway to avoid running into UB that isn't even specified yet.

  • NonZero could be stabilized, but it's always going to be a lame hack API, and I don't think it really matters (Shared does most of its work). That said I don't think NonZero will conflict with any future system that would theoretically obsolete it, so it's probably fine to stabilize it with a shrug.

You can view the current APIs here:

I discuss some rationale for their design here

The major change is that the Deref impls are dead, as they confused LLVM. Now you explicitly call get on NonZero, or as_ptr on Shared/Unique.

Shared and Unique are now exactly the same (covariant, nonzero, takes/produces *mut), except for this distinction:

Shared is similar to Unique, except that it doesn't make any aliasing guarantees, and doesn't derive Send and Sync. Note that unlike &T, Shared has no special mutability requirements. Shared may mutate data aliased by other Shared pointers. More precise rules require Rust to develop an actual aliasing model.

This basically leaves Shared spec'd as "*mut if it worked the way you wished it would". There's potentially room for something between Unique and Shared to capture the subtle aliasing semantics that Rc and Arc have, but we desperately need a type like Shared for the broader ecosystem, and it's not like we have immediate plans to add more aliasing metadata to Shared/Unique.

Unique on the other hand is spec'd as having aliasing guarantees "as if" the Unique<T> were a T, whatever we decide those aliasing guarantees are.

Basically I punted on needing to spec Rust's aliasing rules by giving these two types the aliasing rules of other types.

Some controversial points to note during FCP:

  1. Shared/Unique are now Copy. This is basically an ergonomics thing, to make them as similar to *mut as possible. It's unsafe code, we should be getting out of our users' way. I have mild concern this is sketchy for Unique with its new aliasing model, but mostly I don't think it will matter.

  2. Shared is now covariant, even though it's "aliased and mutable". Why? Because it would literally be useless otherwise. Arc, Rc, LinkedList, and every other type that uses Shared needs covariance, and there's no way to "undo" invariance. Yes this is a foot gun. No I don't think it's a big deal (it's almost always right -- literally always right for the stdlib). Shared's documentation highlights the issue.

  3. Shared/Unique don't require/guarantee alignment when they dangle. This means enums potentially miss out on Even More Tagging Optimizations. However it means that users can do their own by-hand tagging optimizations. For instance HashMap does this today.

@nikomatsakis briefly mused on having a TaggedUnique type to allow misalignment during storage that would provide conveniences for masking off or extracting the align-bits.

I updated the stdlib so everyone but HashMap is now setting their dangling Unique/Shared pointers to mem::align_of::<T>, and deprecated heap::EMPTY. Shared and Unique now have ::empty() constructors that will handle this for you (bonus: they're safe, unlike ::new()). It would be backwards compatible to require alignment now, and abandon it later if it turns out to be more trouble than it's worth. However all current nightly users of Shared/Unique may miss this is a new requirement.

  1. Big +1 to stabilizing Shared
  2. If Unique can always be replaced with Shared (plus maybe impls for Send or Sync), why shouldn’t it be removed entirely?
  3. Of all the potential uses of NonZero that I know of, all but one can/should use Shared (which in turn uses NonZero internally). The remaining one is string_cache, it can not use NonZero in its current API either because NonZero’s field is private, which prevents it from being matched in a pattern.

    So I have no desire to see the current NonZero stabilized. I’d like to see something else that would allow pattern matching, but I don’t know how to do that without also introducing unsoundness. (A public field could be set to zero.)

I'm actually most excited about NonZero myself. It's very useful for integer ID values, where a value of 0 is an invalid ID. Wrapping these types in NonZero would allow them to be wrapped in an Option at no extra cost in space. Using an Option is much more ergonomic than checking for 0 everywhere (it's basically the same issue as nullable pointers).

I frequently use NonZero with integers as well; though what I really want is NonMax (since I am always doing this with indices, and 0 is a perfectly good index). I tend to write some newtype to mask it for me.

Or -1, which is commonly used to indicate an invalid index. One way to do this with the current NonZero is to add 1 to whatever value you are storing (0 => None, 1 => Some(0), 2 => Some(1), etc).

Stabilizing NonZero probably involves stabilizing the Zeroable trait and re-exporting them in std. Is core::nonzero / std::nonzero the desired module name?

Zeroable shouldn't need to be stabilized, I don't think?

I think that stabilizing APIs that have bounds on a trait without stabilizing that trait should not become a habit. It’s been done a couple times, and is maybe acceptable when we have yet to figure out some details of the trait before stabilizing it, but I don’t think we should do it if the trait does not have a path to stabilization at all.

@Amanieu @nikomatsakis

I'm actually most excited about NonZero myself. It's very useful for integer ID values, where a value of 0 is an invalid ID. Wrapping these types in NonZero would allow them to be wrapped in an Option at no extra cost in space. Using an Option is much more ergonomic than checking for 0 everywhere (it's basically the same issue as nullable pointers).

I frequently use NonZero with integers as well; though what I really want is NonMax (since I am always doing this with indices, and 0 is a perfectly good index). I tend to write some newtype to mask it for me.

At some point, doesn't this turn into a discussion of what generalized properties should be encoded in the type system (with the extreme case being full-on refinement types)? I'm not trying to shut this discussion down, but in fact do the opposite - I think it'd be an interesting exercise to consider the total scope of properties of the form "this variable can only take values in this subset of its domain" that we might want to encode in the type system. We've already mentioned non-zero, non-max, and non-negative. What others would be reasonable/useful?

This is very out of scope for this issue, IMO.

I still want the "common" constructor of NonZero to be T -> Option<NonZero<T>>.
It's safe, a noop and you're doing the check somewhere anyway (except for some unsafe APIs).
We could really be using NonZero<usize> -> Option<Unique<u8>> as the malloc signature.

I still want the "common" constructor of NonZero to be T -> Option>.

Yes. I’ve opened https://github.com/rust-lang/rust/pull/42959 to do this.


With that change, I’d like to propose FCP to stabilize Shared. It’s been around for more than long enough, and I think we’ll have addressed all known concerns.

I’m not proposing to stabilize (yet) Unique because its value over Shared seems uncertain. And not NonZero either because I think we should not stabilize an API that involves a trait bound without stabilizing that trait, and I feel Zeroable is not polished enough (though I don’t have a concrete proposal for it, sorry).

A hack to get the equivalent of NonZero<usize>, NonZero<*const T>, or NonZero<*mut T> on stable Rust is to abuse &'static T (in a properly encapsulated struct):

#[derive(Copy, Clone)]
pub struct NonZeroUsize(&'static ());

impl NonZeroUsize {
    #[inline]
    pub unsafe fn new(value: usize) -> Self {
        NonZeroUsize(&*(value as *const ()))
    }

    #[inline]
    pub fn get(self) -> usize {
        self.0 as *const () as usize
    }
}

(I’m not sure who invented/discovered it first. @bholley? @Manishearth?)

I think it was emilio or mystor.

It was actually @nox, though he may be embarrassed to admit it. ;-)

@SimonSapin I invented that technique independently too. However, in case of &'static () I fear that compiler could "optimize" the reference into NULL, since () doesn't actually exist.

There is no fear to have, this can't happen.

Seems like I got confused by this RFC.

That RFC would break too much code to justify implementing it in Rust 3.0.

Back on topic:

I still want the "common" constructor of NonZero to be T -> Option>.

Yes. I’ve opened https://github.com/rust-lang/rust/pull/42959 to do this.

Something came up in this PR’s discussion. I renamed the existing new constructors to new_unchecked and changed all usage with sed. It turns out that most uses, at least in the standard library, did not do a zero/null check themselves because the value is known to be non-zero / non-null, for example a pointer from Box::into_raw. So if changed to the checked constructor they would use .unwrap(). It’s debatable which of an unsafe block or using .unwrap() is preferable.

I think we should have both checked and unchecked constructors. I don’t mind much which one is "the default" (which one gets to be named new). Thoughts?

When creating newtypes with invariants I (almost?) always provide fn new(val: Type) -> Option<Self> and unsafe fn new_unchecked(val: Type) -> Self. I see no reason why standard library should do it differently.

Yes, foo + foo_unchecked is the precedent in the standard library too. Thought to be honest I care more about pushing this stuff to stabilization than about exact naming. I wouldn’t really mind new_checked + new.

because the value is known to be non-zero / non-null

Then the type should reflect that (e.g. in allocators). Box::into_raw is a bit unfortunate I suppose.

We could add a Box::into_nonzero method for unsafe-but-not-FFI use cases like https://github.com/rust-lang/rust/pull/42959

@Ixrec Unique and/or Shared are a bit closer to what Box is.

Perhaps also add conversions between Unique<T>, Shared<T>, and NonZero<*mut T>?

I'm proposing FCP for Shared, though feel free to argue for stabilization of other pieces as well.

@rfcbot fcp merge

While I’m happy to see progress happening, we should decide on https://github.com/rust-lang/rust/pull/42959#issuecomment-312186551 first.

Any reason why there's no impl<T> From<NonZero<T>> for T?

I think NonZero might be used in safe code too to guarantee that some number is not zero. That would mean unsafe fn new_unchecked() is probably preferable.

@SimonSapin ah, hadn't gotten there in my inbox triage yet.

I'm basically with @alexcrichton: I think it's fine to have both variants, don't have a strong opinion on which should be the "default", except that it should match expected/existing usage. @Gankro, do you have thoughts?

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

  • [ ] @BurntSushi
  • [ ] @Kimundi
  • [x] @alexcrichton
  • [x] @aturon
  • [ ] @brson
  • [ ] @dtolnay
  • [ ] @sfackler

Concerns:

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot concern new-unchecked

Just to make sure we don't lose this...

I'm not even sure if @aturon's proposed stabilization covers this method, but I'm sort of assuming it does. This method has not landed yet and is proposed in #42959. I feel that it's not ready yet because it was added but all existing uses of new now use new_unchecked, which to me seems like we're calling the "idiomatic constructor" new_unchecked and the rarely used constructor new. I think we may want a better balance of concerns here, more conversions as mentioned above, or something before stabilizing such a method.

The status quo is that new is unchecked (and unsafe) and is the only constructor. I think that adding checked constructors is uncontroversial. So options are:

  • fn new_checked(…) -> Option<Self> and unsafe fn new(…) -> Self
  • fn new_checked(…) -> Option<Self> and unsafe fn new_unchecked(…) -> Self
  • fn new(…) -> Option<Self> and unsafe fn new_unchecked(…) -> Self

They correspond respectively to the first commit of #42959, the first two commits, and all three commits.

Given what I've currently seen (everything in libstd uses the "unchecked constructor") I'd avocate omitting a checked constructor from a first pass at stabilization.

Given what I've currently seen (everything in libstd uses the "unchecked constructor")

Just to be clear, this is intended to mean "nothing currently in libstd wants the checked constructor", rather than "nothing uses it because it doesn't yet exist"?

@glaebhoerl There’s some of both.

Alright, I went through every call site changed in #42959 (which I initially made mechanically with sed). Results below

It’s not as clear-cut as @alexcrichton suggests. First, most calls could be made unnecessary. Of the few remaining ones, most could be either unsafe unchecked or checked + unwrapping. I think we generally prefer safety by default (for example with checked slice indexing), resorting to unsafety when some piece of code turns out to be part of hot loops and the optimizer turns out to be unable to prove enough invariants to eliminate checks. But it’s not clear at all that this trade-off was made consciously in any case here, since the checked constructors did not exist before that PR.


I counted 31 call sites outside of tests.

  • 21 can be replaced with new APIs that encode non-zeroness in their signature, removing the need for checking or unsafely assuming:

    • Adding conversions between Unique<T>, Shared<T>, NonZero<*mut T>, and possibly NonZero<*const T>, with up to 6 new impls of the From trait. This would replace 2 calls directly, and more when combined with following items.
    • Various memory allocations APIs like Alloc::alloc return a pointer that is guaranteed to be non-null. They should encode this in their signature by returning e.g. Unique<T> instead of *mut T. Would replace 5 calls.
    • Box::into_raw also returns a *mut T that is guaranteed to be non-null. A new Box::into_unique(Box<T>) -> Unique<T> method could be added, replacing 9 calls.
    • 2 calls have a parameter of the form ptr.offset(offset_of!(…)) perhaps offset_of! should return Shared<T>?
    • 2 calls convert their argument from &mut T to *mut T. More From impls?
    • Vec::into_iter uses Shared::new_unchecked(self.as_mut_ptr()). This could be replace by a new RawVec method that returns its Unique<T> field as such, not just as *mut T.
  • One call (in btree::node::NodeRef::ascend) does an explicit null check, and could match on the return value of a checked constructor instead.

  • Three calls are in functions that are themselves unsafe, take a *mut T parameter, and document that the caller must not pass null pointers. Since unsafety is already involved, these probably should keep using the unchecked constructor. (If these APIs were not stable already, we should consider making them take Shared<T> or Unique<T> instead.)

  • 6 calls use the unchecked constructor in an unsafe block, but could just as well use the checked one and then use .unwrap().It’s a subjective call whether unsafety or unwrapping is preferable.

    • Shared::empty and Unique::empty are constructors based on mem::align_of, whose return value is always >= 1.
    • One call has an argument of the form (x as u32) + 1.
    • Three calls do manual pointer tagging (starting from &T or Unique<T>), so From impls wouldn’t help there.

By the way, this shows that Unique<T> and Shared<T> are mostly interchangeable. According to https://github.com/rust-lang/rust/issues/27730#issuecomment-300530242 the only difference is that Unique<T> has additional non-aliasing semantics that could potentially enable future optimizations. But such optimizations apparently don’t exist yet, so they’re effectively the same in today’s compilers.

Is it really worth having two separate types? The name Shared mostly only makes sense when opposed to Unique. If the two types were to be merged into one, what should it be called?

@SimonSapin I like your analysis very much! More conversion impls and some refactoring seem like way to go for me.

Hm with that information I'm not sure I understand the rationale for having a checked constructor. It seems like there's one btree location that wants a checked part (maybe) and others don't? Otherwise if we have to overhaul everything and add dozens of conversions, why don't we just stick with the status quote as an unsafe function? Are usage contexts for these types not already incredibly unsafe regardless?

The unsafe unchecked constructor is not going away. I’m proposing adding additional APIs to make unsafe unnecessary in most cases.

@alexcrichton I've encountered many instances when I knew a value can't be zero and wanted to express it. NonZero<T> is perfect fit for this because it also enables compiler optimization. It'd suck if all the code that want's to create NonZero<T> would have to use unsafe - that is exactly what is not the design goal of Rust.

For one example, look at this recently merged PR. There is Pid newtype and probably could be NonZero. (Well, according to Wikipedia PID 0 can be scheduler, but I don't know how useful is that in practice.) Why using unsafe there?

Also slice::windows() (and chunks(_mut)) accepts non-zero size (sadly, changing the API to accept NonZero<usize> would be breaking change.)

I wonder how much hits would we get from running rg ' != 0' stdlib.

I believe that NonZero<T> is basically a newtype and it definitely should have safe constructor.

21 can be replaced with new APIs that encode non-zeroness in their signature, removing the need for checking or unsafely assuming

I’ve added commits to #42959 to do some of this.

@SimonSapin

According to #27730 (comment) the only difference is that Unique<T> has additional non-aliasing semantics that could potentially enable future optimizations. But such optimizations apparently don’t exist yet, so they’re effectively the same in today’s compilers.

To clarify, does this mean that the only difference between *mut T and Shared<T> is that the latter cannot be null?

@joshlf Shared<T> also has the correct Send/Sync impls and the correct variance via PhantomData.

Shared doesn't have Send/Sync impls

Oh, right

@Manishearth Looks like variance isn't the issue, just ownership of a T for the purposes of the dropck rule.

The libs team talked about this issue a bit in our meeting today:

  • We agreed with @SimonSapin that proving checked default constructors, and unchecked variants, is the right way to go. In particular, it would be easy to think that Shared was just another form of raw pointer, and that using it with null is perfectly fine. If you're doing so within already-unsafe blocks, you could get UB without ever realizing it. At the very least, the _unchecked variants give you a heads up that there's a relevant contract.

  • As to the Unique/Shared distinction: this seems to come down, to some degree, on what we're ultimately trying to achieve with these types.

    • The original vision, as I understand it, was for these types to be aids in writing unsafe code correctly. The distinction signals intent, and at least aids with auto traits.
    • On the other hand, we could instead see these types as essentially a way to get a stable NonZeroPtr type, in which case unifying them would be fine.

Writing this up, it occurs to me that these two perspectives on the types are not mutually-exclusive: we could stabilize NonZeroPtr now and consider the others later.

Once we reach consensus on this point, I think we're ready to go.

@rfcbot resolved new-unchecked

The associated PR has been updated with From and such conversions, and looks good to me.

As I believe I've stated above. I am happy with publically stabilizing Shared (what you call NonZeroPtr) and keeping Unique alive in the stdlib. It's nice-to-have and has no cost for us.

I am massively apathetic about naming at this point. Shared could be called UnsafePtr or RawPtr or Ptr. I don't want NonZero in the name, though (if you want a Totally Honest name you need to call it NonZeroCovariantPtrThatMaybeOwns).

On the other hand, we could instead see these types as essentially a way to get a stable NonZeroPtr type, in which case unifying them would be fine.

if you want a Totally Honest name you need to call it NonZeroCovariantPtrThatMaybeOwns

Would NonZeroPtr have the same ownership semantics as Shared, or would it literally just be "*mut T except that it can't be zero"?

I am like 50% sure the ownership part is actually meaningless now. It only mattered for dropck when we did it, and I think the dropck aspect has been removed.

They still both contain PhantomData<T> whereas NonZero<*const T> doesn’t. Does that not affect dropck, if T contains a reference with a lifetime?

I'm saying the latest version of dropck doesn't contain the same magic that made this matter. Specifically PhantomData turned off magic, and now it's just always off, with unsafe opt-in attributes.

I'm verifying this now.

Shared appears to behave identically to *const T now, meaning PhantomData is no longer important for Shared. Its only uses are making something invariant and deriving Send/Sync, neither of which are used by Shared.

The only question is if this will be true for all eternity. I think so, but I need to verify with @pnkfelix and @nikomatsakis.

#![feature(shared, generic_param_attrs, dropck_eyepatch)]

use std::ptr::Shared;

struct FooMayPeak<T, U, V> {
    t: *const T,
    u: Shared<U>,
    v: V,
}

struct FooBlind<T, U, V> {
    t: *const T,
    u: Shared<U>,
    v: V,
}

impl<T, U, V> Drop for FooMayPeak<T, U, V> { fn drop(&mut self) { } }

unsafe impl<
    #[may_dangle] T, 
    #[may_dangle] U, 
    #[may_dangle] V,
> Drop for FooBlind<T, U, V> { fn drop(&mut self) { } }

fn main() {
    let (t, mut u, v, fooMayPeak, fooBlind);

    t = 0;
    u = 1;
    v = 2;
    unsafe {
        fooMayPeak = FooMayPeak { t: &&t as *const &_, u: Shared::new(&mut &u), v: &v };
        fooBlind   = FooBlind   { t: &&t as *const &_, u: Shared::new(&mut &u), v: &v };
    }
}
error[E0597]: `t` does not live long enough
  --> src/main.rs:35:1
   |
32 |         fooMayPeak = FooMayPeak { t: &&t as *const &_, u: Shared::new(&mut &u), v: &v };
   |                                        - borrow occurs here
...
35 | }
   | ^ `t` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

error[E0597]: `u` does not live long enough
  --> src/main.rs:35:1
   |
32 |         fooMayPeak = FooMayPeak { t: &&t as *const &_, u: Shared::new(&mut &u), v: &v };
   |                                                                             - borrow occurs here
...
35 | }
   | ^ `u` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

error[E0597]: `v` does not live long enough
  --> src/main.rs:35:1
   |
32 |         fooMayPeak = FooMayPeak { t: &&t as *const &_, u: Shared::new(&mut &u), v: &v };
   |                                                                                     - borrow occurs here
...
35 | }
   | ^ `v` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

error: aborting due to 3 previous errors

https://play.rust-lang.org/?gist=5a36b477ff3291a72bc9878de2795f5a&version=nightly

I’m in favor of replacing both Shared<T> and Unique<T> with NonZeroPtr<T> NonNullPtr<T>, and stabilizing that.

I'd use NonNullPtr as the name, but agree with the rest.

Is this implicitly admitting that the *const vs. *mut distinction was a mistake? (considering that we don't seem intent on preserving the distinction here)

(For what it’s worth, the distinction between Shared and Unique is already unrelated to that between *const and *mut.)

(Yeah I know, collapsing down to a single type just brings it into even starker relief.)

The entire point of Shared and Unique is "wow everything about raw pointers suck for writing actual Rust code, let's redo them". There's reasons for every aspect of the raw pointer design, but the final result is unpleasant.

*const vs *mut is broadly useless in practice, but this is mostly a factor of their variance, which forces us to use *const and cast everywhere. In principle the variance they have is good, in that it's a safe default, but it's ultimately frustrating because of how variance works (invariance is irreversible). Supposedly having both is useful for FFI; I have no strong opinions on this.

The fact that they're non-null is frustrating, as it makes us jump through hoops for optimizations, while making unsafe code less rustic. Them being nullable is ostensibly a safe default for their use in FFI. Swift notably made its unsafe pointers non-nullable, but it makes Optionals a lot more ergonomic too.

The (now dead?) "owns a T" design was a bad default because it forced developers to know about some obscure thing and opt out of it or risk UB.

It might be interesting if we could guarantee the the memory layout of Option<NonNullPtr<T>> (or whatever it ends up being called) for FFI.

Correction: felix has produced an example where PhantomData implying ownership is live-and-well (which I've edited down for clarity).

// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case, 
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)

#![feature(dropck_eyepatch, generic_param_attrs)]

use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;

// Debugging code to verify UAF-like behaviour

#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }

#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);

impl<T: fmt::Debug> PrintOnDrop<T> {
    fn new(name: &'static str, t: T) -> Self {
        PrintOnDrop(name, t, State::Valid)
    }
}

impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
    fn drop(&mut self) {
        println!("drop PrintOnDrop({}, {:?}, {:?})",
                 self.0,
                 self.1,
                 self.2);
        self.2 = State::INVALID;
    }
}



// Box impls (identical except for PhantomData)

struct MyBox2<T> { v: *const T }
struct MyBox3<T> { v: *const T, _pd: PhantomData<T> }

// We want this to be *legal*. This destructor is not 
// allowed to call methods on `T` (since it may be in
// an invalid state), but it should be allowed to drop
// instances of `T` as it deconstructs itself.
//
// (Note however that the compiler has no knowledge
//  that `MyBox2<T>` owns an instance of `T`.)

unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
    fn drop(&mut self) { unsafe { Box::from_raw(self.v as *mut T); } }
}
unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
    fn drop(&mut self) { unsafe { Box::from_raw(self.v as *mut T); } }
}
impl<T> MyBox2<T> {
    fn new(t: T) -> Self { unsafe { MyBox2 { v: Box::into_raw(Box::new(t)) } } }
}
impl<T> MyBox3<T> {
    fn new(t: T) -> Self { unsafe { MyBox3 { v: Box::into_raw(Box::new(t)), _pd: Default::default() } } }
}



fn f2() {
    {
        let (v2a, _mb2a); // Sound, and compiles!
        v2a = PrintOnDrop::new("v2a", 13);
        _mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
    }

    {
        let (_mb2b, v2b); // Unsound and compiles!
        v2b = PrintOnDrop::new("v2b", 13);
        _mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
        // namely, v2b dropped before _mb2b, but latter contains
        // value that attempts to access v2b when being dropped.
    }
}

fn f3() {
    let v3; let _mb3;
    // let (_mb3, v3); // Unsound, won't compile due to dropck
    v3 = PrintOnDrop::new("v3", 13);
    _mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}

fn main() {
    f2(); f3();
}

Produces:

drop PrintOnDrop(mb2a, PrintOnDrop("v2a", 13, Valid), Valid)
drop PrintOnDrop(v2a, 13, Valid)
drop PrintOnDrop(v2b, 13, Valid)
drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
drop PrintOnDrop(mb3, PrintOnDrop("v3", 13, Valid), Valid)
drop PrintOnDrop(v3, 13, Valid)

But if you toggle the commented out line in v3:

error[E0597]: `v3` does not live long enough
   --> src/main.rs:106:1
    |
105 |     _mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
    |                                                 -- borrow occurs here
106 | }
    | ^ `v3` dropped here while still borrowed
    |
    = note: values in a scope are dropped in the opposite order they are created

https://play.rust-lang.org/?gist=d0fd8a737281cb70a7d6f85fd0862dec&version=nightly

This is a very interesting quote that I want to investigate, though:

(However, in the absence of #[may_dangle], the compiler will constrain
things in a manner that may indeed imply that PhantomData is unnecessary;
pnkfelix is not 100% sure of this claim yet, though.)

I want to ask about the use of this in FFI function definitions.

Say I have:

fn (ptr: *mut i64)

It would be nice (for some form of nice which probably can't exist in FFI land...) if you could express this as:

fn(ptr: Shared<i64>)

Instead. Is there a reason this isn't possible? Is this something that could be pursued or not?

@Firstyear I think this would be valid if we add #[repr(transparent)] to the type definitions… once that’s implemented: https://github.com/rust-lang/rust/issues/43036. And I think we should.

Status update: this is currently waiting on @Gankro tracking down the details relevant to the question he raised.

Forgive my ignorance, but NonZero is what allows Option of references to be optimised as nullable pointers?

Would it be possible to do a similar thing for Option of floating point types, using the special case of NaN? I understand that NaN technically isn't a kind of Zero but it can represent the absence of a value.

Not by default, no. NaN is a totally valid floating point value. Null is not a valid value for a reference.

You could have a NonNaN<T> that wraps around floats to mark them as "never NaN". I'm skeptical this will be very useful.

Please direct all future enum optimization requests to https://github.com/rust-lang/rfcs/issues/1230

@rfcbot fcp cancel

Calling off FCP until @Gankro has a chance to sort out details around PhantomData.

@aturon proposal cancelled.

The PhantomData concern has been resolved in https://github.com/rust-lang/rust/pull/46749.

So I’d like to propose now:

  • Rename Shared<T> to NonNull<T> and stabilize it. (Being in the ptr module is enough to say that it’s a pointer. I’m not very attached to this specific name though.)
  • Rename Box<T> methods from_unique/into_unique to from_nonnull/into_nonnull (or whatever names are deemed appropriate), replace Unique<T> with NonNull<T> in their signatures, and stabilize them.
  • Replace Unique<T> with NonNull<T> in the signatures of methods of the Alloc trait.
  • Mark Unique “permanently-unstable” by replacing remaining occurrences of #[unstable(feature = "unique", issue = "27730")] with:

    #[unstable(feature = "ptr_internals", issue = "0", reason = "\
      use NonNull instead and consider PhantomData<T> (if you also use #[may_dangle]), \
      Send, and/or Sync")]
    

    (Maybe the reason string is only useful on the struct definition.) Ideally it would be made private to some crate instead, but it needs to be used in both liballoc and libstd.

  • (Leave NonZero and Zeroable unstable for now, and subject to future bikeshedding.)

Instead of NonZero, now that we have https://github.com/rust-lang/rust/pull/45225, would it be possible to make 0xFFFF... (-1 as *const _) the invalid value for pointers? It's anyway usually not possible to put anything meaningful or even allocate something there, and it would automatically allow optimisations for pointers without wrappers.

  • Option<&T>::None is documented as being represented as NULL: https://doc.rust-lang.org/nomicon/other-reprs.html Changing that would like break code that uses FFI or/and unsafe. Similarly for &mut T and Box<T>, and maybe also other smart pointers.
  • Representing in the type system that a pointer is non-null is useful, regardless of memory layout.

So I’d like to propose now: […]

https://github.com/rust-lang/rust/pull/46952 implements this.

Instead of NonZero, now that we have #45225, would it be possible to make 0xFFFF... (-1 as *const _) the invalid value for pointers?

That would make -1 as usize as *const _ undefined behavior, wouldn't it?

@RalfJung creating raw pointer with invalid value isn't UB. Did you mean something else?

They meant that 0xFFFF cannot be used for the None value in Option<*const T> because Some(0xFFFF as *const _) is a totally legit Rust value.

@nox Well, yes, that would be breaking change in exchange of allowing Some(NULL) to become a valid value. Currently it's not, even though NULL is very popular and much more likely to occur as a real pointer through FFI, so each slice::from_raw_parts, str::from_raw_parts etc. call currently needs to be guarded with if (!....is_null()) ... because officially neither of these APIs accept NULL yet don't reject it.

Although I realise that such breaking change is probably impossible at this point and unlikely to be accepted, the FFI situation with NULL pointers is currently somewhat fragile and I'd love if it would be possible to accept them as any others in such APIs without guards.

@RReverser I’m not sure if you already know that and are only using a writing shorthand, but Some(ptr::null()) is a valid value of Option<*const T> today, and that won’t change. NULL is only forbidden for types that contain NonZero<T>, &T or &mut T. And for &T/&mut T NULL not really a special case: callers of for example slice::from_raw_parts need to ensure that the pointer points to valid memory, not just that it is non-null.

NULL is very popular and much more likely to occur as a real pointer through FFI

This isn't about probabilities. "Likely" and "unlikely" do not matter, only "possible" and "impossible" do. Raw pointers represent C pointers and hence every value is possible.

so each slice::from_raw_parts, str::from_raw_parts etc. call currently needs to be guarded with if (!....is_null()) ... because officially neither of these APIs accept NULL yet don't reject it

slice::from_raw_parts requires much more than non-NULL-ness. It requires the pointer to also be aligned and pointing to a valid allocation and whatever it points to must not be mutated for the duration of the lifetime. The last two points cannot be tested for during execution. That's why these functions are unsafe: They accept way fewer arguments than their type gives away.

@SimonSapin Yes, thanks, I was using NULL shorthand for reference; should've clarified that.

The thing special about null is that zero-length slices quite commonly don't have such requirement in C libraries, and Rust doesn't read memory from zero-length slices either, but because of that requirement one needs to wrap each ::from_raw_parts call into if len != 0 { ...::from_raw_parts(ptr, len) } else { &[] }. That's the only thing that bothers me as it's easy to miss even with unsafe block, and it's not very clear why Rust itself can't enforce this by creating &[] internally instead of having a mention in docs (that would have almost zero overhead while would make these functions somewhat easier to use).

Just wonder why NonZero::new() is not const fn as Rust's NPO guarantees that this is no-op.

It happens to compile to a no-op after optimizations for primitive integer and pointer/reference types, but it is implemented with an if which is currently not allowed in const fn. With Miri integration hopefully it can become a const fn later.

Why it isn't implemented as [None, Some(val)][val.is_null() as usize]? That'd allow const evaluation even now.

Or just transmute, as it's guaranteed to be no-op by language.

Edit: Nevermind, for transmute, I found this comment that transmute requires miri integration to be const fn

The point is to conditionally return Some or None

@Kixunil that’s an… interesting idea. But I just tried and hit error[E0379]: trait fns cannot be declared const.

@HyeonuPark Zeroable is a public trait that any crate can implement for its types, but that doesn’t change the memory layout of e.g. Option<Foo>, so transmute would be incorrect. In fact, the size of Foo and Option<Foo> might not be the same, and trying to transmute does cause error[E0512]: transmute called with types of different sizes.

Zeroable should be unsafe trait FWIW, but transmute is checked too early for this, I guess.

Ah yeah that's the point, I think Zeroable should be unsafe trait, and also auto-impl.

As I understand, Option<NonZero<Foo>> has same size as Foo and Option<NonZero<Foo>>::None is identical with std::mem::zeroed(), as NonZero is used by compiler to perform NPO(Please tell me if this is wrong). Then something like this can be possible.

struct Foo(usize);

impl Zeroable for Foo {
    fn is_zero(&self) -> bool {
        self.0 != 0 // really!
    }
}

fn main() {
    let zero = Foo(0);
    let one = Foo(1);

    assert_eq(NonZero::new(zero), None); // as underlying memory is all zero
    assert_eq(NonZero::new(one), None);  // as `one.is_zero()` is true
}

Does this code makes sense?

We should maybe abuse trait privacy (if we can, at all) to make it impossible to implement the trait outside of libcore - we don't really support more types than primitives, in NonZero.

I was hoping to land https://github.com/rust-lang/rust/pull/46952 and get Shared / NonNull out of the way first, but my plan is to propose removing the Zeroable trait and replace NonZero with 12 types like NonZeroUsize and NonZeroI32, similar to atomic integer types. That way we avoid the questions of making the trait an auto trait or attempting to prevent more impls of it, since there is no trait.

Assuming you mean via a hidden private NonZero, then you don't even need to change to compiler.

Right, I meant "remove" as far as the public API is concerned. We can have a private generic type (with or without a trait bound) in the implementation, or whatever’s easier to maintain for the compiler.

Shared is now renamed NonNull and stable; Unique is a permanently-unstable internal implementation detail of std: https://github.com/rust-lang/rust/pull/47631

This only leaves NonZero for this tracking issue, which is now mostly only useful for integers.

I just published https://github.com/rust-lang/rfcs/pull/2307 which propose adding 12 concrete types like NonZeroU32, deprecating NonZero, and later making it private. That last step would (finally!) close this tracking issue.

Also some NonNull follow up: https://github.com/rust-lang/rust/pull/47631

I published https://github.com/rust-lang/rfcs/pull/2307 which propose adding 12 concrete types like NonZeroU32, deprecating NonZero, and later making it private. That last step would (finally!) close this tracking issue.

Implementation PR, ready to merge: https://github.com/rust-lang/rust/pull/48265

  • Add new #[unstable] types in core::num and std::num
  • Deprecate core::nonzero

After whatever time is appropriate for the deprecation period of unstable features, we can land https://github.com/SimonSapin/rust/pull/1 to make core::nonzero private and leave just enough in it to support its public wrappers.

Hm, this probably way too late, but are we sure that the name should be NonNull<T>? It seems to me that NonNull overemphasizes, well, non-nullness of this type, while in fact the real trick is, in my understanding, covariant *mut. At least, some folks on Reddit emphasize optimizations over variance: https://www.reddit.com/r/rust/comments/7zmfuu/when_should_nonnull_be_used/ :)

Would just a Ptr<T> be a better name?

EDIT: nvm, see that this already was discussed in another issue: https://github.com/rust-lang/rust/pull/46952#issuecomment-354726205

The enum optimization is my motivation for pushing this feature to stabilization, FWIW.

I think you could already get covariance before by using *const T (or wrapping it in some other type outside of std). This aspect of ptr::NonNull is just library code, there is no compiler magic. Also it hasn’t happened to me to actively change some code to make it covariant. (Though maybe the stuff I used was already covariant and I just didn’t realize that was important?)

Tracking issue for the new num::NonZero* types: https://github.com/rust-lang/rust/issues/49137

Was this page helpful?
0 / 5 - 0 ratings