Rust: Tracking issue for Weak::into_raw/from_raw & similar

Created on 11 May 2019 · 32Comments · Source: rust-lang/rust

Changing this to a tracking issue for the feature:

[x] Pull request #60766
[ ] :question: What to do about dangling Weak (created by Weak::new()) and T: ?Sized? These seem to be incompatible.
[ ] Stabilization

The original proposal

Hello

The Arc has the into_raw and from_raw methods. I think it would be technically possible to have the same on Weak. Obviously, as Weak is non-owning, it would be up to the caller to make sure the pointer is not dangling when used.

Would adding these (and maybe as_raw too ‒ one can get a reference out of Arc, but not from Weak) require an RFC, because the handling of these methods might be even a bit more delicate than the ones on Arc, or is this considered small enough for just a pull request & later stabilization?

Motivation

I've written the arc-swap crate that allows to keep an Arc around but make it point to some other object atomically. It uses the mentioned methods. It would make sense to have weak version of this atomic storage too.

Additionally, the as_raw would make it possible to compare if eg an Arc and Weak point to the same object (which would also come handy in some alternative of the Cache that doesn't hold the object alive needlessly).

B-unstable C-tracking-issue T-libs disposition-merge finished-final-comment-period

Source

vorner

👀1 👍1

Most helpful comment

When you bring it up, what needs to be done for stabilizing it all? The API follows the one on Arc, so I don't think any of this could look differently.

There's the problem with unsized Ts ‒ currently, they are simply not supported (because of the interaction with dangling & null pointers). But relaxing the restriction later would be backwards compatible change, so I don't think a decision what exactly to do about that has to block stabilization.

vorner on 4 Oct 2019

👍3

All 32 comments

Would adding these (and maybe as_raw too ‒ one can get a reference out of Arc, but not from Weak) require an RFC, because the handling of these methods might be even a bit more delicate than the ones on Arc, or is this considered small enough for just a pull request & later stabilization?

A PR is sufficient; the motivation can be justified in there.

Centril on 11 May 2019

@rust-lang/libs is there any opposition to stabilizing at least .as_raw()? It's hard to imagine wanting that function looking any different than it does today, and it's pretty useful (I don't know another way to deduplicate weak pointers to the same object).

cramertj on 3 Oct 2019

When you bring it up, what needs to be done for stabilizing it all? The API follows the one on Arc, so I don't think any of this could look differently.

vorner on 4 Oct 2019

👍3

Relaxing the restriction of T to ?Sized I don't think is compatible with the current safety requirements. On top of that, I think that as is currently written, it's self-conflicting.

[...]

It takes ownership of one weak count. In case a null is passed, a dangling Weak is returned.

Safety

The pointer must represent one valid weak count. In other words, it must point to T which is or _was_ managed by an (A)Rc and the weak count of that (A)Rc must not have reached 0. It is allowed for the strong count to be 0.

[...]

The safety requirement requires that the pointer point to a valid ArcInner/RcBox (side note: probably should make those naming consistent). But then it also says that it accepts null? The reason for this is that as_raw/into_raw return null when ~~the strong count is 0~~ when the weak is fake (self.inner() returns None).

But on top of that, this is wrong (I just realized writing this up)! Because from_raw(into_raw(_)) on a dangling Weak turns it into the singleton dangling Weak, the weak count of the original will never be decreased! The correct behavior is for Weak::into_raw to return the dangling pointer such that Weak::from_raw can recover the original pointer to allocation and decrease the weak count. This then is fully compatible with T: ?Sized as it's doing the exact same as the non-weak version.

As for as_raw... it's "fine" for it to return null as a helpful hint (as dereferencing null is a faster error), but I think it's more important for as_raw(_); forget(_) to be equivalent to into_raw(_).

CAD97 on 21 Nov 2019

It's been a while since I wrote it, so I went back to have a look. I think it is slightly different than how you describe it. If I read through it correctly, there are three kinds of weak pointer values:

Fully valid ones, pointing to RcBox with strong_count() > 0.
Dead ones, pointing to RcBox, but with strong_count() = 0. They've run the destructor on the T, but the RcBox is still there.
Dangling ones. These were created with Weak::new(). They don't have any RcBox allocated for them and their pointer is set to a sentinel value (usize::MAX casted to pointer).

The into_raw()/from_raw() return null() only for the last case. There's no weak count to speak of in that case, so nothing is forgotten to be decreased.

The code needs to special-case the sentinel value somehow anyway, it can't just add the offset to it, because it would overflow and that is AFAIK forbidden thing to do with pointers. It could also return the sentinel value (usize::MAX), but null seems to be as good sentinel value as that other one, with the difference users recognize it better. It also has proper alignment for the type, which other code might rely on (by putting flags into the low bits). The downside is that pointers to unsized types don't have null() and if I remember correctly, this was the reason why they are not currently supported.

I can try changing the sentinel value and see if I can add support for unsized types instead.

As for the documentation, if I just remove all notions of null and keep it to the fact the pointer must come from previous Weak and not specify what value is returned in case it is from Weak::new(), would it be better?

vorner on 24 Nov 2019

Ah, I admit, I read the code wrong.

Ok, I agree returning null for "fake" weak pointers is probably the correct sentinel. (The use of usize::MAX as a sentinel for the inner pointer only works because the RcBox/ArcInner has align(usize) > 1, but the T doesn't have that guarantee, so usize::MAX is a _theoretically_ valid pointer. I think a T pointer within an RcBox/ArcInner is guaranteed to have at least align(inner) because of the alignment of the container, but it'd be better to not rely on that.)

But the nonexistence of ptr::null for ?Sized types isn't really an issue. The reason it doesn't exist is the inability for it to conjure pointer metadata out of thin air. We have that!

I don't know the exact way std would accomplish this, but "all" that needs to be done is to pull the metadata off of the inner pointer, and put it onto a null pointer. (And then do the same thing in from_raw.)

For docs, I'd just specify that the pointer given to Weak::from_raw must have come from Weak::as_raw or Weak::into_raw.

CAD97 on 24 Nov 2019

I've tried updating the docs, I hope it's for the better.

But the nonexistence of ptr::null for ?Sized types isn't really an issue. The reason it doesn't exist is the inability for it to conjure pointer metadata out of thin air. We have that!

I don't know the exact way std would accomplish this, but "all" that needs to be done is to pull the metadata off of the inner pointer, and put it onto a null pointer. (And then do the same thing in from_raw.)

It's not the problem with null pointers per se, because Weak::new exists for Sized types only too. It's simply illegal to put null pointer into into_raw for unsized type for that reason. However the sized implementation needs to check for null equality. It would be fine if this turned out to be false every time for unsized types, but as the null() doesn't even exist, it doesn't compile :-(. I think specialization would help here but AFAIK it's not ready to be used yet. Or is it?

But, after clarifying the documentation, is there anything that would prevent adding that specialization/relaxing the bounds in the future in backwards compatible manner?

vorner on 24 Nov 2019

Hello? I know this is probably not a high priority feature, but it also seems somewhat small and I'd really like to push it over the finish line somehow. Is this waiting on something from me? I believe the raised concerns about docs were hopefully addressed.

vorner on 6 Apr 2020

@rfcbot fcp merge

Amanieu on 12 Apr 2020

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

[x] @Amanieu
[x] @SimonSapin
[x] @dtolnay
[ ] @sfackler
[ ] @withoutboats

Concerns:

~~as_raw should be as_ptr~~ resolved by https://github.com/rust-lang/rust/issues/60728#issuecomment-624354619
~~dangling Weak are represented using null~~ resolved by https://github.com/rust-lang/rust/issues/60728#issuecomment-624354619

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot on 12 Apr 2020

I would prefer if we made no guarantee as to what pointer value is generated for a dangling Weak. This could allow for a slightly more efficient implementation since we could just blindly apply the offset of usize::MAX to convert to/from a raw pointer.

@rfcbot concern dangling Weak are represented using null

The convention for returning raw pointers to a value is as_ptr. There is no precedent for as_raw in the standard library.

@rfcbot concern as_raw should be as_ptr

Amanieu on 12 Apr 2020

@Amanieu Consider not only that other types use as_ptr, but that Arc/Rc already have both into_raw and from_raw. I don't think any type actually has both the "as_ptr" and "into_raw" operations yet:

as_ptr: slice, str, Cell, RefCell, CStr, MaybeUninit, NonNull, Vec
into_ptr: (none)
from_ptr: CStr
as_raw: (unstable) rc::Weak, sync::Weak
into_raw: Box, CString, Rc, Arc, (unstable) rc::Weak, sync::Weak
from_raw: ExitStatus, ExitStatusExt, Box, CString, Rc, Arc, Waker, (unstable) rc::Weak, sync::Weak

I think sticking to _raw methods for the Rc family is more consistent than using as_ptr here. Consider also: does from_raw(as_ptr()) look like it should be allowed? from_raw(as_raw()) is clearly using the same "raw representation".

There's also the fact that I would think "as_ptr" would be &*rc, i.e. deref but to a pointer, not a ref. as_raw is more subtle because it keeps mutable provenance over the location, and as such can be used for from_raw, unlike a deref. (The standard library got this wrong previously!)

Some small-ish concerns:

Arc/Rc should have as_raw/as_ptr to mirror the API surface of Weak. Note that &*arc is _not enough_ for as_raw, see the link above where the standard library got this wrong.

@rfcbot concern api should be mirrored on Arc/Rc as well

I want to make sure that these functions on Weak are forwards-compatible with supporting T: ?Sized in the future. When the requirements on pointer metadata for invalid (non-derefable) pointers are clear, the standard library should be able to manufacture some arbitrary pointer metadata for the dangling Weak case (at least for some subset of T: ?Sized).

@rfcbot concern forwards-compatible to T: ?Sized

I don't know if the rfcbot commands will work for me (they probably won't), but might as well try.

CAD97 on 12 Apr 2020

👍1

Consider also: does from_raw(as_ptr()) look like it should be allowed? from_raw(as_raw()) is clearly using the same "raw representation".

No it doesn't look like it should be allowed, and that's a good thing! You really should be using into_raw instead of as_ptr for this.

In fact I would be in favor of removing as_ptr + forget as a valid way of constructing a pointer that can be used with from_raw, specifically because of the provenance issues you mentioned earlier. The only valid pointer that can be passed to from_raw is one created by into_raw. I don't think this will affect any practical use cases but do let me know if I'm wrong.

I want to make sure that these functions on Weak are forwards-compatible with supporting T: ?Sized in the future.

It is relatively easy to support T: ?Sized if we drop the guarantee that dangling Weaks return a null pointer. We can just let the pointer dangle in that case.

Amanieu on 12 Apr 2020

👍1

I would prefer if we made no guarantee as to what pointer value is generated for a dangling Weak

I would prefer to be able to create dangling pointer from null (or, as dangling weak pointers exist, it's reasonable to assume that one will be created if I feed it null). But considering any pointer returned from weak may already be dangling because the count had dropped to 0, I think the other guarantee didn't really exist. I'll go over the docs and see if there are any mentions of guarantees.

Do you think I should also change the implementation, so nobody relies on it by accident?

I don't think this will affect any practical use cases but do let me know if I'm wrong.

I'll have to review my code in arc-swap. I suspect I might actually be using it, but I'm not 100% sure. There are some cases around lock-free algorithms when I need the pointer but I don't know yet if storing it into AtomicPtr will work out or not. So I store the as_ptr and then do forget or into_raw when it's confirmed. I probably could do into_raw first and then, if it fails, do from_raw again to recover.

vorner on 12 Apr 2020

Do you think I should also change the implementation, so nobody relies on it by accident?

Yes, I think that's a good idea. Note that you'll need to use wrapping_offset instead of offset to support the dangling pointers.

Amanieu on 12 Apr 2020

👍1

@Amanieu

In fact I would be in favor of removing as_ptr + forget as a valid way of constructing a pointer that can be used with from_raw, specifically because of the provenance issues you mentioned earlier. The only valid pointer that can be passed to from_raw is one created by into_raw. I don't think this will affect any practical use cases but do let me know if I'm wrong.

I use Arc::as_raw (currently spelled Arc::into_raw(unsafe { ptr::read(this) })) rather pervasively to implement ArcBorrow<'_, _>, which is effectively "&T but can be upgraded to Arc<T>." This is "+0" passing for Arc, as opposed to double-indirect passing (&Arc<T>) or "+1" passing (Arc<T>).

I'm _not_ doing as_raw + forget, I'm doing as_raw + clone_raw.

If it can't be used for from_raw, it definitely should be as_ptr instead of as_raw. But I see real use cases (well, mostly just +0 passing) for as_raw, but not much use for as_ptr beyond implementing ptr_eq. (What can you do with a Weak::as_ptr? Not deref it, not without knowing you have a strong ref around somewhere, and you can't check that from the ptr if you can't turn it back into a Weak.)

CAD97 on 12 Apr 2020

Fair enough, let's leave the semantics of as_raw as they are right now and just rename it to as_ptr for consistency with the rest of libstd. Since this is clearly something that would be useful for Rc/Arc it would make sense to add a method there as well.

Amanieu on 12 Apr 2020

Yes, I think that's a good idea. Note that you'll need to use wrapping_offset instead of offset to support the dangling pointers.

While implementing it, I've noticed one thing. The current implementation will return a pointer with invalid alignment. Is that Ok?

Since this is clearly something that would be useful for Rc/Arc it would make sense to add a method there as well.

Here, as part of this feature?

vorner on 13 Apr 2020

I think it's fine to return an unaligned pointer (you're not allowed to dereference it anyways). We should document it however.

Here, as part of this feature?

Sure, might as well.

Amanieu on 13 Apr 2020

May I ask what happens now? I believe I've addressed the concerns and it was merged, but they are still listed here. How are they removed?

vorner on 5 May 2020

@rfcbot resolve dangling Weak are represented using null
@rfcbot resolve as_raw should be as_ptr

Amanieu on 6 May 2020

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot on 6 May 2020

Hmm, I don't know about that as_ptr. Forcing the user into using into_ptr is a great lint that would've prevented some bug's I experienced in C++ code had C++ have it. @CAD97 use case is legit, though. Does that one use case justify "removing a lint"?

Kixunil on 15 May 2020

@Kixunil Can you explain what you mean? It was renamed to as_ptr to hint that it should not be interchangeable with the into_raw. The documentation no longer suggests it should be possible to use as_ptr to re-create it with from_raw and then forget it. So I think your concern was already resolved?

vorner on 15 May 2020

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

rfcbot on 16 May 2020

@vorner I mean not having it at all. :) into_raw is much more useful in my experience. If not for @CAD97 use case, I would believe as_ptr to be completely useless.

Kixunil on 16 May 2020

There are other use cases. Like, hashing it, equivalence checking, etc.

vorner on 16 May 2020

@vorner aren't those cases better served by using a reference and converting it? Seems more obvious what it does.

Kixunil on 16 May 2020

What reference do you have in mind? &Weak<T> is not useful for these cases (where you want to check if eg. an atomic pointer still points to the same thing as the Weak you have cached or if you want to recreate a new one). You don't get &T out of a Weak and have to upgrade it to the Arc. But that is both more expensive and might return None ‒ but you might want to know that the Weak you have is still the one you should have even if it is no longer valid. The as_ptr on Arc is there mostly for symmetry.

But I don't really understand your issue. What kind of misuse are you afraid of? as_ptr doesn't seem to allow more foot guns than into_raw ‒ you still get a raw pointer out of that and can do whatever you like with it. You also need to use unsafe to do anything dangerous with it, which should be a lint enough to stop people from doing stupid things by accident.

I assume that when the FPC is over, I can enjoy submitting a stabilization PR.

vorner on 16 May 2020

You don't get &T out of a Weak

Ah, yeah, I missed that. I had strong ones in mind mostly. Might be a good reason to have ptr_eq and hash_ptr methods instead, but pointer is definitely more flexible. But the docs should probably explain the pointer is useless for anything else than comparison.

I was mostly wondering about as_ptr being confused with into_ptr in case of strong versions. A sad thing about these kinds of bugs is when combined with multithread environment they can be really hard to debug.

(Personal experience: code sending a pointer from one thread to another waiting on select over a Unix pipe, the sender forgot to ref the pointer. Took years to find the cause of mysterious crashes.)

Kixunil on 17 May 2020

Might be a good reason to have ptr_eq and hash_ptr methods instead

That wouldn't allow customizing how you hash or against what you compare the pointer (I'd say ptr_eq should take another Weak as a parameter and in case of storing the pointer out-of-bound, you have *const T, maybe).

But the docs should probably explain the pointer is useless for anything else than comparison.

But that is not true. The pointer is fully usable as long as you make sure by some other means that the lifetime of the object is still valid, same as with any other pointer.

vorner on 17 May 2020

Yeah, it's possible to do it by other means, I just can't imagine a real-life situation in which one would have a Weak and know that it's valid without also having Rc/Arc and using a reference to those instead.

That all being said, it's not a huge deal, I guess. Mainly wanted this to be considered.

Kixunil on 17 May 2020

Was this page helpful?

0 / 5 - 0 ratings