Rust: Tracking issue for Cell::update

Created on 23 Apr 2018  路  37Comments  路  Source: rust-lang/rust

This issue tracks the cell_update feature.

The feature adds Cell::update, which allows one to more easily modify the inner value.
For example, instead of c.set(c.get() + 1) now you can just do c.update(|x| x + 1):

let c = Cell::new(5);
let new = c.update(|x| x + 1);

assert_eq!(new, 6);
assert_eq!(c.get(), 6);
B-unstable C-tracking-issue Libs-Small Libs-Tracked T-libs

Most helpful comment

@oli-obk

I like the second option (taking a &mut T in the closure). There's an interesting variation on this option - instead of requiring T: Copy we could require T: Default to support updates like this one:

let c = Cell::new(Vec::new());
c.update(|v| v.push("foo"));

Perhaps we could have two methods with different names that clearly reflect how they internally work?

fn get_update<F>(&self, f: F) where T: Copy, F: FnMut(&mut T);
fn take_update<F>(&self, f: F) where T: Default, F: FnMut(&mut T);

All 37 comments

Not a blocking issue for merging this as an unstable feature, so I'm registering my concern here:

The update function returning its new value feels very ++i-ish. Similar to += returning the result of the operation (which rust luckily does not!). I don't think we should be doing this. Options I see:

  • return () (that was the suggestion in the PR)
  • make the closure argument a &mut T, and allow the closure to return any value.

    • if users want the "return new/old value" behaviour, they can now encode it themselves via |x| {*x += 1; *y }

Maybe there are other options?

Data point: the volatile crate's update method returns () (https://github.com/embed-rs/volatile/blob/master/src/lib.rs#L134)

@oli-obk

I like the second option (taking a &mut T in the closure). There's an interesting variation on this option - instead of requiring T: Copy we could require T: Default to support updates like this one:

let c = Cell::new(Vec::new());
c.update(|v| v.push("foo"));

Perhaps we could have two methods with different names that clearly reflect how they internally work?

fn get_update<F>(&self, f: F) where T: Copy, F: FnMut(&mut T);
fn take_update<F>(&self, f: F) where T: Default, F: FnMut(&mut T);

What do you think, @SimonSapin? Would two such methods, get_update and take_update, be a more appropriate interface?

The T: Default flavor feels kinda awkward but I don鈥檛 know how to put that in more concrete terms, sorry :/

It looks like there is a number of slightly different possible APIs for this. I don鈥檛 have a strong opinion on which is (or are) preferable.

Taking &mut in this method would be awesome, I would love to use it.

Unfortunately it seems that it would have to be unsafe, because AFAIK there is no way in Rust to restrict usage like this:

let c = Cell::new(vec![123]);
c.update(|vec| {
    let x = &mut vec[0];            // x references 123
    c.update(|vec| vec.clear());    // x references uninitialised memory
    *x = 456;                       // BOOM! undefined behavior
})

I hope, I'm wrong.

@CodeSandwich it is a fundamental restriction of Cell<T> that a reference &T or &mut T (or anything else inside of T) must not be given to the value inside the cell. It鈥檚 what makes it sound. (This is the difference with RefCell.)

We could make an API that takes a closure that takes &mut T, but that reference would not be to the inside of the cell. It would be to value that has been copied (with T: Copy) or moved (and temporarily replaced with some other value, with T: Default) onto the stack and will be copied/moved back when the closure returns. So code like your example would be arguably buggy but still sound: the inner update() would operate on a temporary value that would be overwritten at the end of the outer one.

I think, that a modifier method might be safe if we just forbid usage of reference to Cell's self inside. Rust does not have mechanism for forbidding usage of a SPECIFIC reference, but it does have mechanism for forbidding usage of ALL references: 'static closures.

I've created a simple implementation of such method and I couldn't find any hole in it. It's a limited solution, but it's enough for many use cases including simple number and collections modifications.

A 'static closure could still reach the Cell, for example by owning a Rc.

That's absolutely right :(

It seems like, the semantic are a bit different between Default and Copy: the later is more light weighted, as it does not have to fill the hole behind with some temporary rubbish.

What shall we name those variants? update_move for the Default case?

Also there is another variation: update_unsafe which will be unsafe and does not require a trait bound: it only moves the value and leave uninitialized value behind, before the method returns. This will make any nested calls UB and so have to be marked as unsafe.

pub unsafe fn update_unsafe<F>(&self, f: F) -> T where F: FnOnce(T) -> T;

if you鈥檙e gonna use unsafe anyway you can use .get() and manipulate the raw pointer. I don鈥檛 think we should facilitate it more than that.

IMO, closure receiving a &mut to a stack copy isn't that advantageous ergonomically in terms of offering flexibility. You still need to jump through hoops to get the old value if you want to. Consider:

let id = next_id.update(|x| { let y = *x; *x += 1; y });

I think the most universally applicable thing to do would be to offer all of update, get_and_update and update_and_get:

things.update(|v| v.push(4)); // returns ()
// Alternative with v by-move. I'd guess this one isn't as useful, and you can easily use
// mem::swap or mem::replace in the by-ref version instead if you need it.
things.update(|v| { v.push(4); v });
let id = next_id.get_and_update(|x| x + 1);
let now = current_time.update_and_get(|x| x + 50);

These are 3 additional names, but I think they'll make the function more useful. A by-mut-ref update would only be useful for complex types where you're likely to do in-place updates, but I find myself doing the .get() -> update/keep value -> .set() pattern a lot with Cell.

Related conversation on a similar PR (for RefCell): https://github.com/rust-lang/rust/pull/38741

Has it been considered to use an arbitrary return type on top of updating the value? That is (naming aside),

fn update1<F>(&self, f: F) where F: FnOnce(T) -> T;
fn update2<F, R>(&self, f: F) -> R where F: FnOnce(T) -> (T, R);

The second method has a very functional feel to it and appears both flexible and sound (i.e. you can't return &T). You could get either ++i or i++ behavior out of it. Also it is the same whether T: Copy or not. Though ergonomics are of course debatable.

EDIT: Something to chew on:

fn dup<T: Copy>(x: T) -> (T, T) { (x, x) }
let i = i.update2(|i| (i + 1, i)); // i++
let i = i.update2(|i| dup(i + 1)); // ++i

+1 for a T version, as opposed to &mut T. It's easy to construct demo code that is more expressive with either version (ie, x + 1 vs vec.push(..)), but unless we add both, I think the more general version is preferable, since under the hood they both require a swap-out swap-in for soundness. Also +1 for an extra version with a return value.

Finally, I think Default is preferable to Copy. My gut is that it covers a wider range of cases; that is, it's more likely for a type to implement Default but not Copy than the other way around, and hopefully for simple cases the compiler should be able to optimize out the extra writes anyway.

It sounds like accepting &mut T isn't a viable option.

I'm sort of combining previous ideas here. How about this?

fn update<F>(&self, f: F) where T: Copy, F: FnOnce(T) -> T;
fn update_map<F, U>(&self, f: F) -> U where T: Copy, F: FnOnce(T) -> (T, U);
fn take_update<F>(&self, f: F) where T: Default, F: FnOnce(T) -> T;
fn take_update_map<F, U>(&self, f: F) -> U where T: Default, F: FnOnce(T) -> (T, U);

A completely different option could be to not take a function as argument, but return a 'smart pointer' object that contains the taken/copied value and writes it back to the cell when it is destructed.

Then you could write

    a.take_update().push(1);

    *b.copy_update() += 1;

[[Demo implementation](https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=7ab3c9318a91794d7041c66b6fb9f1fe)]

@m-ou-se I think that would be unsafe suprising because multiple instances of the 'smart pointer' / guard could exist simultaneously.

        let mut cu1 = b.copy_update();
        let mut cu2 = b.copy_update();
        *cu1 += 1;
        *cu2 += 1;
        *cu1 += *cu2;

https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=f3fc67425f1b7f1dba1220fb961ad7d9

The proposed implementation appears to move the value out of the cell and into the smart pointer; pointer in this case is a misnomer. This will lead to the old c++ auto_ptr problem: not unsound, but surprising in nontrivial use cases (ie, if two exist at the same time, one will receive a garbage value)

None of the proposed solutions (with a T -> T lambda, with a &mut T -> () lambda, or with a 'smart pointer'/RaiiGuard/younameit, etc.) are unsafe. All of them can be implemented using only safe code.

All of them have this 'update while updating' problem. That's not inherent to a specific solution, but inherent to Cell. There's just no way to 'lock' access to a Cell. That's the entire point of a Cell. Rust has a built-in solution for this problem in general: only allowing mutation through a provably uniquely borrowed &mut. Cell is the way to explicitly disable that feature.

With the current (unstable) update() function: [[Demo](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=239e7392e2b94402c49614f0ec32faa2)]

    let a = Cell::new(0);
    a.update(|x| {
        a.update(|x| x + 1);
        x + 1
    });
    assert_eq!(a.get(), 1); // not 2

With a &mut:

    let a = Cell::new(0);
    a.update(|x| {
        a.update(|x| *x += 1);
        *x += 1;
    });
    assert_eq!(a.get(), 1); // not 2

With a copy-containing 'smart pointer':

    let a = Cell::new(0);
    {
        let mut x = a.copy_update();
        *a.copy_update() += 1;
        *x += 1;
    }
    assert_eq!(a.get(), 1); // not 2

There is no way to avoid this problem. Every possible solution we come up with will have this problem. All we can do is try to make it easier/less verbose to write the commonly used .get() + .set() (or .take() + .set()) combination.

I think Cell::update should be marked as unsafe (I know this idea was already discarded, but I currently don't know about alternatives, or do we have any other "fitting" attribute which marks functions as potentially suprising/non-trivial with edge cases, e.g. something that marks a function call as "review me more intense, I may have suprising side-effects, but I'm not unsafe"?) even though it doesn't contain unsafe code, because of this non-trivial/possible suprising behavoir, which (I think) doesn't happen when using normal .get/.set methods (at least not "hidden").

// e.g.:
  // current Cell::update function, edge-case behavoir isn't obvious
  let a = Cell::new(0);
  a.update(|x| {
    a.update(|x| *x += 1);
    *x += 1;
  });
  assert_eq!(a.get(), 1);

// versus
  // same workflow without the "blackbox" Cell::update, edge-case behavoir is obvious
  let a = Cell::new(0);
  {
    let old = a.get();
    let new2 = {
      let new1 = old + 1;
      a.set(new1);
      old + 1
    };
    a.set(new2);
  }
  assert_eq!(a.get(), 1);

Do we have some existing use case which would hugely benefit from the current Cell::update?

My gut is that it covers a wider range of cases; that is, it's more likely for a type to implement Default but not Copy than the other way around

There are quite a few types that implement Copy but not Default. For example: &T, Instant, SystemTime, IpAddr, NonNull<T>, NonZero*, MaybeUninit<T>, pointers, ThreadId, ...

(And of course, there are a lot of types that implement Default that don't implement Copy, such as Vec<T> and many more.)

Temporarily placing the Default value in the Cell (e.g. 0 for i32) while updating might lead to harder to debug issues: if a Cell is accessed while it is being updated, it then contains an unexpected value (0) instead of just the old value.

So I think we really need both options.

Do we have some existing use case which would hugely benefit from the current Cell::update?

I was working on a variant on Rc<T> recently, and the source contains lines like:

if self.inner.counter().strong.update(|x| x - 1) == 0 {
    ...
}

That'd be a lot more verbose to write with separate .get() and .set().

The only way to make absolutely sure 'update-while-updating' doesn't happen, is to not allow the user to provide arbitrary code to update the value. Then we would basically end up with the same restrictive interface as the atomics: fetch_add(1), fetch_sub(1), etc. for a few 'approved' operations like |x| x + y and |x| x - y, etc. But that would be way less flexible. (Atomics are usually integers, but Cells are used for all kind of types.)

Edit: Nevermind, this doesn't work.

To put the problem in other words, what we need is for the borrow checker to treat &self as &mut self so that the lambda is not allowed to borrow self.

Okay, I'm just trying to think outside the box here...

What if we created a way to tell the borrow checker that a borrow is "exclusive", similar to a mutable borrow?

fn update<F>(#[exclusive_borrow] &self, f: F) -> T where F: FnOnce(T) -> T
cell.update(|x| {
    cell.update(|x| x + 1); // Error: cell is exclusively borrowed above
    x + 1
});

Is it an option to leave the update-while-updating problem to the user? We could put a clear warning in the documentation.

To put the problem in other words, what we need is for the borrow checker to treat &self as &mut self so that the lambda is not allowed to borrow self.

That's not really possible. It could be some function deep within some function call that also happens to have a reference to the cell.

The entire point of the Cell is to disable this part of the borrow checker. If there was a way to prove to the borrow checker you're only modifying it from one place, you didn't need a Cell<T> to begin with. Then you could just have used a T instead.

Is it an option to leave the update-while-updating problem to the user? We could put a clear warning in the documentation.

I think that's fine. It's also our only option, other than not providing any update mechanism at all.

Then I suppose the behavior of nested updates needs to at least be well defined.

I would think that the update method opens the door for compiler optimizations, and that could be part of the motivation for adding the method. For example, shouldn't we be able to optimize away Default::default() in some cases?

With all proposed solutions, 'nested' updates are well defined.

It isn't needed for optimization. The optimizer will understand a .get()+.set() (or .take()+.set()) just as well. An update function is just to make it shorter to write, and possibly to make it easier to write correct code.

In most cases, the optimizer will be able optimize the Default::default() value away, sure. However, if you're calling a (non-inline) function from another crate, it can not always know whether the function can panic or read from the Cell, so it can't optimize away writing the temporary placeholder.

[[Demo](https://godbolt.org/z/PJ6WvB)]

The "smart pointer" idea is really neat. It just seems a little too obfuscated since the write back to the Cell is hidden. As soon as somebody decides to bind the "smart pointer" to a variable, it gets more confusing and the user probably should have used get and set instead. With a lambda, the timing of the update to the Cell is clear.

I second the motion that we replace update with get_update and take_update, but with lambda type T -> T. The need for having both has been demonstrated. The value must be copied or taken to supply the lambda. I think the API should just be upfront about that fact. This will alleviate the risk that "nested" updates cause surprising behavior. It's also very consistent with the current API.

The smart pointer idea could be considered as a separate enhancement, unless people feel that that should be the main implementation of update.

What if we created a way to tell the borrow checker that a borrow is "exclusive", similar to a mutable borrow?

By definition, a mutable borrow is an exclusive borrow.

Can't we require closure to be Send to make variant with &mut T sound?

fn update<F>(&self, f: F) where F: Send + FnOnce(&mut T);

Since Cell is !Sync anything that can contain reference to Cell must be !Send which prevents any kind of reference to Cell from appearing inside the closure, thus trivially preventing any recursive calls on the same Cell. Here is a playground that fails to compile on "update while updating". Error message is confusing, though, because threads are not involved here, but with proper documentation it shouldn't be a problem.

This is an abuse of Send to get desired results. Send should only be used in the context of threading, and so it doesn't fit here. Secondly, it would prevent many valid use-cases for example, you can't capture a Rc<_> (which you would probably do if you are using a Cell).

@rrevenantt I brought up the same idea several years ago; unfortunately it's still not sufficient for soundness. The closure could still refer to a Cell through a thread-local variable, which is the only specific example I remember (but one is sufficient to demonstrate unsoundness), but IIRC, there were other ways to circumvent the restriction as well.

@RustyYato Please try to keep an open mind. I'm sure there have been other cases in Rust's history where something originally intended for one purpose turned out to be useful for another one as well, and in general it can be a fruitful and enlightening direction to explore. This also would not have removed any of Cell's existing methods, only added a new one with a different tradeoff.

you can't capture a Rc<_> (which you would probably do if you are using a Cell)

(And furthermore, if you are using a Cell then you couldn't call the method, so this logic is backwards.)

Any reason why the API for RefCell and Cell is so inconsistent?

RefCell has a method replace_with similar to the method update for Cell but return the old value instead.

Shouldn't it be named update_with and be available on both or at least have the methods update and replace_with available for both?

Was this page helpful?
0 / 5 - 0 ratings