Rust: Tracking issue for Cell::update

Created on 23 Apr 2018 · 37Comments · Source: rust-lang/rust

This issue tracks the cell_update feature.

The feature adds Cell::update, which allows one to more easily modify the inner value.
For example, instead of c.set(c.get() + 1) now you can just do c.update(|x| x + 1):

let c = Cell::new(5);
let new = c.update(|x| x + 1);

assert_eq!(new, 6);
assert_eq!(c.get(), 6);

B-unstable C-tracking-issue Libs-Small Libs-Tracked T-libs

Source

stjepang

👍4

Most helpful comment

@oli-obk

I like the second option (taking a &mut T in the closure). There's an interesting variation on this option - instead of requiring T: Copy we could require T: Default to support updates like this one:

let c = Cell::new(Vec::new());
c.update(|v| v.push("foo"));

Perhaps we could have two methods with different names that clearly reflect how they internally work?

fn get_update<F>(&self, f: F) where T: Copy, F: FnMut(&mut T);
fn take_update<F>(&self, f: F) where T: Default, F: FnMut(&mut T);

stjepang on 24 Apr 2018

👍9

All 37 comments

Not a blocking issue for merging this as an unstable feature, so I'm registering my concern here:

The update function returning its new value feels very ++i-ish. Similar to += returning the result of the operation (which rust luckily does not!). I don't think we should be doing this. Options I see:

return () (that was the suggestion in the PR)
make the closure argument a &mut T, and allow the closure to return any value.
- if users want the "return new/old value" behaviour, they can now encode it themselves via |x| {*x += 1; *y }

Maybe there are other options?

Data point: the volatile crate's update method returns () (https://github.com/embed-rs/volatile/blob/master/src/lib.rs#L134)

oli-obk on 24 Apr 2018

👍3

@oli-obk

let c = Cell::new(Vec::new());
c.update(|v| v.push("foo"));

Perhaps we could have two methods with different names that clearly reflect how they internally work?

fn get_update<F>(&self, f: F) where T: Copy, F: FnMut(&mut T);
fn take_update<F>(&self, f: F) where T: Default, F: FnMut(&mut T);

stjepang on 24 Apr 2018

👍9

What do you think, @SimonSapin? Would two such methods, get_update and take_update, be a more appropriate interface?

stjepang on 2 May 2018

👍1

The T: Default flavor feels kinda awkward but I don’t know how to put that in more concrete terms, sorry :/

It looks like there is a number of slightly different possible APIs for this. I don’t have a strong opinion on which is (or are) preferable.

SimonSapin on 14 May 2018

Taking &mut in this method would be awesome, I would love to use it.

Unfortunately it seems that it would have to be unsafe, because AFAIK there is no way in Rust to restrict usage like this:

let c = Cell::new(vec![123]);
c.update(|vec| {
    let x = &mut vec[0];            // x references 123
    c.update(|vec| vec.clear());    // x references uninitialised memory
    *x = 456;                       // BOOM! undefined behavior
})

I hope, I'm wrong.

CodeSandwich on 6 Aug 2018

👍3

@CodeSandwich it is a fundamental restriction of Cell<T> that a reference &T or &mut T (or anything else inside of T) must not be given to the value inside the cell. It’s what makes it sound. (This is the difference with RefCell.)

We could make an API that takes a closure that takes &mut T, but that reference would not be to the inside of the cell. It would be to value that has been copied (with T: Copy) or moved (and temporarily replaced with some other value, with T: Default) onto the stack and will be copied/moved back when the closure returns. So code like your example would be arguably buggy but still sound: the inner update() would operate on a temporary value that would be overwritten at the end of the outer one.

SimonSapin on 7 Aug 2018

👍2

I think, that a modifier method might be safe if we just forbid usage of reference to Cell's self inside. Rust does not have mechanism for forbidding usage of a SPECIFIC reference, but it does have mechanism for forbidding usage of ALL references: 'static closures.

I've created a simple implementation of such method and I couldn't find any hole in it. It's a limited solution, but it's enough for many use cases including simple number and collections modifications.

CodeSandwich on 13 Aug 2018

A 'static closure could still reach the Cell, for example by owning a Rc.

SimonSapin on 13 Aug 2018

That's absolutely right :(

CodeSandwich on 13 Aug 2018

It seems like, the semantic are a bit different between Default and Copy: the later is more light weighted, as it does not have to fill the hole behind with some temporary rubbish.

What shall we name those variants? update_move for the Default case?

Also there is another variation: update_unsafe which will be unsafe and does not require a trait bound: it only moves the value and leave uninitialized value behind, before the method returns. This will make any nested calls UB and so have to be marked as unsafe.

pub unsafe fn update_unsafe<F>(&self, f: F) -> T where F: FnOnce(T) -> T;

earthengine on 3 Oct 2018

if you’re gonna use unsafe anyway you can use .get() and manipulate the raw pointer. I don’t think we should facilitate it more than that.

SimonSapin on 3 Oct 2018

👍2

IMO, closure receiving a &mut to a stack copy isn't that advantageous ergonomically in terms of offering flexibility. You still need to jump through hoops to get the old value if you want to. Consider:

let id = next_id.update(|x| { let y = *x; *x += 1; y });

I think the most universally applicable thing to do would be to offer all of update, get_and_update and update_and_get:

things.update(|v| v.push(4)); // returns ()
// Alternative with v by-move. I'd guess this one isn't as useful, and you can easily use
// mem::swap or mem::replace in the by-ref version instead if you need it.
things.update(|v| { v.push(4); v });
let id = next_id.get_and_update(|x| x + 1);
let now = current_time.update_and_get(|x| x + 50);

These are 3 additional names, but I think they'll make the function more useful. A by-mut-ref update would only be useful for complex types where you're likely to do in-place updates, but I find myself doing the .get() -> update/keep value -> .set() pattern a lot with Cell.

yuriks on 15 Oct 2018

👍1

Related conversation on a similar PR (for RefCell): https://github.com/rust-lang/rust/pull/38741

peterjoel on 13 May 2019

Has it been considered to use an arbitrary return type on top of updating the value? That is (naming aside),

fn update1<F>(&self, f: F) where F: FnOnce(T) -> T;
fn update2<F, R>(&self, f: F) -> R where F: FnOnce(T) -> (T, R);

The second method has a very functional feel to it and appears both flexible and sound (i.e. you can't return &T). You could get either ++i or i++ behavior out of it. Also it is the same whether T: Copy or not. Though ergonomics are of course debatable.

EDIT: Something to chew on:

fn dup<T: Copy>(x: T) -> (T, T) { (x, x) }
let i = i.update2(|i| (i + 1, i)); // i++
let i = i.update2(|i| dup(i + 1)); // ++i

matthew-mcallister on 6 Jan 2020

👍2

+1 for a T version, as opposed to &mut T. It's easy to construct demo code that is more expressive with either version (ie, x + 1 vs vec.push(..)), but unless we add both, I think the more general version is preferable, since under the hood they both require a swap-out swap-in for soundness. Also +1 for an extra version with a return value.

Finally, I think Default is preferable to Copy. My gut is that it covers a wider range of cases; that is, it's more likely for a type to implement Default but not Copy than the other way around, and hopefully for simple cases the compiler should be able to optimize out the extra writes anyway.

Lucretiel on 17 Jan 2020

It sounds like accepting &mut T isn't a viable option.

I'm sort of combining previous ideas here. How about this?

fn update<F>(&self, f: F) where T: Copy, F: FnOnce(T) -> T;
fn update_map<F, U>(&self, f: F) -> U where T: Copy, F: FnOnce(T) -> (T, U);
fn take_update<F>(&self, f: F) where T: Default, F: FnOnce(T) -> T;
fn take_update_map<F, U>(&self, f: F) -> U where T: Default, F: FnOnce(T) -> (T, U);

camsteffen on 25 Feb 2020

A completely different option could be to not take a function as argument, but return a 'smart pointer' object that contains the taken/copied value and writes it back to the cell when it is destructed.

Then you could write

    a.take_update().push(1);

    *b.copy_update() += 1;

[[Demo implementation](https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=7ab3c9318a91794d7041c66b6fb9f1fe)]

m-ou-se on 29 Feb 2020

@m-ou-se I think that would be ~~unsafe~~ suprising because multiple instances of the 'smart pointer' / guard could exist simultaneously.

        let mut cu1 = b.copy_update();
        let mut cu2 = b.copy_update();
        *cu1 += 1;
        *cu2 += 1;
        *cu1 += *cu2;

https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=f3fc67425f1b7f1dba1220fb961ad7d9

zserik on 2 Mar 2020

The proposed implementation appears to move the value out of the cell and into the smart pointer; pointer in this case is a misnomer. This will lead to the old c++ auto_ptr problem: not unsound, but surprising in nontrivial use cases (ie, if two exist at the same time, one will receive a garbage value)

Lucretiel on 2 Mar 2020

None of the proposed solutions (with a T -> T lambda, with a &mut T -> () lambda, or with a 'smart pointer'/RaiiGuard/younameit, etc.) are unsafe. All of them can be implemented using only safe code.

All of them have this 'update while updating' problem. That's not inherent to a specific solution, but inherent to Cell. There's just no way to 'lock' access to a Cell. That's the entire point of a Cell. Rust has a built-in solution for this problem in general: only allowing mutation through a provably uniquely borrowed &mut. Cell is the way to explicitly disable that feature.

With the current (unstable) update() function: [[Demo](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=239e7392e2b94402c49614f0ec32faa2)]

    let a = Cell::new(0);
    a.update(|x| {
        a.update(|x| x + 1);
        x + 1
    });
    assert_eq!(a.get(), 1); // not 2

With a &mut:

    let a = Cell::new(0);
    a.update(|x| {
        a.update(|x| *x += 1);
        *x += 1;
    });
    assert_eq!(a.get(), 1); // not 2

With a copy-containing 'smart pointer':

    let a = Cell::new(0);
    {
        let mut x = a.copy_update();
        *a.copy_update() += 1;
        *x += 1;
    }
    assert_eq!(a.get(), 1); // not 2

There is no way to avoid this problem. Every possible solution we come up with will have this problem. All we can do is try to make it easier/less verbose to write the commonly used .get() + .set() (or .take() + .set()) combination.

m-ou-se on 2 Mar 2020

👍3

I think Cell::update should be marked as unsafe (I know this idea was already discarded, but I currently don't know about alternatives, or do we have any other "fitting" attribute which marks functions as potentially suprising/non-trivial with edge cases, e.g. something that marks a function call as "review me more intense, I may have suprising side-effects, but I'm not unsafe"?) even though it doesn't contain unsafe code, because of this non-trivial/possible suprising behavoir, which (I think) doesn't happen when using normal .get/.set methods (at least not "hidden").

// e.g.:
  // current Cell::update function, edge-case behavoir isn't obvious
  let a = Cell::new(0);
  a.update(|x| {
    a.update(|x| *x += 1);
    *x += 1;
  });
  assert_eq!(a.get(), 1);

// versus
  // same workflow without the "blackbox" Cell::update, edge-case behavoir is obvious
  let a = Cell::new(0);
  {
    let old = a.get();
    let new2 = {
      let new1 = old + 1;
      a.set(new1);
      old + 1
    };
    a.set(new2);
  }
  assert_eq!(a.get(), 1);

Do we have some existing use case which would hugely benefit from the current Cell::update?

zserik on 2 Mar 2020

My gut is that it covers a wider range of cases; that is, it's more likely for a type to implement Default but not Copy than the other way around

There are quite a few types that implement Copy but not Default. For example: &T, Instant, SystemTime, IpAddr, NonNull<T>, NonZero*, MaybeUninit<T>, pointers, ThreadId, ...

(And of course, there are a lot of types that implement Default that don't implement Copy, such as Vec<T> and many more.)

Temporarily placing the Default value in the Cell (e.g. 0 for i32) while updating might lead to harder to debug issues: if a Cell is accessed while it is being updated, it then contains an unexpected value (0) instead of just the old value.

So I think we really need both options.

m-ou-se on 2 Mar 2020

Do we have some existing use case which would hugely benefit from the current Cell::update?

I was working on a variant on Rc<T> recently, and the source contains lines like:

if self.inner.counter().strong.update(|x| x - 1) == 0 {
    ...
}

That'd be a lot more verbose to write with separate .get() and .set().

m-ou-se on 2 Mar 2020

The only way to make absolutely sure 'update-while-updating' doesn't happen, is to not allow the user to provide arbitrary code to update the value. Then we would basically end up with the same restrictive interface as the atomics: fetch_add(1), fetch_sub(1), etc. for a few 'approved' operations like |x| x + y and |x| x - y, etc. But that would be way less flexible. (Atomics are usually integers, but Cells are used for all kind of types.)

m-ou-se on 2 Mar 2020

👍2

Edit: Nevermind, this doesn't work.

To put the problem in other words, what we need is for the borrow checker to treat &self as &mut self so that the lambda is not allowed to borrow self.

Okay, I'm just trying to think outside the box here...

What if we created a way to tell the borrow checker that a borrow is "exclusive", similar to a mutable borrow?

fn update<F>(#[exclusive_borrow] &self, f: F) -> T where F: FnOnce(T) -> T

cell.update(|x| {
    cell.update(|x| x + 1); // Error: cell is exclusively borrowed above
    x + 1
});

camsteffen on 2 Mar 2020

Is it an option to leave the update-while-updating problem to the user? We could put a clear warning in the documentation.

camsteffen on 2 Mar 2020

To put the problem in other words, what we need is for the borrow checker to treat &self as &mut self so that the lambda is not allowed to borrow self.

That's not really possible. It could be some function deep within some function call that also happens to have a reference to the cell.

The entire point of the Cell is to disable this part of the borrow checker. If there was a way to prove to the borrow checker you're only modifying it from one place, you didn't need a Cell<T> to begin with. Then you could just have used a T instead.

m-ou-se on 2 Mar 2020

👍1

Is it an option to leave the update-while-updating problem to the user? We could put a clear warning in the documentation.

I think that's fine. It's also our only option, other than not providing any update mechanism at all.

m-ou-se on 2 Mar 2020

Then I suppose the behavior of nested updates needs to at least be well defined.

I would think that the update method opens the door for compiler optimizations, and that could be part of the motivation for adding the method. For example, shouldn't we be able to optimize away Default::default() in some cases?

camsteffen on 2 Mar 2020

With all proposed solutions, 'nested' updates are well defined.

It isn't needed for optimization. The optimizer will understand a .get()+.set() (or .take()+.set()) just as well. An update function is just to make it shorter to write, and possibly to make it easier to write correct code.

In most cases, the optimizer will be able optimize the Default::default() value away, sure. However, if you're calling a (non-inline) function from another crate, it can not always know whether the function can panic or read from the Cell, so it can't optimize away writing the temporary placeholder.

[[Demo](https://godbolt.org/z/PJ6WvB)]

m-ou-se on 2 Mar 2020

👍2

The "smart pointer" idea is really neat. It just seems a little too obfuscated since the write back to the Cell is hidden. As soon as somebody decides to bind the "smart pointer" to a variable, it gets more confusing and the user probably should have used get and set instead. With a lambda, the timing of the update to the Cell is clear.

camsteffen on 2 Mar 2020

👍1

I second the motion that we replace update with get_update and take_update, but with lambda type T -> T. The need for having both has been demonstrated. The value must be copied or taken to supply the lambda. I think the API should just be upfront about that fact. This will alleviate the risk that "nested" updates cause surprising behavior. It's also very consistent with the current API.

The smart pointer idea could be considered as a separate enhancement, unless people feel that that should be the main implementation of update.

camsteffen on 2 Mar 2020

What if we created a way to tell the borrow checker that a borrow is "exclusive", similar to a mutable borrow?

By definition, a mutable borrow is an exclusive borrow.

Lucretiel on 5 Mar 2020

Can't we require closure to be Send to make variant with &mut T sound?

fn update<F>(&self, f: F) where F: Send + FnOnce(&mut T);

Since Cell is !Sync anything that can contain reference to Cell must be !Send which prevents any kind of reference to Cell from appearing inside the closure, thus trivially preventing any recursive calls on the same Cell. Here is a playground that fails to compile on "update while updating". Error message is confusing, though, because threads are not involved here, but with proper documentation it shouldn't be a problem.

rrevenantt on 9 Jun 2020

👎1 👍1

This is an abuse of Send to get desired results. Send should only be used in the context of threading, and so it doesn't fit here. Secondly, it would prevent many valid use-cases for example, you can't capture a Rc<_> (which you would probably do if you are using a Cell).

RustyYato on 9 Jun 2020

@rrevenantt I brought up the same idea several years ago; unfortunately it's still not sufficient for soundness. The closure could still refer to a Cell through a thread-local variable, which is the only specific example I remember (but one is sufficient to demonstrate unsoundness), but IIRC, there were other ways to circumvent the restriction as well.

@RustyYato Please try to keep an open mind. I'm sure there have been other cases in Rust's history where something originally intended for one purpose turned out to be useful for another one as well, and in general it can be a fruitful and enlightening direction to explore. This also would not have removed any of Cell's existing methods, only added a new one with a different tradeoff.

you can't capture a Rc<_> (which you would probably do if you are using a Cell)

(And furthermore, if you are using a Cell then you couldn't call the method, so this logic is backwards.)

glaebhoerl on 10 Jun 2020

Any reason why the API for RefCell and Cell is so inconsistent?

RefCell has a method replace_with similar to the method update for Cell but return the old value instead.

Shouldn't it be named update_with and be available on both or at least have the methods update and replace_with available for both?

Kevan0v on 23 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Use #[repr(C)] HList's to infer type-erased fmt fn pointers in format_args!'s static data.

eddyb · 3Comments

Callback parameter names are missing from rustdoc

dtolnay · 3Comments

ThreadRng performance bug

SharplEr · 3Comments

Documentation: Unclear difference between "string slices" and "string literals" for newcomers

defuz · 3Comments

#![allow(unknown_lints)] does not work if there is a unknow lint at the same level since nightly-2017-08-11-x86_64-unknown-linux-gnu

overvenus · 3Comments