Rust: Generator size: unwinding and drops force extra generator state allocation

Created on 12 Mar 2019 · 42Comments · Source: rust-lang/rust

#![feature(generators, generator_trait)]

use std::ops::Generator;

struct Foo([u8; 1024]);

impl Drop for Foo {
    fn drop(&mut self) {}
}

fn simple() -> impl Generator<Yield = (), Return = ()> {
    static || {
        let first = Foo([0; 1024]);
        let _second = first;
        yield;
    }
}

fn complex() -> impl Generator<Yield = (), Return = ()> {
    static || {
        let first = Foo([0; 1024]);
        { foo(); fn foo() {} }
        let _second = first;
        yield;
    }
}

fn main() {
    dbg!(std::mem::size_of_val(&simple()));
    dbg!(std::mem::size_of_val(&complex()));
}

The two generators returned by simple and complex should be equivalent, but complex takes twice as much space:

[foo.rs:29] std::mem::size_of_val(&simple()) = 1028
[foo.rs:30] std::mem::size_of_val(&complex()) = 2056

Dumping out the MIR (with rustc 1.34.0-nightly (f66e4697a 2019-02-20)) shows an issue with how unwinding from foo interacts with the two stack slots for first and _second, using a dynamic drop flag means that first is "live" through the path that goes through the yield, even though the drop flag is guaranteed to be false. (The below graph shows the basic blocks, with the psuedo-code run in them and which variables are alive when exiting the block):

MIR graph

A-async-await A-generators AsyncAwait-Triaged T-compiler

Source

Nemo157

Most helpful comment

Agreed. And if it's just size_of_val we could let the analysis know that it doesn't actually escape the pointer. (Or we could do inlining and then that becomes obvious.)
Even with full Stacked Borrows, we'd still have an "uncertainty principle" (the physicist in me dies a little in the face of this very inaccurate analogy but whatever ;) when there is any interior mutability in a local of the future. That still seems bad enough for debugging, does it not?

Mm yeah, I guess the true fix for this rough edge is MIR inlining, with a special case for size_of_val being an acceptable interim solution.

Still, with regard to optimization I would like to support as many cases as possible.

I guess this might be already understood, but size_of_val shouldn't be the only interim statement that should be looked at and improved. It was mainly an instrument to surface the issue, but it seems like the size extension happens in other places too.

There is quite a bit of code around which moves futures just in order to make sure users can't fiddle with them anymore, e.g. join! in futures-rs.

With @tmandry's last optimization I got a not too large future based program already from a a 70kB Future to a 30kB Future without further changes. However with e.g. replacing some await based mechanisms with manual combinators I brought it down further to less than 10kB - and I'm pretty sure there is a lot more potential since the program is actually pretty tiny.

I can't comment on what can be done in the compiler to improve this, but I definitely encourage all further investigations and improvements on this. The impact seems to be huge!

Matthias247 on 14 Jun 2019

👍2

All 42 comments

@rustbot modify labels: A-generators and T-compiler.

Nemo157 on 12 Mar 2019

Is this related to #52924?

matprec on 23 Mar 2019

@MSleepyPanda in as much as it’s about generators being too big. The specific optimisation proposed there won’t help here as first and second are both live over the same yield point. What should really happen is that first is not kept live across the yield at all and it should be allocated in the resume function stack instead of the generator state. (And then some sort of copy-elision optimisation might eliminate that allocation and use the allocation in the generator state directly, but IMO that’s less important (and probably more difficult) than ensuring the memory usage is reduced).

Nemo157 on 23 Mar 2019

cc @tmandry

cramertj on 2 Apr 2019

Another example, without drop:

#![feature(generators, generator_trait)]

use std::ops::Generator;

struct Foo([u8; 1024]);

fn simple() -> impl Generator<Yield = (), Return = ()> {
    static || {
        let first = Foo([0; 1024]);
        let _second = first;
        yield;
    }
}

fn complex() -> impl Generator<Yield = (), Return = ()> {
    static || {
        fn foo(_: &mut Foo) {}

        let mut first = Foo([0; 1024]);
        foo(&mut first);
        let mut second = first;
        foo(&mut second);
        let mut third = second;
        foo(&mut third);
        let mut _fourth = third;

        yield;
    }
}

fn main() {
    dbg!(std::mem::size_of_val(&simple()));
    dbg!(std::mem::size_of_val(&complex()));
}

outputs

[src/main.rs:32] std::mem::size_of_val(&simple()) = 4
[src/main.rs:33] std::mem::size_of_val(&complex()) = 3076

andreytkachenko on 29 May 2019

I was thinking, we can solve this by adding the following rules to our MaybeStorageLive dataflow analysis (possibly being renamed to RequiresStorage):

(Existing) StorageLive(x) => mark x live
(Existing) StorageDead(x) => mark x dead
If a local is moved from, _and has never been mutably borrowed_, mark it dead
If (any part of) a local is initialized, mark it live

We must not optimize away storage of locals that are mutably borrowed, because as @matthewjasper notes in #61430, it isn't decided that the following is UB:

let mut x = String::new();
let p = &mut x as *mut String;
let y = x;
p.write(String::new());

It's an open question of whether we can say "the local hasn't been mutably borrowed _up to here_" when evaluating rule 3. I'd prefer to make the optimization as smart as we can, but MIR probably allows borrowing a moved-from value and mutating it.

Is this sound?

cc @cramertj @eddyb @matthewjasper @RalfJung @Zoxc

tmandry on 12 Jun 2019

The "mutably" of "mutably borrowed" is a red herring IMO, unless you want to check for Freeze, which will conservatively default to "may contain interior mutability" once generic parameters are thrown into the mix.

eddyb on 12 Jun 2019

👍1

@tmandry Interesting. How bad would it be to relax this to "if a local is moved from and never has had its address taken"? Then we can be sure without any assumptions about Stacked Borrows that direct accesses to the local variable are the only way to observe it, and those will be prevented after a move. This would also alleviate @eddyb's concern I think.

Also, what is the granularity here? Without Stacked Borrows assumptions we can only do this on a per-local level, not on a per-field-of-a-struct level. Taking the address of one field leaks the address of other fields (if layout assumptions are made).

RalfJung on 12 Jun 2019

Taking the address of one field leaks the address of other fields (if layout assumptions are made).

Oh, really? I suppose that makes sense in a "list of things you can do" sense, but it's surprising and seems like it would make optimizations like SROA hard.

cramertj on 12 Jun 2019

Well, Stacked Borrows might restrict this. But then we get into https://github.com/rust-lang/unsafe-code-guidelines/issues/134. So for now I'd say as compiler authors we should assume that it does leak the entire local.

RalfJung on 12 Jun 2019

The "mutably" of "mutably borrowed" is a red herring IMO, unless you want to check for Freeze, which will conservatively default to "may contain interior mutability" once generic parameters are thrown into the mix.

Oh right, well Freeze is an option. I don't see why we need to rule out types with generic parameters, at least based on this definition. But if you want to maintain flexibility on the definition then I suppose we could.

if a local is moved from _and never has had its address taken_

Yes, but how would we enforce this? A borrow can be passed to another function, which can convert to a pointer.

I'd rather not disable the optimization when a borrow escapes the function. One practical reason for this is that std::mem::size_of_val(&x) is used to benchmark the size of futures, and I don't want that to have an impact on optimizations. I don't think Rust futures should have their own uncertainty principle.

Also, what is the granularity here? Without Stacked Borrows assumptions we can only do this on a per-local level, not on a per-field-of-a-struct level. Taking the address of one field leaks the address of other fields (if layout assumptions are made).

I'm only proposing to do this on a per-local level, for now.

tmandry on 12 Jun 2019

@tmandry

Yes, but how would we enforce this? A borrow can be passed to another function, which can convert to a pointer.

I think that @RalfJung was including & as an operator which would "take the address" of a local. That said, I agree with you that we can do better than this and only include & if T: ?Freeze.

cramertj on 12 Jun 2019

Yes, but how would we enforce this? A borrow can be passed to another function, which can convert to a pointer.

"the address taken" = "has the & operator executed". That should be trivial to enforce?

Basically what @cramertj said. Doing "better than this" requires committing to parts of Stacked Borrows. I am not sure if @rust-lang/lang is ready for that. For sure it shouldn't happen en passant as part of a PR like this.

RalfJung on 12 Jun 2019

"the address taken" = "has the & operator executed". That should be trivial to enforce?

Then yes, that's easy to enforce.

Doing "better than this" requires committing to parts of Stacked Borrows.

Can you expand on this? How would checking for Freeze require committing to Stacked Borrows?

tmandry on 12 Jun 2019

@tmandry I'm guessing @RalfJung is referring the fact that it commits us to things like this being UB:

let mut x = String::new();
let addr_x: *const String = reference_to_pointer(&x);
drop(x);
ptr::write(addr_x as *mut String, String::new());

cramertj on 12 Jun 2019

👍1

@cramertj indeed that's what I mean. This would be the first new exploitation of "reference-based UB" since... forever, I think. It should happen deliberately and prominently and have its own discussion.

RalfJung on 12 Jun 2019

@RalfJung Hmm okay, then I think we should start having that discussion soon (in a different issue, probably).

This relates to the maturity of await. It's common practice to debug the size of your futures by calling size_of_val on the inner futures. In this state of the world, doing so would increase the size of your outer future! I'm uncomfortable shipping a feature in that state, at least without a plan for making it better.

tmandry on 12 Jun 2019

@tmandry notice that the moment your future is generic or has a Cell local, taking a shared reference will have to pessimize such optimizations due to the necessary is_frozen check.

RalfJung on 12 Jun 2019

Also, won't it usually await the inner future, so anyway its address has been taken mutably?

RalfJung on 12 Jun 2019

is generic

I'm not sure I understand this bit-- couldn't the optimization be monomorphization-dependent?

cramertj on 12 Jun 2019

(possibly being renamed to RequiresStorage):

FWIW I'm probably reverting all changes to the generator transform in that PR. The MaybeStorageLive analysis is also used for determining what variables need to have StorageLive when we enter resume, so it probably shouldn't be modified.

matthewjasper on 12 Jun 2019

👍1

Also, won't it usually await the inner future, so anyway its address has been taken mutably?

@RalfJung What I'm trying to optimize is this:

let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
x.await

which today expands to

let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
{
  let mut pinned = x;  // <--- This move
  loop {
    // use `&mut pinned`
    yield;
  }
}

The move let mut pinned = x; doubles the size of the future that contains it today, because we reserve space for both x and pinned. What we need is for the generator transform to recognize that x does not need to be saved across the yield at all.

Based on our discussion, it seems pretty straightforward to allow this optimization when the size_of_val line is not present.

tmandry on 12 Jun 2019

👍1

For what it's worth, that move (let mut pinned = x;) is just to prevent the user from using x after calling .await, and could be omitted if we had some way to mark a local as "moved" without actually moving it (drop doesn't work because it can cause the value to move to a different place in memory).

cramertj on 12 Jun 2019

A secondary question I have is, can we rely on the ordering between the move and the borrow? e.g.

let mut x = String::new();
move_from(x);
// ...later...
x = bar();
let addr_x = &x as *const String;
move_from(x);
ptr::write(addr_x as *mut String, String::new());

Since the _first_ instance of move_from(x) happens before any borrow, can we consider x as not needing storage between that and when it is re-assigned? (Clearly, today we cannot do this for the second instance of move_from(x).)

I don't want to get bogged down in this question though, ~since I'm unlikely to rely on it at first anyway~.

tmandry on 12 Jun 2019

For what it's worth, that move (let mut pinned = x;) is just to prevent the user from using x after calling .await, and could be omitted if we had some way to mark a local as "moved" without actually moving it (drop doesn't work because it can cause the value to move to a different place in memory).

Actually i think we can do this today-- ptr::drop_in_place followed by mem::forget XD

cramertj on 12 Jun 2019

@cramertj Hmm, good to know, but this also affects the size of join! because we are moving all the sub-futures into a single joined future.

tmandry on 12 Jun 2019

@tmandry we can get away with a similar thing there by only putting references in the new state machine. It's higher overhead because of the additional pointers, but it'll prevent ever having to copy into a new location, and save space for large types, even those whose address had been taken.

cramertj on 13 Jun 2019

Still catching up, some comments on the way:

I'm not sure I understand this bit-- couldn't the optimization be monomorphization-dependent?

Oh so everything we are discussing here happens post-monomorphization? Never mind that part then.

Based on our discussion, it seems pretty straightforward to allow this optimization when the size_of_val line is not present.

Agreed. And if it's just size_of_val we could let the analysis know that it doesn't actually escape the pointer. (Or we could do inlining and then that becomes obvious.)

Even with full Stacked Borrows, we'd still have an "uncertainty principle" (the physicist in me dies a little in the face of this very inaccurate analogy but whatever ;) when there is any interior mutability in a local of the future. That still seems bad enough for debugging, does it not?

RalfJung on 13 Jun 2019

❤1 😄1

Actually i think we can do this today-- ptr::drop_in_place followed by mem::forget XD

Aaaand this doesn't work because it requires that the binding be mutable:

macro_rules! drop_with_guaranteed_move_elision {
    ($val:ident) => {
        unsafe {
            std::ptr::drop_in_place(&mut $val);
            std::mem::forget($val);
        }
    }
}

fn main() {
    let mut x = "foo".to_string(); // errors if `x` isn't `mut`
    drop_with_guaranteed_move_elision!(x);
}

cramertj on 13 Jun 2019

❤1

Agreed. And if it's just size_of_val we could let the analysis know that it doesn't actually escape the pointer. (Or we could do inlining and then that becomes obvious.)

Even with full Stacked Borrows, we'd still have an "uncertainty principle" (the physicist in me dies a little in the face of this very inaccurate analogy but whatever ;) when there is any interior mutability in a local of the future. That still seems bad enough for debugging, does it not?

Mm yeah, I guess the true fix for this rough edge is MIR inlining, with a special case for size_of_val being an acceptable interim solution.

Still, with regard to optimization I would like to support as many cases as possible.

tmandry on 13 Jun 2019

Still, with regard to optimization I would like to support as many cases as possible.

Sure you do, we all do. Just remember that you are paying for this optimization with the ~~blood~~ sweat of Rust programmers because you made their programs UB. ;)

One thing I am wondering about: in a situation like

let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
{
  let mut pinned = x;  // <--- This move
  loop {
    // use `&mut pinned`
    yield;
  }
}

why can't MIR building insert the StorageDead(x) right after the move? That would also make the optimization trivial. It wouldn't change the MIR semantics at all. It would, however, change the surface language semantics by adjusting StorageDead generation (which so far I have no good idea how to even specify, but with the impact it is having we will have to specify it some day).

RalfJung on 13 Jun 2019

Aaaand this doesn't work because it requires that the binding be mutable:

Closures have a similar problem with bindings having to be mutable for some stuff, and that's why we have "unique" borrows in the compiler. So maybe a similar technique could be used here?

A secondary question I have is, can we rely on the ordering between the move and the borrow? e.g.

Since the first instance of move_from(x) happens before any borrow, can we consider x as not needing storage between that and when it is re-assigned?

let mut x = String::new();
move_from(x);
// ...later...
x = bar();
let addr_x = &x as *const String;
move_from(x);
ptr::write(addr_x as *mut String, String::new());

You can rely on there not being any access to x between the first move_from and x = bar, yes. The address has not been observed yet and hence the program should be unable to notice that x actually did not have a stable address all the time.

This is subtle but I see nothing wrong with this reasoning.^^

RalfJung on 13 Jun 2019

Sure you do, we all do. Just remember that you are paying for this optimization with the ~blood~ sweat of Rust programmers because you made their programs UB. ;)

Fair enough :) Personally I'm just as wary of painting ourselves into a corner in terms of classes of optimizations we can do, though. One is a higher immediate cost, and the other is a smaller cost which will be paid by all future users of Rust. But anyway, I'll stop waxing philosophical about optimizations for now ;)

why can't MIR building insert the StorageDead(x) right after the move?

To me it seems just as hard as doing the analysis, because you have to make sure not to do this in instances where x has already been borrowed. (And make sure to issue StorageLive(x) again before re-assignment, but that seems easy.)

This is subtle but I see nothing wrong with this reasoning.

Great. In MIR is it possible to write code that takes the address of a local after it's been moved from (before it's reinitialized)? If so it should be easy to handle, we just mark locals as needing storage as soon as any part of them is borrowed.

tmandry on 14 Jun 2019

Agreed. And if it's just size_of_val we could let the analysis know that it doesn't actually escape the pointer. (Or we could do inlining and then that becomes obvious.)
Even with full Stacked Borrows, we'd still have an "uncertainty principle" (the physicist in me dies a little in the face of this very inaccurate analogy but whatever ;) when there is any interior mutability in a local of the future. That still seems bad enough for debugging, does it not?

Mm yeah, I guess the true fix for this rough edge is MIR inlining, with a special case for size_of_val being an acceptable interim solution.

Still, with regard to optimization I would like to support as many cases as possible.

There is quite a bit of code around which moves futures just in order to make sure users can't fiddle with them anymore, e.g. join! in futures-rs.

I can't comment on what can be done in the compiler to improve this, but I definitely encourage all further investigations and improvements on this. The impact seems to be huge!

Matthias247 on 14 Jun 2019

👍2

To me it seems just as hard as doing the analysis, because you have to make sure not to do this in instances where x has already been borrowed. (And make sure to issue StorageLive(x) again before re-assignment, but that seems easy.)

No, that's not what I mean. The idea is to aggressively insert StorageDead after moves, to make such programs UB. What I am more worried about is that the program might move out only in some conditional branches, and what that would mean for the storage of those locals.

The reason I think we need this is that validity of pointers is not the only concern here. Currently, the following (entirely safe) code will definitely return true:

let mut x = String::new();
let addr_x = &x as *const _ as usize;
drop(x);
// later
x = String::new();
let addr_x2 = &x as *const _ as usize;
return addr_x == addr_x2;

If we want to do optimizations like yours here (and I am totally sympathetic to that), we have to explain in the "Rust Abstract Machine" (and in Miri) why this program might return false. And the answer cannot be "there is UB", because this is safe code.

This is a topic that @nikomatsakis, @eddyb and me have touched on several times already, without ever hashing out a full plan. But in the current state of affairs, the only mechanism we have to "defeat" pointer equality tests like the above is to make sure that this is not the same allocation any more.

So, one thing we might do is to do StorageDead(x); StorageLive(x); immediately after every move. This "re-allocates" x and thus definitely kills any existing pointers and also "defeats" pointer comparisons. The immediate StorageLive is to keep the liveness state in sync in both branches of a conditional (which might or might not be relevant -- unfortunately LLVM's semantics for these intrinsics is less than clear). I guess the StorageLive could be moved down in cases where there is no merging control flow, which should give you your optimization in many cases.

Great. In MIR is it possible to write code that takes the address of a local after it's been moved from (before it's reinitialized)? If so it should be easy to handle, we just mark locals as needing storage as soon as any part of them is borrowed.

Maybe. Better assume so. Just look out for any & in the MIR.

RalfJung on 14 Jun 2019

So, one thing we might do is to do StorageDead(x); StorageLive(x); immediately after every move.

Okay, that sounds like something worth pursuing, but it also seems like it would have potential fallout we need to check for. I'm not sure I understand all the implications myself. Given that I have an optimization pass mostly working without it, maybe it would be good to set up a separate issue to track all the considerations involved. EDIT: Opened #61849

Also, I forgot to mention.. just adding StorageDead(x) isn't enough because of drop flags. @eddyb wanted to add a MIR sanity check that every use (including drop) is dominated by StorageLive(x). Adding StorageDead(x) after every move would cause that sanity check to fail when we're branching on drop flags.

We would need something that adds StorageLive(x) in a strategic place, in order to both pass this check and make the optimization "just work" based on StorageLive/StorageDead alone.

tmandry on 14 Jun 2019

Moving to "deferred" by the same logic as https://github.com/rust-lang/rust/issues/52924#issuecomment-502266293. @rust-lang/lang feel free to put the labels back if you disagree.

cramertj on 14 Jun 2019

I'd personally like to see this fixed before stabilization. It can cause the same exponential size effects as we were seeing before #60187, albeit in different contexts.

I'm hoping to have a fix up for review soon.

tmandry on 15 Jun 2019

@RalfJung But wasn't the point of "Operand::Move doesn't invalidate borrows" that source-level moves don't invalidate borrows?
We could certainly emit pairs of llvm.lifetime.{end,start} calls after an Operand::Move, without adding pairs of Storage{Dead,Live} statements into the MIR, if the goal is to make it UB to reuse an old pointer. But I thought you wanted to allow reinitialization after a move, with an old pointer?

eddyb on 18 Jun 2019

The only thing I want is a precise definition of the semantics in a way that can be dynamically checked (e.g. in Miri), and ideally I also want the semantics to not be full of weird special cases. ;)

We could certainly emit pairs of llvm.lifetime.{end,start} calls after an Operand::Move, without adding pairs of Storage{Dead,Live} statements into the MIR, if the goal is to make it UB to reuse an old pointer.

How would that work? Wouldn't that mean that legal MIR code (that uses some kind of trick to reinitialize after a move) becomes UB in LLVM?

RalfJung on 18 Jun 2019

How would that work? Wouldn't that mean that legal MIR code (that uses some kind of trick to reinitialize after a move) becomes UB in LLVM?

No, I was referring to the case where we want the semantics of Operand::Move to be that they invalidate any outstanding borrows, similar to Storage{Dead,Live} but without bloating the MIR/impacting analyses which rely on some sort of dominance relationship.

I don't understand your position now. Are you saying you don't mind if source-level moves invalidate borrows, you just don't want it be be encoded into Operand in MIR, but rather something more like StorageDead? That would make sense, I just kept thinking you were worried about source-level moves.

eddyb on 18 Jun 2019

Let's continue at https://github.com/rust-lang/rust/issues/61849#issuecomment-503250520.

RalfJung on 18 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings