Rust: StorageLive (and even StorageDead) may be unnecessary in MIR.

Created on 29 Jan 2020 · 11Comments · Source: rust-lang/rust

A while back I was discussing Storage{Live,Dead} and dominators, with @tmandry (in the context of generator layout optimizations), and came to the conclusion that StorageLive pretty much has to dominate all uses (I doubt we ever added a check that it does so, though).

More recently, I was trying to figure out what the simplest "StorageLive sinking" (i.e. moving the statement "later" in the CFG) optimization we could do was.

The conclusion I came to was that we might not need StorageLive at all, because there might be a deterministic "best placement" we could compute (assuming we need exactly one llvm.lifetime.start per alloca).

That best placement would be the least (common) dominator of all mentions of a MIR local.
Even indirect accesses require a direct borrow beforehand, so this should cover everything.

(Assuming that, given CFG points x, y, z, "x is a common dominator of y and z" means "x dominates both y and z", i.e. "to reach either y or z you must go through x first", and the "least" such x is the one not dominating other common dominators of y and z, i.e. it's "the closest to y and z")

This could be:

just before the single assignment of that local
- let x = x + y;
at the end of a block branching into paths which all assign that local
- let x = if c { a } else { b };
- let x; if c { x = a; } else { x = b; } (roughly equivalent)

I am not sure about interactions with loops, though.

But this doesn't have to remain theoretical, we could compute this "ideal StorageLive position" and then compare it with the existing one (presumably one would dominate the other? not sure this would catch any loop issues though).

StorageDead could also be similar ("least (common) post-dominator"?).

However, it also has the effect of invalidating borrows, so we would need to keep an InvalidateBorrows(x) statement around, and consider it one of the mentions of x.

Then "Storage{Live,Dead} range shrinking" would simply boil down to hoisting InvalidateBorrows(x) up past statements which couldn't indirectly access x.

cc @nikomatsakis @ecstatic-morse @rust-lang/wg-mir-opt

A-mir C-cleanup T-compiler

Source

eddyb

👍4

All 11 comments

cc @rust-lang/wg-mir-opt (yay we have a github team now)

wrt borrow invalidation, would this be a statement emitted by borrowck?

oli-obk on 29 Jan 2020

wrt borrow invalidation, would this be a statement emitted by borrowck?

No, lifetime-based reasoning is an easy way to break unsafe code.

It would be emitted where StorageDead is emitted today (based on scope), and could only be hoisted (by some optimization) up past statements we can prove don't interact with the local (either trivially if no indirection, or by alias analysis otherwise).

eddyb on 29 Jan 2020

👍1

Regarding loops, I'm thinking it might be hard to tell apart variables declared outside the loop vs inside the loop body, if they are only accessed inside.

I think the problem is that domination might be flawed for reasoning about the entire lifecycle of a local when it might be live across a backedge.

I wonder if we can rely on the requirement of initialization, i.e. no part of a local can be accessed inside the body without being initialized before the first iteration (making cycles in the MIR CFG more structural than it would be for e.g. C).

So a local only accessed inside a loop body shouldn't actually be able to live across the backedge without relying on references, and in that case the placement of InvalidateBorrows (or StorageDead, today) can be relied upon.

We would just need to be careful to not consider CFG points inside a loop body to be able to dominate something after the loop, e.g. in:

loop {
    x = 0;
    if cond { break; }
}
f(x);

the block containing the statement x = 0; shouldn't dominate (for our purposes here) the one containing f(x);, as the former can "execute more times" than the former.

I think we can still use a dominator tree, we would just construct it differently around loops.
(What I want to avoid is some sort of dataflow pass)

I wonder if there is a formal notion of domination that takes cycles into account in the same way.

cc @sunfishcode (I remember there is a similar distinction for SSA, where sometimes you don't want values created inside a loop body to be used directly outside of the loop, but rather you want to pass them through BB args or similar)

eddyb on 3 Feb 2020

StorageDead could also be similar ("least (common) post-dominator"?).

This sounds suspicious to me. At minimum, you would I think want to take loop carried dependencies into account, no?

i.e.,

x = 0;
loop {
    x += 1;
    if x > 100 { break; }
}
println(...);

here the postdominator of all uses of x is going to be the use in x>100, but since x is still live around the loop, the "storage dead" surely has to go outside of loop.

The other concern of course is the usual one of not knowing when there might be unsafe code trying to access that memory, which means we can't get an accurate set of all accesses. (Still, I'm only mildly sympathetic on this point, it seems like we want to be able to assume that unsafe code doesn't go too crazy here repopulating values that are moved etc, since that seems unusual.)

nikomatsakis on 3 Feb 2020

At minimum, you would I think want to take loop carried dependencies into account, no?

Loops are my main concern as well, see https://github.com/rust-lang/rust/issues/68622#issuecomment-581270676 above.

The other concern of course is the usual one of not knowing when there might be unsafe code trying to access that memory, which means we can't get an accurate set of all accesses.

Even if we remove StorageDead(x), we'd have an InvalidateBorrows(x) in the same spot, so we would not emit the llvm.lifetime.end(x) call before the end of the lexical scope today.

I would also not hoist such a InvalidateBorrows(x) statement up past any statement directly mentioning x nor any statement which could perform indirect accesses.

(I have no intent of breaking any unsafe code which works today, only to make the placement a deterministic computation)

eddyb on 4 Feb 2020

@eddyb What is the deference between StorageDead and InvalidateBorrows? (It is also worth noting that NLL borrowck relies on StorageDead to emit errors for code that returns borrows to things owned by current stack frame.)

nikomatsakis on 10 Feb 2020

@nikomatsakis Storage{Live,Dead} form a "CFG range" (to the extent that ranges of graphs make sense), and using statements for them is a hack IMO.

OTOH, InvalidateBorrows can just be a plain statement, meaning there can be multiple of them for every local, and the StorageDead location would take all of them into account.

Although there's a subtle implication there that having a borrow that's not guaranteed to reach a InvalidateBorrows before a function returns, is UB in some sense.
(Or maybe what's UB is using the borrow in parts of the CFG reachable from InvalidateBorrows, even if the InvalidateBorrows didn't "execute". Which kind of still makes it a non-statement)
And in compile-time MIR, it would mean the local is leaked into 'static.

eddyb on 10 Feb 2020

I guess from the perspective of the borrow checker, we treat StorageDead as if it were a kind of "deallocate" statement. We don't really care about the "CFG range", which is more LLVM's problem. In other words, I think the borrow checker is compatible with your view point, though as you point out there is some sort of implication (independent from borrow checking, but required if you want to allocate overlapping stack ranges and the like) that the uses of a borrow are post-dominated by the set of StorageDead blocks or whatever.

nikomatsakis on 14 Feb 2020

👍1

Related to #61849

tmandry on 22 Mar 2020

So... as an minimal first step, we could remove just StorageLive and rename StorageDead to InvalidateBorrows.cc @simonvandel this is almost the same analysis as the one you wrote in #78928.

oli-obk on 14 Nov 2020

InvalidateBorrows sounds rather high-level, when this also invalidates raw pointers and really has little to do with the borrow checker. So I'd vote for something more like ReallocateStorage or so -- something that describes what this actually does.