Rust: Box<dyn FnOnce> doesn't respect self alignment

Created on 17 Jan 2020  路  53Comments  路  Source: rust-lang/rust

The following code fails on stable:
When compiled in release mode, the "optimized out" assert is optimized out by LLVM even though it checks the exact same condition as the other assert, as can be verified by testing in debug mode.

Note that it's theoretically possible for the stack to be aligned correctly such that the bug is suppressed, but that's not likely. The 256's can be replaced with larger alignments if that happens.

The bug is caused by the impl of Box<dyn FnOnce> copying self to the stack with the default alignment of 16.

#![allow(dead_code)]
#[repr(align(256))]
struct A {
    v: u8
}

impl A {
    fn f(&self) -> *const A {
        assert_eq!(self as *const A as usize % 256, 0, "optimized out");
        self
    }
}

fn f2(v: u8) -> Box<dyn FnOnce() -> *const A> {
    let a = A { v };
    Box::new(move || {
        a.f()
    })
}

#[test]
fn test() {
    let addr = f2(0)();
    assert_eq!(addr as usize % 256, 0, "addr: {:?}", addr);
}

(Playground)

A-cranelift A-dst C-bug F-unsized_locals I-unsound 馃挜 P-high T-compiler

Most helpful comment

I just checked and, a function call r = f(a, b, c) turns into roughly this (touched up for clarity):

_1 = move a;
_2 = move b;
_3 = move c;
_4 = f(move _1, move _2, move _3);
r = move _4; 

Some of my previous comments assumed the r = move _4 wasn't there and the call would use the destination directly, but given that there is a move, that simplifies my suggestion.


Summary of my proposal

For virtual by-value self calls, we can skip one move when building the MIR, specifically the move out of the dereference of a Box<dyn Trait> for the self argument.
Given that we already move the result of the call, there should be no aliasing caused by this, but if we decide to change anything about how we build MIR calls, we'd need to be extra careful.

We can also make this pattern be gated by a separate feature-gate (rather than unsized_locals), and make sure the standard library only uses that feature-gate and not unsized_locals.

This should solve any alignment issues Box<dyn FnOnce(...) -> _> might have, because we'd not be using dynamic alloca unless the broader unsized_locals feature is used.

All 53 comments

61042 and #48055 seem related

This is a soundness hole.

So... I guess the problem is that the alloca that rustc emits for the unsized local doesn't use align_of_val? That's... kind of bad, because it doesn't look like alloca even supports dynamically setting the alignment: https://llvm.org/docs/LangRef.html#alloca-instruction

If this is correct, the issue should be renamed to something like "unsized locals do not respect alignment".

A way to fix this would be to change the ABI for dynamic trait calls that take self by value to always pass self by address. When the trait object is created is where the trampoline function would be generated, so there is no additional cost for static calls.

This can also be fixed by not allowing dynamically-sized local variables with an alignment more than some maximum value and having all dynamically-sized locals use that maximum alignment unless optimizations can prove that a smaller alignment suffices.

A way to fix this would be to change the ABI for dynamic trait calls that take self by value to always pass self by address. When the trait object is created is where the trampoline function would be generated, so there is no additional cost for static calls.

Messing with the ABI wouldn't help with unsized locals in general (assuming @RalfJung is correct), since e.g. let f: dyn FnOnce() = ...; will also need an alloca with dynamic alignment in general.

This can also be fixed by not allowing dynamically-sized local variables with an alignment more than some maximum value and having all dynamically-sized locals use that maximum alignment unless optimizations can prove that a smaller alignment suffices.

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

Can we call alloca(align + size) and then align the pointer for the local ourselves as a workaround until we come up with a good solution?

In clang there's a function __builtin_alloca_with_align() which might or might not be useful in this case.

@crlf0710

In clang there's a function __builtin_alloca_with_align() which might or might not be useful in this case.

__builtin_alloca_with_align requires a constant alignment. Non constant alignments are rejected by Clang. This matches semantics of alloca LLVM instruction, so it's not helping, as alignment of value in dyn FnOnce is unknown at compile time when moving from Box<dyn FnOnce>.

int main(void) {
    int alignment = 16;
    __builtin_alloca_with_align(16, alignment);
}

I think that overallocating for unsized locals of unknown alignment may make sense. You can alloca(sizeof + alignof - 1) and then make use of <*>::align_offset to compute the offset.

Sure, it increases stack requirements, but it seems acceptable for Box<dyn FnOnce>, and in most cases you end up with extra 7 bytes on stack or less. 256 alignment seems like an edge case where wasting 255 bytes doesn't matter.

if we keep setting 16 as the default alignment we can skip the overallocating and offset dance for types with an alignment below that.

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

My idea was that rustc would prohibit coercing overaligned types to a dyn trait that takes self by value. So:

#[repr(align(1048576))] // 1MiB
struct Overaligned(u8);

#[repr(align(16))]
struct Normal(u8);

trait TakeSelfByValue {
    fn f(self) {
        todo!()
    }
}

impl TakeSelfByValue for Overaligned {}
impl TakeSelfByValue for Normal {}

fn make_overaligned() -> Box<dyn TakeSelfByValue> {
    Box::new(Overaligned(0)) // errors due to too strict alignment
}

fn make_normal() -> Box<dyn TakeSelfByValue> {
    Box::new(Normal(0)) // doesn't error -- alignment is under limit
}

if we keep setting 16 as the default alignment we can skip the overallocating and offset dance for types with an alignment below that.

The alignment is dynamically determined though, not sure if saving 15 bytes of stack space is worth the conditional branch?

Oh right. Yea that's probably not helpful.

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

My idea was that rustc would prohibit coercing overaligned types to a dyn trait that takes self by value. So:

Since Fn : FnMut : FnOnce, this would be backwards incompatible (at latest) once trait object upcasting is finally implemented: dyn Fn and dyn FnMut trait objects would have to have the same alignment restriction (because you can upcast to dyn FnOnce), but today you can freely create such objects with any alignment.

I was under the impression that calling Box<dyn FnOnce()> didn't require dynamic alloca, is that related to missing MIR optimizations, meaning we have unnecessary moves which end up duplicating the data on the stack using a dynamic alloca?

Correctness shouldn't rely on optimizations to fire...

@RalfJung Sure, I was just hoping we weren't relying on "unsized locals", only unsized parameters (i.e. self), for what we're exposing as stable. And I was hoping an optimization wouldn't need to be involved to get there.

I wonder if it would even be possible to restrict "unsized locals" to the subset where we can handle it entirely through ABI, and not actually have dynamic allocas anywhere, since that would make me far more comfortable with Box<dyn FnOnce()>'s implementation or even stabilization.

Ironically, if that works, it's also the subset we could've implemented years earlier than we did, since most of the complexity, unknowns (and apparently bugs) are around dynamic allocas, not calling trait object methods on a dereference of Box.

I wonder if it would even be possible to restrict "unsized locals" to the subset where we can handle it entirely through ABI, and not actually have dynamic allocas anywhere

That should be possible. I think everything that is necessary is performing the deref of the box as argument (call FnOnce::call_once(move *_1, _2)) instead of moving the box contents to a local first (_3 = move _1; call FnOnce::call_once(move _3, _2)) That would also make it possible to call Box<FnOnce> in cg_clif. Cranelift doesn't have alloca support, so the current MIR is untranslatable.

@bjorn3 Hmm, but that's not trivially sound because it could lead to MIR calls where argument and return places overlap, which is UB.

We'd potentially have to special-case it to Box derefs, but even then what's to stop someone from doing foo = (*foo.boxed_closure)();?

Maybe we could introduce a temporary for the return value instead, for these kinds of virtual calls? That should remove any potential overlap while allowing the call.

(Unrelatedly, I just realize we use a copying-out shim for vtable entries that take self by value, even when Self would be passed as *Self at the call ABI level. We would probably have to make computing the call ABI a query for this microopt to not be expensive)

We'd potentially have to special-case it to Box derefs, but even then what's to stop someone from doing foo = (*foo.boxed_closure)();?

That would result in foo.boxed_closure: Box<FnOnce> being moved to a temp first before dropping foo and calling the boxed closure. If it wasn't moved first before the drop of foo, then borrowck would give an error.

(I meant for this change to be done at mir construction level, not as an optimization pass, so borrowck should catch any soundness issues)

If it wasn't moved first before the drop of foo, then borrowck would give an error.

We can try it, I guess, but right now a lot relies on many redundant copies/moves being created by MIR building and may not catch edge cases like this.

But also I don't think moving out of a Box moves the Box itself today.

(I meant for this change to be done at mir construction level, not as an optimization pass, so borrowck should catch any soundness issues)

Me too, but I was going with the "correct by construction" approach of creating a redundant copy/move, just on the return side, instead of the argument side, of the call.

After revisiting on triage, removing nomination.

pre-triage: nominated for discussion in the next weekly meeting.

I just checked and, a function call r = f(a, b, c) turns into roughly this (touched up for clarity):

_1 = move a;
_2 = move b;
_3 = move c;
_4 = f(move _1, move _2, move _3);
r = move _4; 

Some of my previous comments assumed the r = move _4 wasn't there and the call would use the destination directly, but given that there is a move, that simplifies my suggestion.


Summary of my proposal

For virtual by-value self calls, we can skip one move when building the MIR, specifically the move out of the dereference of a Box<dyn Trait> for the self argument.
Given that we already move the result of the call, there should be no aliasing caused by this, but if we decide to change anything about how we build MIR calls, we'd need to be extra careful.

We can also make this pattern be gated by a separate feature-gate (rather than unsized_locals), and make sure the standard library only uses that feature-gate and not unsized_locals.

This should solve any alignment issues Box<dyn FnOnce(...) -> _> might have, because we'd not be using dynamic alloca unless the broader unsized_locals feature is used.

OK, if I understand what's being proposed here, it is to excerpt a subset of the "original unsized-locals plan" -- specifically, to permit locals with unsized types, but only in contexts where we can be guaranteed that we can point to the original storage and don't actually have to move any value (and hence we don't require variable size stack frames).

I'm in favor of this proposal, but I do think we want to be slightly careful to ensure we're not violating soundness. The main concern would be if there is some way to access or re-use the variable that was moved. I attempted to do so with this example (playground) but it fails to compile:

#![feature(unsized_locals)]

trait Foo {
    fn you_move_me(self, g: &dyn FnMut());
}

impl Foo for () {
    fn you_move_me(self, g: &dyn FnMut()) { }
}

fn call_me(mut f: Box<dyn Foo>) {
    f.you_move_me(&|| {
        f = Box::new(());
    })
}

fn main() { }

The error here arises because the closure would need to capture a &mut f to do anything, which prevents a later move. But maybe there is some kind of tricky example that can arise where we re-assign the value during argument evaluation?

#![feature(unsized_locals)]

trait Foo {
    fn you_move_me(self, g: ());
}

impl Foo for () {
    fn you_move_me(self, g: ()) { }
}

fn call_me(mut f: Box<dyn Foo>) {
    f.you_move_me({
        f = Box::new(());
    })
}

fn main() { }

compiles and does _2 = move (*_1); before _1 = move _4;, where _4 is the result of Box::new(()).

I'm in favor of this proposal, but I do think we want to be slightly careful to ensure we're not violating soundness. The main concern would be if there is some way to access or re-use the variable that was moved.

Borrowck should prevent that, right?

borrowck may rely on MIR building creating extraneous copies, so we can't just assume it helps us.

IMO we should limit it to functions taking unsized parameters and function calls passing such parameters or dereferences of Boxes (any other pointer/reference doesn't support moving out of).

Well, I think @bjorn3 is right that borrowck would catch violations of soundness in some strict sense, but I guess what I meant is that we might be breaking backwards compatibility.

The example that @bjorn3 gave here feels pretty relevant. If I understood your proposal, @eddyb, it would no longer compile.

If I recall (thinking back), I had a plan to handle this by

  • save the Box<F> to an alternate location
  • in the actual call, we pass ownership of *x and then do a shallow free of the Box

This still seems like it would work, though I'm not sure how easy/elegant it is to formulate.

In a sense we'd be rewriting x.you_move_me(...) to {tmp = x; tmp.you_move_me(...)}.

Isn't @bjorn3's example dependent on call_me doing a direct virtual call?

My understanding is that you can't do this on stable with Box<dyn FnOnce()> because the virtual call exists in one place, in libstd, and it doesn't do anything weird, so it would work with a restricted ruleset.

EDIT: if it can work on stable, it's because calling impl<A> FnOnce<A> for Box<dyn FnOnce<A>>'s call_once will move the whole Box<dyn FnOnce<A>> before the assignment is evaluated, which is fine, because the shallow drop is actually inside that call_once, not its caller.

@eddyb I agree that this code doesn't compile on stable today, but I'm not sure if that's the point. I guess you're saying it's ok for it to stop compiling?

I think what I would like is that our overall behavior is as consistent as possible between Sized and Unsized parameters.

I think we could actually do a more general change that doesn't special case Sized vs Unsized or anything else. I think I am basically proposing the same thing as what @bjorn3 described earlier, but with a twist:

Today, when building the parameters, we introduce a temporary for each parameter. So if we had foo(*x.y, *z) we might do:

let tmp0: dyn Trait = *x.y;
let tmp1: usize = *z;
f(move tmp0, copy tmp1)

What I am saying is that we could say: If the argument is moved (not copied), and the "place" has outer derefs (e.g., we are passing *P where P is some place), then we assign the place P to a temporary and we make the argument to the function be *tmp:

let tmp0: Box<dyn Trait> = x.y; // this argument is moved, so move the `*` to the call site
let tmp1: usize = *z; // no change here, this is copied
f(move *tmp0, copy tmp1)

I don't believe this is observable to end-users, except potentially for when the destructor on Box runs. I believe that it accepts all the same code we used to accept. It also has nothing specific to Sized or Unsized types at all.

Does that make sense? I'm very curious to know if anyone sees any problems with it.

(One obvious thing: if we ever add DerefMove, its design would have to accommodate this requirement, but I think that's fine, in fact the complications around getting moves right for precisely this case are one of the reasons we've held off on DerefMove thus far.)

Can you not reinitialize *(x.y) today? If you move the whole x.y, it is indeed more conservative (and should let us skip unnecessary moves even in Sized cases).

But also, do you want to move the Sized locals check from typeck to borrowck?
I don't have an intuition for when your approach wouldn't create unsized MIR locals, from the perspective of typeck. However, this is only relevant if we want a stricter feature-gate.

At the very least it seems like it would be easier to implement a change only to MIR building, and it should fix problems with Box<dyn FnOnce()> for the time being, even without a separate feature-gate.

Can you not reinitialize *(x.y) today?

Hmm, that's a good point, I think you can do that. Sigh. I knew that seemed too easy. I remember we used to try and prevent that, but we never did it fully, and in the move to NLL we loosened those rules back up, so this compiles:

fn foo() {
    let mut x = Box::new(vec![22]);
    let y = take(*x);
    *x = vec![44];
}

fn take<T>(x: T) -> T { x }

Still, you could do my proposal for unsized types, though it would introduce an incongruity.

UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

@eddyb

At the very least it seems like it would be easier to implement a change only to MIR building, and it should fix problems with Box<dyn FnOnce()> for the time being, even without a separate feature-gate.

To be clear, I wasn't proposing changing anything apart from MIR building either, I don't think.

IIUC, what you proposed is that foo(*x, ...) would be compiled to:

let tmp1 = ..;
..
let tmpN = ..;
foo(*x, tmp1 .. tmpN)

whereas under my proposal (limited to apply to unsized parameters, instead of moved parameters) would yield:

let tmp0 = x;
...
let tmpN = ...;
foo(*tmp0, tmp1..tmpN)

I think I mis-stated the problem earlier. It's not so much that your proposal will introduce compilation errors, it's that I think it changes the semantics of code like this in what I see as a surprising way:

x.foo({ x = Box::new(...); ... })

I believe that this code would compile to the following under your proposal:

// evaluate argument1 into tmp1:
x = Box::new(...);
tmp1 = ...;

// invoke:
foo(*x, tmp1)

which means that foo would be invoked with the new value of x, right?

Under my proposal, the code would still compile, but it would behave in what I see as the expected fashion. Specifically, it would generate:

tmp0 = x;

x = Box::new(...);
tmp1 = ...;

foo(*tmp0, tmp1)

which means that foo would be invoked with the new value of x, right?

Isn't there a way to "lock" that value through the borrow-checker, so further accesses would be disallowed, until the call returns? I'd rather be stricter here.


UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

My point was that if your proposal affects sized moves, you're breaking stable reinitialization.
If it doesn't affect sized moves, I don't see how this applies anymore:

I think what I would like is that our overall behavior is as consistent as possible between Sized and Unsized parameters.

UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

Let me just chime in here and say that this is a pain in Miri. Unsized locals are pretty different from sized locals, I had to introduce a bunch of special hacks to make them work. MIR locals even needed a new state just for them:

https://github.com/rust-lang/rust/blob/6050e523bae6de61de4e060facc43dc512adaccd/src/librustc_mir/interpret/eval_context.rs#L123-L127

Lazily allocating locals seems odd to me, but at least we can consistently do that for all locals. Elsewhere however we needed some special treatment to make this lazy initialization actually work out and catch mistakes where an unsized local is "overwritten" with a value of different size:

https://github.com/rust-lang/rust/blob/6050e523bae6de61de4e060facc43dc512adaccd/src/librustc_mir/interpret/place.rs#L891-L901

This all feels really unprincipled, and it would be great if at some point we could come up with a coherent semantics for unsized locals in MIR.
The worst part is that the size of a local is implicitly determined on the first write to it. It would be much better to be explicit about that, e.g. at StorageLive time.

@eddyb

Isn't there a way to "lock" that value through the borrow-checker, so further accesses would be disallowed, until the call returns? I'd rather be stricter here.

I don't think we have anything like this today, but presumably we could add such a thing, but I'm not quite sure what you have in mind. When does this prohibition on further accesses take effect?

In particular, the IR for foo(*x, ...) under your proposal, if I understood, would be

... // evaluate remaining arguments
foo(*x, ...) // invoke `foo` with `*x` directly, no temporary

But the access to x that (I believe) we wish to prevent takes place in the "evaluate remaining arguments" part of things. So I think we'd have to add some sort of "pseudo-move" that tells borrowck that x should not be assignable from that point forward, right?

My point was that if your proposal affects sized moves, you're breaking stable reinitialization. If it doesn't affect sized moves, I don't see how this applies anymore: [unsized should be congruent with sized]

Well, it depends on your perspective. It's true that, given the need to permit reinitialization, the check is now specific to unsized values -- i.e., MIR construction will be different depending on whether we know the type to be sized or not. That's unfortunate.

However, I think we still maintain a degree of congruence, from an end-user perspective. In particular, users could not have reinitialized a Box<T: ?Sized> anyhow.

However, under my proposal, users can write things like

x.foo({x = ...; ...})

and the code works as expected (versus either having an unexpected result or getting an error).

In any case, it seems like we're sort of "narrowing in" on a proposal here:

  • when we see a function call with a parameter that is ?Sized, and the value being passed is a deref, we want to modify the IR to not introduce a temporary and instead pass that parameter directly from its location,
  • either we do some sort of "no more assignments" lock,
  • or we move the pointer and introduce a temporary (and then pass in a deref of that temporary)

Does that sound right so far?

If it's easier to do the more flexible approach (by moving the pointer) than the "locking" approach (which is a bit like borrowing the pointer until all arguments are evaluated, then releasing the borrow and doing a move instead), I don't mind it, I was only arguing with the "uniformity" aspect.

However, under my proposal, users can write things like
```rust
x.foo({x = ...; ...})
````
and the code works as expected (versus either having an unexpected result or getting an error).

Something important here: your approach is stabilizable, while mine is meant more for that one impl<A> FnOnce<A> for Box<dyn FnOnce<A>> in the standard library, and not much else.

Yes, I was kind of shooting for something that we could conceivably stabilize.

Removing nomination, this was already discussed in triage meeting and it's under control.

So should we maybe try out the approach I proposed and see how well it works?

@nikomatsakis yes, going to try that out.

For the record, the "new plan" is this.

Change to MIR construction

When generating the arguments to a call, we ordinarily will make a temporary tmp containing the value of each argument:

tmp0 = <arg expr>
...
tmpN = <arg expr>
call(tmp0, .., tmpN)

But we will modify this such that, for arguments that meet the following conditions:

  • The argument is moved (not copied),
  • and its type is not known to be Sized,
  • and the "place" has outer derefs (e.g., we are passing *P where P is some place),

then we assign the place P (instead of *P) to a temporary tmp and we make the argument to the function be *tmp (instead of tmp).

This means that given x(...) where x: Box<dyn FnOnce(...)>, we used to generate

let tmp0 = *x; // tmp0: dyn FnOnce(), requires an alloca
...
FnOnce::call(tmp0, ...)

but now we generate

let tmp0 = x; // tmp0: Box<dyn FnOnce()>, no alloca
...
FnOnce::call(*tmp0, ...)

Rationale

We are now moving the box itself, which is slightly different than the semantics for values known to be sized. In particular, it does not permit "re-initialization" of the box after having moved its contents. But that is not supported for unsized types anyway.

The change does permit modifications to the callee x during argument evaluation and they will work in an analogous way to how they would work with sized values. So given something like:

x.foo({ x = Box::new(...); ... })

we will find that we (a) save the old value of x to a temporary and then (b) overwrite x while evaluating the arguments and then (c) invoke the FnOnce with the old value of x.

FnOnce::call(*tmp0, ...)

This needs MIR datastructure changes, right? It is not currently representable I think as fn arguments are Operand. (Cc https://github.com/rust-lang/rust/issues/71117#issuecomment-613459418 where @eddyb argues that these "operands" should already have a weird semantics in case of move.)

move *tmp0 is a perfectly normal operand, I don't know what you mean.
It's the same kind of Operand as we currently have in @nikomatsakis' let tmp0 = *x; (on the RHS)

Oh sorry, somehow I thought operands could only refer directly to locals, not also dereference them.

I think I am beginning to see the pieces here -- together with @eddyb-style Operand::Move function argument handling, this entirely avoids having to dynamically determine the size of a local. I have to agree that is quite elegant, even though baking "pass-by-reference" into MIR semantics still seems like a heavy hammer to me.

On the other hand this places a severe restriction on now unsized locals may be used, basically taking back large parts of https://github.com/rust-lang/rfcs/pull/1909. Is there some long-term plan to bring that back (probably requires alloca with dynamic alignment in LLVM)? Or should we amend the RFC or at least the tracking issue with the extra restrictions?

I would keep the more general feature but use two different feature-gates (if we can).

And just like with a few other feature-gates (#![feature(const_generics)]) we could warn that #![feature(unsized_locals)] (the general one) is incomplete and could lead to misaligned variables (or something along those lines).

We could also add runtime asserts that the (dynamic) alignment is respected, once we know Box<dyn FnOnce()> could never trigger them (e.g. from stable code).

Is there some long-term plan to bring that back (probably requires alloca with dynamic alignment in LLVM)?

I was wondering about that too. It seems like it'd be good to review the more general feature in any case.

This is partly why I brought up the idea of wanting a guaranteed "no alloca" path -- this was something I remember us discussing long ago, and so in some sense I feel like the work we're doing here isn't just working around LLVM's lack of dynamic alignment support (though it is doing that, too) but also working to complete the feature as originally envisioned.

I added two notes to the tracking issue #48055 so we don't overlook this in the future.

This is probably expected, but the fix failed to actually make unsized_locals sound: https://github.com/rust-lang/rust/issues/71416.

Was this page helpful?
0 / 5 - 0 ratings