In this program, the local variable _a is leaked:
struct Foo;
impl Drop for Foo {
fn drop(&mut self) { fail!(); }
}
fn main() {
let _a = box 1;
let _b = Foo;
}
Nominating, I feel uncertain about our story of failing destructors.
This was one of the motivations for the original algebraic effects systems (and the RFC I wrote). But that was a much bigger abstraction than merely fixing this scenario. I guess it was somewhat similar to C++'s nofail exception guarantee.
I'm a little shocked that this problem still exists.
P-high, not 1.0
So:
fn main() {
let b = Bar;
let f = Foo;
}
compiles down to:
define internal void @_ZN4main20hd8096a747d8d732eIaa4v0.0E() unnamed_addr #0 {
entry-block:
%b = alloca %"struct.Bar<[]>"
%f = alloca %"struct.Foo<[]>"
%0 = getelementptr inbounds %"struct.Bar<[]>"* %b, i32 0, i32 0
store i1 true, i1* %0
%1 = getelementptr inbounds %"struct.Foo<[]>"* %f, i32 0, i32 0
store i1 true, i1* %1
call void @_ZN3Foo14glue_drop.114317h8e9a26d4f7ff5149E(%"struct.Foo<[]>"* %f)
call void @_ZN3Bar14glue_drop.114517hb6d883694ad7fcd5E(%"struct.Bar<[]>"* %b)
ret void
}
To me it looks like at least all but the last one should be Invoke instructions with a landing pad.
So the desired behaviour if we fail in a destructor is to return and execute any other destructors and then continue to fail? If that sounds right, this should just be a matter of adding landing pads for the dtor calls as @jakub- suggests, right? I assume that if one of the other destructors fails, then we will go down the double-fail path automatically?
That's what I would expect at least! We don't support more than double-failure at the library level, so we can probably avoid any more nesting after the first landing pad.
It would be memory unsafe to run nested destructors, so there's not much else that can be done.
This is more complex than I thought. If we have n objects with cleanup, we need to introduce n landing pads, because any of the dtor calls might fail. For dtor i we need to do the remaining n-i cleanups, but any dtor calls don't have to be invokes. We are basically turning any scope into n scopes - one scope per object with cleanup. In the cleanup case we don't need to do this, since this first fail will be a double fail and crash - this is the n-i case above, but with i = 0, I guess.
This seems like a lot of extra code being generated :-(. I wonder if we should treat fail in a dtor like a double fail and just crash.
If we do want to try a bit harder, perhaps an implementation strategy is to have cleanup treat dtor calls (on normal exit of a scope) as belonging to their own scopes. I guess we could do some kind of transformation of the CleanupScope data structure.
One thing I've noticed is that our existing notion of landing pads in the compiler isn't quite what you want when you execute the drop glue. Right now a landing pad will never ever rejoin the original code path (they're completely divergent), but when invoking drop glue for local variables we want both the unwinding and the normal case to fall through to the next instruction. I think that this would inflate the debuginfo metadata, but I don't think that this would incur a runtime cost.
As in, I think that each local variable's destructor should be its own basic block, and then each basic block is terminated with an invoke where both edges go to the next destructor (and the final basic block at the end just has a return). The runtime would continue to hard abort on double failure, of course.
It would be memory unsafe to continue invoking the inner drop glue after failure. It is often assumed that the outer destructors succeeded. A lot of redesign would need to be done to cope with it and it _would_ have a performance hit. Splitting into basic blocks will also have a performance hit because LLVM can't optimize across them well.
@thestinger, this is the second time you have said this, but can you give a concrete example as to where it would be memory unsafe?
When a type does something like setting vec.cap = 0 to prevent destruction after an operation that could fail. There are numerous examples in the standard library. There's already a lot of memory unsafety from failure under the current implementation but this would bring more problems. It doesn't seem to be something that's considered in most of the unsafe code.
I'm not sure we're talking about the same thing - it sounds like @thestinger is talking about nested dtors, which we would continue to abandon, whilst @alexcrichton and I are talking about 'sibling' dtors - e.g., if you have two variables in a scope and the dtor for the first fails, then continue to execute the dtor for the second. I guess this _could_ be memory safe if the second has references to the first, but they can't be owning refs so I don't think the dtor should rely on them being consistent. Perhaps that is not strong enough though.
I did have the performance worry, but I also worry about code bloat.
nominating since if we abort on fail in dtors that will be a backwards incompatible change
I discussed this a bit on irc with @alexcrichton, @pcwalton, and @sfackler. To summarise, there are two options really - either we terminate the process if we fail in a dtor or we treat each variable as having its own scope for destructors, as outlined above by @alexcrichton and I. To expand on the latter a bit - each dtor call would invoke rather than call but the normal and landing pad branches the normal return executes the next dtor. The landing pad would set a flag and then continue with the next dtor. A double fail would be handled only by the runtime, not in the landing pads. After executing all the cleanups we would check the flag mentioned before to see if we need to resume or do a normal return.
The cost of the properly failing dtors would be a bunch more basic blocks (may or may not be a cost, depending on LLVM optimisation), a branch at the end of any function with cleanups and a stack slot for a flag on these functions, and updating that flag after a failed dtor call.
The terminate on fail method has no runtime cost and is much simpler to implement, however, failing in a dtor is common enough that it might cause problems.
I don't believe that aborting on failure in destructors is P-backcompat-lang since it causes havoc at the moment anyway, so we'd be within our rights to change it post 1.0.
It compiles and runs fine, so it's a backwards compatibility issue. Code that leaks resources still works fine in production (just look at Firefox and Chromium) and crashing instead would be a break of the contract with user code.
leaving at P-high, not 1.0.
We are choosing to leave open the option of adopting the "abort on fail! from within a dtor", because code that is faiing from within a dtor is breaking a linguistic contract. However, we are also unlikely to actually implement the "abort on fail! from within a dtor" option. We just are leaving that door open.
I think that we (Servo) are OK with leaking the resources for now, though in conversations with @jdm and others some of the interactions with potentially leaking pointers to SpiderMonkey, native graphics resources, channels, etc. are not obviously either safe or deadlock-free. If possible, it would be nice to leave the door open to having "abort on fail! from a dtor," even if it's just as a non-default compiler flag that Servo might have to compile its codebase with in the future.
Another issue with abort on fail in a dtor is that it is not free to implement - we would need some kind of flag somewhere to indicate that we're in a dtor and thus should abort.
We discussed this at the traige/overflow from weekly meeting today. The conclusion is that we are not sure what we want to do, but abort on panic a dtor seems pretty undesirable. It seems that the best fix is to carry on calling dtors after a failure in one. Perhaps things will change if we change any behaviour around dtors and unwinding. But, we're not going to do any of this soon, so we will live with the memory leak bug for 1.0.
Calling the inners destructors after failing would be a _very_ backwards incompatible change. At the moment, a common pattern in destructors is to manually prevent the inner destructors from running. However, there's an assumption that a panic (such as from a contained element) will not result in the inner destructors running.
For server-side apps, designed to live long, death on panic in dtor would more safe. Because if dtor survived after panic and not all code in dtor was executed, or chain of destructors was interrupted - not only resources leaks are possible, but whole code will be in fragile condition. So I hope you'll decide to kill destructors which can't leave this world kindly.
Can be trivially “fixed” in MIR, thus adding an A-mir.
It compiles and runs fine, so it's a backwards compatibility issue. Code that leaks resources still works fine in production (just look at Firefox and Chromium) and crashing instead would be a break of the contract with user code.
I think people are aware of this, but: It depends on what the resource is, right? If the resource is a mutex or equivalent then deadlocks or worse could definitely occur.
Also, keep in mind that some code in a destructor (Drop implementation`) might assume that if it is reached, then some previously-dropped item's destructor ran successfully--e.g. nulling a pointer. Thus, running a destructor after another destructor failed could result in all kinds of badness including potentially even UaF.
With MIR, a panic inside your destructor will not prevent other destructors in the scope from running anymore. Since MIR is the default and AST based translator is removed, this is technically fixed now.
Just want to add, for the record, that allowing failing destructors to leak other destructors can and _will_ cause nasty use-after-free bugs for scoped threads, since they rely on finalization code to join the child threads before the borrowed stack frame expires.
@NXTangl it was been agreed quite a while ago that relying on destructors to run for safety is not safe. There’s a reason mem::forget() is safe and safe mem::forget() is in turn a reason to not rely on destructors for safety.
This is also why destructor-based scoped threads were remove from std: https://github.com/rust-lang/rust/issues/24292. Crossbeam now uses closures to make sure to join threads after the closure returns.
Yeah, I know that...but, AFAIK, we rely on being able to use the destructors of stack objects for finalization, no? What happens if the closure throws an exception? Or am I completely misunderstanding how it works?
Destructors are run when unwinding from a panic.
Which is my point. This says that current behavior means that panicking in a destructor during a panic can cause those destructors to be leaked, thereby causing UB in the child threads.
Leaking isn't undefined behaviour; it may yield unexpected behaviour, which is (I believe) what you mean.
If Drop::drop panics one has to make sure the value is leaked. Ootherwise some fields might be double-dropped when the destructor is invoked a second time.
Is this behavior documented anywhere? I've only found this issue, no RFC, internals / mailing list post, no nothing. I've filled #60611 for tracking documenting this, so if someone has any sources about the current drop behavior it would be nice to post them there.
Most helpful comment
With MIR, a panic inside your destructor will not prevent other destructors in the scope from running anymore. Since MIR is the default and AST based translator is removed, this is technically fixed now.