Rust: Failing destructors leak resources

Created on 13 Jun 2014 · 35Comments · Source: rust-lang/rust

In this program, the local variable _a is leaked:

struct Foo;

impl Drop for Foo {
    fn drop(&mut self) { fail!(); }
}

fn main() {
    let _a = box 1;
    let _b = Foo;
}

A-codegen A-destructors A-mir E-needs-test P-low

Source

alexcrichton

Most helpful comment

With MIR, a panic inside your destructor will not prevent other destructors in the scope from running anymore. Since MIR is the default and AST based translator is removed, this is technically fixed now.

nagisa on 26 Aug 2016

🎉9

All 35 comments

Nominating, I feel uncertain about our story of failing destructors.

alexcrichton on 13 Jun 2014

This was one of the motivations for the original algebraic effects systems (and the RFC I wrote). But that was a much bigger abstraction than merely fixing this scenario. I guess it was somewhat similar to C++'s nofail exception guarantee.

thehydroimpulse on 13 Jun 2014

I'm a little shocked that this problem still exists.

brson on 17 Jun 2014

P-high, not 1.0

brson on 19 Jun 2014

So:

fn main() {
    let b = Bar;
    let f = Foo;
}

compiles down to:

define internal void @_ZN4main20hd8096a747d8d732eIaa4v0.0E() unnamed_addr #0 {
entry-block:
  %b = alloca %"struct.Bar<[]>"
  %f = alloca %"struct.Foo<[]>"
  %0 = getelementptr inbounds %"struct.Bar<[]>"* %b, i32 0, i32 0
  store i1 true, i1* %0
  %1 = getelementptr inbounds %"struct.Foo<[]>"* %f, i32 0, i32 0
  store i1 true, i1* %1
  call void @_ZN3Foo14glue_drop.114317h8e9a26d4f7ff5149E(%"struct.Foo<[]>"* %f)
  call void @_ZN3Bar14glue_drop.114517hb6d883694ad7fcd5E(%"struct.Bar<[]>"* %b)
  ret void
}

To me it looks like at least all but the last one should be Invoke instructions with a landing pad.

ghost on 29 Jun 2014

So the desired behaviour if we fail in a destructor is to return and execute any other destructors and then continue to fail? If that sounds right, this should just be a matter of adding landing pads for the dtor calls as @jakub- suggests, right? I assume that if one of the other destructors fails, then we will go down the double-fail path automatically?

nrc on 24 Sep 2014

That's what I would expect at least! We don't support more than double-failure at the library level, so we can probably avoid any more nesting after the first landing pad.

alexcrichton on 24 Sep 2014

It would be memory unsafe to run nested destructors, so there's not much else that can be done.

thestinger on 24 Sep 2014

This is more complex than I thought. If we have n objects with cleanup, we need to introduce n landing pads, because any of the dtor calls might fail. For dtor i we need to do the remaining n-i cleanups, but any dtor calls don't have to be invokes. We are basically turning any scope into n scopes - one scope per object with cleanup. In the cleanup case we don't need to do this, since this first fail will be a double fail and crash - this is the n-i case above, but with i = 0, I guess.

This seems like a lot of extra code being generated :-(. I wonder if we should treat fail in a dtor like a double fail and just crash.

If we do want to try a bit harder, perhaps an implementation strategy is to have cleanup treat dtor calls (on normal exit of a scope) as belonging to their own scopes. I guess we could do some kind of transformation of the CleanupScope data structure.

nrc on 8 Oct 2014

One thing I've noticed is that our existing notion of landing pads in the compiler isn't quite what you want when you execute the drop glue. Right now a landing pad will never ever rejoin the original code path (they're completely divergent), but when invoking drop glue for local variables we want both the unwinding and the normal case to fall through to the next instruction. I think that this would inflate the debuginfo metadata, but I don't think that this would incur a runtime cost.

As in, I think that each local variable's destructor should be its own basic block, and then each basic block is terminated with an invoke where both edges go to the next destructor (and the final basic block at the end just has a return). The runtime would continue to hard abort on double failure, of course.

alexcrichton on 8 Oct 2014

It would be memory unsafe to continue invoking the inner drop glue after failure. It is often assumed that the outer destructors succeeded. A lot of redesign would need to be done to cope with it and it _would_ have a performance hit. Splitting into basic blocks will also have a performance hit because LLVM can't optimize across them well.

thestinger on 8 Oct 2014

@thestinger, this is the second time you have said this, but can you give a concrete example as to where it would be memory unsafe?

alexcrichton on 8 Oct 2014

When a type does something like setting vec.cap = 0 to prevent destruction after an operation that could fail. There are numerous examples in the standard library. There's already a lot of memory unsafety from failure under the current implementation but this would bring more problems. It doesn't seem to be something that's considered in most of the unsafe code.

thestinger on 8 Oct 2014

I'm not sure we're talking about the same thing - it sounds like @thestinger is talking about nested dtors, which we would continue to abandon, whilst @alexcrichton and I are talking about 'sibling' dtors - e.g., if you have two variables in a scope and the dtor for the first fails, then continue to execute the dtor for the second. I guess this _could_ be memory safe if the second has references to the first, but they can't be owning refs so I don't think the dtor should rely on them being consistent. Perhaps that is not strong enough though.

I did have the performance worry, but I also worry about code bloat.

nrc on 8 Oct 2014

nominating since if we abort on fail in dtors that will be a backwards incompatible change

nrc on 9 Oct 2014

I discussed this a bit on irc with @alexcrichton, @pcwalton, and @sfackler. To summarise, there are two options really - either we terminate the process if we fail in a dtor or we treat each variable as having its own scope for destructors, as outlined above by @alexcrichton and I. To expand on the latter a bit - each dtor call would invoke rather than call but the normal and landing pad branches the normal return executes the next dtor. The landing pad would set a flag and then continue with the next dtor. A double fail would be handled only by the runtime, not in the landing pads. After executing all the cleanups we would check the flag mentioned before to see if we need to resume or do a normal return.

The cost of the properly failing dtors would be a bunch more basic blocks (may or may not be a cost, depending on LLVM optimisation), a branch at the end of any function with cleanups and a stack slot for a flag on these functions, and updating that flag after a failed dtor call.

The terminate on fail method has no runtime cost and is much simpler to implement, however, failing in a dtor is common enough that it might cause problems.

8861 should prevent dtors seeing other objects in an inconsistent state, even after a failure.

nrc on 9 Oct 2014

I don't believe that aborting on failure in destructors is P-backcompat-lang since it causes havoc at the moment anyway, so we'd be within our rights to change it post 1.0.

pcwalton on 9 Oct 2014

It compiles and runs fine, so it's a backwards compatibility issue. Code that leaks resources still works fine in production (just look at Firefox and Chromium) and crashing instead would be a break of the contract with user code.

thestinger on 9 Oct 2014

leaving at P-high, not 1.0.

We are choosing to leave open the option of adopting the "abort on fail! from within a dtor", because code that is faiing from within a dtor is breaking a linguistic contract. However, we are also unlikely to actually implement the "abort on fail! from within a dtor" option. We just are leaving that door open.

pnkfelix on 9 Oct 2014

I think that we (Servo) are OK with leaking the resources for now, though in conversations with @jdm and others some of the interactions with potentially leaking pointers to SpiderMonkey, native graphics resources, channels, etc. are not obviously either safe or deadlock-free. If possible, it would be nice to leave the door open to having "abort on fail! from a dtor," even if it's just as a non-default compiler flag that Servo might have to compile its codebase with in the future.

larsbergstrom on 21 Oct 2014

Another issue with abort on fail in a dtor is that it is not free to implement - we would need some kind of flag somewhere to indicate that we're in a dtor and thus should abort.

nrc on 31 Oct 2014

We discussed this at the traige/overflow from weekly meeting today. The conclusion is that we are not sure what we want to do, but abort on panic a dtor seems pretty undesirable. It seems that the best fix is to carry on calling dtors after a failure in one. Perhaps things will change if we change any behaviour around dtors and unwinding. But, we're not going to do any of this soon, so we will live with the memory leak bug for 1.0.

nrc on 31 Oct 2014

Calling the inners destructors after failing would be a _very_ backwards incompatible change. At the moment, a common pattern in destructors is to manually prevent the inner destructors from running. However, there's an assumption that a panic (such as from a contained element) will not result in the inner destructors running.

thestinger on 31 Oct 2014

For server-side apps, designed to live long, death on panic in dtor would more safe. Because if dtor survived after panic and not all code in dtor was executed, or chain of destructors was interrupted - not only resources leaks are possible, but whole code will be in fragile condition. So I hope you'll decide to kill destructors which can't leave this world kindly.

e-oz on 22 Feb 2015

Can be trivially “fixed” in MIR, thus adding an A-mir.

nagisa on 26 Jan 2016

It compiles and runs fine, so it's a backwards compatibility issue. Code that leaks resources still works fine in production (just look at Firefox and Chromium) and crashing instead would be a break of the contract with user code.

I think people are aware of this, but: It depends on what the resource is, right? If the resource is a mutex or equivalent then deadlocks or worse could definitely occur.

Also, keep in mind that some code in a destructor (Drop implementation`) might assume that if it is reached, then some previously-dropped item's destructor ran successfully--e.g. nulling a pointer. Thus, running a destructor after another destructor failed could result in all kinds of badness including potentially even UaF.

briansmith on 26 Aug 2016

nagisa on 26 Aug 2016

🎉9

Just want to add, for the record, that allowing failing destructors to leak other destructors can and _will_ cause nasty use-after-free bugs for scoped threads, since they rely on finalization code to join the child threads before the borrowed stack frame expires.

NXTangl on 10 Oct 2016

@NXTangl it was been agreed quite a while ago that relying on destructors to run for safety is not safe. There’s a reason mem::forget() is safe and safe mem::forget() is in turn a reason to not rely on destructors for safety.

nagisa on 10 Oct 2016

This is also why destructor-based scoped threads were remove from std: https://github.com/rust-lang/rust/issues/24292. Crossbeam now uses closures to make sure to join threads after the closure returns.

SimonSapin on 10 Oct 2016

Yeah, I know that...but, AFAIK, we rely on being able to use the destructors of stack objects for finalization, no? What happens if the closure throws an exception? Or am I completely misunderstanding how it works?

NXTangl on 29 Oct 2016

Destructors are run when unwinding from a panic.

sfackler on 29 Oct 2016

Which is my point. This says that current behavior means that panicking in a destructor during a panic can cause those destructors to be leaked, thereby causing UB in the child threads.

NXTangl on 29 Oct 2016

Leaking isn't undefined behaviour; it may yield unexpected behaviour, which is (I believe) what you mean.

jdm on 29 Oct 2016

If Drop::drop panics one has to make sure the value is leaked. Ootherwise some fields might be double-dropped when the destructor is invoked a second time.

Is this behavior documented anywhere? I've only found this issue, no RFC, internals / mailing list post, no nothing. I've filled #60611 for tracking documenting this, so if someone has any sources about the current drop behavior it would be nice to post them there.