Rust: Default behavior of unwinding in FFI functions

Created on 28 Feb 2019  ·  133Comments  ·  Source: rust-lang/rust

This is the tracking issue for the behavior of unwinding through FFI functions.

There are two choices here: we can abort if unwinding occurs through an extern "C" boundary. We abort on beta 1.34 and nightly 1.35, but will permit unwinding in stable 1.33.

We previously attempted this change in 1.24 and reverted in 1.24.1. We attempted to do so again in 1.33, but reverted once again pending lang team discussion on the topic.

There has been discussion on this topic in #52652, #58760, and #55982.

The stable behavior of permitting unwinding is UB, and can be triggered in safe code (https://github.com/rust-lang/rust/issues/52652#issuecomment-408123653). Notably, mozjpeg depends on this behavior and seems to have no good stable alternatives; there's been some discussion on internals.

There is an RFC discussing this: https://github.com/rust-lang/rfcs/pull/2699.

C-tracking-issue T-lang

Most helpful comment

I wanted to apologize. I was called out by using the unconstructive word malice here. In that sentence, by us, I meant rustc, the tool, as in, it would be "evil" for rustc to continue emitting nounwind if we aren't going to land a fix for this soon.

@joshtriplett I agree with you about just trying to remove nounwind. There is precedence about doing this, e.g., when LLVM was miscompiling noalias, we stopped emitting it, for a long time.

In retrospect, we have had a mis-optimization that impacts all Rust users and a PR to fix it for over a year. During this whole time, we always had the chance to just stop emitting nounwind and protect all Rust users. Instead, we didn't do that. First, because the fix was landing "real soon", so why waste time on a workaround. Second, because we didn't wanted to land something that would justify to continue to exploit the undefined behavior. For some reason, for some of us, banning the exploit of UB here was more important than protecting all Rust users. Both were mistakes, and I hope that we can learn from them and, in the future, try to land the quickest fix possible that protects all users first, and only start worrying about proper solutions once that is done. You never know when a simple fix is going to take 1 day or 1 year to land.

All 133 comments

It would be helpful for the discussion if someone knowledgeable could write a summary covering the following:

  • What are the issues with allowing unwinding through foreign languages on major platforms? What is the interaction with C++ exceptions? (Saying it's "UB" doesn't cut it)
  • What is the interaction between setjmp/longjmp and unwinding (in general, and on major platforms)?
  • Are there any concrete plans for writing a custom unwinder/"tweaking the unwinder"? Has anyone tried this/is anyone working on this? (The answer to this question may be used to determine how Rust restricts itself by locking into the standard unwinder on major platforms.)

When I say major platforms, I mean GNU libunwind, Windows SEH, possibly others.


Is it possible to have different default behavior depending on which unwinder you're using? Say, unwind normally on major platforms, abort on others?

  • What are the issues with allowing unwinding through foreign languages on major platforms? What is the interaction with C++ exceptions? (Saying it's "UB" doesn't cut it)

In another thread, @alexcrichton wrote:

From a technical perspective this is pretty feasible, but from a stabilization perspective is historically something we've never wanted to provide. We want the technical freedom to tweak unwinding as we see fit, which means it's not guaranteed to match what C++ does across every single platform.

As for the current implementation, my understanding is: on Unix it works; on Windows it mostly works, with some issues that could be solved. See my comment in that thread for more details.

  • What is the interaction between setjmp/longjmp and unwinding (in general, and on major platforms)?

On Unix, longjmp just resets the stack pointer and ignores unwinding.

On Windows, longjmp triggers SEH unwinding and so will run Rust destructors, AFAIK. *

* (I said in other threads that it didn't, because I misread the description of this PR and thought that it changed things so destructors wouldn't run when unwinding via longjmp; in reality, it only did that to the abort-on-unwind handler itself.)

On Windows, longjmp triggers SEH unwinding and so will run Rust destructors

As far as the last part of that is concerned, that is an implementation-specific and unspecified behaviour.

The current behavior on stable amounts to a soundness hole. For example, based on https://github.com/rust-lang/rust/issues/52652#issuecomment-408123653, we can write (playground):

extern "C" fn bad() {
    panic!()
}

fn main() {
    bad()
}

The behavior of this program is undefined on stable because we attach the nounwind LLVM attribute to bad.

Soundness is non-negotiable and as such we landed https://github.com/rust-lang/rust/pull/55982 to close this soundness hole. However, since there was no explicit confirmation of this step by the language team the change was reverted on 1.33 pending confirmation. The change is still seen in beta and nightly compilers.

Based on notes by @alexcrichton in https://github.com/rust-lang/rust/issues/52652#issuecomment-439072829, https://github.com/rust-lang/rust/pull/55982#issue-231227251, https://github.com/rust-lang/rust/pull/55982#issuecomment-445368032, and https://github.com/rust-lang/rust/pull/55982#issuecomment-447036128, I propose that we go ahead with and confirm the change in https://github.com/rust-lang/rust/pull/55982.

@rfcbot merge

Team member @Centril has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @Centril
  • [ ] @aturon
  • [x] @cramertj
  • [x] @eddyb
  • [ ] @joshtriplett
  • [x] @nikomatsakis
  • [x] @pnkfelix
  • [x] @scottmcm
  • [ ] @withoutboats

Concerns:

  • blocked-on-rfc-2699 resolved by https://github.com/rust-lang/rust/issues/58794#issuecomment-529620280
  • blocked-on-rfc-2753 (https://github.com/rust-lang/rust/issues/58794#issuecomment-529620280)
  • need-documented-replacement (https://github.com/rust-lang/rust/issues/58794#issuecomment-472436182)

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

This change was tried twice, and twice it was reverted because important parts of the ecosystem broke. Do we really want to try merge it again without any changes?

The discussion on this topic is pretty fragmented, the internals thread mentioned in the top post has been quite active as well. I'm still waiting for the summary I requested https://github.com/rust-lang/rust/issues/58794#issuecomment-468105235. I thought the scope of this issue is much bigger than what @Centril just mentioned. If you only care about the soundness issue at the IR level, another way to fix it is to never emit the nounwind attribute.

This change was tried twice, and twice it was reverted because important parts of the ecosystem broke.

This is untrue. The second time it was reverted it was reverted __only__ because of the lack of a completed T-Lang FCP (which we are doing now).

The second time it was reverted it was reverted only because of the lack of a completed T-Lang FCP

I don't think so. If no one in the community would've complained about the change, I don't think this would've been reverted a day before the stable release even though no lang team discussion had happened yet. That might've been used as justification to actually do the revert, but it's certainly not the only reason.

@jethrogb It most definitely was the only reason; the release team cannot undo language team decisions and had there been one we would not have reverted.

@Centril I'm saying that if there hadn't been any backlash no one would've even proposed to undo the change.

@jethrogb Yes, there was backlash, but that was irrelevant to the acceptance or non-acceptance of the undo-PR itself. The sole reason for accepting the undo-PR was the lack of a completed T-Lang FCP.

I don't have enough information to argue about procedural details and people's rationale for r+'ing this or that PR, and I wouldn't be very interested in doing so anyway. I just want to say that in the light of the the ongoing discussions and continued lack of consensus on how to address the legitimate needs of some projects to unwind through FFI, it seems premature to me to take this step now, just as it was premature the last times. Soundness is ultimately not negotiable, but there can absolutely be bad times and ways to roll out soundness fixes.

If you only care about the soundness issue at the IR level, another way to fix it is to never emit the nounwind attribute.

It was suggested to me in a private conversation that this might lead to performance loss. I'd like to see some numbers on that. Because things “work” most of the time right now, it seems to me that LLVM currently generates code that would be similar to the code it would generate without nounwind.

I wholeheartedly agree with @rkruppe. I feel like not emitting nounwind is a good alternative to fix the unsoundness now (although not solving UB in general), while keeping users happy, and it gives us time to search for a real solution. For this real solution, I'd like to see an RFC-style discussion with a solid motivation and discussion of alternatives.

As long as the soundness hole is closed one way or the other (aborting, not emitting nounwind, ...) I think it's fine.

However, I think we should separate discussion about new mechanisms like #[unwind(...)] from fixing the soundness hole. It cannot be that people knowingly depend on UB (and some do it unknowingly) and that therefore, we are forced to accept more additions to the language as so that the soundness hole can be closed. If "soundness is ultimately not negotiable" is to have any meaning a possible outcome must be that the hole is fixed but there's no #[unwind(...)]. I think it is long overdue to fix the hole as it was reported first in 2014! We have also communicated about the problem in two separate release notes.

However, I think we should separate discussion about new mechanisms like #[unwind(...)] from fixing the soundness hole. It cannot be that people knowingly depend on UB (and some do it unknowingly) and that therefore, we are forced to accept more additions to the language as so that the soundness hole can be closed.

I don't feel it is helpful to draw such an antagonistic picture. There literally is no way to do FFI unwinding safely in Rust currently, and some people got frustrated enough by that that they went with something that "happens to work". I have hacked around limitations in ugly ways often enough that I can totally sympathize. Sure, they should instead have written an RFC to provide a defined way to do what they needed to do, but that's a lot of work and not everyone is up for that kind of contribution.

The #[unwind] attribute was planned anyway, so there are no forced additions to the language here---just an adjustment to the transition plan. I hope this discussion kickstarts the RFC process for #[unwind], that's the most constructive outcome I can imagine here.

@Centril It is indeed a possible outcome that the relevant teams ultimately decide "damn those programs and use cases, we won't provide a way to unwind through FFI". However, that would be a decision with severe downsides (more social ones than technical ones) which I don't think should be taken lightly, and IMO not at this very moment but rather after the other options have been explored and rejected -- as @RalfJung said, the trajectory of #[unwind] so far is rather the opposite!

While it's all good and well to say "this is UB, we've always said so, and programs with UB are completely invalid", the Rust project really made its bed itself here by not acting on the subject for years and in particular not providing an alternative way to address the very reasonable needs that cause users to write programs with this UB. We now have the situation that people trying to do certain (fairly reasonable!) things with Rust not only have no way to achieve it without writing programs that have UB, they do not even have an alternative in sight that they could switch to when those programs break.

Rust is well within its rights to break those programs, and I am definitely not arguing that the de facto behavior of today should be ad-hoc blessed as defined behavior, but it will cause users serious problems to not provide some alternative way to do what they need to do. We should not cause users such problems if we can reasonably avoid it, even if it means delaying a soundness fix. For comparison, some type system soundness bugs get a long grace period of time where the compiler warns instead of erroring on wrong programs to help people fix it before they get broken. Such a warning is not possible in this case (as it's about runtime behavior), but we should similarly do our best to ease the pain. Holding off pulling the trigger for another couple months (peanuts compared to how long the soundness issue has been open!) while waiting on other issues get worked out is a quite easy way to do that.

However, that would be a decision with severe downsides (more social ones than technical ones) which I don't think should be taken lightly, and IMO not at this very moment but rather after the other options have been explored and rejected

I think there are also severe social downsides to not going ahead with this. Namely, we legitimize "There's no way to do X currently, so we'll do something that happens to work".

as @RalfJung said, the trajectory of #[unwind] so far is rather the opposite!

I first heard of the existence of #[unwind(...)] during the T-Release meeting where we decided to revert the change on 1.33. It also didn't have a tracking issue until 12 days ago. Moreover, @alexcrichton said this in https://github.com/rust-lang/rust/pull/55982#issuecomment-445368032:

The #[unwind] attribute was added as a necessary evil when this first came up (but wasn't supposed to be necessary long-term) and is otherwise only tweaking a very low-level detail of LLVM that doesn't relate to the correctness of the API.

For #[unwind] to become stable we'd need to provide a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++, which I don't really see happening any time soon, especially when we want to leave ourselves to implement unwinding via checked return values on targets where necessary.

None of this suggests that "The #[unwind] attribute was planned anyway, [...]" or "the trajectory of #[unwind] so far is rather the opposite!".

While it's all good and well to say "this is UB, we've always said so, and programs with UB are completely invalid", the Rust project really made its bed itself here by not acting on the subject for years [...]

Yes, I'm quite unhappy about the inaction here. I think the reason for the inaction has precisely been that we didn't want to break anyone. In the future I hope that we set deadlines for and better track soundness holes and C-future-compatibility issues.

We should not cause users such problems _if we can reasonably avoid it_, even if it means delaying a soundness fix.

I think it is entirely reasonable that people use nightly until such time and help test the #[unwind(...)] attribute in the process. As @alexcrichton noted:

Additionally I don't think this can really bake without actually testing, this will remain virtually undetected unless everyone opts-in to testing it, so the only real way to get testing is to actually flip the defaults and see what happens. That's what we did last time and it's easy to always flip the defaults back if something comes up!

For comparison, some type system soundness bugs get a long grace period of time where the compiler warns instead of erroring on wrong programs to help people fix it before they get broken.

I'm well aware of C-future-compatibility issues and but I think we let them sit around for far too long without actionable and well-triaged plans to address them. I think we are in need of schedules and deadlines.

I think there are also severe social downsides to not going ahead with this. Namely, we legitimize _"There's no way to do X currently, so we'll do something that happens to work"_.

On a philosophical level, I disagree. It's not a question of "legitimizing". It's a fact of life that people will rely on implementation details whether they're supposed to or not, unless you actively prevent them from doing so. Ideally you do prevent them, like rustc does with #[feature] flags, or at least actively assist them in avoiding it, like the hypothetical undefined behavior checker will do for violations of the memory model in unsafe code. But if you don't (and in some situations you can't), you can't shrug off responsibility for the breakage that ensues when the implementation changes.

On a more practical note, among the links in the original post, I think this (from here) is a key quote:

For #[unwind] to become stable we'd need to provide a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++, which I don't really see happening any time soon, especially when we want to leave ourselves to implement unwinding via checked return values on targets where necessary.

My thoughts:

  • The nounwind LLVM attribute is a red herring, since sticking it on a handful of functions in a larger codebase is very unlikely to have a measurable effect on performance. (If you want it on all your functions, you can always use panic=abort.) If abort-on-unwind is not merged, the default nounwind should just be removed for now. Likewise, if we wanted to make extern "C" functions unwindable by default and the only downside was that we wouldn't be able to automatically mark them nounwind, IMO it would easily be worth it.
  • However, the suggestion of unwinding via implicit return values changes the story. It doesn't necessarily conflict with unwinding across C code, since you could implement the transformation at the LLVM level and apply it to both C and Rust code. But if the C code is already compiled for an existing ABI (e.g. because it's a system library, or just using a different compiler toolchain), there's clear value in being able to interoperate with that while still supporting unwinding within Rust code.
  • There do exist targets where unwinding across arbitrary stack frames is not just unimplemented or difficult to implement, but impossible. WebAssembly in its current state is a partial example; you can implement unwinding using a helper written in JavaScript, but there are some pure-WebAssembly environments that don't have JS at all.
  • That implies that it is arguably beneficial to (eventually) require an explicit #[unwind] for functions that want to unwind into C, so that it can produce a compile error on such targets. On the other hand, if unwindable were the default, it could still produce a runtime error.
  • That also implies that making "a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++" is impossible. If that's the requirement for #[unwind] to exist, then it can never exist.

    • I don't think we have to be that strict. After all, there is already panic=abort mode where unwinding is not supported at all, yet we still have catch_unwind. Using it makes your code less than fully portable, but you can still use it. The same should be true for unwinding across FFI.

    • Indeed, I believe it would be against Rust's philosophy to not expose useful and necessary platform functionality just because it is not portable.

Of course, this needs to go through an RFC. I don't think it needs to be a particularly "hard" RFC, at least if we're just stabilizing unwinding across C; it could be accepted quickly enough that there would be basically no benefit in changing the implementation to abort by default in the meantime. But this seems to have been rather controversial so far, so who knows...

For that to happen, someone needs to write the RFC. Does anyone want to volunteer to do that? Should I?

I also think it would be useful to fix and stabilize unwinding across C++, as a feature which might have an even narrower set of supported platforms, but which is not at all hard to implement on most existing platforms and could be quite useful for mixed C++-Rust codebases. But that comes later.

Oh, one more thing (I'd edit this in, but that doesn't help people reading via email):

If the unwind attribute is stabilized, rather than #[unwind(allowed)], I'd like to see something like #[unwind(C)], indicating that you want to unwind across C code. In the future there could be a separate #[unwind(C++)] and perhaps others. There wouldn't necessarily be any way to verify that you chose the right language, but it would make it more evident what exactly the implementation is guaranteeing to be safe, and would allow targets that supported unwinding across C but not C++ to make #[unwind(C++)] a compile error.

@rfcbot concern need-documented-replacement

In the absence of a documented replacement for how people should handle errors in C libraries that only support handling through unwinding, closing this would break a common use case.

Another way to fix the UB might be to drop the LLVM "nounwind" attribute. We could also add #[unwind], though it'll take some exploration of the details there.

I'm happy to support this as a sensible default after we document exactly what we expect people to do when interacting with inflexible C libraries that expect unwind-based error handling.

Whenever you have incompatible exception handling machinery you can always wrap it on both sides (using panic! & catch_unwind in Rust, throw & try/catch in C++, longjmp & setjmp or compiler extensions in C) and pass something closer to Result across the FFI boundary.

There's nothing special about exceptions when doing this, they're like any other "unrepresentable ABI" problem with FFI: you translate to something that is representable in both languages and pass that.

Opt-in language functionality like #[unwind], #[ffi_returns_twice], etc. can help in the long-term, but is IMO harder to get right, since you still have UB hazards left and right (sometimes more than translating a language feature to C ABI in that language) and can be less portable (e.g. incompatible unwinding mechanisms), so it's not a silver bullet.

I don't think that dropping nounwind will solve anything in the general case, as incompatible forms of exception handling are still UB by their very nature, it'd just make some code appear to work one some platforms.

EDIT: just saw https://github.com/rust-lang/rust/issues/58794#issuecomment-471367940, and I agree with @comex (who went into more detail than me).

@eddyb Are you suggesting that any user of this should write C code to wrap any Rust callbacks and handle unwind there, and translate to/from a Rust Result?

@joshtriplett Yes, if they want to be portable.
Also, if the non-Rust library is using C++ exceptions, the glue code should be in C++, not C.

If you need to "unwind through C code", you either need to compile the C code yourself with a compiler that supports a form of unwinding as an extension, or try to find another way, I believe that's one of the things that's impossible/UB in the general case.

(setjmp/longjmp might work in many cases with already-compiled C code but I can't in good faith recommend that approach)

@eddyb

(setjmp/longjmp might work in many cases with already-compiled C code but I can't in good faith recommend that approach)

Why not? This is precisely what those C libraries that only support unwind-based error handling expect callbacks to do.

In any case, unwinding works in Rust today for this particular use case (Rust -> C -> Rust callback)

you either need to compile the C code yourself with a compiler that supports a form of unwinding as an extension

Crates can reasonably do that by doing the build as part of build.rs.

@joshtriplett One thing off the top of my head (but there might be more reasons): libraries written without unwinding support in mind and/or not compiled with -fexceptions (or equivalent) may be UB if certain code is skipped (e.g. a loop waiting for a mutex to be unlocked / a thread to exit).

I'd say in general it's really UB, however we can have defined behavior with certain known conditions. For example, in Elbrus architecture frames are explit on the CPU level, and stack unwinding is guaranteed to actually jump through the stack frames (no longjmp there). So, on Elbrus, for the whole platform is seems reasonable to assume that the bahevior is known, and that the unwinding works for everything by default. I'm not an expert in Elbrus though, I just head it somewhere. (Please validate my before actually relying on this!) The point is, maybe it can be different per-platform?
I'm not an expert in Rust internals either, but maybe there could be a certain autotrait a compiler can generate for Fn on a case by case basis if it has an indication that unwinding is safe? Just a wild idea...

Certainly unsafe code is allowed to assume its destructors won't be skipped when unrolling up the stack, otherwise things like the crossbeam scoped API would be completely unsound. I think the only thing being discussed here is whether it's okay to unwind past specific, known chunks of Rust code.

We are mostly worried about unwinding past C code.

I'm confused. Unwinding Rust code can be made safe within the sequence of Rust frames, being that sequence the top-level app stack, or a callback inside of a C/C++/whatever passed the control to the Rust code, right? Problems and uncertantries arise when we unwind across Rust and non-Rust stack frames, like from Rust code (running in callback), to C++ code (that invoked the callback), running in Rust code (in the Rust app, that uses C++ library that has a call that takes a callback). In this case it might be tricky to guarantee that unwinding will correctly pass through the C++ layer (Rust -> C++ -> Rust). And so on with other combinations. My point about Elbrus is all such cases baheve very similarly with their architecture with multiple stacks (they have separate frames and data stacks).
Also, any part of the code is allowed to catch the unwinding, in theory. Yeah, that's not as strightforward as I anticipated initially.

C code compiled with C++ exceptions enabled is not C. ABI-wise it is probably more C++ than C.

If the aim is to convert Rust panics to C++ exceptions, and then catch C++ exceptions and convert them to Rust panics, supporting that via extern "c++" makes more sense to me.

What's the status on this? We reverted the abort in #58795 for the stable branch, but AFAICS it's still there on beta and master. If the decision is still pending, we should also revert on beta so it doesn't land in 1.34 next week!

Rust programs have UB without the PR, we want to exploit that UB to improve the performance of Rust panics in the near future (e.g. https://github.com/CraneStation/cranelift/issues/553), and the PR makes those programs have defined behavior by guaranteeing an abort instead which is IMO a big improvement.

The revert was intended to buy us more time to explore some solutions and we have done so. Somebody needs to put in the work to write RFCs, implement them, etc.

Programs affected by this, like mozjpeg, have a migration path available: catch panics in the Rust code, pass error codes through FFI, re-raise as exceptions/longjmps/etc. in whatever other language is at the other side of the FFI. This is what those programs should have done in the first place.

Therefore I think we should revert the revert.

While it is sad that Rust cannot directly interoperate with C++ in C FFI (automatically inserting shims to convert from Rust panics to C++ exceptions and vice-versa), the right way to solve that problem is to submit an RFC with a solution.

I think we should revert the revert.

The revert was only on stable, so if we do nothing right now, the unwind-abort will be in 1.34. Maybe that's fine, but it doesn't seem like there's consensus per rfcbot https://github.com/rust-lang/rust/issues/58794#issuecomment-471281244.

I agree that we should propagate this to stable to avoid a stable regression, yes. This should never have been broken in the first place without discussion before changing the behavior. This worked in prior stable versions, and that it happens to not be well defined doesn't change that it worked in many cases and people were able to successfully use it in those cases.

I'd be all for seeing RFCs, to propose alternatives that would allow optimizations like the proposed one in cranelift. And in the meantime, let's not break stable users.

I'm aware that "this is unsound" is a permissible justification for breaking stable. However, there's a difference between "this is unsound" and "this is undefined (but people know how it works)". By all means, let's find a solution for this, and until then let's not stick a crowbar through the engine of a running vehicle to stop it for maintenance. ;)

I opened #59640 on beta to preserve the current stable behavior.

This worked in prior stable versions, and that it happens to not be well defined doesn't change that it worked in many cases and people were able to successfully use it in those cases.

It is not that the behavior "wasn't well defined" - the behavior is _undefined_ by design. The main reason being that we can change the implementation to alert users that they are invoking undefined behavior, as well as to enable optimizations in FFI code and in the Rust panic implementation.

By all means, let's find a solution for this,

There is already a straightforward solution to this. People arguing that it's not what they wish it would be does not change that fact.

and until then let's not stick a crowbar through the engine of a running vehicle to stop it for maintenance. ;)

LLVM is allowed to optimize all this code under the assumption that it won't unwind because of the nounwind attribute. If we don't alert users that their code is broken loudly, the next LLVM upgrade could silently break their production code, potentially introducing security vulnerabilities.

I'd rather have users complaints of the form "You changed the implementation of some code that had undefined behavior and now I have to fix my code" than complaints of the form "You knew my code had undefined behavior, had an implementation of a way to alert me, yet decided not to do so, which resulted in my software having a security vulnerability".

I'd be all for seeing RFCs, to propose alternatives that would allow optimizations like the proposed one in cranelift. And in the meantime, let's not break stable users.

The current stable C FFI already allows these optimizations. If people want language features to more ergonomically interface with the unwinding strategies of other programming languages, like C++, they should open RFCs to do that.

Process question: Should we revert this on nightly too so we don't have to keep reverting this until the final decision has been made?

Rust programs have UB without the PR, we want to exploit that UB to improve the performance of Rust panics in the near future (e.g. CraneStation/cranelift#553), and the PR makes those programs have defined behavior by guaranteeing an abort instead which is IMO a big improvement.

That link does not show that we want to exploit it "in the near future", given that Cranelift does not perform optimizations and so any small performance benefit from changing the calling convention would be purely academic – not to mention that rustc doesn't even support Cranelift yet.

Arguably it shows that we want to exploit it in the far future; that gives us plenty of time to add FFI unwinding attributes first, rather than rushing to break things and, at best, making crates like mozjpeg write pointless workarounds that will soon become obsolete.

I'd rather have users complaints of the form "You changed the implementation of some code that had undefined behavior and now I have to fix my code" than complaints of the form "You knew my code had undefined behavior, had an implementation of a way to alert me, yet decided not to do so, which resulted in my software having a security vulnerability".

Then remove nounwind.

Then remove nounwind.

As discussed yesterday on discord (cc @Centril @joshtriplett ), I believe that while these libraries have knowingly decided to rely on a particular implementation of undefined behavior, we might have failed to communicate during the last cycle that it's up to them to put in the work. Iff these libraries show willingness to fix their code, and require more time to do that, I'll be fine with delaying the landing of abort for another 6 weeks, as long as it then lands unconditionally.

Penalizing correct C FFI code (e.g. by removing nounwind) would set IMO a bad precedence (what's next? remove noalias, dereferenceable and align from Rust references because some other code exploits UB and these break that?). Also, I'm more sympathetic of incorrect-by-accident C FFI code that would be alerted by abort, and then fixed, than of libraries knowingly abusing UB.

making crates like mozjpeg write pointless workarounds that will soon become obsolete.

No other programming language allows unwinding through C FFI. The only standard way of propagating exceptions in C++ and D through FFI boundaries is catching all exceptions at the FFI boundary, and passing error codes instead - depending on the language at the other side, those error codes can be re-raised as panics/exceptions/sjlj/etc. The behavior of throwing from extern "C" functions in those languages is undefined as well, and aborting is what C++'s noexcept also chooses to do if an exception is thrown.

In the last 6 week, no pre-RFC or RFC has been filled to extend the language to support this use case, this issue has received very little attention, the design work required to extend the language with a more ergonomic solution is IMO significant (should we have #[unwind(c++/c-sjlj/c++itanium/c++seh/c++sjlj/rust)] vs extern "C++/extern "Rust" etc.), and it is unclear to me whether the added complexity of doing so is worth the improved ergonomics given that a solution to this problem that works 100% reliably between Rust and all other PLs (not only C++ or C, but also Python, Ruby, and everything else) has been available since Rust 1.0, and that the number of affected crates is small.

So I am not as certain as you are about "how soon" this workarounds will land.


There have been cases where there just was no way to do something in Rust without invoking UB (e.g. taking the address of a packed struct field), but this is not one of those cases. These libraries had a perfectly valid stable Rust alternative available, yet decided to pursue the UB route instead. I think that's unfortunate, but I think it would be more unfortunate to penalize those using C FFI correctly, as well as leaving vulnerable those who are invoking undefined behavior by accident instead of by design.

@joshtriplett

This worked in prior stable versions, and that it happens to not be well defined doesn't change that it worked in many cases and people were able to successfully use it in those cases.

However, there's a difference between "this is unsound" and "this is undefined (but people know how it works)".

I agree in principle. However, notice that the same is true for the kind of UB that we will eventually introduce with whatever derivative of or alternative to Stacked Borrows becomes "the real thing": we currently definitely do not exploit many kinds of UB in this space (we only emit noalias on function boundaries), so one could say "this is undefined (but people know how it works)".

I appreciate that unwinding across FFI might be different, but "it is UB just on paper, not in practice" is a slippery slope.

In the last 6 week, no pre-RFC or RFC has been filled to extend the language to support this use case,

I started one (which is largely based on jcranmer's post). If jcranmer's ideas sound reasonable and interesting to people, I can try expediting writing the RFC (at the cost of interacting with my family...).

I have a crate (objrs) that makes Rust code adhere to the Objective-C ABI, allowing Rust and Objective-C to interop. Throwing exceptions across Rust+Objective-C boundaries is a critical part of my crate (libobjc even has a C FFI function (objc_exception_throw) for throwing Objective-C exceptions). And I'm not the only one doing this: the more popular rust-objc crate also supports Objective-C exceptions across the Rust boundaries. I bring this up to show that there is interest in not only Rust → FFI → Rust exceptions (e.g., mozjpeg), but also C++ → Rust → C++ and Objective-C → Rust → Objective-C (e.g., objrs and rust-objc).

In short, there's a good argument to make for exceptions across FFI boundaries. It definitely needs an RFC to iron out the details and make clear what is and is not permissible. I just hope people are at least open to the idea.

No other programming language allows unwinding through C FFI. The only standard way

Focusing on a "standard way" for a language is a red herring. This kind of stuff is in the land of implementation-defined behavior, not standard-defined behavior. There's a reason GCC has -fexceptions and allows it to be used with C (to quote their docs: "you may need to enable this option when compiling C code that needs to interoperate properly with exception handlers written in C++"). With Rust, I imagine this being a mix of standard- and implementation-defined behavior (meaning Rust supports some common exception ABIs for FFI, but not necessarily on all platforms, and the onus is on the programmer to make sure the ABI used by Rust matches the ABI used by the FFI).

@mjbshaw jcranmers' post is one of the many directions in which we could go, I think the direction is worth exploring, but not at the cost of interacting with your family. Before writing an RFC, it would probably make sense to first open an internal threads to collect all the use cases we want to support and all the constraints that we have, and try to reach a consensus on that, since those are going to restrict the design space for the RFC.


On discord we also discussed that maybe we could add a -Z no-ffi-abort-on-panic-Rust-1.34.0 feature flag to allowing users to temporarily, and for particular toolchain versions, turn off abort on panic through C FFI.

@gnzlbg

Iff these libraries show willingness to fix their code, and require more time to do that, I'll be fine with delaying the landing of abort for another 6 weeks, as long as it then lands unconditionally.

I take this as an ultimatum for the lang team -- from the release team perspective, IMO we should keep maintaining the status quo on stable until this issue is decided.

On discord we also discussed that maybe we could add a -Z no-ffi-abort-on-panic-Rust-1.34.0 feature flag to allowing users to temporarily, and for particular toolchain versions, turn off abort on panic through C FFI.

Would you make this flag available to stable users? -Z options usually aren't, and if users have to play games with RUSTC_BOOTSTRAP, they might as well go further and use #[unwind(allowed)].

Would you make this flag available to stable users?

That's a good question. mozjpeg is used where stable Rust is needed, but Firefox is using RUSTC_BOOTSTRAP anyways. So... we should just ask them if they need it. While we usually don't do this, I think that as long as we clearly delimit for which Rust versions the flag is available, and make it clear that the flag will be removed in the future, allowing using such a temporary flag from 1-2 stable Rust toolchains is a good compromise.

cc @cuviper @kornelski @alexcrichton

Based on discussion in the language team meeting: We'd like to address the undefined behavior, and in the process of doing so we want to provide a stable Rust alternative for this. We'd like to set a reasonable deadline for a stable release making undeclared unwinds through FFI abort, something in the ~12 week range. In order to make a plan that seems likely to succeed, we'd like to have the folks who need this feature (either unwind-through-FFI or some manner of well-defined setjmp/longjmp) involved in the conversation and specifying the replacement feature.

Could we please get some positive confirmation, from the Rust-bindings-to-mozjpeg folks or others who need this, that it seems reasonable to develop a replacement for this in the near-term future?

I'm still interested in a solution.

I've prototyped a "plan B" of wrapping each FFI call in a C++ try/catch and translating caught errors to C-compatible error codes. It's doable, so if you decide to kill unwinding in 12 weeks, I could reluctantly switch to the workaround. But the workaround is not great (a ton of boilerplate that is hard to automate, and needs changes on both sides of the FFI).
Most importantly, the C++ workaround AFAIK is also technically UB that happens to work, for exactly the same reasons as in Rust, so that's a churn that doesn't actually solve UB for libjpeg wrappers.

A solution that would work for me would be, e.g: a function attribute that allows unwinding through FFI (#[unsafe_may_unwind]), with an implementation-defined behavior that:

  • aborts if it's known that the target doesn't support it (doesn't have to abort if "it depends")
  • it's not UB on principle, and it's not UB when used "properly", where the exact meaning of "properly" is implementation-defined. I'd expect this to be something like forbidding the foreign language from having its own unwinding/destructors/longjmp/using SEH on Windows.

Personally I don't need interoperability with C++ or longjmp, so I'm fine if these are left out of scope as UB.

Most importantly, the C++ workaround AFAIK is also technically UB that happens to work,

This is false. The behavior is defined and supported by C++ language features. If you want to make sure that no exceptions propagate from C++ into C, just make your extern "C" C++ wrappers noexcept - that will result in std::terminate being called if an exception would have escaped the extern "C" function, preventing UB if that exception were to have escaped into C or some other PL.

About libjpeg, considering the library is written in C, it may make sense to write a C wrapper which passes panics through longjmp. For example, assuming the library function is called extlib_magic.

From the Rust side:

use std::any::Any;
use std::os::raw::{c_char, c_void};
use std::panic::{catch_unwind, resume_unwind, AssertUnwindSafe};

struct State<'a, F> {
    panic_store: &'a mut Option<Box<dyn Any + Send + 'static>>,
    function: F,
}

fn magic<F: FnMut()>(f: F) {
    unsafe extern "C" fn rust_side_magic_call<F: FnMut()>(state: *mut c_void) -> c_char {
        let state = &mut *(state as *mut State<F>);
        *state.panic_store = catch_unwind(AssertUnwindSafe(&mut state.function)).err();
        state.panic_store.is_some() as c_char
    }
    let mut store = None;
    let mut state = State {
        panic_store: &mut store,
        function: f,
    };
    unsafe {
        c_side_magic_call(
            rust_side_magic_call::<F>,
            &mut state as *mut State<F> as *mut c_void,
        );
    }
    if let Some(panic) = store {
        resume_unwind(panic);
    }
}

extern "C" {
    fn c_side_magic_call(f: unsafe extern "C" fn(*mut c_void) -> c_char, state: *mut c_void);
}

fn main() {
    magic(|| {});
}

From the C side:

#include <setjmp.h>

extern void extlib_magic(void f(void *), void *);

struct RustFunctionDetails {
    char (*f)(void *);
    void *state;
    jmp_buf jump_buffer;
};

static void c_side_magic_call2(void *state) {
    struct RustFunctionDetails *details = state;
    if (details->f(details->state)) {
        longjmp(details->jump_buffer, 1);
    }
}

void c_side_magic_call(char f(void *), void *state) {
    struct RustFunctionDetails details;
    details.f = f;
    details.state = state;
    if (!setjmp(details.jump_buffer)) {
        extlib_magic(c_side_magic_call2, &details);
    }
}

longjmp never passes Rust - everything is plain C, so this shouldn't cause UB. I didn't test whether this does work, so this may not compile or be subtly broken.

@xfix: This is not a solution, because I specifically want to avoid per-FFI-call setjmp calls, and handling panic in the callback is not the goal, but a method to achieve the goal of unwinding. If you prevent unwinding, you keep the particular solution, but miss the main goal entirely. I don't want to repeat the whole explanation here, so please see the earlier thread: https://internals.rust-lang.org/t/unwinding-through-ffi-after-rust-1-33/9521

@kornelski the method proposed by @xfix and variants of it (if the C library is designed properly, there is no need to use SJLJ) is probably the best way to do this in Rust today.

if the C library is designed properly, there is no need to use SJLJ

That's exactly the point. libjpeg is not "designed properly". It has code like

if (!ptr) err->callback(); ptr[x] = y;

which leaves no safe options other than abort() and unwinding of some sort.

I've even asked if it was possible to fix that in libjpeg, and the answer was no: the change would be too big, and the maintainer won't risk ABI breakage that comes with it.

That's exactly the point. libjpeg is not "designed properly".

There are many solutions that work for those, like the one that @xfix gave you. Some of them work for even worse library designs, e.g., a thread local error context (just like libc's errno), a global context protected by a mutex, etc. There are many options that have always worked on stable Rust correctly.

@gnzlbg We may be talking past each other? @xfix's solution uses per-call setjmp, and my goal is to avoid per-call setjmp. @xfix's solution requires wrapping of all FFI calls, and I'd prefer not to do that. That's where Rust adds both runtime cost and code complexity, and I don't see how that's any better than the still-not-so-great "plan B" solution I described earlier.

Is there a way to avoid per-call setjmp for such a C library design in stable Rust today that does not invoke undefined behavior ?

Assuming C++ unwinding through C is not UB (and I'm finding answers like these that it is UB, but happens to work through use of the same compiler that in practice is compatible), the current stable-compatible solution is to wrap each call in a C++ wrapper, use C++ exceptions for unwinding, and pass errors to Rust in some non-unwinding way.

Unwinding is going to be slow when it happens, it's by design. The cost will be here, whatever you do. Also, keep in mind there is no unwinding in C, and unwinding isn't going to work on any platform, for instance it may not work on Windows when a library was compiled with an /EHsc option (i.e. an option to consider C++ exceptions in C functions to be undefined behaviour - this DOES improve the performance).

Being a C feature, longjmp should work, but this says nothing about unwinding.

You definitely do need the compiler to add unwinding support on the C code, e.g. gcc -fexceptions.

Assuming C++ unwinding through C is not UB

But that assumption is incorrect. It is UB, for many reasons:

  • a Rust extern "C" function panics, which is something that is UB in Rust, independently of whether that function is used on C FFI or not
  • an extern "C" function throws a Rust panics in C++, which is something that C++ does not expect from an extern "C" function, and is therefore UB in C++ (C++ doesn't expect such a function to even throw a C++ exception, much less a Rust panic, which is something that C++ does not even know exists)
  • an extern C functions throws a Rust panic in C, which is something that C does not expects (C does not even have a concept of stack unwinding, C++ exceptions, Rust panics, etc.)

So this "approach" is UB on all programming languages and implementations involved, most of them are completely outside our control. Sure, one possible outcome of UB is that the program appears to work correctly, but that behavior is often followed by the realization that when one changes the compiler, optimization-level, etc. the program breaks, which is exactly what happened here.

So I'll ask my question again:

Is there a way to avoid per-call setjmp for such a C library design in stable Rust today that does not invoke undefined behavior ?

AFAICT, there isn't, but this is not an excuse to invoke undefined behavior, particularly in this case where there are solutions that do not do so (other open issues don't have this luxury). If the run-time cost is unacceptable, write an RFC, push it through the process, etc. If the issue is critical those steps can be fast-tracked, grace periods (like the one proposed here) can be given, etc.

But if there is a viable alternative and a project decides to rely on UB instead, then IMO, it's their own fault. In this case, mozjpeg has two reverse dependencies, so the number of projects affected is at least small.

But if there is a viable alternative and a project decides to rely on UB instead, then IMO, it's their own fault. In this case, mozjpeg has two reverse dependencies, so the number of projects affected is at least small.

You assume a lot here:

  • The OSS ecosystem is the world?
  • The number of reverse dependencies is proportional to a crate's importance?

@gnzlbg I think the Rust factors in your reply wouldn't apply to what I infer @kornelski is doing:

  • Rust code calls a C++ shim

    • C++ wraps a try around the C call

    • C encounters an error and calls the callback



      • C++ callback throws an exception



    • C++ catches the exception, translates to a plain error return value

  • safely return to Rust

On Fri, Apr 05, 2019 at 04:26:33AM -0700, Kornel wrote:

A solution that would work for me would be, e.g: a function attribute that allows unwinding through FFI (#[unsafe_may_unwind]), with an implementation-defined behavior that:

  • aborts if it's known that the target doesn't support it,
  • it's not UB on principle, and it's not UB when used "properly", where the exact meaning of "properly" is implementation-defined. I'd expect this to be something like forbidding the FFI language from having its own unwinding/destructors/longjmp/using SEH on Windows.

This sounds potentially viable to me. There is, as I understand it,
currently such an attribute available on nightly Rust, so one route
would be to see if that attribute works for you, and if so, stabilize
it.

If that attribute doesn't work for you, let's figure out what would.

I'm happy to help review RFCs with you.

@sanmai-NL

You assume a lot here:

The OSS ecosystem is the world?
The number of reverse dependencies is proportional to a crate's importance?

The argument was made that because mozjpeg is used in Firefox, it is an important crate, and we shouldn't break it.

My point is that if we let this to factor into this decision we are setting the horrible precedent that if you make an "important" project like Firefox depend on undefined behavior, then the language team will define the behavior in your favor.

With the statement that mozjpeg has two reverse dependencies my point was that the currently approved change does not cause ecosystem wide breakage. So while it might be breaking an important project, that's IMO their own fault, and totally ok. Firefox is not special.

EDIT: this does not mean that we should break things "just because", even if only one crate is affected I'm the first one that's always against breaking code if there is no migration path available. I know that the migration path here is painful, and I think that offering a better solution is worth doing. But there are many other things worth doing.

I think the Rust factors in your reply wouldn't apply to what I infer @kornelski is doing:

@cuviper I don't see there anything about unwinding through Rust code, so maybe we are talking past each other like @kornelski suggested.

AIUI the current implementation unwinds through Rust. The C++ shims are what they would have to otherwise use to avoid this, but that's cumbersome to write for every possible call.

Perhaps using the cpp crate could at least keep this inline.

Assuming C++ unwinding through C is not UB

But that assumption is incorrect. It is UB, for many reasons:

* a Rust `extern "C"` function panics, which is something that is UB in Rust, independently of whether that function is used on C FFI or not

That is something that can be changed (as is proposed to be changed).

* an `extern "C"` function throws a Rust panics in C++, which is something that C++ does not expect from an `extern "C"` function, and is therefore UB in C++ (C++ doesn't expect such a function to even throw a C++ exception, much less a Rust panic, which is something that C++ does not even know exists)

* an `extern` C functions throws a Rust panic in C, which is something that C does not expects (C does not even have a concept of stack unwinding, C++ exceptions, Rust panics, etc.)

These two points are actually false. The C specification says nothing about unwinding, but we're in the realm of relying on compiler options that are in effect changing the version of C we're using to a (admittedly poorly-documented) variant of the C standard that isn't vanilla C11 or what have you. Similarly, C++ doesn't say that a C++ exception being thrown through a C function is undefined behavior--at best it's implementation-defined, and extern "C" in C++ is de facto implemented as "drop name mangling, all other ABI is the same." So the question isn't what C/C++ allows, but what the ABI allows and the ability of your compiler to adhere to that ABI.

@jcranmer

These two points are actually false. The C specification says nothing about unwinding, [...]

You are stating that the behavior of these things is defined in all languages involved (by stating that my claim is false), yet continue right in the next sentence to state that the behavior is indeed undefined.

Note that I wrote (emphasis mine):

  • an extern "C" function throws a Rust panic in C++, [...]
  • an extern "C" functions throws a Rust panic in C

The concept of a Rust panic does not exist anywhere in the C and C++ standards (the concept of unwinding doesn't even exist in the C standard), nor do we define in Rust that Rust panics are guaranteed to match C++ exceptions or C SJLJ ABI wise (both are incompatible with each other).

Similarly, C++ doesn't say that a C++ exception being thrown through a C function is undefined behavior--at best it's implementation-defined,

For this to be implementation-defined there must be a standard clause that says so. AFAIK such clause does not exist, and this shows: on Linux / MacOSX the exception ABI is thoroughly documented, but on Windows the C++ exception ABI is not documented at all and the recommendation is not to use them on ABI boundaries because their implementation might change on any toolchain upgrade.


What many have argued is that, while the behavior of all of this is (or might be) undefined, this isn't a KO criteria because iff one controls all code and toolchains involved at both sides of the FFI, one can tune compiler options to make things work. This is true, but this IMO hints that we should expand the Rust language / toolchain with similar features / knobs, which is different from other approaches being argued for, where we defined the behavior to "somehow" guarantee that this works or that at least its not broken by default (e.g. removing nounwind, not aborting on panics through FFI by default, etc.). I disagree with that. As is, this code has UB today, and is therefore already broken.

I also don't think this is a bad or a time critical thing, because from this same argument it follows that Iff you control the code at both sides of the FFI, you can just modify the code to not have UB.

Is this solution optimal? Of course not, it sucks, and it should be improved, but that's what the RFC process is there for. We can fast-track such a process, but AFAICT we have an infinite amount of time to improve this particular issue because there is at least a working alternative that works properly. Other open issues (e.g. creating a pointer to a packed struct field) do not have that luxury (there is no way to avoid UB on those).

Other open issues (e.g. creating a pointer to a packed struct field) do not have that luxury (there is no way to avoid UB on those).

Strictly speaking, creating a pointer to a packed struct field is safe (as long there is C representation, for non-C representations, it should be possible, but Rust doesn't provide a syntax for that purpose). For #[repr(packed, C)] structures, it's possible to get a pointer to a structure, and add sizes of all fields before the field you want to access (it's packed, so alignment is non-issue, and C representation guarantees ordering). It can be done without using unsafe at all or triggering any warnings, so if it causes UB then it's a soundness bug.

Then, it's possible to read the value using std::ptr::read_unaligned function. This is unsafe, because it takes a pointer, but considering the function includes unaligned in its name, i don't think it would be concerned with unaligned pointers (and if it is, then what's the point of this function?).

use std::mem::size_of;
use std::ptr;

#[repr(C, packed)]
struct Magic {
    a: u8,
    b: u32,
    c: u16,
}

fn main() {
    let magic = Magic {
        a: 1,
        b: 2,
        c: 0x3333,
    };
    // I'm not sure whether you can use add here, so using wrapping_add
    // to be safe. Miri doesn't complain, but this is complicated.
    let c_ptr = (&magic as *const Magic as *const u8)
        .wrapping_add(size_of::<u8>() + size_of::<u32>()) as *const u16;
    println!("{:x}", unsafe { ptr::read_unaligned(c_ptr) });
}

@xfix I wasn't talking about safety, I was mentioning that it is impossible to do so in general without invoking undefined behavior (e.g. by creating a misaligned reference first). This is fully offtopic, but it is not possible to do this for all repr(C) types either (we can take this discussion offline though).

You are stating that the behavior of these things is defined in all languages involved (by stating that my claim is false), yet continue right in the next sentence to state that the behavior is indeed undefined.

No, I'm stating that the behavior is implementation-defined. Go read the Itanium ABI, for example--it very clearly explains how to catch foreign exceptions in C++ adhering to the Itanium ABI.

For this to be implementation-defined there must be a standard clause that says so. AFAIK such clause does not exist, and this shows: on Linux / MacOSX the exception ABI is thoroughly documented, but on Windows the C++ exception ABI is not documented at all and the recommendation is not to use them on ABI boundaries because their implementation might change on any toolchain upgrade.

That clause does exist, at least in C++. Quoting §7.5 ¶9:

Linkage from C++ to objects defined in other languages and to objects defined in C++ from other languages is implementation-defined and language-dependent.

No, I'm stating that the behavior is implementation-defined. Go read the Itanium ABI,

Implementation-defined behavior is required by the standard to be documented. The standard does not mention anything about the exception ABI, and it does not require implementations to document it.

This does not mean that implementations can't do so, some tier-1 platforms (like the ones using the Itanium ABI) do, and they are standard compliant. Other tier-1 platforms do not document it, and they are standard compliant as well.

That clause does exist, at least in C++. Quoting §7.5 ¶9:

That paragraph of the dcl.link section is about object linkage, not about the exception ABI. The section makes it clear that extern "C" is for linking against the C programming language.

So, for terminology's sake: if something is defined by the implementation, but is not described by the standard as implementation-defined, it's not Implementation-defined. So what is it? What terminology should we be using for this (as it clearly exists)?

It's not really a vendor extension (those are dunderscore macro/keywords and/or vendor compiler flags).

I think everyone agrees that the "best" solution is to introduce the knobs such that at least the extern fn foo() can be set to allow to return/unwind/terminate with a Rust panic from the same rustc toolchain. IIRC, we said in the Chucklefish (rlua) case that reverted (this? very similar?) change that we want to allow setjmp/longjmp _over_ Rust code (thus mem::forgetting everything).

Where there seems to be disagreement is the order of things. The standard position, IIRC, is that we'll try not to break things until there exists a stable alternative. There are two major positions here: there is (use C++ and C++ exceptions, which then pass an error code back to Rust through extern "C"), and that that is an ugly workaround that shouldn't be necessary, we should provide #[unwind] (or whatever mechanism) first.

Proposal: delay changing the extern fn behavior until an implementation to fix this mess is stable. This means waiting on an RFC effort so long as it is making demonstrable process.

Proposal: stabilize some subset of #[unwind] to get the not-LLVM-UB solution of removing nounwind and the catches short term, move towards a more complete solution through the RFC process.

This kind of language lowering is extremely annoying and never leads to a productive solution.
From the C++ standard point of view even threads didn't exist before the year 2011, we should not rely on the standard's opinion on something it doesn't really cover.

At high level, unwinding is largely a platform capability (especially on Windows) rather than a language feature, and Rust as a low level language should provide access to platform capabilities, at least by not standing in the way.
So, we should focus on how to achieve that, IMO, rather than the differences between implementation-defined and UB in standards.

So, for terminology's sake: if something is defined by the implementation, but is not described by the standard as implementation-defined, it's not Implementation-defined. So what is it?

intro.def defines those terms, worth a read.

extern fn foo() can be set to allow to return/unwind/terminate with a Rust panic from the same rustc toolchain.

I think that adding an #[unwind(panic_abi)] attribute that states that an FFI function is allowed to panic, and that the panic will be shimmed to a particular ABI, is worth doing, but by itself it would not solve @kornelski 's issue.

@kornelski has expressed the requirement for the approach to have zero-overhead. If the #[unwind(panic_abi)] attribute specifies the panic abi, only #[unwind(rust)] would have zero-overhead. However, we don't guarantee Rust panics to be implemented like C++ panics, so unwinding through C or C++ is not guaranteed to work, and we don't want to tie Rust panic implementation to any of the existing ABIs, particularly because that would prevent Rust from improving its panic implementation in the future, as well as from interfacing across multiple incompatible ABIs from Rust.

We could add #[unwind(c++-native)], #[unwind(c++-itanium)], #[unwind(seh)], #[unwind(rust)] etc. where Rust would shim in/out panics to those ABIs, but that would add a run-time overhead if the ABIs differ.

To solve @kornelski problem we need to combine this with globally selecting the Rust panic ABI. We currently already allow that with -C panic=abort/unwind, but we could extend this with -C panic=c++-native. For @kornelski code to be correct, its library should probably contain a:

#![cfg(not(panic = "c++-native"))] 
compile_error!("this crate requires -C panic=c++-native");

so we might want to extend the language with such a cfg(panic =) attribute to do conditional-compilation based on the global panic implementation (EDIT: note that we explicitly decided during the panic=abort RFC that we did not want Rust code to be able to rely on the panic strategy, but maybe we can just forbid using "abort" as a key here). Then just marking its FFI functions with the #[unwind(rust)] attribute, to state that they allow unwinding with the Rust ABI, would be enough. C FFI functions without #[unwind] attribute would still be nounwind, avoiding a regression on all current C code.

So adding #[unwind(rust)] and supporting -C panic=c++-native might be an MVP for this, that we can extend in the future to other use cases.

From a type-system POV, #[unwind(rust)] extern "C" fn() -> () is not the same type as extern "C" fn() -> (), so we need to make this type-system change (EDIT: unwinding is an effect, so this would impact an effect system, which might want to allow code to be generic over unwinding, or specialize depending on the unwinding abi, etc.). We might want to add a coercion from extern "C" fn()->() to #[unwind(ANY)] extern "C" fn() -> () since extern "C" fn()->() never unwinds, which should be compatible with all unwinding ABIs.

From a C++ point-of-view, we need to make sure that Rust "panics" with c++-native work properly as C++ exceptions, and IMO C++ code should be able to catch and re-throw them. I think we can add support for this later in a backwards compatible way by, at some point in the future, exposing a C++ support library that defines C++ types for handling Rust panics. Whether this library uses a C++ std type, or its own, would be an implementation-detail. In the mean time, if a Rust function panics in C++ code containing try { } catch { } the behavior would probably still be undefined, since whether the exception will be caught or not depends on which C++ exception type the Rust panic would have, and that is something that the MVP would not define. We probably want to guarantee somehow that Rust exceptions can only be caught by using the appropriate Rust exception type from the Rust C++ support library.

For me personally, the biggest open question is how this would interact with a future extern "C++" Rust language feature. AFAICT from the comments in these and the other threads, people don't want to define C FFI functions that throw. What they want is to define a C++ function with C name mangling. In C++ these are just extern "C" functions, and in Rust these would be #[no_mangle] extern "C++" (#[unwind(c++-native)] is implied by extern "C++"). Combined with -C panic=c++-native, that would mean that Rust panics get shimmed as C++ panics in those functions, but the shimming is a nop because the implementation is the same. This is not incompatible with allowing extern "C" functions to panic, so we can do both things, but I'm not sure whether we want to.

Would Rust ever have a panic implementation that isn't zero overhead for function calls?
(of course unwinding can be slow, because that's supposed to be rare. Overhead of setting up catch_unwind in my case can be amortized, but would be pretty bad for libraries like Rayon that wrap everything in it.)

I'm worried that C++ or SJLJ interop would unnecessarily complicate the issue. These are less in Rust's control than its own unwinding implementation.

It's hard to imagine a situation where shimming between an "outer" unwinding ABI and an "inner" one couldn't be implemented as zero-overhead in the non-panicking case.

That is, of course, zero-overhead compared to the normal cost of calls using that ABI. In the SJLJ (setjmp-longjmp) ABI, for example, every single function that needs to do any cleanup on unwinding (e.g. having variables with destructors, or catching exceptions) must call a variant of setjmp at its start. A function that shimmed from a different outer ABI to an inner SJLJ ABI would need that call, since it would have to catch exceptions in order to rethrow them using the outer ABI. But so would probably the majority of functions written in Rust, since Rust tends to use RAII a lot. That doesn't count as overhead for these purposes.

In the panicking case, converting the panic from the inner form to the outer form might require some extra work. For example, raising a panic in a libunwind-based ABI involves allocating a new exception record on the heap. But the most commonly used unwinding mechanisms are already very slow, so I don't think this is worth worrying about.

Therefore, I don't think a global unwind-ABI toggle needs to be a prerequisite for an #[unwind(c++-native)] feature.

Of course, that doesn't mean such a toggle would be a bad feature to add for its own sake, if and when rustc actually develops new alternate unwinding ABIs. However, since changing the ABI requires recompiling all dependencies including the standard library, I think it would be best to treat it as part of the target specification.

It's hard to imagine a situation where shimming between an "outer" unwinding ABI and an "inner" one couldn't be implemented as zero-overhead in the non-panicking case.

Independently of how one defines zero-overhead, doing nothing has often less overhead than doing something. If the panics ABI differ, the shims must include catch_unwinds to catch a potential panic, and re-raise it using the other ABI. Even if that does not incur a run-time overhead in the non-panicking case when compared with not having those shims at all, one still pays a code-size overhead, a compile-time overhead, an overhead in the panicking case, and potentially also an overhead due to missed-optimizations caused by an increase in complexity (which could result in overhead in the non-panicking case due to missed-optimizations).

Therefore, I don't think a global unwind-ABI toggle needs to be a prerequisite for an #[unwind(c++-native)] feature.

That depends on the constraints that we have. Without a global unwind-ABI toggle, then this solution is a zero-overhead abstraction over the manual solution that @kornelski argued (here) had an unacceptable overhead. Therefore I don't understand @kornelski 's thumbs up on your idea, unless @kornelski is now of the opinion that the overhead is tolerable.

If the overhead is tolerable, then I agree with you @comex that we don't need a switch for the global panic ABI, but note that if that's the case, then the only valid argument against the current stable Rust solution to this problem is that it is not very ergonomic.

I think it would be best to treat it as part of the target specification.

+1

That depends on the constraints that we have. Without a global unwind-ABI toggle, then this solution is a zero-overhead abstraction over the manual solution that @kornelski argued (here) had an unacceptable overhead. Therefore I don't understand @kornelski 's thumbs up on your idea, unless @kornelski is now of the opinion that the overhead is tolerable.

If the overhead is tolerable, then I agree with you @comex that we don't need a switch for the global panic ABI, but note that if that's the case, then the only valid argument against the current stable Rust solution to this problem is that it is not very ergonomic.

I'm a bit confused. The comment you linked to criticizes a proposed solution involving wrapping each API call in a C function that calls setjmp, which is quite slow.

A different solution, which @kornelski also mentioned in this thread, is to wrap each API call in a C++ function that does try/catch (where the try/catch itself is typically zero-runtime-overhead in the success case – assuming you're not on a weird target that uses SJLJ exceptions!). I don't think he ever said the overhead of that solution was intolerable. For the record, it would typically have at least the overhead of an extra non-inlined function call per API call, since the Rust-to-C++-wrapper and C++-wrapper-to-underlying-library calls both cross library boundaries and thus normally can't be inlined. They can be inlined if you are using LTO; in particular, the currently unstable -Zcross-lang-lto would allow inlining the Rust-to-C++-wrapper call and thus replicating the performance characteristics of directly calling from Rust to C. But even if not inlined, the overhead of an extra function call is fairly small; I think the main issue with this solution is ergonomics.

I'm a bit confused. The comment you linked to criticizes a proposed solution involving wrapping each API call in a C function that calls setjmp, which is quite slow.

A different solution, which @kornelski also mentioned in this thread, is to wrap each API call in a C++ function that does try/catch

The comment I linked criticizes the approach being discussed, which requires the user to write the _appropriate_ shim for the compiled library they are interfacing. There is typically just a single strategy that makes sense per compiled library (all other alternatives are often just UB), and that's what the shims have to do. There is not really much choice there.

(where the try/catch itself is typically zero-runtime-overhead in the success case – assuming you're not on a weird target that uses SJLJ exceptions!)

MinGW on Windows uses - or can easily use, depending on the MinGW selected - SJLJ. Isn't that a Rust Tier-1 target?

But even if not inlined, the overhead of an extra function call is fairly small; I think the main issue with this solution is ergonomics.

I think the overhead of the shims is acceptable, but I don't think it is easy to reason about what the overhead actually is in practice (as mentioned in my previous comment, the shims add different types of overhead).

The comment I linked criticizes the approach being discussed, which requires the user to write the _appropriate_ shim for the compiled library they are interfacing. There is typically just a single strategy that makes sense per compiled library (all other alternatives are often just UB), and that's what the shims have to do. There is not really much choice there.

The comment you linked says:

@xfix's solution uses per-call setjmp, and my goal is to avoid per-call setjmp. @xfix's solution requires wrapping of all FFI calls, and I'd prefer not to do that.

This presumably refers to @xfix's only comment in this thread sent prior to that comment, which starts with:

About libjpeg, considering the library is written in C, it may make sense to write a C wrapper which passes panics through longjmp.

As for UB, the C++ spec makes longjmp UB only if there are objects with destructors in the stack frames that are skipped over, so there are two strategies that make sense for it: longjmp and exceptions. C compiled with -fexceptions on GNU compilers, or without /EHsc (non-default) under MSVC, behaves like C++.

By the way,

on Windows the C++ exception ABI is not documented at all and the recommendation is not to use them on ABI boundaries because their implementation might change on any toolchain upgrade.

Are you sure about this? I don't have much experience with Windows, but I did some Googling and I wasn't able to find a recommendation to this effect from Microsoft. I was able to find recommendations elsewhere citing:

  • ABI problems when throwing an exception in one DLL and catching it in another, if they have different C++ runtimes – okay, but doesn't affect where you're merely unwinding across another DLL
  • Incompatibility with MinGW – not an issue if you're not using MinGW.

MinGW on Windows uses - or can easily use, depending on the MinGW selected - SJLJ. Isn't that a Rust Tier-1 target?

The Rust MinGW target does not itself support SJLJ unwinding, so an #[unwind(c++-native)] feaure would not have that overhead, but if you build actual C++ wrappers using the SJLJ variant of MinGW, then yeah, they might be slow...

Are you sure about this? I don't have much experience with Windows, but I did some Googling and I wasn't able to find a recommendation to this effect from Microsoft.

This is a general C++ recommendation, not a Microsoft specific one. For an authoritative source, see "C++ Coding Standards" by Alexandrescu and Sutter (ISO WG21 chair, Microsoft C++ language extensions lead) covers this recommendation in "Item 62. Don’t allow exceptions to propagate across module boundaries." (this SO answer contains the "gist" of the advice: https://stackoverflow.com/a/55057868/1422197 ). Platform vendors all mention this in their documentation.

For example, GCC on Linux breaks the C++ABI every now and then, e.g., binaries compiled against libstdc++.so.5 and libstdc++.so.6 (current version) are incompatible with each other. This last breaking change was actually due to an ABI breaking change in the C++ exception implementation from SLJL to Itanium. On Linux, these kind of changes do happen, but Linux goes to great extent to make sure that users largely do not need to worry about them. The last such "breaking" change was the migration to the C++11 ABI, and Linux "mostly" avoided breaking the world (https://developers.redhat.com/blog/2015/02/05/gcc5-and-the-c11-abi/).

This is what MSVC2019 says about C++ABI compatibility:
https://docs.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=vs-2019 . Until MSVC2017 all MSVC releases were ABI incompatible with the previous version. In MSVC2017 Microsoft started giving same ABI version numbers to MSVC releases that were "almost" ABI compatible with each other, allowing _newer_ toolchains to use "C++" DLLs compiled with older ones. MSVC2015 and MSVC2017 are "almost" binary compatible with each other, including the exception ABI. MSVC2019 is incompatible with both, so if you want to link a DLL that throws C++ exceptions, the behavior is undefined. However, if that DLL exposes a C API, and at most uses SEH, then everything works fine.

The difference between Linux and Windows is that Linux breaks the ABI once / twice per decade, and Microsoft does so... at least once every two years.

so there are two strategies that make sense for it: longjmp and exceptions.

This is only the case if libjpeg was compiled with exception support and the FFI code it calls will always be compiled with the same exception ABI. If that's the case, then indeed both options hold and exceptions will be faster. If this does not hold, the only option is longjmp, which also happens to work independently of the exception ABI. To know which approach one should use in a wrapper, one needs to consider this. longjmp is more portable, but has a higher run-time cost in the non-failure path.

The Rust MinGW target does not itself support SJLJ unwinding,

Are you sure? Unwinding through Rust FFI is forbidden, so whether a C++ library with a C API is compiled with mingw-sjlj or mingw-seh cannot be observed from Rust. Therefore I believe Rust pc-windows-gnu targets should be able to link and use SJLJ libraries without issues, unless there is some linker issues that I'm not aware of.

This is what MSVC2019 says about C++ABI compatibility:

The question is not whether the ABI is fully compatible, but whether library A can throw an exception which unwinds across library B before being caught by another function from library A. In practice I'd expect it to work on Windows across multiple versions of MSVC, since the all use SEH and SEH unwinding itself is handled by kernel32.dll.

Are you sure? Unwinding through Rust FFI is forbidden, so whether a C++ library with a C API is compiled with mingw-sjlj or mingw-seh cannot be observed from Rust. Therefore I believe Rust pc-windows-gnu targets should be able to link and use SJLJ libraries without issues, unless there is some linker issues that I'm not aware of.

Indeed, and that is why I said:

if you build actual C++ wrappers using the SJLJ variant of MinGW, then yeah, they might be slow...

The question is not whether the ABI is fully compatible, but whether library A can throw an exception which unwinds across library B before being caught by another function from library A. In practice I'd expect it to work on Windows across multiple versions of MSVC, since the all use SEH and SEH unwinding itself is handled by kernel32.dll

Given:

// library B - MSVC2015
void foo(void(fn)(void)) {
    try {
       fn();
    } catch(std::string const& e) {
        std::cout << "foo" << std::endl;
        std::terminate();
    }
}

// library A - MSVC2019
void bar() { throw std::string{"bar"}; }
void baz() { 
    try {
        foo(bar); 
    } catch (std::string const& e) {
        std::cout << e << std::endl;
    }
}

where libraries A and B are compiled with two MSVC toolchains that are C++ABI incompatible, but use the same underlying unwinding implementation, what does calling baz print?

"C++ Exceptions" != "Unwinding implementation". The layout of C++ types can change across toolchains (e.g. due to a layout optimization or an internal change in the implementation of std::string), the implementation of RTTI, vtables, virtual dispatch can all change, and one can throw an user defined type that has virtual functions, uses inheritance, etc. and one can catch derived exceptions by slicing them as any of their parent base classes.

So I don't think it makes sense to talk about "C++ exceptions" without talking about the C++ABI as a whole. If instead of adding a #[unwind(c++-native)] or similar, we add a #[unwind(seh)] which raises a panic as a SEH (which is not the same as a C++ exception), then as long as that SEH does not collide with any other C++ exceptions then everything "should be fine" (even though this is still probably a bad idea). If the SEH raised collides with a C++ exception, and some C++ code catches it under the assumption that it is some other C++ type, then BOOM.

We have the same problem on Linux. A C++ exception type of libc++ does not need to have the same layout as the same exception type of libstdc++, so if we are re-raising panics as C++ exceptions on Linux, we'd need to be careful about the incompatibilities in ABI as well even though the underlying exception handling implementation is the same (itanium for both).

EDIT: as long as the FFI code does not try to catch anything between the Rust code that raise the panic, and the Rust code that catches the panic, everything should be "ok".

EDIT: as long as the FFI code does not try to catch anything between the Rust code that raise the panic, and the Rust code that catches the panic, everything should be "ok".

That's the case that we (are trying to) primarily care about here. Obviously "full" control would be great in the long term, but short term we for a FFI that promises to both a) be unwind safe and b) not catch unwinding, allow unwinding through it. I personally can't think of _any_ unwinding implementation that should have any reason to break there (obviously a Return sugaring wouldn't work but I don't consider that true unwinding).

This guarantee requires control over the compiling of the library or at the very least the compiler to uphold those promises with the correct compile flags.

The #[unwind(allowed)] attribute is not working for me. I'm attempting to unwind a c++ exception through an ffi barrier into rust and then across another ffi barrier back into c++, but I'm getting weird illegal instruction crashes, but only on nightly. Stable works just fine.

Quoting https://github.com/rust-lang/rust/issues/58794#issuecomment-479339046:

Process question: Should we revert this on nightly too so we don't have to keep reverting this until the final decision has been made?

What's the @rust-lang/lang team feeling on this? Should we revert on master and backport for 1.35 beta? Or leave the abort behavior on master / nightlies as a warning shot, but keep reverting on each beta until the decision is final?

@lachlansneff

I'm getting weird illegal instruction crashes, but only on nightly.

That's probably the ud2 trap instruction, which is the raw mechanism to abort on x86. I guess you need the attribute in more places? Or maybe it's just not sufficient yet...

@lachlansneff To avoid confusion, that's a very different case than the one I've been talking about. Unwinding over Rust is much more complicated, and currently probably won't work due to conflict between unwinding mechanisms (and where there's no conflict, it'll leak all the memory). This doesn't apply to the case of unwinding over C that doesn't have any own unwinding mechanism and doesn't have destructors.

So with the minimal change I'm advocating for, your case still won't work.

@cuviper @kornelski I actually got it to work. @cuviper was right, I needed the attribute in more places.

There are no destructors that need to be called in my case, so I'm calling it good enough.

@gnzlbg @CAD97

So, for terminology's sake: if something is defined by the implementation, but is not described by the standard as implementation-defined, it's not Implementation-defined. So what is it?

intro.def defines those terms, worth a read.

Per the C++ definitions:

  • Anything that the standard doesn't impose restrictions on is "undefined", even if particular implementations (or every existing implementation!) defines and documents it. (Side note: I'm not convinced this specific use of the term "undefined" is helpful, so I would not advocate that Rust adopt it.)
  • Anything the standard requires implementations to define but not necessarily to document is "unspecified".
  • "Implementation defined" behavior is required by the standard to be documented.

Note that it has been a while since this was previously discussed. Is development or discussion towards a solution taking place somewhere?

@kornelski?

I'm not sure what's the current status and whether the Rust team has made any decisions about this.

As for me the status is:

  • I still use Rust → C → Rust unwinding via panic!() → C → catch_unwind().
  • I have an option of making an intrusive rewrite to use C++ → C → C++ instead via throw → C → try/catch. On Windows (only) C++ throw can be replaced with SEH.
  • It's still unclear to me whether C++ → C → C++ unwinding is UB or not.
  • I don't have any use for C++ → Rust or Rust → C++ unwinding that was discussed at length here.

@kornelski

I've asked a question on SO about whether such unwinding is UB in the limited case where the C code is compiled with -fexceptions. I am further assuming that the C++ and C code will always be compiled with the same LLVM version.

@kornelski You may have seen this already, but on the Internals thread, @gnzlbg mentions another alternative, using C wrappers instead of C++ wrappers. This seems to be unambiguously well-defined.

https://internals.rust-lang.org/t/unwinding-through-ffi-after-rust-1-33/9521/67?u=batmanaod

On Tue, Apr 23, 2019 at 06:18:00PM +0000, Kornel wrote:

I'm not sure what's the current status and team decisions are about this.

As for me the status is:

  • I still use Rust → C → Rust unwinding via panic!() → C → catch_unwind().
  • I have an option of making an intrusive rewrite to use C++ → C → C++ instead via throw → C → try/catch.

The ideal goal, here, would be if you could help specify a pure-Rust
alternative that doesn't require modifying the underlying library. I
don't know what form that would take; I've seen suggestions for an
unwind-related attribute, for instance. The goal would be to get
something specified that will actually work for your use case, so that
we have an alternative.

@BatmanAoD the problem with the C wrappers approach like @joshtriplett mentions is that it is a pain.

@kornelski

An MVP could just be an #[unwind(Rust)] attribute that allows Rust panics to propagate. @kornelski would need it on the extern "C" fn callbacks and the extern "C" { ... } declarations. That would just allow Rust panics to propagate, removing the nounwind assumption only on those, leaving the nounwind attribute for most FFI which does not need to unwind enabled by default. If C code called from Rust unwinds with something that isn't a Rust panic, the behavior is undefined.

This would allow Rust->Rust interfacing via extern "C" FFI, e.g., for DLLs using a "C-like" ABI, and it fixes the soundness bug of nounwind functions unwinding, so it would be a feature that might be considered to stand on its own.

The Rust panic implementation remains unspecified, and whether the language at the other side of FFI can unwind Rust panics or not isn't really our problem, e.g., Rust->Rust FFI with two incompatible Rust toolchains might not work, and if the Rust implementation changes @kornelski use case might break.

This might mean that @kornelski might need a better solution in the future (#[unwind(c++)] attribute, global target options for the Rust panic implementation, etc.), and they should not wait for things to break to start working on pre-RFCs for those, but with #[unwind(Rust)] things would continue to work and end up a bit better than they are now.

In my case I'm not worried about having to deal with multiple different Rust toolchains, because the same code sets up the panicking callback and catches it. So #[unwind(Rust)] would work well. #[unwind(c++)] is not useful in my case.

[unwind(c++)] is not useful in my case.

What would be required in your case for this to always work independently of how Rust panics are implemented? (what exception ABI does your C code use?)

Oh, I assumed it doesn't matter what C "uses", because it doesn't have its own unwinding, so it's entirely passive here.

For C it would not matter, but IIUC you are not using C, but "C with -fexceptions".

I'm building the C part in my sys crate, so I'm able to add/remove compile flags as necessary.

The -fexceptions is documented as "this implies GCC generates frame unwind information for all functions", so it seems fine?

There are many different incompatible frame unwind information formats, and GCC supports many of them (dwarf2-eh, sjlj, seh, etc.). #[unwind(Rust)] tells Rust to use the Rust format, and as long as you compile the C code to match Rust's format, then sure that's fine.

My point is that we can change Rust's format any time, and if that's changed, then you'd need to recompile your C code to use that new format, potentially having to submit a GCC patch for it first.

If you want to avoid doing that, you might better off writing #[unwind(FOO)] where FOO is the format that Rust panics need to be converted from/to. As long as FOO is the same as Rust, this should be free, but if the Rust implementation changes, then you'd start paying a conversion cost, but at least your software would still work correctly instead of silently having undefined behavior (we can't diagnose this from Rust side). Chances are that "C + -fexceptions" does by default whatever C++ does natively in the target, so #[unwind(c++)] could be used for that.

@gnzlbg @kornelski

Unwinding is not possible without the right metadata, so I believe that no matter what Rust does, Kornel's use case in a non-Windows, non-SEH environment is only well-behaved when -fexceptions is used.

There are many different incompatible frame unwind information formats, and GCC supports many of them (dwarf2-eh, sjlj, seh, etc.).

This is a subtle point, and one I didn't realize, since it is not mentioned in the -fexceptions description. The -fexceptions documentation would seem to indicate (for both GCC and LLVM) that the same mechanism is used as would be used for C++, whatever that happens to be for the given target/options/etc. (That is, I think your guess that it "does by default whatever C++ does natively" is in fact the documented behavior; indeed, that appears to be the entire point of the feature, since it doesn't actually add any means of throwing exceptions to the C language.)

So it would seem that for this to be well-behaved, Rust must use the same mechanism as C++. On Windows with MSVC, this would presumably be SEH. Otherwise, it would be whatever Clang/LLVM would use.

Since we only have 8 weeks left till abort is enabled again, I think that just starting with #[unwind(Rust)] would be enough.

This would mean that @kornelski's code using #[unwind(Rust)] would be brittle, but less brittle than what it is today. Adding more #[unwind(...)] attributes that makes @kornelski use case more robust can be done later, maybe as part of the WG-FFI roadmap for this year, but at least we would be able to work on that without the time pressure.

On Thu, Apr 25, 2019 at 07:50:42AM +0000, gnzlbg wrote:

Since we only have 8 weeks left till abort is enabled again, I think that just starting with #[unwind(Rust)] would be enough.

This would mean that @kornelski's code using #[unwind(Rust)] would be brittle, but less brittle than what it is today. Adding more #[unwind(...)] attributes that makes @kornelski use case more robust can be done later, maybe as part of the WG-FFI roadmap for this year, but at least we would be able to work on that without the time pressure.

This sounds reasonable to me.

Would someone be willing to write up an RFC for #[unwind(Rust)]?

@gnzlbg Hmm. Since the Rust unwind implementation is not specified, isn't adding an annotation that explicitly exposes a potentially non-forward-compatible API fairly dangerous? I know it'll be unsafe, and that it will only break if the panic implementation changes to not use the platform-default stack unwind, but it still seems less helpful than something like unwind(exception) (to match the -fexceptions/SEH terminology) or unwind(c++) (which make it easier to guess the meaning of the annotation for people who aren't already familiar with it).

Since the Rust unwind implementation is not specified, isn't adding an annotation that explicitly exposes a potentially non-forward-compatible API fairly dangerous?

Doing Rust->Rust via FFI is a reasonable thing to do (e.g., DLLs, hot reloading in games, ...), Rust does not have a stable ABI, and for all we know that might never happen and extern "C" + #[unwind(Rust)] could be the only ABI Rust will ever get (as long as all toolchains match, etc.).

Rust programs are already doing that, this code would break if we abort in C FFI, and this is the problem that #[unwind(Rust)] solves. It does not intend to solve the Rust->X->Rust problem, it just so happens that, by accident, it would allow those doing Rust->X->Rust to continue to "work" as long as their assumptions about implementation details still hold.

My priority is for the soundness bug to get fixed without regressing on Rust->C FFI and #[unwind(Rust)] allows that. I believe Rust->X->Rust is a problem worth solving, but I also believe that the problem won't be solved in the reasonable future because no stakeholders are working on it.

Would someone (@kornelski, @centril, @comex) be willing to write up an RFC for an unwind(Rust) attribute, to give us a path forward for unwinding past FFI functions? I'd be happy to review and sponsor such an RFC.

@joshtriplett I'll take at a stab at it. Not having written an RFC before, is there anything I should read first other than the template? Also, should I just leave the "RFC PR" and "Rust issue" out of my drafts for now?

Edit: here's what I have so far: https://github.com/rust-lang/rfcs/blob/annotate-unwind-rust/text/XXX-annotate-unwind-rust.md

@joshtriplett @gnzlbg @kornelski

I've finished a draft of the RFC. Please take a look! https://github.com/rust-lang/rfcs/blob/annotate-unwind-rust/text/XXX-annotate-unwind-rust.md

I assume that I submit this just by changing "XXX" and "???" to have the next number in the RFC sequence, and then opening a PR?

On May 10, 2019 6:27:49 PM EDT, BatmanAoD notifications@github.com wrote:

@joshtriplett @gnzlbg @kornelski

I've finished a draft of the RFC. Please take a look!

I assume that I submit this just by changing "XXX" and "???" to have
the next number in the RFC sequence, and then opening a PR?

You can leave it templated, and the person who merges it will number it.

So what's the timeline for this?

Iff the ~12 weeks started April 04, we have ~5 weeks left. There is an RFC, but iff FCP will take ~2-3 weeks, then this RFC will need to go into FCP by the end of next week. The timeline will still be tight if the solution needs to be implemented, minimally tested by the stakeholders on nightly, and go into FCP for stabilization (which will also take ~2 weeks).

Once we have a proposal in the works and accepted, I think it'd be
reasonable to tie the re-enabling of abort-on-unwind-through-FFI to the
availability and use of this feature, so that they become stable at the
same time.

What's the status of this FCP ? The 12 week grace period expired a couple of weeks ago with little to no progress. The only documented concern is a lack of a documented replacement. I've documented two replacements here: https://github.com/gnzlbg/cffi-panic . These contain building and working examples of how to properly wrap horribly designed C APIs such that they can be used in Rust->C->Rust FFI without invoking undefined behavior, both using unwinding/exceptions and setjmp/longjmp.

If there are any other cases that should be documented, please let me know. It shouldn't be too hard to do so, but AFAICT this should resolve the concern. Is anything else blocking this?

The 12-week pseudo-deadline was intended to ensure forward progress, learning from the last time this came up. It had that desired effect, and generated an RFC, RFC 2699, which is currently in progress. (@Mark-Simulacrum, could you please add a link to that RFC in the top post of this issue?) Thus, the status is currently "let's make sure that RFC continues to make forward progress".

@rfcbot concern blocked-on-rfc-2699

Once that RFC goes through and gets implemented, we can document it as a direct replacement for the existing functionality, and then schedule an appropriate transition period after which we go back to aborting on unwind through FFI (when the appropriate attribute isn't present).

I appreciate your attempt to document how to rewrite the previous solutions that use Rust code to use C or C++ wrappers instead. However, as stated multiple times through many different threads of discussion, that is not a reasonable replacement for what previously worked with no additional C/C++ helpers, and it's incredibly onerous if you have numerous callbacks. The point of the concern here is that we don't yet have a replacement for the existing functionality yet, and the repository you linked to doesn't change that.

Any response along the lines of "but what people are doing today isn't a solution because undefined behavior" should see this post from @RalfJung. In particular, I'd like to quote this highlight:

I hope I did not give the impression that when a soundness issue occurs, it will be closed and shut at all cost and disregard valid concerns. I am well aware that there are trade-offs here, and while I personally feel very strongly about soundness, I sympasize with developers that just need to “get something done”.

I feel very strongly about soundness as well, and I hope that I have not given the impression that I don't care about the underlying issue. Quite the contrary; I do want this fixed, which is why I'm working to make sure RFC 2699 makes forward progress.

RFC 2699 will have a much easier time making forward progress if we can agree on the goal of being able to continue interfacing with these types of C APIs from pure Rust, and we're just trying to figure out the best method of enabling that.

This soundness hole was reported years ago. The lang team knows that it introduces undefined behavior on all extern functions in the presence of panics, and that this causes us to misoptimize user programs. Delaying the fix for years due to a single (or a couple) of crates which knowingly decided to exploit undefined behavior is a horrible attitude toward all other Rust users. Continuing to silently emit nounwind is just malice at this point (*). Needless to say, I'm truly disappointed.

(*) EDIT: if the plan is to delay a proper fix for an indeterminate period of time we should, at the very least, stop emitting nounwind. This would impact all Rust users using extern functions - that is, _all_ Rust users, which are calling functions like memcpy every time Rust moves a variable - but at least we wouldn't be miss-optimizing their programs.


Any response along the lines of "but what people are doing today isn't a solution because undefined behavior" should see this post from @RalfJung.

The use case @RalfJung is discussing in that thread has no Rust alternative. The users exploiting undefined behavior here have a safe documented Rust alternative, yet they are explicitly deciding to invoke undefined behavior instead. So no, these are not the same situation at all.

The 12-week pseudo-deadline was intended to ensure forward progress, learning from the last time this came up.

What we said was:

We'd like to set a reasonable deadline for a stable release making undeclared unwinds through FFI abort, something in the ~12 week range.


However, as stated multiple times through many different threads of discussion, that is not a reasonable replacement for what previously worked with no additional C/C++ helpers, and it's incredibly onerous if you have numerous callbacks. The point of the concern here is that we don't yet _have_ a replacement for the existing functionality yet, and the repository you linked to doesn't change that.

I disagree with this description. @gnzlbg's repository is both well documented and it does not seem "incredibly onerous" at all and if you have numerous callbacks it seems to me that things could be automated through code generation (e.g. a bindgen-like approach).

Any response along the lines of "but what people are doing today isn't a solution because undefined behavior" should see this post from @RalfJung. In particular, I'd like to quote this highlight:

I hope I did not give the impression that when a soundness issue occurs, it will be closed and shut at all cost and disregard valid concerns. I am well aware that there are trade-offs here, and while I personally feel very strongly about soundness, I sympasize with developers that just need to “get something done”.

I don't think "at all costs" applies here. @gnzlbg has kindly stepped up to provide a solution which should work in the short term while we can take our time to consider an in-language option.

RFC 2699 will have a much easier time making forward progress if we can agree on the _goal_ of being able to continue interfacing with these types of C APIs from pure Rust, and we're just trying to figure out the best method of enabling that.

It depends... I'm not in favor of the goal at any cost if the solution is too invasive or carries too much complexity with it when we are talking about interfacing with legacy code.

Another consideration is that even in the case of mozjpeg-sys, the library that motivated the decision to hold off (for the second time) on stabilizing the abort behavior, the crate still exhibits problematic behavior (possibly due to nounwind):

I’m currently using mozjpeg-sys as well as a C shim to the libjpeg API, but even with the error handler in C the setjmp/longjmp behavior leads to horrible-to-debug problems.

Aborting would be strictly better here, even without a guarantee that Rust will some day support a way to safely cause another language's stack frames to be unwound.

Additionally, @kornelski, the author of that crate, wrote:

For the libjpeg case, if the binary wants to abort, that's perfectly fine. libjpeg's default error handler even calls abort() itself. libjpeg only cares that the error handler function doesn't return normally, not whether it does it via abort or unwind.

If I understand this correctly, it means that mozjpeg-sys would be useful and usable even if it aborted on errors, although that would not be ideal.

If I understand this correctly, it means that mozjpeg-sys would be useful and usable even if it aborted on errors, although that would not be ideal.

I think that it means that as long as the unwind vs abort behavior is controllable by the binary, then it's acceptable. For some binaries aborting on a JPEG error is okay, but for others it is not acceptable.

@Centril

if you have numerous callbacks it seems to me that things could be automated through code generation (e.g. a bindgen-like approach).

Those tools don't exist today, and I don't think that's going to change in the immediate future. So, this can't be automated today.

I'm not in favor of the goal at any cost

I never said "at any cost". We're currently working to produce a minimal solution that should have zero impact outside this use case.

The simplest solution seems like an attribute to turn off the "nounwind" LLVM attribute on an extern function, to address undefined behavior. There's a lot of discussion on how to better specify panic behavior and exactly what kind of unwinding we do, but in the interim, I wonder whether we could quickly address the underlying undefined behavior and if that would make people happier about taking a little more time to solve the "panic runtime" aspects of the RFC?

Moderation note: Let's please try to keep our cool folks. It's understandable that some may feel disappointed here---which is a totally valid thing---but let's keep commentary more productively focused instead of accusing others of malice.

I wanted to apologize. I was called out by using the unconstructive word malice here. In that sentence, by us, I meant rustc, the tool, as in, it would be "evil" for rustc to continue emitting nounwind if we aren't going to land a fix for this soon.

@joshtriplett I agree with you about just trying to remove nounwind. There is precedence about doing this, e.g., when LLVM was miscompiling noalias, we stopped emitting it, for a long time.

In retrospect, we have had a mis-optimization that impacts all Rust users and a PR to fix it for over a year. During this whole time, we always had the chance to just stop emitting nounwind and protect all Rust users. Instead, we didn't do that. First, because the fix was landing "real soon", so why waste time on a workaround. Second, because we didn't wanted to land something that would justify to continue to exploit the undefined behavior. For some reason, for some of us, banning the exploit of UB here was more important than protecting all Rust users. Both were mistakes, and I hope that we can learn from them and, in the future, try to land the quickest fix possible that protects all users first, and only start worrying about proper solutions once that is done. You never know when a simple fix is going to take 1 day or 1 year to land.

@gnzlbg Thank you for taking the time to explain in more detail, and thank you for taking the time to understand the problem and past stresses involved.

It sounds like we're now in agreement on the right ways forward here?

Let's introduce the simplest possible mechanism to just remove nounwind from a function. I don't know if that should be a simplified version of @BatmanAoD's RFC 2699, or if we should just introduce a new RFC, but either way we should get this in place ASAP.

Once we have that in place, we also need to decide how long after that we start aborting again on unwind of an extern function call tagged with nounwind. At this point, with the amount of attention this has received, I wouldn't have an objection to just doing so at the same time, and providing some very clear release notes explaining the issue.

@rust-lang/lang and @gnzlbg: Do you believe that the easiest path forward is a new RFC or some updates to RFC 2699? I'm starting to think that a new RFC might be better, and then we can leave RFC 2699 for the problem of standardizing panic/unwind ABIs.

@joshtriplett Note that the #[unwind(allowed)] annotation already exists; an RFC would just need to stabilize it.

Having recommended in my RFC that the attribute itself be dependent on the compiler's ability to verify that the panic implementation has been explicitly selected (rather than using the default, which may be a moving target as rustc continues to evolve), I'm not sure I'd be in favor of actually stabilizing the attribute as-is. But it would be sufficient to permit mozjpeg-sys et al. continue to work, I think.

@joshtriplett I don't want to create a divide where there is none, but it seems to me that @gnzlbg is advocating to just stop emitting nounwind to LLVM altogether while we wait for a solution. I don't see them expressing an opinion on stabilizing (or removing) the unwind(allowed) attribute.

@BatmanAoD I would like to see your RFC continue, to develop a more precise specification of panic behavior. But I'd also like to see #[unwind(allowed)] stabilized more-or-less as-is, as soon as reasonably possible.

@jethrogb If we can't trivially get #[unwind(allowed)] stabilized, then I'm all for fixing the problem that way, but if we can get #[unwind(allowed)] stabilized quickly then let's do that.

@joshtriplett Here is the tracking issue for stabilizing #[unwind(allowed)]: https://github.com/rust-lang/rust/issues/58760

For those watching, @joshtriplett just proposed a simple #[unwind] specification.

@rfcbot resolved blocked-on-rfc-2699
@rfcbot concern blocked-on-rfc-2753

Summarizing much further discussion: RFC 2699 is a much more general "how do we handle cross-language panics/unwinds/etc", while RFC 2753 is a much simpler proposal to just provide an #[unwind] attribute to disable nounwind and fix the UB.

@rfcbot fcp cancel

So a lot has happened since this FCP was created. In RFC https://github.com/rust-lang/rfcs/pull/2797, we created the ffi-unwind project group, for one thing. One of the first things we've been doing is debating this exact topic in some depth. Therefore, I'm going to cancel this FCP and close this issue -- we expect to try and resolve this issue very soon with a firmer decision about the direction we're going, and we'll re-evaluate in light of that.

@nikomatsakis proposal cancelled.

If you'd like to participate in the group, btw, most of the discussion has been taking place in Zulip in the #wg-ffi-unwind stream.

Was this page helpful?
0 / 5 - 0 ratings