The recover
function is able, on a single thread, to recover from a panic. There are numerous controversies around this API, including the bounds it applies to its closure, its safety, and its location within std
. An open RFC seeks to settle these issues and open the door to stabilization.
cc @alexcrichton
Iām playing with making Python bindings for a Rust library, and I got some segfaults when a Rust panic tried to unwind into the Python interpreterās call stack. I understand that using catch_panic
at the FFI boundary is the right thing to do, but what should I do with an Err
result? I can establish conventions so that the other side triggers a Python exception, but Iād like to have some more helpful information in it than "Something bad happened somewhere". At least (a string representation of) the message that was given to panic!
, ideally also a back trace of the unwinded stack frames.
catch_panic
returns Result<T, Box<Any + Send + 'static>>
, but Any
is not very helpful here. It implements Debug
, but that impl only prints "Any"
(literally), which isnāt that helpful.
@wycats, how do you deal with panics in Rust-inside-Ruby?
@SimonSapin Looking at the guts of unwinding, Box<Any + Send + 'static>
is always going to either be String
(if panic!()
had formatting arguments) or &'static str
(if panic!()
was invoked with just a string literal), so you basically downcast it to one, and if that fails you can probably assume it's the other.
I agree, though, the type could be a lot more informative here.
Yeah, I ended up finding this code: https://github.com/rust-lang/rust/blob/d2047bc97d/src/libstd/panicking.rs#L33-L39 , but itās not very discoverable. And though String
and &'static str
are the most common cases, panic!()
with a single argument can take any type:
SimonSapin: panic!(4)
playbot: thread '<main>' panicked at 'Box<Any>', <anon>:9
playbot: playpen: application terminated with error code 101
So I wanted to propose changing from this (in std::thread
):
type Result<T> = Result<T, Box<Any + Send + 'static>>;
to this:
type Result<T> = Result<T, PanicValue>;
struct PanicValue(Box<Any + Send + 'static>);
impl PanicValue {
fn as_str(&self) -> Option<&str> {
match self.0.downcast_ref::<&'static str>() {
Some(s) => Some(*s),
None => match self.0.downcast_ref::<String>() {
Some(s) => Some(&s[..]),
None => None,
}
}
}
}
⦠but std::thread::Result
is already stable :(
Would this as_str
belong as a default method on the Any
trait?
Why not have impl Debug for Any (+ Send)
test for the value being String
or &'static str
before printing "Any"
? It would add negligible overhead, and I don't think anyone should be relying on the exact value it prints. Right now, it's pretty much useless anyways.
In fact, why not cover all the primitives (except obviously &[T]
, *const T
, *mut T
, etc)?
Why are we making it easier to use and abuse panics, instead of removing inappropriate ones from the standard library?
@alexchandel Even if you donāt choose to use panics, itās very hard to be 100% sure that a non-trivial program will not panic ever. (Similar to being 100% sure that it doesnāt have any bug.) And since unwinding through an FFI boundary can be undefined behavior, something like catch_panic
is necessary at least to protect against that.
Why are we making it easier to use and abuse panics,
There is a whole "Motivations" section of the related RFC which describes this https://github.com/alexcrichton/rfcs/blob/stabilize-catch-panic/text/0000-stabilize-catch-panic.md#motivation
About panic::recover
: need of Mutexes is limiting usage of this new feature very much. Or you can share resources between threads (without making whole program single-threaded by Mutex), or you can use panic::recover.
@e-oz Why would you need a Mutex
? You can use any cell type if you know the interior data won't be corrupted by a caught panic. The AssertRecoverSafe
wrapper was designed for exactly that.
@cybergeek94 after reading this: https://github.com/michaelwoerister/rust/commit/f899ea9d1d98734ff3122c4129de3d99deb00d52#diff-f0a475ebd45e8b84e91e006a89b8c321R79
@e-oz That merely explains the difference between RefCell
and Mutex
when talking about recover-safety. RefCell
wasn't designed with poisoning because it's usually not possible to access it after a panic, and it'd be a breaking change to add it now. Those same docs mention AssertRecoverSafe
in a later paragraph for use when you either don't care about recover-safety or can guarantee it yourself.
@cybergeek marker trait will just disallow to use without Mutexes.
@e-oz You can wrap your type in AssertRecoverSafe
or manually implement RecoverSafe
and RefRecoverSafe
for it if it uses some RefCell
internally but won't have logic problems if it sees a panic and continues being used.
You can also put an &mut
reference in an AssertRecoverSafe
if you want to mutate your type within the closure, as mutable references aren't generally recover-safe.
These new marker traits aren't enforced like Copy
is, where you can't manually implement it unless your contained types are Copy
, nor are they unsafe like Send
or Sync
. Bypassing them won't (or shouldn't) create the risk of undefined-behavior; they're just to help prevent users from unknowingly introducing really frustrating bugs in their code.
@cybergeek94
These new marker traits aren't enforced like Copy is, where you can't manually implement it unless your contained types are Copy
After reading docs (in comments) I think it's indeed enforced:
a struct is recover safe if all of its components are recover safe
https://github.com/michaelwoerister/rust/commit/f899ea9d1d98734ff3122c4129de3d99deb00d52#diff-f0a475ebd45e8b84e91e006a89b8c321R70
And I really hope I just don't understand something and you are right and we will be able to use panic::recover
AND shared resources (with multithreading).
@e-oz https://github.com/michaelwoerister/rust/commit/f899ea9d1d98734ff3122c4129de3d99deb00d52#diff-f0a475ebd45e8b84e91e006a89b8c321R113
@sfackler thanks, will try.
I've added a nightly
feature to rust-openssl that uses panic::{recover,propagate}
: https://github.com/sfackler/rust-openssl/commit/fd6454f625bb7efde3772ae106180cc6ac7c02dd.
~75% of the code and ~90% of the implementation time went into "dealing with" RecoverSafe
. It really does not seem like a good trade off.
@sfackler hm interesting, although that mostly seems like it's to straddle the nightly/stable boundary? It doesn't look like there's that much code related to the RecoverSafe
bound itself if that's all excluded
Without RecoverSafe
, the recover
portion of the change would just be
match state.stream.read(buf) {
Ok(len) => len as c_int,
Err(err) => {
if retriable_error(&err) {
BIO_set_retry_write(bio);
}
state.error = Some(err);
-1
}
}
to
match recover(|| state.stream.read(buf)) {
Ok(Ok(len)) => len as c_int,
Ok(Err(err)) => {
if retriable_error(&err) {
BIO_set_retry_write(bio);
}
state.error = Some(err);
-1
}
Err(err) => {
state.panic = Some(err);
-1
}
}
That is a pretty nice, ergonomic delta for something that you have to do as a developer wrapping C in a robust library.
With RecoverSafe
, my workflow was then
RecoverSafe
let result = { let state = AssertRecoverSafe(state); recover(|| state.stream.read(buf)) }
&mut [u8]
isn't panic safe. Add in let buf = AssertRecoverSafe(buf);
and change state.stream.read(buf)
to state.stream.read(&mut *buf)
.&mut AssertRecoverSafe
isn't panic safe, so slap on a move
and the thing finally works.Given that this kind of thing is, I believe, the primary use case for recover
, this seems like a rather significant obstacle to actually using it. If it's too annoying to use, people simply _won't_ and users of their library will be in a bad place if something panics in the wrong place.
Hm yes that's an interesting sequence of events. The step of adding move
seems like it should be unnecessary, not sure if we can get away with it though.
Otherwise I'm somewhat ok with the level of friction vs what's happening here. It seems reasonable for you to explicitly need to assert the state is recover safe, and somewhat tangentially the buffer is as well (e.g. the data inside may be corrupt).
This is an interesting use case though which I don't think has been explored much before of Rust -> C -> Rust (e.g. Rust passes callbacks to C which then call into Rust). There's very little "need for exception safety" in this situation because you're just going to propagate the exceptions around the C boundaries. It appears in this case RecoverSafe
may just get in the way because it's mostly geared towards _catching_ exceptions...
So thinking about this more, I see 3 main use cases for recover
:
In the first case, as you've noted, recover safety doesn't matter since we're going to be restarting the panic anyway.
In the second case, recover safety arguably doesn't matter since theoretically bogus internal state after a panic is no worse than the undefined behavior of letting the panic hit C. You could alternatively abort the entire process, but that seems like "bad library manners" and you don't need recover
for that in any case.
In the case of a thread pool, the jobs are going to commonly be one-offs so recover safety won't factor into it since you can move the job into the recover closure.
This has been brought up before, but keep in mind that in all of these cases, thread locals are a potential source of recover safety issues that aren't covered by RecoverSafe
, as is code running in destructors during the unwinding inside of recover
.
What's the use case in which RecoverSafe
will help someone discover that what they were trying to do has issues? It feels to some extent like the "making unsafe more safe" proposals that pop up every once in a while that tend to boil down to throwing boilerplate at people working with unsafe code.
I somewhat disagree that exception safety is not an issue in the second or third cases you mention above, you can certainly envision situations where you want to make sure that corrupt state isn't witnessed by accident. For example with C -> Rust, the C code may call back into Rust to finalize an object (or finish some request) after an operation which panicked, and if the point of catching the panic didn't indicate that the mutable state could be witnessed that could in theory be bad.
Also, there are indeed vectors to subvert RecoverSafe
, but that's why the function isn't unsafe
. The intention was for it to be a "speed bump", not an ironclad guarantee.
Finally, along the lines of being a speed bump, it was definitely not the intention that RecoverSafe
adds _that_ much overhead. I was thought that you'd simply throw a wrapper around any variables you cared about and be done with it, but if it's more involved than that it may be the case that this is too heavily geared towards the "just throwing boilerplate at you" situation indeed.
It's also worth bearing in mind that the same design may be perceived as tedious boilerplate by someone who's fully cognizant of the involved issues and knows what they are doing, and as a helpful and important guard-rail by someone who isn't yet. (Emphasis on _may_ - that's a general observation which may or may not apply to this specific case.)
~75% of the code and ~90% of the implementation time went into "dealing with" RecoverSafe. It really does not seem like a good trade off.
I tried to use recover
for the first time over the weekend and the types I needed to capture (ThreadPool) was not RecoverSafe. I gave up and wrote a FIXME instead.
I see 3 main use cases for recover
My use case was just that I wanted to report that the thread panicked. I wanted to do this with channels and not JoinHandle because I have more flexibility with channels (I needed something like select for JoinHandle). I didn't want to construct a Drop
implementation just to do it, but some kind of defer
function would also have worked (though I wouldn't have been able to capture the panic reason).
Another case for panic is protecting event loops - imagine a webserver (like I was working on) responding to events, some of which are flaky. In this case you probably will be passing around some state associated with the event loop that could get corrupted.
The intention was for it to be a "speed bump", not an ironclad guarantee.
That we have a notion of static recover safety to ensure logical consistency, but don't guarantee it, could be very misleading.
@brson Could you expand on that thought? What (negative) scenario do you envision?
What (negative) scenario do you envision?
I don't envision anything in particular beyond what was already mentioned about TLS not being recover-safe. If a coder jumps through hoops to make their code recover-safe, they expect it to be true, but there's a loophole.
:bell: This issue is now entering a cycle-long final comment period :bell:
The libs team discussed this in the triage meeting this past Wednesday, and we had quite a few things to say about this function! We're sympathetic to experiences like those @sfackler and others are having where RecoverSafe
is a bit _too_ much of a road block, so there are a few directions we can go in from here. The libs team was primarily considering one of three options:
The FCP for this API is neither for stabilization or deprecation this cycle, but we would rather like to mainly emphasize soliciting feedback about this API. It's quite important for the FFI use case of Rust and it's something that we need an answer for!
Along those lines, I'm going to cc a bunch of people who either participated in the RFC discussion or appear to be using this on crates.io. If you'd like to unsubscribe from the discussion though you may certainly feel free to! Also note this is by no means intended to be an exhaustive list, so everyone should feel free to jump in and discuss!
cc @rust-lang/libs, @pythonesque, @tomaka, @kentonv, @glaebhoerl, @bluss, @jamwt, @mitsuhiko, @hansihe, @dpc, @sfackler
For the record, I'm still in favor of making both TLS and the Drop
trait unsafe and removing AssertRecoverSafe
, just because of this issue.
Exception-safety is one of the reasons why many critical apps & libraries choose C over C++. It's not something to take lightly.
I don't like the current compromise, but it's still better than ignoring it entirely.
Naively, I'm wondering if a possible compromise might be to drop the bounds on recover
and enforce recover-safety with a lint which warns, by default, for types captured in the passed closure which might not be recover-safe, such ones containing RefCell
or &mut
references. The lint could then be silenced by marking user types (or the expression containing recover
) with #[recover_safe]
(and maybe #[ref_recover_safe]
and #[mut_recover_safe]
), or crate-wide with #![allow(recover_unsafe)]
.
That way we get the intended "speedbump" effect without sacrificing ergonomics, or making any promises we can't (or don't intend to) keep.
As for TLS, what if we destroy it by default at the end of unwinding, so it gets reinitialized when next accessed, and add a way to define TLS which can survive this process when needed?
@tomaka
making [ā¦] the Drop trait unsafe
The point of unsafe
is that some invariants that are needed for safety can not be verified by the compiler, so users assert that theyāve verified it themselves. What invariants would a Drop
impl need to maintain to be safe?
āDonāt use Drop
everā is not tenable.
@SimonSapin
It would be safe to implement Drop
if the RecoverSafe
trait is implemented on the type, but I'm not sure how doable this is on the language level.
Alternatively, Drop
could remain safe but not be called during the unwinding.
Since the leakapocalypse, I have changed the way I see the Drop
trait. You can't use it for anything important, because it could just be ignored. Therefore I don't think it's that extreme to not implement Drop or to not call it.
Unwinding would be basically useless if it didn't run destructors. If you catch it then you're almost certainly leaking OS resources. If you don't catch it then you're just aborting anyway.
@tomaka What I can see, you should link it with ownership. "The owner is responsible for freeing a resource". And only the owner can suppress drop of a value.
The libs team was primarily considering one of three options:
- Consider the bounds stable as-is today, and continue moving forward with them.
- Consider the bounds stable for now, but concede that we may eventually remove the bounds after stabilizing the function itself if the bounds are too onerous.
- Remove the bounds for now, and consider stabilizing a bare signature.
I'm in favor of 2. Bounds can be loosened without breaking BC, and even a too-restrictive bound is still a step forward from no stable solution at all.
Remember also that part of the impetus here was for the existence of a bound to send a message that we don't intend this function to be used as an alternative mechanism for recoverable error handling (though to that end I still wish @alexcrichton had chosen a name other than recover
... (still better than catch
, anyway)*). Having the bound be temporary would still suffice to send that message, while also giving the community time to develop idioms for its proper use that will prevent abuse later (and I suppose also giving time for the try!
sugar to arrive to keep people happy with ADT-based error handling).
* Last ditch hail-mary bikeshed for renaming it to guard_from_unwinding
!
@tomaka
I think that splitting Drop
to a safe Drop
trait that is not called by unwinding and an unsafe Destroy
trait that _is_ called by unwinding would be a nice compromise, but this fight is for another day.
Until then, you can compile with -Z no-landing-pads
and stay out of the panic game C-style (I figure that Servo will eventually want to run in that mode with process-per-tab).
My sentiments are similar to @tomaka's, for what it's worth. (Although "don't run destructors at all" seems like throwing the baby out with the bathwater.) I really wish we could find a way (and a will) to actually, definitively _solve_ these panic-safety issues somehow, not just put a band-aid on them and keep moving forward. But failing that, a band-aid is better than nothing.
@sfackler Isn't the only reason this was even a problem in your code that you did the transmute and let buf = slice::from_raw_parts_mut(buf as *mut _, len as usize);
outside the recover? Or am I missing something? I feel like the entirety of that code being inside the recover would make this just work. So this doesn't seem like a good argument against the bounds to me.
The prose docs say &Mutex<T>
should be recover-safe, but that's not actually true: the implementation is only impl<T: ?Sized> RecoverSafe for Mutex<T>
and there's no impl RefRecoverSafe
. Not sure if the docs or the implementation is incorrect here.
Mutex
is recover-safe thanks to its poisoning mechanism.
If you try to use a mutex that was in use when a panic occurred, you will get a poison error. This prevents you from viewing the actual content, which may be in an invalid state.
@tomaka sure, Mutex<T>
is. I'm asking about &Mutex<T>
.
Oh sorry I thought you were saying that &Mutex
does implement RecoverSafe, but shouldn't in theory.
diff --git a/src/libstd/panic.rs b/src/libstd/panic.rs
index 83df54f..6c943bf 100644
--- a/src/libstd/panic.rs
+++ b/src/libstd/panic.rs
@@ -186,6 +186,8 @@ impl<T: RefRecoverSafe + ?Sized> RecoverSafe for Arc<T> {}
impl RefRecoverSafe for .. {}
impl<T: ?Sized> !RefRecoverSafe for UnsafeCell<T> {}
impl<T> RefRecoverSafe for AssertRecoverSafe<T> {}
+impl<T: ?Sized> RefRecoverSafe for Mutex<T> {}
+impl<T: ?Sized> RefRecoverSafe for RwLock<T> {}
impl<T> AssertRecoverSafe<T> {
/// Creates a new `AssertRecoverSafe` wrapper around the provided type.
diff --git a/src/test/run-pass/panic-safe.rs b/src/test/run-pass/panic-safe.rs
index 9949b79..585ef3f 100644
--- a/src/test/run-pass/panic-safe.rs
+++ b/src/test/run-pass/panic-safe.rs
@@ -36,6 +36,8 @@ fn main() {
assert::<Box<i32>>();
assert::<Mutex<i32>>();
assert::<RwLock<i32>>();
+ assert::<&Mutex<i32>>();
+ assert::<&RwLock<i32>>();
assert::<Rc<i32>>();
assert::<Arc<i32>>();
Well, this fixes it, at least.
AssertRecoverSafe
should have a into_inner()
method to consume its contents. This would allow it to wrap FnOnce
objects without needing to use an Option
.
Example:
fn run<F: FnOnce()>(f: F) {
let mut wrapper = AssertRecoverSafe::new(Some(f));
panic::recover(move || (*wrapper).take().unwrap()());
}
The libs team discussed this during triage yesterday, and we reached a few conclusions!
An interesting point from the discussion since the first FCP was that it seems that almost no one is happy with the situation we have today. It seems to clearly be a set of unfortunate compromises between various strategies, and this certainly appears to need remedy. Some of the various alternatives that have been proposed along the way are:
thread::spawn
, aka Send + 'static
. This is considered unacceptable, however, due to the heavy usage of this function in FFI where nothing is Send
.Along those lines, the only real clear choice for the next thing to pursue was to remove the bounds altogether from the recover
function. This has been discussed many times, of course, with the obvious drawback that it is easy to conflate panic::recover
with a valid error handling strategy and therefore increase the risk of panic safety bugs induced by recover
to exist.
As the next line of reasoning to pursue we decided that we may want to rename the function away from recover
. By adding some artificial "syntactic salt" we may be able to retain a certain level of automatic avoidance of the API, but ergonomically that's the only hindrance to using the method (no need to deal with AssertRecoverSafe
). The name recover
is also quite "sweet" in the sense that it's easy to reach for and considered a first class construct for managing errors.
So, tl;dr; the libs team has decided to pursue the avenue of removing the bounds on this function today and attempt to rename it to a more "unergonomic" name.
In my personal opinion, I honestly feel that @bstrie's suggestion of guard_from_unwinding
isn't actually the worst thing in the world. We independently want to pursue a mode of compiling where panics are translated to aborts, in which case panic::recover
wouldn't actually recover anyway, so all it's really doing is guarding from unwinding the stack past the frame with that function call.
Also nominating for another FCP next cycle.
Another thing we discussed in the meeting is that it's actually possible to implement the RecoverSafe
system externally to the standard library, and use your own wrapping of recover
that retains those bounds. So those that _do_ find the bounds useful for discovering potential panic safety problems can still reap the benefits in their own crates.
Why is the "no bounds" solution more acceptable than the compromise? "no one is happy with it" is a bit vague.
Another thing we discussed in the meeting is that it's actually possible to implement the RecoverSafe system externally to the standard library, and use your own wrapping of recover that retains those bounds.
As usual, nobody is going to use something that's not in the stdlib. The stdlib is supposed to guide the way in terms of safety. It would be more logical to keep the bounds in the stdlib and remove them in the external library for those who find them too restrictive.
Why is the "no bounds" solution more acceptable than the compromise?
My own personal opinion is that stabilizing a design no one is happy with is just a bad decision. If no one's happy with it unstable then no one's gonna be happy with it when it's stable, and it's a signal that it's not done being designed.
To me, when I say "no one is happy with it", I mean that no one has argued in favor of the compromise bounds over one of the other solutions (at least to my knowledge). We would very much welcome some discussion that argued in favor of this though!
As usual, nobody is going to use something that's not in the stdlib.
I think this is an exaggerated statement, for example there's lots of users of mio. We also weren't thinking that _idiomatic_ usage of this function would be through an external crate with bounds. In the libs meeting we mainly concluded that those who felt they wanted the extra safety would reach for such a crate.
I don't know for others, but I very much prefer the current bounds to no bound at all.
An alternative solution could be to add two functions:
recover
, with the current bounds.weak_recover
, with no bounds.I think no bounds at all is the best solution. It'll be the best for FFI, and while it's a dangerous function if used without care, I believe that "syntactic salt" is the wrong way to stop people from using it, especially when there's a _really_ good reason to; i.e., prevention of undefined behavior.
I'm not against recover
being unsafe
without bounds since it's meant for FFI already. I think that would be enough of a deterrent for casual use and make it easy to spot in code review as well.
Making the name long as a deterrent won't prevent it being used, it'll just make it tedious to use. It'll only make it feel unnecessarily clunky to naive users, who will then complain that Rust deliberately makes things tedious.
"recover" can be useful not only for ffi, but also for daemons (web
server), and memory safety is very important in this case.
On Fri, 4 Mar 2016 at 07:39, Austin Bonander [email protected]
wrote:
I'm not against recover being unsafe since it's meant for FFI already. I
think that would be enough of a deterrent for casual use and make it easy
to spot in code review.ā
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/27719#issuecomment-192135836.
In any case, it's already commonly recognized as a "speed-bump" that signals you should only use an API if you know what you're doing.
One example is String::as_mut_bytes()
which is unsafe
yet does not have much to do with memory-safety AFAICT, just sanity in maintaining properly formed UTF-8.
@cybergeek94 I disagree. Words like "commonly recognized" are more suitable for world of Ruby, with all of their implicit magic. In Rust things should be explicit.
Exactly. The message that unsafe
sends is clear: if you don't do your homework--if you don't cross your T's and dot your I's and make sure all invariants are upheld--bad things _will_ happen.
Right, and I've tried to say that unsafe behavior is not desirable for daemons, where recover
could be very very useful.
@cybergeek94: unsafe
is about memory-safety and type-safety (which are interlinked). Type-safety includes upholding a type's invariants, so String::as_mut_bytes()
is unsafe
because we allow unsafe code to rely on strings being properly formed UTF-8.
It's important that there's a clear boundary between what's possible in safe Rust and what needs unsafe Rust. When that boundary is unclear, unsafe code may start relying on safe Rust code to uphold some property that it doesn't actually uphold. Classic example: the leakpocalypse. std::mem::forget()
was marked unsafe
, so unsafe code (the scoped threads implementation and the Vec::drain
method) started relying on safe code's inability to leave a scope without calling the destructor of an object. Except that safe code could do this anyway by leaking objects -> leakpocalypse. One of its outcomes was RFC 1066, which made mem::forget()
safe.
recover
is in a very similar situation: recover
allows observing broken invariants that were temporarily broken and then not properly restored due to a panic. But you don't need recover
for that, you can also do it with Drop
impls. Making recover
unsafe would lead to a false sense of security. If that then leads to unsafe code incorrectly assuming that it doesn't need to care about panic safety, it might even cause security vulnerabilities. Better to keep recover
safe without any bounds, so that it's obvious to authors of unsafe code that their objects might still be accessed after they panicked.
It's unfortunate that safe code has this ability (and thus unsafe code is forced to protect against it). I'd prefer a world where Drop
is an unsafe trait and recover
an unsafe function, but we can't do that without breaking every Drop
impl out there.
So @wycats reminded me of something over IRC. The original intention was that RecoverSafe<F>
would implement FnOnce()
iff F: FnOnce()
. This then implies that one can "assert recover safety" by wrapping the closure and not by wrapping the upvars. In other words, recover(|| foo)
in a world where recover had no bounds would be equivalent to recover(AssertRecoverSafe(|| foo))
in a world with bounds. This is then used a lot more like a targeted version of unsafe
-- basically a lexical scoping of the code that you are assering is recover safe.
@sfackler, in the specific examples that you presented, this means that e.g. where you wrote the following:
let result = {
let mut youre_not_my_supervisor = AssertRecoverSafe::new(&mut *state);
recover(move || youre_not_my_supervisor.stream.write(buf))
};
You might instead have written:
let result = recover(AssertRecoverSafe(|| {
state.stream.write(buf)
}));
In fact, since you are wrapping recover anyway, you could simply have put AssertRecoverSafe
into the wrapper function, and added a comment that this is only to be used when the panics are translated into FFI errors. Perhaps calling it something like recover_ffi
.
I think this goes a very long way towards addressing the ergonomic objections, at least. I'll try to write up something more general about the impact of these bounds in a bit (I also want to read more of the thread; I'm not fully caught up).
@cybergeek94
One example is String::as_mut_bytes() which is unsafe yet does not have much to do with memory-safety AFAICT, just sanity in maintaining properly formed UTF-8.
This is a common misconception. In fact, there is code for str
and String
that assumes the input is valid utf-8; if you violate that, you can cause memory safety violations. That said, I don't think anybody argues that unsafe
should be strictly intended for memory safety. Rather, the argument is that unsafe
should be used to protect strong invariants. I think @aturon articulated it best in the original thread, so I'll just quote him:
As @nikomatsakis said, we made a deliberate decision as a community to tie uses of unsafe to iron-clad guarantees only. I think it's vital to do so, because it makes crystal clear what you can rely on, and what you must guarantee, when you're writing unsafe Rust code...Note that the set of iron-clad guarantees is not closed: you can use encapsulation to boostrap new guarantees around a particular API -- say, that some invariant holds -- and can provide unsafe APIs that potentially violate those guarantees. Then you can say that "any client written in safe Rust may assume that such-and-such". But the key point is "any client written in safe Rust" is following precisely the language-wide safety guarantees, no more, no less, so it's important that we be extremely clear on what those are.
I was also thinking about the name. I do think that "recover" is perhaps not the right emphasis. When using recover
, you aren't necessarily expecting to _recover_ from a panic -- typically you are going to propagate that panic, whether that be by further unwinding (perhaps translating the panic to a return code, or translating the panic across threads) or by poisoning out data that may be affected. So perhaps we could change the name from recover
to manually_propagate
or something like that -- something to give the impression that this is not a way to make a panic anything other than "game over", but rather just a way to customize how that is done.
Anyway, regarding the AssertRecoverSafe
bound, I will say the following. I think people tend to view AssertRecoverSafe
as a burden, but I don't see if that way. I see it as an opportunity -- basically the compiler is reminded me that I should add some comments to explain that, since I am taking manual responsibility for propagating panics, I should describe how I plan to do that, and justify to myself that it is reasonable.
I have found that these kinds of manual "audit points" are incredibly valuable. For example, if I were to go read @sfackler's code now, in the absence of any AssertRecoverSafe
bound, there is no obvious single place to write out how panic propagation is supposed to work. This is not only bad for readers of the crate, it's bad _for the current authors_ -- because if you don't write the comment, you won't think carefully about whether your system is sound. To continue with the example of the "rust-openssl" crate: how do we know that returning -1
will indeed ensure for us that the stream
, which is now poisoned, will no longer be used? If we don't write a comment, we might never even ask the question.
I know that @wycats' strong opinions on the topic of recover are due to direct experience of this kind. That is, they were not even aware of "exception safety" as a thing, they just knew that they did not want to crash. So they manually implemented some form of recover, but did not think carefully about poisoning. The result was a lot of assertion failures clogging people's logs and causing them general discomfort. So they instituted a more careful policy, and now those problems are basically solved. (@wycats, correct me if this summary is wrong.)
To make all this work though we have to control the AssertRecoverSafe
experience carefully. In particular, we ought to have a decent error message, and we ought to have actionable docs. They should suggest ergonomic solutions (such as wrapping the closure) and when those methods are appropriate (e.g., if you are "translating" the unwind into an unwind through some other mechanism) and when they are risky (e.g., if you don't really control what will happen afterwards). Probably @wycats has more concrete suggestions here.
On the topic of unsafe, it's true that exception safety is a kind of "soft guarantee" in Rust. Given that drop
is safe, and that the mutex APIs and so forth are stable, this ship has basically sailed. But that doesn't make it unimportant. I still think library crates should be careful to propagate errors carefully and ensure that they use poisoning or other schemes to try and discard "damaged goods".
Altering AssertRecoverSafe
to work on the closure itself does take care of my ergonomic concerns!
When discussing a rename, it's worth noting that the panic::propagate
function also exists to pair with panic::recover
, so renaming recover
to manually_propagate
might not fit exactly.
I am concerned about AssertRecoverSafe
becoming an increasingly complex web of impls and !impls over time.
The original intent of AssertRecoverSafe
was something like the following:
recover
through abstractions like thread::spawn
. Those abstractions handle recover safety through the Send + 'static
bound.thread::scoped
API handles recover safety through re-propagation.By "poisoning" I mean remembering not to use the value anymore because using it again will likely result in another panic.
In my opinion, all of these are totally valid strategies to use when writing a raw recover
function.
The idea was:
AssertRecoverSafe(closure)
and šš.AssertRecoverSafe(closure)
, communicate the panic over the FFI boundary (we use the return position for this in Skylight) and null out your C cells.AssertRecoverSafe
on specific upvars, and then null out just the C cells containing the values.The idea was that you could use the compiler warning to drive your strategy for becoming exception safe, not just as a click-through EULA.
In my experience with Skylight, we started off with a ĀÆ_(ć)_/ĀÆ approach, which resulted in continuing to use objects that had broken invariants after the panic occurred. In practice, this resulted in users seeing a flood of scary-looking logs that could have been avoided if we had simply poisoned the values. We have gotten so many reports of this from people looking in their logs and wondering whether something was critically wrong that it has become a somewhat serious priority for us.
(we don't want to stop emitting errors in the logs, since learning that a panic occurred is pretty critical to fixing our own bugs; since we don't control the environments our library is deployed to, logs are all we have)
We are transitioning to "poison all the values", but that is, in our case, quite over-eager as there are a large number of FFI functions that take values that cannot become "broken" (because they're &Instrumenter
, let's say). We would prefer to restrict the poisoning to the cases where there is a real reason to stop using a value.
The idea behind AssertRecoverSafe
was that you could go from ĀÆ_(ć)_/ĀÆ to "poison everything" and then eventually "poison exactly what I need to" in a gradual fashion. As Niko points out, we didn't actually implement AssertRecoverSafe(closure)
, so the entire plan didn't work and just felt like a pointless EULA.
We also didn't document the operational details ("when I see this error, here are my options") so it was further unclear what someone was supposed to do.
My general assumption here is that we will eventually have a number of off-the-shelf libraries that provide isolation boundaries, so people use something like skylight_style_ffi(closure)
(which would come with an associated C library) rather than recover(closure)
. In my preferred long-term endgame, almost nobody ever needs to write recover
in applications, because the reasonable strategies are encapsulated and available as crates.
Of course, we need to get there first.
@nikomatsakis I've recently started thinking that we need to stop using the term "memory safe" in regards to the unsafe
keyword. What we really mean is "defined behavior". If something is safe, it cannot cause undefined behavior in the presence of no unsafe code. Marking something unsafe
means "With only this thing, you could cause undefined behavior in safe code". As an example of something which invokes undefined behavior, but which has nothing to do with memory safety; matching on an uninitialized value. The following is undefined:
fn main() {
let x: Option<u8> = unsafe { std::mem::uninitialized() };
match x {
Some(v) => println!("{}", v),
None => println!("None"),
}
}
(and, for example, when I tested it gives an opposite answer to the following)
fn main() {
let x: Option<u8> = unsafe { std::mem::uninitialized() };
match x {
None => println!("None"),
Some(v) => println!("{}", v),
}
}
but doesn't have anything to do with memory safety.
no, memory safety is not only defined behavior, but also data safety.
On Sat, 5 Mar 2016 at 00:15, Nicole Mazzuca [email protected]
wrote:
@nikomatsakis https://github.com/nikomatsakis I've recently started
thinking that we need to stop using the term "memory safe". What we really
mean is "defined behavior". If something is safe, it cannot cause undefined
behavior in the presence of no unsafe code. Marking something unsafe
means "With only this thing, you could cause undefined behavior in safe
code". As an example of something which could invoke undefined behavior,
but which has nothing to do with memory safety; matching on an
uninitialized value. The following is undefined:fn main() {
let x: Option= unsafe { std::mem::uninitialized() };
match x {
Some(v) => println!("{}", v),
None => println!("None"),
}
}(and, for example, in practice will give an opposite answer to the
following)fn main() {
let x: Option= unsafe { std::mem::uninitialized() };
match x {
None => println!("None"),
Some(v) => println!("{}", v),
}
}ā
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/27719#issuecomment-192515676.
@e-oz If your definition of "data safety" includes prevention of memory leaks, safe Rust does not guarantee that.
@mzabaluev it does. And data safety is not about memory leaks, but about... data safety. Race conditions, etc.
@ubsan
After we let LLVM go over this code, it can turn UB to memory unsafety.
@arielb1 Using memory unsafety in this context doesn't make much sense to me. Your program isn't defined. We're allowed to do anything including making demons fly out of your nose; why should we be worried about memory safety in that context?
I've added the FnOnce
impl for AssertRecoverSafe
in https://github.com/rust-lang/rust/pull/32102 so we can help the ergonomic concerns for now, but I've got a few more thoughts on some of the discussion recently as well.
I think that the point brought up by @nikomatsakis about "if we don't write a comment, we might never even ask the question" is certainly important. To me this acts like unsafe
where it's a clear indicator that you need to think about what's going on. This I think is similar to @wycats's concern about how this shouldn't be "just as a click-through EULA".
To some extent, however, although this is similar to unsafe
it's also different. Unsafe code has an iron-clad guarantee that it requires an unsafe
block to call, but code that must be recover-safe does not have such a guarantee about requiring AssertRecoverSafe
(or an actually safe implementation). Additionally, both already suffer from the "I just want this to compile problem". We've already seen plenty of unsafe
code that, well, was unsafe. There's no doubt that if we add this then in any world there will be non-recover-safe code regardless of what we do.
In that sense what we're trying to do is to minimize the set of non-recover-safe code via this wrapper. So called "diligent programmers" will see the error message, read some documentation, and then go think appropriately about what they need to do. I would think, however, that this set of programmers isn't really any different than those who are already reading documentation carefully and are considering the repercussions of using the recover function.
Overall, my personal opinion is that the bounds here aren't worth it still. The ergonomic problems from the current incarnation are bad enough that I felt we needed to push back on them entirely. I think that the suggestion of FnOnce for AssertRecoverSafe
is enough to fix that concern, however. With that I'm basically where I was before which is ok with the current state of affairs. It's relatively easy enough to get around the bounds when necessary, and perhaps this'll indeed be a forcing function for someone to read the documentation and rethink usage.
In terms of naming and in light of https://github.com/rust-lang/rfcs/pull/1513 I'd be ok with a names like catch_unwind
, prevent_unwind
, or the ever illustrious guard_from_unwinding
. To me it sounds like putting "unwind" in the name is a good idea to be clear that this isn't catching or recover from all panics, just unwinding panics.
My humble +1 to @alexcrichton words.
:bell: This issue is now entering its cycle-long final comment period for stabilization in 1.9 :bell:
With the recent addition of FnOnce
to AssertRecoverSafe
there's consensus among the libs team that the ergonomics here are good enough that we're willing to stabilize this. It may still be the case that we can remove these bounds one day, but that's an API-wise backwards-compatible change that can be made (and isn't currently planned).
This cycle we'd like to stabilize both the bounds _and_ the function at hand. There wasn't complete consensus about the name "recover", but it was pointed out that with the bounds remaining we likely don't need the syntactic salt of a long function name. Overall we didn't quite reach a conclusion for a recommendation for the name, but we figured that FCP could duke it out!
A bit more detail from the lengthy discussion we had:
FnOnce
impl, the API here is very much like a kind of custom unsafe
marker -- you always have the option of simply wrapping the whole closure with AssertRecoverSafe
. Thus, we have a clear but ergonomic audit marker, with a very clear separation from unsafe
(and the hard guarantees connected with that).propagate
), however, _is_ intrinsically tied to unwinding. Therefore, we'd like unwinding
to play some role in the naming, either directly in the function name, or perhaps by introducing an unwind
submodule in std::panic
to clarify the relationship.On Fri, Mar 11, 2016 at 10:16:55AM -0800, Alex Crichton wrote:
This cycle we'd like to stabilize both the bounds _and_ the function at hand. There wasn't complete consensus about the name "recover", but it was pointed out that with the bounds remaining we likely don't need the syntactic salt of a long function name. Overall we didn't quite reach a conclusion for a recommendation for the name, but we figured that FCP could duke it out!
I feel like there are two things arguably wrong with the name 'recover':
I wonder if halt_unwinding
would be a better name? It certainly
tells you just what it does.
Would AssertRecoverSafe
then also be called AssertHaltUnwindingSafe
?
I think catch_unwind
and AssertUnwindSafe
would make the most sense. The fundamental operation here is halting unwinding and every other language calls it "catch". I don't think we should invent a new term for such a well known operation.
My concern with the current bounds is still less with anything about the API itself than with the fact that this changes the rather strange nature of rustc's OIBIT handling (#27554) from something that was probably rarely encountered in practice to something exposed in a stable API in the standard library. This is going to make changing the OIBIT semantics without breaking the world really hard, and I don't think enough thought has been put into the current form to make locking the language into it desirable.
Naming
I would refrain from using "unwind": It's not currently exposed in the stdlib, and also, it could mean "to relax", which is the opposite of panicking, really :-) And if "catch" is undesirable (I don't remember exactly why) then maybe std::panic::capture
could be an option? And std::panic::release
to re-throw the panic. E g, consider a herd of panicking animals, which you capture in a cage, move them safely over some boundary, and then release them so they panic wildly again. RecoverSafe then becomes either std::panic::CaptureSafe
or possibly just std::panic::Safe
.
PanicInfo
The panic handlers added the PanicInfo struct, and recover/propagate has been some kind of parallel process. It would be nice if this API could be a little more consistent, so that recover would actually return a panicinfo somehow, and propagate allowed to take one as a parameter. I'm not exactly sure if this means redesigning PanicInfo, so I'm just going to leave that as a thought for now.
Landing pads for extern fn
I was wondering if we need an abort-on-panic landing pad on extern fns. Even with this interface stablized, the code is going to end up like:
extern fn callback_from_c( /* ... */ ) -> /* ... */ {
let r = std::panic::capture(|| /* some complicated handling */ );
if r.is_err() { /* convert to some sort of error */ }
}
...it's going to be hard to guarantee to 100% that the "convert to some sort of error" part is never going to panic. Since we don't want UB, maybe we need to insert an abort-on-panic landing pad as part of all extern fn?
On Sat, Mar 12, 2016 at 12:13:56PM -0800, wthrowe wrote:
My concern with the current bounds is still less with anything about the API itself than with the fact that this changes the rather strange nature of rustc's OIBIT handling (#27554) from something that was probably rarely encountered in practice to something exposed in a stable API in the standard library.
I think the most prominent use of auto traits (aka OIBIT) is Send
and Sync
, not this relatively obscure function.
This is going to make changing the OIBIT semantics without breaking the world really hard, and I don't think enough thought has been put into the current form to make locking the language into it desirable.
I am not sure what changes you are describing (there are none in the
works that I am aware of, just clarifications -- most of which in fact
behave the same as the current implementation), but as I said, I would
expect the real problem for compatibility with any change is going to
be Send
and Sync
!
I'm happy to see the progress here, and just as happy that we've managed to smooth over the ergonomics without throwing out the bounds entirely.
Unsurprisingly, I'm +1 to a more descriptive name than "recover". This topic comes up _so much_ in the wild that we're going to be perpetually fielding responses to questions like "why does Rust now have exceptions?" and having self-explanatory naming would make that so much less tedious.
To reiterate:
catch_unwind
).I like @nikomatsakis 's halt_unwind
and @alexcrichton 's prevent_unwind
out of the options presented so far, and @Amanieu 's AssertUnwindSafe
.
@pnkfelix semi-jokingly proposes recoil
:)
what about "intercept"?
Looks like the discussion is still open for a name so I'll throw out a few ideas. This thread's a little long and these are somewhat generic so it's a little too tedious this late at night to Ctrl-F to see if any of them have already been mentioned, so bear with me.
recover
->
guard
(like guard_from_unwinding
but not intentionally unwieldy)barrier
(panic::barrier
is pretty descriptive, methinks)contain
(like containing a panicked crowd or herd of animals)sequester
(if you're feeling pretentious)propagate
->
release
(I know this one's already been mentioned but it parallels well with contain
)resume
(heavily implies continuation of a previously caught panic, which I think is generally desired)I kind of agree with the abort landing pads in extern fn
, though it should be trivial to disable them so the people who do their homework and use recover
(or what-have-you) properly don't have pay for them. I'm thinking a function attribute, like #[recover_safe]
so that it's right next to documentation explaining how panics are handled properly. Or maybe, like integer overflow traps, they should only be emitted in debug builds. It could also be a lint that warns whenever possibly fallible operations are undertaken, namely calling into non-extern
functions and methods, but only once per extern fn
.
So far the front-runner in terms of naming to me is catch_unwind
as it invokes both the idea that something is being caught but _only_ unwinds as it won't catch panics-as-aborts. The verb "catch" seems interchangeable to me, so something like guard_unwind
also seems fine.
For the propagation function, I'd expect something similar like continue_unwind
or resume_unwind
which clearly indicates that it's only intended for unwinding and isn't useful in a panics-as-aborts world.
I like catch_unwind
and resume_unwind
unwind personally.
It is worth noting, though, that resume_unwind
can still be _called_ in an panic-as-aborts context, it'll just abort without running the panic hook. I don't really think that's too big of a deal though.
When considering names, the module path is relevant because functions are typically called via it. Things like panic::catch_unwind
seem awkward to me: it's three verbs in a row.
Returning https://github.com/rust-lang/rust/issues/27719#issuecomment-195494040, particularly its suggestion of introducing a submodule to std::panic
, unwind::catch_unwind
is bad, but maybe unwind::catch
is OK? (Modulo the negative connotations of the catch
word, of course, and the possibility that catch
will become a keyword.)
Like @sfackler, I prefer catch_unwind
and resume_unwind
, scoped directly in the panic
module, rather than introducing a new submodule.
I think I'm happy as long as unwind
is in the name.
I thought the barrier
suggestion from @cybergeek94 was good. Especially if we put it into an unwind
submodule, yielding unwind::barrier
To me barrier
carries the notion that the unwinding will stop here, but there's the hint that you should perhaps trying to "restart it", to keep it going across the boundary introduced by the barrier
, if possible, or at least signal that it occurred.
Update: unwind::guard
seems okay too, but for some reason I prefer unwind::barrier
; not sure why.
My vote against "barrier", sounds too non-related to execution flow.
On Wed, 6 Apr 2016 at 22:32, Felix S Klock II [email protected]
wrote:
I thought the barrier suggestion from @cybergeek94
https://github.com/cybergeek94 was good. Especially if we put it into
an unwind submodule, yielding unwind::barrierTo me barrier carries the notion that the unwinding will stop here, but
there's the hint that you should perhaps trying to "restart it", to keep it
going across the boundary introduced by the barrier, if possible, or at
least signal that it occurred.ā
You are receiving this because you were mentioned.Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/27719#issuecomment-206524672
In system programming, barrier
usually means "memory barrier", which has nothing to do with unwinding.
The libs team discussed this during triage yesterday and the decision was to stabilize with the following names:
recover
-> catch_unwind
propagate
-> resume_unwind
AssertRecoverSafe
-> AssertUnwindSafe
Thanks again for all the discussion everyone!
yay :)
Most helpful comment
The libs team discussed this during triage yesterday and the decision was to stabilize with the following names:
recover
->catch_unwind
propagate
->resume_unwind
AssertRecoverSafe
->AssertUnwindSafe
Thanks again for all the discussion everyone!