Rust: std::os::unix::process::CommandExt::before_exec should be unsafe

Created on 6 Feb 2017 · 74Comments · Source: rust-lang/rust

The before_exec method should be marked unsafe because after a fork from a multi-threaded process, POSIX allows only allows async-signal-safe functions to be called. Otherwise, the result is undefined behavior.

The only way to enforce this in Rust is to mark the function as unsafe.

So to be clear, this should not compile:

use std::process::Command;
use std::os::unix::process::CommandExt;

fn main() {
    let status = Command::new("sh")
        .arg("-c")
             .arg("echo world")
        .before_exec(|| { println!("hello"); Ok(()) })
        .status().unwrap();
    println!("status: {}", status);
}

C-bug I-unsound 💥 P-medium T-lang T-libs

Source

fweimer

👍4 😄3

Most helpful comment

@alexcrichton I'm afraid I disagree. Strongly. I think these cases are miles apart. I am also somewhat disappointed that we are now playing these definitional games where I come up with counterexamples and you just define them away. :/

The reasons I see a huge difference between /proc/self/mem and this case are as follows:

First of all, /proc/self/mem is a somewhat special platform-specific API. It works on Linux only. Admittedly other platforms likely have something similar, but this is still is strong contrast to my example which relies entirely on standard POSIX interfaces.
Secondly, defending against /proc/self/mem would essentially entail making File::open unsafe, which is a very commonly used API and the vast majority of ways to use it are perfectly safe. That is hardly realistic. (If there was a good way to detect if the file we open is such a strange all-of-memory file, and we could make File::open reject opening such files, I'd propose that immediately.) In contrast, defending against this bug would just mean making before_exec unsafe, which is perfectly possible I would argue (and which nobody has argued against).
Most importantly though, I think we should consider the trade-off here. We can make either before_exec safe, and declare everything unsafe that would be broken by it, or vice versa. There is no "not taking a decision", if we change nothing we decide for the former. (After all, this is something we can realistically fix, unlike /proc/self/mem.) So on the one hand, we have a single rather obscure Unix-specific method doing something that is generally considered very dangerous when programming against POSIX (namely executing arbitrary code between fork and exec), and on the other hand, we have every single abstraction relying on having unique control over a file descriptor. Whether this be some network socket that wants to make sure a certain signal is only sent to another host once, a serialization/deserialization library used across a pipe because there might or might not be a process boundary while also maintaining uniqueness of some type instances, pretty much anything that uses Rust's ownership to enforce a discipline on external data could be broken by this.

I usually go around telling people that what makes Rust great is not just that it tracks ownership of memory for you, but that you can use the ownership facilities to track ownership of other, user-defined resources that Rust has no idea about, providing safe interfaces where that would not be possible with a less expressive type system and avoiding correctness errors that would otherwise be hard to detect statically. One of my favorite examples is a software I once started to write (unfortunately never got finished) that interacts with some external sensors, actuators and LEDs and where I used ownership to make sure that every LED has exactly one party in charge, so that we do not get strange flickering because two threads decide they will use the same LED to signal their status. Isn't it great how many places there are where ownership helps us provide better guarantees, or any guarantees at all?

It seems like in the future, every time I say that I will have to add "oh and you better make sure that your program does not call before_exec, because that can be used to break all of these guarantees". before_exec is essentially "fork in safe code", and fork duplicates the entire process state, putting it at inherent conflict with any reasoning about uniqueness. Making before_exec safe sends the message that yes, you can use Rust to track ownership within your process, but if you actually want to do something with the outside world, you better do not rely on it. Of course this won't stop anyone, people will still write these abstractions because nobody will think of before_exec, and then their code will have one entirely unnecessary subtle soundness hole. That is so sad :(

RalfJung on 13 Nov 2018

👍10 ❤2

All 74 comments

Could you clarify the unsafety that can happen here? We chose to stabilize this function on the grounds that we couldn't come up with any memory safety reasons this would cause problems, just deadlocks and such.

alexcrichton on 6 Feb 2017

👍1

It's probably best to treat anything that is undefined according to POSIX as a memory safety violation. Once POSIX says that you cannot assume anything about the process's behavior, you need to assume that memory safety is violated as well.

I'll try to give two examples why this facility is unsafe in practice. I have to argue based on the glibc implementation, but we can hopefully assume that it is a valid implementation, matching the POSIX requirements in play here.

First, assume that you have a PRNG for use for cryptography written in Rust. This PRNG uses locks or thread-local variables to protect its internal state. It does not have fork protection because there is no way to implement that in Rust. This PRNG will return the same sequence of bytes in the before_exec callback and the parent thread, violating the expectations associated with a PRNG. Maybe this does not count as a memory safety violation, but I think it is still a problem for PRNGs.

For the second problem, we look at the implementation of recursive mutexes in libpthread. They embed a PID/TID, a small numeric value which gets reused over time. Let's look at this sequence of events:

A thread in the parent process acquires a recursive mutex.
Another thread invokes an external command with a before_exec callback.
The before_exec callback invokes another external command with another before_exec callback.
The initial parent process exits (so the thread in step 1 ceases to exist and its TID can be reused).
The second before_exec callback spawns a new thread. The TID of that thread happens to match the TID of the thread in step 1.
This new thread attempts to acquire the recursive mutex which has been acquired in step 1.

The attempt in step 6 will succeed because libpthread incorrectly assumes that the same thread tried to acquire the mutex. But the process was copied while the thread in step 1 was holding the mutex, so the data protected by it may not be consistent, and the thread in steps 5 and 6 may access intermediate results which are not supposed to be visible. This is a thread safety violation, and hence a memory safety violation.

I think the second example at least is quite compelling.

fweimer on 6 Feb 2017

👍6

That'd do it!

alexcrichton on 6 Feb 2017

This is not marked as unsafe today and to do so would be backwards incompatible; however, a quick search on GitHub didn't seem to turn up any code that uses it.

Mark-Simulacrum on 20 May 2017

The libs team discussed this a few weeks ago at the last triage and the conclusion was that we'd prefer to see concrete evidence (e.g. a proof-of-concept) showing the memory unsafety here. At that point we'll consider remediations but for now we're fine just adding some documentation that "care should be taken"

alexcrichton on 5 Jun 2017

😕7

In general, anything involving files may not behave as expected.

For example, a concrete memory safety issue would be a memory mapped allocator and assumes that the memory mapped file can't be accessed by multiple processes (e.g., by creating a file with the O_TMPFILE on Linux or enforced through some other means (e.g., by contract)).

Other shenanigans include:

A database that locks the underlying file and thus assumes that there is at most one writer can end up corrupting the database.
Loggers can end up writing corrupted logs.
Values can be dropped twice per constructor call (although the values will be dropped by different processes).

Stebalien on 6 Jun 2017

Not to denigrate theoretical concerns (which is to say, just because one hasn't yet demonstrated that this violates memory safety doesn't mean that we're going to reject this out-of-hand, and it doesn't mean that it won't eventually become a severe bug somehow), but I second Alex's desire to see a working proof-of-concept in code.

bstrie on 8 Jun 2017

Some prior art

Feels similar to me.

steveklabnik on 8 Jun 2017

I honestly don't have time to write a convincing POC but this breaks all wayland implementations (assuming memory corruption counts as a safety bug) because wayland clients and servers communicate by memory mapped files. Any attempt (accidental or otherwise) to update a wayland buffer from within a before_exec callback might corrupt the shared buffer. Now, this can be mitigated by taking a global lock and then checking the PID before writing to memmapped buffers but that's not something one wants to do every frame.

@steveklabnik

Those aren't quite the same:

They're more useful than before_exec (a reason to make an exception).
The unsafety only shows up when doing FFI.

Unlike those cases, this affects a fundamental rust concept: move/copy semantics. Without before_exec (the ability to fork without unsafe code), it's impossible to (safely) copy a value of a type that doesn't implement Copy. With before_exec, this guarantee is relaxed by allowing copies (in safe code) as long as those copies reside in different processes. This prevents processes from relying on any form of "one-constructor-call -> at-most-one-value" guarantee.

As before_exec can be used to emulate fork, these threads also apply:

Stebalien on 11 Jun 2017

@alexcrichton I like to see all soundness issues with an associated priority tag, does your comment above imply that this is considered P-low? Alternatively, if you don't consider this a soundness issue at all, would you like to remove the I-unsound label?

bstrie on 20 Jun 2017

Ah yes the libs team decided this was basically P-medium, so I will tag as such.

alexcrichton on 21 Jun 2017

👍1

Even if there were no way to trigger memory unsafety today, at any time a change to libpthreads or libc or the operating system in the future could cause memory unsafety where there was previously none. Accordingly, I don't think blocking the change on a PoC makes sense, and further I think requiring somebody to make a PoC before the change is made would be a waste of those people's time. Any time we trigger undefined behavior, memory unsafety should just be assumed.

briansmith on 7 Jul 2017

👍8

BTW have anyone considered that on macOS if an application is using libdispatch, it must not call any libdispatch code after fork? I don't have PoC now but I think it'd not be difficult to create.

Kixunil on 7 Jul 2017

I concur with the arguments here that this is a different beast than remove_var and set_var. Thus far we have chosen to always interpret specified undefined behavior as memory unsafety, demonstrable or otherwise. It would be a unique exception to our policy of "all memory unsafety may be traced back to an unsafe block" to add "(...unless using std::os::unix::process::CommandExt::before_exec)". The longer we wait, the more painful this will be. I propose that we begin emitting a warning ASAP with a plan to mark this unsafe at a specified future date.

bstrie on 19 Oct 2017

👍8

With some OS help one can always cause memory unsafety:

#![feature(getpid)]

fn hacky_ptr_write<T>(ptr: &T, value: u32) {
    std::process::Command::new("gdb").arg("-batch").arg("-quiet")
        .arg("-pid").arg(format!("{}", std::process::id())).arg("-ex")
        .arg(format!("set {{unsigned long}}{:p}={}",ptr,value))
        .output() .unwrap();
}
fn main() {
    let q = &mut Box::new(55);
    hacky_ptr_write(&q, 0);
    *q = Box::new(44); // Segmentation fault
}

But this does not mean Command should be unsafe.

vi on 5 Nov 2017

😄2

@vi

The problem here is that the type-based assumption that a non-Copy type can't be copied is broken (even though the copies end up in different processes). Any code relying on this assumption is potentially incorrect and any unsafe code relying on it is potentially memory unsafe. However, given the "different process" constraint, this bug necessarily involves the operating system.

Yes, you can usually shoot yourself in the foot using the OS^†. However, you generally have to explicitly request this behavior. In this case, you can take two apparently safe and independent APIs (e.g., wayland and before_exec) and combine them to corrupt memory.

^†It's actually harder than you might think as long as you aren't root. Most modern Linux distros, at least, don't allow non-root processes to debug (or access/modify the memory of) non-child processes.

Stebalien on 5 Nov 2017

👍1

I expect there to be some unsafe{} inside that Wayland code. Can one reproduce the unsafety without using non-std unsafe{} code?

Does the question boil down to "Should every unsafe{} block also expect forks or should the fork itself just be be unsafe"?

non-Copy type can't be copied

During fork() I expect it to stay on the same virtual address, so that it is as if not copied. !Copy from Rust perspective does not mean non-copyable from OS kernel perspective.

vi on 5 Nov 2017

I expect there to be some unsafe{} inside that Wayland code.

Yes. It memory maps a shared buffer.

Can one reproduce the unsafety without using non-std unsafe{} code?

Not unless you were to move the shared mmap functionality from wayland into std. It's not really about where the code lives but what are valid assumptions and what are not (although any assumptions made by std are assumed to be valid).

Does the question boil down to "Should every unsafe{} block also expect forks or should the fork itself just be be unsafe"?

Pretty much, yes. If we do say "this API is fine", we should also provide a fork function (the same way we made mem::forget safe).

During fork() I expect it to stay on the same virtual address, so that it is as if not copied. !Copy from Rust perspective does not mean non-copyable from OS kernel perspective.

From an outside perspective, the value is copied and:

Will be dropped twice.
Will share any os-level objects (e.g. file descriptors).

Stebalien on 5 Nov 2017

From an outside perspective, the value is copied and:

Will be dropped twice.
Will share any os-level objects (e.g. file descriptors).

Just what is typically needed. Consider a C program:


#include <stdlib.h>
#include <unistd.h>

int main() {
    int fd = open("myfile.txt", O_RDONLY);
    void* mem = malloc(300);
    if (fork()) {
        close(fd); // first "drop" of fd
        free(mem); // first "drop" of mem
    } else {
        close(fd); // second "drop" of fd
        free(mem); // second "drop" of mem
    }
}

There are two frees, but it's not that "double free" error. Likewise file descriptors should be closed twice after a fork.

Unsafety of fork may be like unsafety of FromRawFd - not directly messing with memory, just confusing other parts of system with surprises, not causing unsafety from within Rust, but causing operating system to behave unsafely for Rust program. And it's workaroundable by std::fs::File::open("/proc/self/fd/0") or something like that.

Relationship between processes after fork() is IPC. Shared memory buffers is also a sort of IPC. Triggering program crash by using IPC is unsafety of the program...

What exactly happens in Wayland if evil peer (or even evil Wayland server) deliberately corrupts our buffer? Shound't Rust program not trust memory-mapped buffers that are writable from outside anyway? I expect storing pointers inside a memory-mapped file is unsafe anyways. And if no pointers getting stored there then it would be wrong, but safe behaviour.

vi on 5 Nov 2017

Not unless you were to move the shared mmap functionality from wayland into std.

What function? map? But it's unsafe anyways. Also it gives you [u8], and those bytes can jump around unpredictably because of it's IPC and somebody may be interfering. Storing safety-important things in that [u8] is your decision and that's where the wrong assumption ("when I get mapping, it should be safe to work with") kicks in.

vi on 5 Nov 2017

There's also this comment from the implementation of Command:

    // Currently we try hard to ensure that the call to `.exec()` doesn't
    // actually allocate any memory. While many platforms try to ensure that
    // memory allocation works after a fork in a multithreaded process, it's
    // been observed to be buggy and somewhat unreliable, so we do our best to
    // just not do it at all!

If this is true it seems pretty clear that this function should be considered unsafe. If it's not true then there's a whole lot of unnecessary and complex unsafe code in the standard library.

Diggsey on 16 Dec 2017

👍5

Here is an example for an API that successfully maintains an invariant (on an OS pipe), but that gets broken by before_exec:

// cargo-deps: os_pipe
extern crate os_pipe;

mod use_once_pipe {
    //! A rather strange module that creates a pipe and makes sure that
    //! we only ever send a single byte on that pipe.
    use std::io::{Read, Write};
    use os_pipe::{pipe, PipeReader, PipeWriter};

    pub struct Reader(PipeReader);

    impl Reader {
        /// Will block until `write` gets called!
        pub fn read(mut self) -> Option<u8> {
            let mut res = vec![];
            self.0.read_to_end(&mut res).unwrap();
            assert!(res.len() <= 1, "Our contract got violated! Read {:?}.", res);
            res.get(0).map(|b| *b)
        }
    }

    pub struct Writer(PipeWriter);

    impl Writer {
        pub fn write(mut self, byte: u8) {
            self.0.write(&[byte]).unwrap();
        }
    }

    pub fn create() -> (Reader, Writer) {
        let (reader, writer) = pipe().unwrap();
        (Reader(reader), Writer(writer))
    }
}

use std::process::Command;
use std::os::unix::process::CommandExt;
use std::sync::{Mutex, Arc};

fn main() {
    // Small demonstration of use_once_pipe.
    {
        let (r, w) = use_once_pipe::create();
        w.write(42);
        //w.write(23); // not possible because `write` consumes `self`
        assert_eq!(r.read(), Some(42));
    }

    // Now lets break it.
    {

        let (r, w) = use_once_pipe::create();
        let w = Arc::new(Mutex::new(Some(w)));
        let w2 = Arc::clone(&w);

        Command::new("true")
            .before_exec(move || Ok(w2.lock().unwrap().take().unwrap().write(42)))
            .status().unwrap();
        w.lock().unwrap().take().unwrap().write(23);

        r.read();
    }
}

When I run this, I get

thread 'main' panicked at 'Our contract got violated! Read [42, 23].', before_exec.rs:17:13
note: Run with `RUST_BACKTRACE=1` for a backtrace.

@alexcrichton If I replace that assert!(res.len() <= 1) by something that causes UB, you have an example of before_exec introducing memory unsafety.

RalfJung on 11 Nov 2018

@RalfJung that seems akin to saying File::open is unsafe because you can File::open("/proc/self/mem") and arbitrarily read/write memory. I don't think I'd personally agree that example shows any memory unsafety.

alexcrichton on 12 Nov 2018

😕2 👍1

The reasons I see a huge difference between /proc/self/mem and this case are as follows:

First of all, /proc/self/mem is a somewhat special platform-specific API. It works on Linux only. Admittedly other platforms likely have something similar, but this is still is strong contrast to my example which relies entirely on standard POSIX interfaces.
Secondly, defending against /proc/self/mem would essentially entail making File::open unsafe, which is a very commonly used API and the vast majority of ways to use it are perfectly safe. That is hardly realistic. (If there was a good way to detect if the file we open is such a strange all-of-memory file, and we could make File::open reject opening such files, I'd propose that immediately.) In contrast, defending against this bug would just mean making before_exec unsafe, which is perfectly possible I would argue (and which nobody has argued against).
Most importantly though, I think we should consider the trade-off here. We can make either before_exec safe, and declare everything unsafe that would be broken by it, or vice versa. There is no "not taking a decision", if we change nothing we decide for the former. (After all, this is something we can realistically fix, unlike /proc/self/mem.) So on the one hand, we have a single rather obscure Unix-specific method doing something that is generally considered very dangerous when programming against POSIX (namely executing arbitrary code between fork and exec), and on the other hand, we have every single abstraction relying on having unique control over a file descriptor. Whether this be some network socket that wants to make sure a certain signal is only sent to another host once, a serialization/deserialization library used across a pipe because there might or might not be a process boundary while also maintaining uniqueness of some type instances, pretty much anything that uses Rust's ownership to enforce a discipline on external data could be broken by this.

I usually go around telling people that what makes Rust great is not just that it tracks ownership of memory for you, but that you can use the ownership facilities to track ownership of other, user-defined resources that Rust has no idea about, providing safe interfaces where that would not be possible with a less expressive type system and avoiding correctness errors that would otherwise be hard to detect statically. One of my favorite examples is a software I once started to write (unfortunately never got finished) that interacts with some external sensors, actuators and LEDs and where I used ownership to make sure that every LED has exactly one party in charge, so that we do not get strange flickering because two threads decide they will use the same LED to signal their status. Isn't it great how many places there are where ownership helps us provide better guarantees, or any guarantees at all?

It seems like in the future, every time I say that I will have to add "oh and you better make sure that your program does not call before_exec, because that can be used to break all of these guarantees". before_exec is essentially "fork in safe code", and fork duplicates the entire process state, putting it at inherent conflict with any reasoning about uniqueness. Making before_exec safe sends the message that yes, you can use Rust to track ownership within your process, but if you actually want to do something with the outside world, you better do not rely on it. Of course this won't stop anyone, people will still write these abstractions because nobody will think of before_exec, and then their code will have one entirely unnecessary subtle soundness hole. That is so sad :(

RalfJung on 13 Nov 2018

👍10 ❤2

I am adding the lang team, because this concerns fundamental limits of safe abstractions and unsafe code.

RalfJung on 13 Nov 2018

FWIW, I find @RalfJung's arguments quite persuasive.

nikomatsakis on 15 Nov 2018

One of my favorite examples is a software I once started to write (unfortunately never got finished) that interacts with some external sensors, actuators and LEDs and where I used ownership to make sure that every LED has exactly one party in charge, so that we do not get strange flickering because two threads decide they will use the same LED to signal their status.

My understanding is that @japaric actually implemented this in his RTFM project. Also I had the same idea before knowing about it. I find breaking this idea just to make some obscure function safe strange.

Kixunil on 15 Nov 2018

@RalfJung you're talking specifically about file descriptors and inheritance, though, which is a system-level concern that we really can't provide many guarantees about.

Let's say that we did want to provide a guarantee that you can control file descriptors and guarantee uniqe access to file descriptors. That has consequences like:

We've never said this before. Community bindings are not prepared for this to be unsafe. One example is this function, and surely there's a lot of other that are now "should be unsafe". This can have a huge impact we don't at all have a handle on as there are likely a good deal of abstractions in the wild that rely on system behavior one way or another, and would likely need close analysis if we want to say "all Rust programs need to provide a new guarantee by default".
Every platform other than linux is now "memory unsafe" by default because of non-atomic CLOEXEC. We do our best to atomically create file descriptors and set CLOEXEC, but that isn't always possible so handles to file descriptors can leak into child processes. Once they're leaked you no longer have a static guarantee that only your process has access to it.

I also think it's disingenous to paint this as a picture of "oh the bad person is trying to destroy Rust's guarantees by forcing this one obscure thing to be safe". The actual situation is that we have a function added to the standard library long ago as safe and has been stable for quite some time. We had many debates about whether to make it unsafe during stabilization, and concluded we could find no memory unsafety aspect and hence left it safe. We're not trying to act maliciously and break the community, we're doing the best we can (as everyone else is) and want Rust to be just as safe as everyone else.

alexcrichton on 16 Nov 2018

❤2

I also think it's disingenous to paint this as a picture of "oh the bad person is trying to destroy Rust's guarantees by forcing this one obscure thing to be safe".

I overreacted. I am sorry.

we have a function added to the standard library long ago as safe and has been stable for quite some time. We had many debates about whether to make it unsafe during stabilization, and concluded we could find no memory unsafety aspect and hence left it safe

I was not aware of discussion happening before stabilization, it did not come up in this thread so far (or I missed it). Is it possible to read this somewhere?

there are likely a good deal of abstractions in the wild that rely on system behavior one way or another, and would likely need close analysis if we want to say "all Rust programs need to provide a new guarantee by default".

This goes both ways though, there are likely also a good deal of abstractions that rely on Rust's ownership and non-duplication guarantees across process boundaries if and that would need close analysis if we want to say "Rust's guarantees entirely end at the process boundary". That does seem like a surprising limitation.

You're talking specifically about file descriptors and inheritance, though, which is a system-level concern that we really can't provide many guarantees about.

I realized file descriptors are a red herring. The entire process state gets duplicated, affecting all external communication. Here is another example, this time no file descriptors get duplicated (just as a sketch this time):

Let's say we have a type Led that represents a single LED:

struct Led(usize); // the usize is some GPIO pin or I²C address or so

impl Led {
  fn on(&mut self) {
    // send GPIO/I²C messages to turn on LED
  }

  fn off(&mut self) {
    // send GPIO/I²C messages to turn off LED
  }
}

The purpose is to make sure that no two parties think they both exclusively control the LED, and confuse the user. (This might be an actuator controlling a servo if you want something where the failure mode is physical destruction. But that's a bit too dramatic even for my taste. ;)

But now I can again do:

fn foo(led: Led) {
    let m1 = Arc::new(Mutex::new(Some(led)));
    let m2 = Arc::clone(&w);

    Command::new("true")
        .before_exec(move || Ok(m2.lock().unwrap().take().unwrap().on()))
        .status().unwrap();
    w.lock().unwrap().take().unwrap().off();
}

The on and the off race here, subverting the guarantee that Led otherwise manages to uphold. No file descriptors get duplicated.

That's why I said this is really about resources external to the process: If we allow safely duplicating the entire process state, we cannot use ownership in Rust to control access to resources that live outside the current process.

And it doesn't have to be embedded either. Another situation I can imagine is a network library using some non-Copy type to make sure some message (a reply or so) is only sent once. With a safe fork, I can send the reply twice. Or a crypto library that has some token to let you use something only once (a pre-generated nonce for some operation, or so) could now have that token be used twice. Basically, the moment the thing to protect against is any form of communication with the outside world, a safe fork breaks any safe API you might otherwise come up with in Rust.

RalfJung on 17 Nov 2018

❤3 👍1

Every platform other than linux is now "memory unsafe" by default because of non-atomic CLOEXEC. We do our best to atomically create file descriptors and set CLOEXEC, but that isn't always possible so handles to file descriptors can leak into child processes. Once they're leaked you no longer have a static guarantee that only your process has access to it.

I had no idea non-Linux unixes do not have an atomic CLOEXEC. That is sad :(

However, since creating a File from a file descriptor is unsafe, the new process cannot actually do anything with those extra file descriptors that it got leaked, right? Also, CLOEXEC changes behavior on exec, not fork. So I do not see how it breaks my pipe example.

Would it be possible to acquire a lock across open a file and setting CLOEXEC, and take the same lock across a fork? Then we could be sure that in the forked-off process, all file descriptors (well, at least the one we can have any say about) have CLOEXEC set. I do not know if such a lock would be prohibitive in terms of performance, though.

RalfJung on 18 Nov 2018

To me at least @RalfJung has made compelling arguments for why before_exec should be unsafe so I think we should at the very least try a crater run with before_exec as unsafe fn and see what the extent of the breakage is and then depending on the outcome change it outright or use a compatibility-warning period to ease crates into this.

Centril on 20 Nov 2018

What safety warnings should it carry then? Shall nix crate's fork also be unsafe then?

vi on 20 Nov 2018

fork should definitely be consistent with before_exec.

I can prepare a PR that marks this unsafe, so that we can crater it to get some more data.

RalfJung on 20 Nov 2018

@Centril Is this something that should be nominated for lang and libs team discussion? There'd be an FCP anyway if we decide to go with it.

RalfJung on 20 Nov 2018

Actually we're not even getting the compiler built... jobserver fails:

error[E0133]: call to unsafe function is unsafe and requires unsafe function or block
   --> /home/r/.cargo/registry/src/github.com-1ecc6299db9ec823/jobserver-0.1.11/src/lib.rs:566:13
    |
566 | /             cmd.before_exec(move || {
567 | |                 set_cloexec(read, false)?;
568 | |                 set_cloexec(write, false)?;
569 | |                 Ok(())
570 | |             });
    | |______________^ call to unsafe function
    |
    = note: consult the function's documentation for information on how to avoid undefined behavior

RalfJung on 20 Nov 2018

😄1

@RalfJung It would be nicer to have some data re. breakage to go on before discussing actions to take it in-meeting.

Centril on 20 Nov 2018

Seems like I have to submit a PR to jobserver then to add a currently-unnecessary unsafe block so that we can proceed? Or is there a better way?

RalfJung on 20 Nov 2018

@RalfJung I have some potentially good news. It seems as tho jobserver and duct are the only users of this method (depending on how much you trust sourcegraph here...) so after you've filed the PR there might not be much more pain ahead.

Centril on 20 Nov 2018

Nominating for discussion on next T-lang meeting which is on the 29th.
If @RalfJung and @alexcrichton can participate that'd be good.

Centril on 20 Nov 2018

See https://github.com/alexcrichton/jobserver-rs/pull/9

RalfJung on 20 Nov 2018

❤1

We can discuss this in a broader context, but in no way can we switch before_exec to being an unsafe function. The only way forward for this, if decided to implement it, will be ot deprecate the existing method and figure out a new name.

alexcrichton on 20 Nov 2018

I haven't had time to respond to @RalfJung's previous comment, nor has practically anyone else from the libs team weighed in here.

I'm not personally thrilled about debating the minutae here myself, and on skimming this thread awhile back it's already explained that posix basically defines arbitrary code after fork as UB. Regardless of anything else that's likely enough motivation to start deprecation/renaming.

alexcrichton on 20 Nov 2018

❤1

@alexcrichton

If we don't make before_exec into an unsafe fn then the soundness hole remains in the type system -- deprecation is not enough imo; I think it's entirely within our rights per RFC 1122 to make this change especially given that the impact seems to be quite nano-sized in terms of root regressions in a way that does not affect their public APIs (I'd like to run crater to be sure, but the sourcegraph query seems promising). It's even more nano-sized then I initially thought; the only crate that sourcegraph found a regression in was jobserver because the mention in duct was in a comment.

Centril on 20 Nov 2018

Is fork(2) supposed to be usable in single-threaded with-std Rust programs by itself, not as a part of fork-exec combo?

Maybe there should be two forks: the one is safe, but checks that we are currently single-threaded, the other is unsafe and just forks?

vi on 20 Nov 2018

@Centril no, we are not within our rights with that RFC to break the entire ecosystem in one go. We allow breaking changes that have minimal impact, which I know pretty certain this does not have minimal impact.

Here's a small list of crates we're breaking with this sort of change that I've found locally

nix - every historical version is now declared unsafe
Probably a good portion of the 238 reverse dependencies of nix
jobserver
alacritty

All of those are relatively high profile candidates, meaning this is not a trivial "just flag it as unsafe and go ahead".

Crater can provide better visibility here, and using that does not require merging changes to jobserver to figure it out.

alexcrichton on 20 Nov 2018

using that does not require merging changes to jobserver to figure it out.

How can I get a crater run if rustc doesn't build because it depends on jobserver?

RalfJung on 20 Nov 2018

❤1

We chose to stabilize this function on the grounds that we couldn't come up with any memory safety reasons this would cause problems, just deadlocks and such.

I think this approach should be re-evaluated going forwards. Whilst "being able to directly cause memory unsafety" should of course imply that a function should be unsafe, I don't think the inability to do that is reason that a function should necessarily be stabilised as a safe function.

Firstly, there are features which are independently safe, but cause unsafety when used together. With the above approach, this will end up being decided just by whichever feature gets there first, whereas it would be better to make such choices explicitly, ie. we want X to be safe at the cost of Y being unsafe.

Secondly, it may be hard (as is the case here) to properly evaluate whether something is indeed safe or not, and stabilising it as an unsafe function would be more conservative. Obviously we want to avoid doing this too, but in certain cases where there is reasonable doubt (there was a lot of evidence at the time that there might be memory unsafety here), and particularly on platform-specific and infrequently used functionality, either delaying stabilisation or stabilising it as unsafe would be preferable.

Diggsey on 20 Nov 2018

❤3

@alexcrichton

@Centril no, we are not within our rights with that RFC to break the entire ecosystem in one go. We allow breaking changes that have minimal impact, which I know pretty certain this does not have minimal impact.

I think we should measure the impact on the ecosystem, but given the results from sourcegraph it strikes me as hyperbole to state that this would "break __the entire ecosystem__ in one go" (emphasis mine).

Twice I have read RFC 1122 now and I cannot find any paragraph nor sentence which states that breakage due to soundness fixes has to be minimal. The RFC mainly talks about mitigation strategies depending on impact, severity, exploitability, etc. If you think I'm wrong about this please show me why.

A migration strategy consisting of changing key crates before-hand plus a compatibility warning, over say 2 release cycles, that tells users to wrap .before_exec(..) in unsafe { ... } is both simple and a mechanism where affected users have a clear recourse.

Here's a small list of crates we're breaking with this sort of change that I've found locally

nix - every historical version is now declared unsafe

Well, historical versions yes, we can at most go and introduce new patch versions 0.8.2, 0.9.1, 0.10.1, and 0.11.1, but we cannot do anything more than that and particularly we cannot deal with crates that have pinned versions of nix other than with warnings. Furthermore, I checked nix upstream and did not find any mentions of before_exec in the code at all.

jobserver

As I noted before this has one case of before_exec.

alacritty

This has 2 mentions of before_exec, https://github.com/jwilm/alacritty/search?q=before_exec&unscoped_q=before_exec. The crate has zero reverse dependencies on crates.io tho so the fix is fairly local.

All of those are relatively high profile candidates, meaning this is not a trivial "just flag it as unsafe and go ahead".

OK; so we take a more gradual approach with a lint / warning stating that it should be wrapped in unsafe { .. } and in 2-3 months we change it to unsafe fn. I don't have a problem with waiting a bit.

Centril on 20 Nov 2018

@RalfJung you can use [patch] to depend on a git repository for a try build.

@Centril I'm not really interested in hyper-analyzing all the use cases here. Crater is required for a change like this anyway, so I'm not going to deal with speculating what crater says until it's run.

alexcrichton on 20 Nov 2018

What would cause a breakage in Nix? I can't find any uses of before_exec in any version Nix going back to 0.1.2. As an aside, Nix _did_ have a very similar problem in its tests, caused not by using before_exec but by calling fork directly and running code before exec. As the OP notes, we had to fix it by only using async-signal-safe functions in those situations.

asomers on 20 Nov 2018

Furthermore, I checked nix upstream and did not find any mentions of before_exec in the code at all.

nix crate just exposes fork (the source of before_exec's problems) itself as a safe function. Documentation does not mention any memory crash-worthy issues:

Safety

In a multithreaded program, only async-signal-safe functions like pause and _exit may be called by the child (the parent isn't restricted). Note that memory allocation may not be async-signal-safe and thus must be prevented.

Those functions are only a small subset of your operating system's API, so special care must be taken to only invoke code you can control and audit.

I'm not sure how "async-signal-safe" and Rust's "safe" are connected.

vi on 20 Nov 2018

Whilst "being able to directly cause memory unsafety" should of course imply that a function should be unsafe, I don't think the inability to do that is reason that a function should necessarily be stabilised as a safe function.

As far as I remember, there is some notion against marking functions as unsafe just because of it feels unsafe or because of it is bad for security or to deter users from using it. Each unsafe function's documentation should ideally contain the proof of unsafety and concrete instructions of how to use it without causing UB.

Examples of such "unsafe-esque" safe functions are mem::forget and TlsConnectionBuilder::danger_accept_invalid_certs.

vi on 20 Nov 2018

As far as I remember, there is some notion against marking functions unsafe "just because of it feel unsafe" or because of it is bad for security or to deter users from using it. Each unsafe function's documentation should ideally contain the proof of unsafety and concrete instructions of how to use it without causing UB.

This is the wrong way round: functions which call unsafe functions but are themselves safe should have a "proof" (not necessarily formal, but at least some reasoning) of why they are safe. Not just "we couldn't find any unsafety".

Diggsey on 20 Nov 2018

❤3 👍1

@vi

nix crate just exposes fork (the source of before_exec's problems) itself as a safe function. Documentation does not mention any memory crash-worthy issues:

Making before_exec an unsafe fn doesn't directly impact nix as a matter of semver due to fork being safe in the crate; it does however indicate that nix might be doing something fishy. @RalfJung knows more about this subject than me so I'll leave that to him.

Centril on 20 Nov 2018

There is definitely a policy against marking functions that are known to sound but otherwise dangerous as unsafe. It's also true that functions exposed as safe should be proven (or, in the absence of a formalism for all of Rust in which such a proof can be carried out, convincingly argued) to be sound. Neither quite matches what @Diggsey is talking about: no proof that it's sound and no non-UB footguns motivating unsafe. I don't think there is an existing policy about that.

hanna-kruppe on 20 Nov 2018

you can use [patch] to depend on a git repository for a try build.

I've never used [patch] before, will see what I can do.

there is some notion against marking functions as unsafe just because of it feels unsafe or because of it is bad for security or to deter users from using it.

True. However:

there are features which are independently safe, but cause unsafety when used together

Namely, my pipe example (helpfully folded away by GitHub) is safe in a universe without a safe fork, and fork is safe in a universe that does not allow my pipe example. Making fork (and hence before_exec) unsafe would not just be a "lint" because it is "bad for security", it would actually come with a proper proof obligation for the caller: You must not use this to "use up" or even touch the same exclusive resource in both parent and child process. (Basically, shared references are okay, mutable references are not. My pipe example acquires a mutable reference from a shared reference through Mutex::lock, that is the unsound part.)

I do not think there can be a general policy about what to do in such cases, we must decide on a case-by-case basis.

Making before_exec an unsafe fn doesn't directly impact nix as a matter of semver due to fork being safe in the crate; it does however indicate that nix might be doing something fishy.

If we decide before_exec to be unsafe, that can only go together with deciding that we make some amount of guarantees for using ownership to track out-of-process resources -- promising that things like my pipe example are sound. Hence the justification for fork to be safe (which contains an unsafe block, so it must manually justify its safety) would no longer hold.

So, yes, this would immediately imply that a safe fork is a soundness hole in a library.

RalfJung on 20 Nov 2018

@RalfJung

If we decide before_exec to be unsafe, that can only go together with deciding that we make some amount of guarantees for using ownership to track out-of-process resources -- promising that things like my pipe example are sound. Hence the justification for fork to be safe (which contains an unsafe block, so it must manually justify its safety) would no longer hold.

So, yes, this would immediately imply that a safe fork is a soundness hole in a library.

Oh sure, the crate would have to change eventually; but it does not imply that the code would break immediately in the sense that cargo check no longer works, which is what I was trying to communicate (poorly!). :)

Centril on 20 Nov 2018

pipe example is safe in a universe without a safe fork

The example looks like it trusts sockets to return some sane data. I'm not sure if OS sockets and other objects (apart from memory manager) are in scope of Rust's safety rules.

from_raw_fd being unsafe may set precedent that file descriptors are in scope of safety rules although. There are workarounds (like /proc/self/fd/... or /proc/self/mem), so maybe there is some concensus that tricky file paths are not in scope of safety rules.

Another tricky point is shared memory. Is all shared memory inherently unpredictable (i.e. can't store any references there or rely on any invariants about content of shared memory) or safe facades are OK and tool that break those safe facades are unsafe?

Would hypothetical new OS function that adjusts monotonic clock, allows it to jump back in time (which is currently documended not to happen) safe or unsafe according to Rust?

There should be some explicit policy of which operation system features are in scope of safety rules and which are not. For each feature in scope, there should be description of what invariants shall unsafe-using code preserve (e.g. "don't close or duplicate file descriptors owned by Rust code").

vi on 20 Nov 2018

So crater found 32 root regressions. Looks like if we decide we want to take a stanza for tracking ownership of external resources, we'd have to go the gradual-deprecation route.

RalfJung on 24 Nov 2018

👍3

Thanks for gathering the data @RalfJung! This is a nominated issue for the next T-libs meeting where we can try to discuss and discover a new name here. Do others have suggestions for what to call this method alternatively? Some ideas I might have are:

before_exec2
pre_exec
after_fork

alexcrichton on 25 Nov 2018

Nominating for discussion on next T-lang meeting which is on the 29th.
If @RalfJung and @alexcrichton can participate that'd be good.

Ralf couldn't participate on the 29th.

Centril on 25 Nov 2018

My preference is to deprecate the method temporarily with:

#[deprecated =
    "This method will become unsafe because $reason.\
     Check whether what you are doing is safe and then wrap the call in `unsafe { ... }`."]

And then we can let say 3-6 months pass while we file fixing PRs; after that we make before_exec into unsafe fn.

I don't mind having an extra method; but I would like before_exec to eventually become unsafe fn.
As for naming, all of the mentioned candidates seem fine; another potential name could be before_exec_unchecked.

Centril on 25 Nov 2018

👍1

This was discussed briefly with the libs team during today's triage, and there was no objection to moving forward with a deprecation. The leading contender for a name was before_exec2 which clearly indicates it's "basically the same" as the prior method modulo a few small details (in this case the safety).

Moving this issue forward largely just needs someone to champion the PR and implementation to push it through!

alexcrichton on 28 Nov 2018

Not my call to make but I really do not like the idea of just incrementing the count on methods. This made a lot of code in Python just very ugly to begin with.

If you want to find some better names here are similar APIs in other languages: function with the same behavior in python is called prexec_fn for subprocesses, in boost it's called on_exec_setup. In glib the closest equivalent is GSpawnChildSetupFunc (setup_func).

mitsuhiko on 29 Nov 2018

👍3

If editions could be used to solve this in the future, that would be a huge benefit. (I realise this is difficult given that libstd has to only be compiled once, but perhaps there is a way to attach some metadata to the function signature so that the compiler can do the mapping).

eg. an attribute like:

#[rename("before_exec", edition=2018)]
fn before_exec2(...)

Diggsey on 29 Nov 2018

@Diggsey ostensibly you could do that with "edition visibility", e.g. pub(>= 2018) and pub(<2018) as an unstable mechanism. I'd like to use some mechanism like that to say e.g. pub(< 2021) fn uninitialized<T>() -> T.

Centril on 29 Nov 2018

The leading contender for a name was before_exec2 which clearly indicates it's "basically the same" as the prior method modulo a few small details (in this case the safety).

This makes sense for someone coming from today's Rust, but in a future where before_exec is not used at all any more, many people will be confused by what the 2 means. Basically, before_exec2 means we will always have to explain the name as "successor of a now-deprecated function".

RalfJung on 29 Nov 2018

We can have "soft unsafe" operations which only produce a (deny by default) lint that states that one should be wrapping it in unsafe, but also mention that the only reason it's a lint is because of backwards compatibility.

Implementing such a scheme in the unsafe checker is quite simple. I am volunteering to write a PR if such a change is desired (or if we just want to see how such a change would look).

oli-obk on 29 Nov 2018

👍2

"soft unsafe"

Like what happens when taking a reference of a field of a repr(packed) struct?

vi on 29 Nov 2018

I think there's definitely enough valid pushback to not call it before_exec2! I like preexec or maybe on_exec_setup myself perhaps? (I don't feel too strongly one way or another).

@oli-obk I think I'd personally prefer to see a different function, because there's also the matter of creating a function pointer to this method which today is safe and creates a safe function pointer, but aftewards would need to ideally be safe and create an unsafe function pointer but would in practice have to unsafely create a safe function pointer

alexcrichton on 29 Nov 2018

The language team agreed that before_exec should be unsafe, and leaves the details of a transition plan to the libs team.

joshtriplett on 29 Nov 2018

👍1

Speaking only for myself: I like pre_exec.

joshtriplett on 29 Nov 2018

I do feel like there is a meta question lurking here of unsafe composition and how we should manage it. For now, it seems good to err on the side of caution where possible, though, and avoid thorny questions -- but -- as has been amptly demonstrated here -- it's sort of hard to tell what the limits ought to be on what unsafe code can and cannot do when it comes to e.g. external resources. I wonder if at some point we're going to want to try and allow unsafely implemented libraries to declare more precisely the kinds of things they rely on other libraries not to do.

nikomatsakis on 29 Nov 2018

❤2 👍1

Here is a simple example showing memory unsafety in the presence of fork:

#![feature(thread_id_value)]

use std::sync::atomic::{AtomicU64, AtomicPtr};
use std::sync::atomic::Ordering::SeqCst;
use std::{thread, ptr};

pub fn f() {
    static I: AtomicU64 = AtomicU64::new(0);
    static D: AtomicPtr<i32> = AtomicPtr::new(ptr::null_mut());

    let data = Box::leak(Box::new(0));
    let tid = thread::current().id().as_u64().get();
    match I.load(SeqCst) {
        0 => {
            I.store(tid, SeqCst);
            /*
             * Assumption for safety: No call to `f` can have the same `tid` while we're
             * in the critical section that spans this comment.
             */
            D.store(data, SeqCst);
        },
        n if n == tid => {
            /* D has been set to a valid pointer because of the assumption above */
            let _v = unsafe { *D.load(SeqCst) };
        },
        _ => { },
    }
}

mahkoh on 24 May 2020

👀1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Internal compiler error: cannot relate bound region (likely caused by `conservative_impl_trait`)

jmegaffin · 3Comments

Use #[repr(C)] HList's to infer type-erased fmt fn pointers in format_args!'s static data.

eddyb · 3Comments

#![allow(unknown_lints)] does not work if there is a unknow lint at the same level since nightly-2017-08-11-x86_64-unknown-linux-gnu

overvenus · 3Comments

Cannot impl Drop for a generic type with a Higher-Rank Trait Bound

dwrensha · 3Comments

detect when "unconstrained type parameters" could be provided explicitly to a fn call

nikomatsakis · 3Comments