Rust: On Unix-like OSes, std::process::Command::spawn() should not use assert! from the child process after fork()

Created on 30 Jun 2020 · 29Comments · Source: rust-lang/rust

On Unix-like OSes, std::process::Command::spawn() uses an assert! to check if the pipe I/O succeeded. If that assert! fails, it will call panic!, which is not signal-safe. It's better to use if ....is_err() { std::intrinsics::abort() } there.
libc::abort() (std::process::abort()) shouldn't be used as a replacement, as, at least in glibc, it's not signal-safe^[1][2], although all of the C standard, the C++ standard, the POSIX standard, and Linux man-pages say it should be.

[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c;h=df98782dd7ea6c1476184a365bd9f3954f481a18;hb=refs/heads/master#l54
[2] https://www.gnu.org/software/libc/manual/html_node/Aborting-a-Program.html#Aborting-a-Program

C-bug T-libs libs-impl

Source

hyd-dev

Most helpful comment

@tmiasko I have a fix for glibc. Locks are basically required to avoid seeing the unmasked signal state in the child. However, you have to take that lock during fork to ensure that you own and can unlock the lock in the child (avoids the deadlock). The regression test was not straight forward to write and on average it only reproduces once every 100-600 seconds on an x86_64 VM with 4vCPU. That's good enough though and I run the test for 20s to probabilistically try to catch if the defect comes back. On average if I ran the regression test just once per dev-test cycle it might take something like 32 years to show the defect :-( Thankfully more people than just the dev-test cycle will be using the API.

codonell on 27 Sep 2020

🎉2 🚀1

All 29 comments

It seems that nobody is aware of this... I know many libraries (such as the glibc above) don't care about signal safety in such situation. If you're not aware of this, feel free to close this issue.

hyd-dev on 16 Jul 2020

Closing as there is no response.

hyd-dev on 17 Jul 2020

cc @joshtriplett

mati865 on 17 Jul 2020

This is a good catch. We certainly try to be signal-safe in such contexts, and failing to do so is a bug. Sorry to see that this didn't get a response. I'm reopening this, as it remains a bug.

joshtriplett on 20 Jul 2020

I discovered that if libc::abort() is not signal-safe (in glibc), then rtabort!() is not signal-safe, so the signal handlers in the standard library that call rtabort!() is actually not signal-safe:
https://github.com/rust-lang/rust/blob/891e6fee572009ff2be4d4057fb33483610c36a7/src/libstd/sys/unix/stack_overflow.rs#L92-L106
Maybe I should open a new issue or report the abort(3) bug to glibc...

hyd-dev on 20 Jul 2020

On Mon, Jul 20, 2020 at 01:32:22AM -0700, hyd-dev wrote:

I just realized if libc::abort() is not signal-safe (in glibc), then rtabort!() is not signal-safe, so the signal handlers in the standard library that call rtabort!() is actually not signal-safe:
https://github.com/rust-lang/rust/blob/891e6fee572009ff2be4d4057fb33483610c36a7/src/libstd/sys/unix/stack_overflow.rs#L92-L106
Maybe I should open a new issue or report the abort(3) bug to glibc.

signal-safety(7) lists abort(3) as async-signal-safe. If it isn't in
practice async-signal-safe in glibc, that's a bug and you should report
it.

joshtriplett on 21 Jul 2020

On Mon, Jul 20, 2020 at 01:32:22AM -0700, hyd-dev wrote:
I just realized if libc::abort() is not signal-safe (in glibc), then rtabort!() is not signal-safe, so the signal handlers in the standard library that call rtabort!() is actually not signal-safe: https://github.com/rust-lang/rust/blob/891e6fee572009ff2be4d4057fb33483610c36a7/src/libstd/sys/unix/stack_overflow.rs#L92-L106 Maybe I should open a new issue or report the abort(3) bug to glibc.

signal-safety(7) lists abort(3) as async-signal-safe. If it isn't in practice async-signal-safe in glibc, that's a bug and you should report it.

Reported as https://sourceware.org/bugzilla/show_bug.cgi?id=26275.

hyd-dev on 22 Jul 2020

So I think libc::abort() can be used, and if ... { something_like_rtabort!() } is suitable as a replacement. WDYT?

hyd-dev on 22 Jul 2020

So I think libc::abort() can be used, and if ... { something_like_rtabort!() } is suitable as a replacement. WDYT?

@joshtriplett Should this be fixed like that?

hyd-dev on 11 Aug 2020

Given the above research, I don't think libc::abort() is safe to use either.

joshtriplett on 11 Aug 2020

Using std::intrinsics::abort() seems like a potentially reasonable approach here. (std::process::abort() just calls libc::abort().)

joshtriplett on 11 Aug 2020

I would prefer just using rtabort! since that is precisely what it was designed for.

Amanieu on 12 Aug 2020

That call chain is rtabort! -> sys_common::util::abort() -> sys::abort_internal(), which on unix calls libc::abort().

https://github.com/rust-lang/rust/blob/e5e33ebd2ba12a78dbf6e2d5f154d5f71f28576c/library/std/src/sys/unix/mod.rs#L159-L168

cuviper on 12 Aug 2020

The note about flushing files isn't something we can rely on:

https://man7.org/linux/man-pages/man3/abort.3.html

   Up until glibc 2.26, if the abort() function caused process
   termination, all open streams were closed and flushed (as with
   fclose(3)).  However, in some cases this could result in deadlocks
   and data corruption.  Therefore, starting with glibc 2.27, abort()
   terminates the process without flushing streams.  POSIX.1 permits
   either possible behavior, saying that abort() "may include an attempt
   to effect fclose() on all open streams".

cuviper on 12 Aug 2020

In upstream glibc the goal is for abort() to be AS-safe as required by the standard and it should be today (after glibc 2.27). In fact doing anything superfluous during the abort() generally constitutes a security hazard because the process can be in an unknown state when corruption is detected and the abort() called. For the sake of reducing restart latency and avoiding executing any compromised parts of the process we shut down as quickly as possible. We strongly recommend out-of-process backtracing to avoid needing to run any of that code in the process with bad or unknown state.

codonell on 12 Aug 2020

❤1

In upstream glibc the goal is for abort() to be AS-safe as required by the standard and it should be today (after glibc 2.27).

It's not. It tries to hold a lock.

hyd-dev on 12 Aug 2020

Example demonstrating a deadlock. After a dozen of so runs leaves deadlocked children behind:

#include <assert.h>
#include <pthread.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

void *lazy_abort(void *arg) {
  struct timespec req = {.tv_sec = 0, .tv_nsec = 1000000 };
  nanosleep(&req, NULL);
  abort();
  return NULL;
}

int main() {
  pthread_t thread;
  signal(SIGABRT, SIG_IGN);
  int r = pthread_create(&thread, NULL, lazy_abort, NULL);
  assert(r == 0);

  while (1) {
    pid_t pid = fork();
    assert(pid != -1);
    if (pid == 0) {
      abort();
    }
  }
}

At least the conditions for deadlock are quite unlikely, since both a child process and a thread in the parent process must abort.

The musl implementation looks correct (it also uses a lock, which by itself is only a weak indicator of a potential issue).

tmiasko on 13 Aug 2020

❤1 😄1

@tmiasko You may want to post that on the glibc bug report here: https://sourceware.org/bugzilla/show_bug.cgi?id=26275

Amanieu on 13 Aug 2020

codonell on 27 Sep 2020

🎉2 🚀1

@tmiasko https://sourceware.org/pipermail/libc-alpha/2020-September/117934.html

codonell on 27 Sep 2020

❤1

Unfortunately, avoiding abort() may still be required after the patch lands, to wait for it to be released and included in GNU/Linux distributions.

hyd-dev on 28 Sep 2020

I think we can make sys::abort_internal() call intrinsics::abort() when using glibc and then use rtabort! in spawn().

hyd-dev on 28 Sep 2020

Please don't use intrinsics::abort. If you really want then raise SIGABRT instead.

Amanieu on 28 Sep 2020

@Amanieu raise() is not signal-safe: https://man7.org/linux/man-pages/man3/raise.3.html#ATTRIBUTES

hyd-dev on 28 Sep 2020

It is: https://www.man7.org/linux/man-pages/man7/signal-safety.7.html

Although you should probably use pthread_kill, pthread_sigmask and sigaction to properly implement an abort replacement (all of which are also signal-safe).

Amanieu on 28 Sep 2020

It is: https://www.man7.org/linux/man-pages/man7/signal-safety.7.html

Although you should probably use pthread_kill, pthread_sigmask and sigaction to properly implement an abort replacement (all of which are also signal-safe).

Sorry. I confused. You're correct.

hyd-dev on 28 Sep 2020

Although you should probably use pthread_kill, pthread_sigmask and sigaction to properly implement an abort replacement (all of which are also signal-safe).

@Amanieu https://man7.org/linux/man-pages/man3/raise.3.html#DESCRIPTION says raise(sig) is equivalent to pthread_kill(pthread_self(), sig). I think pthread_sigmask(), sigaction() and just raise() can be used to implement the replacement.

I can submit a PR to do that.
@rustbot claim

hyd-dev on 28 Sep 2020

IMO we should just leave it as glibc's acknowledged bug, unless folks are actively hitting this problem. While we do hope to be technically "pure", the actual abort race seems like a very artificial situation that I doubt is a problem in practice.

cuviper on 28 Sep 2020

IMO we should just leave it as glibc's acknowledged bug, unless folks are actively hitting this problem.

@cuviper @Amanieu I've found that if I have to avoid intrinsics::abort() and use libc::raise(), I almost need to reimplement libc::abort() and port its tests to std. Maybe it's really better to keep using libc::abort() and just replace the assert! with rtassert!.

While we do hope to be technically "pure", the actual abort race seems like a very artificial situation that I doubt is a problem in practice.

Actually that's what my "don't care about signal safety in such situation" mean. But, again, feel free to ignore the race if it's really not a problem in practice. Also, thanks @codonell for fixing the race in glibc. The race will eventually get fixed (by updating glibc).

I can still submit a PR to just change the assert! to rtassert!. That assert! unsafety shouldn't be ignored.

hyd-dev on 28 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings