Go: os/exec: data race between StdinPipe and Wait

Created on 13 Dec 2014  路  17Comments  路  Source: golang/go

I believe there is an unintentional race in os/exec between Command.Wait and file operations. I am using go version go1.3 linux/amd64, but the race also reproduces with 1.4

A test case follows.

package main

import (
  "fmt"
  "log"
  "os/exec"
)

func main() {
  cmd := exec.Command("/bin/echo")
  stdin, err := cmd.StdinPipe()
  if err != nil {
    log.Fatalf("StdinPipe: %v", err)
  }
  if err := cmd.Start(); err != nil {
    log.Fatalf("Start: %v", err)
  }
  wrote := make(chan bool)
  go func() {
    defer close(wrote)
    fmt.Fprint(stdin, "echo\n")
  }()
  if err := cmd.Wait(); err != nil {
    log.Fatalf("Wait: %v", err)
  }
  <-wrote
}

To observe the race, save this program to racedemo.go, and run:

go run -race racedemo.go

The result:

WARNING: DATA RACE
Read by goroutine 5:
  os.(*File).write()
      /usr/lib/google-golang/src/pkg/os/file_unix.go:211 +0xc5
  os.(*File).Write()
      /usr/lib/google-golang/src/pkg/os/file.go:139 +0xcb
  os/exec.(*closeOnce).Write()
      <autogenerated>:42 +0x81
  fmt.Fprint()
      /usr/lib/google-golang/src/pkg/fmt/print.go:223 +0xc3
  main.func路001()
      /usr/local/google/home/piatek/racedemo.go:21 +0x159

Previous write by main goroutine:
  os.(*file).close()
      /usr/lib/google-golang/src/pkg/os/file_unix.go:108 +0x19b
  os.(*File).Close()
      /usr/lib/google-golang/src/pkg/os/file_unix.go:97 +0x94
  os/exec.(*closeOnce).close()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:442 +0x45
  os/exec.*closeOnce.(os/exec.close)路fm()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:437 +0x33
  sync.(*Once).Do()
      /usr/lib/google-golang/src/pkg/sync/once.go:40 +0xb1
  os/exec.(*closeOnce).Close()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:437 +0x8e
  os/exec.(*Cmd).closeDescriptors()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:222 +0xa2
  os/exec.(*Cmd).Wait()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:367 +0x45c
  main.main()
      /usr/local/google/home/piatek/racedemo.go:23 +0x316

Goroutine 5 (running) created at:
  main.main()
      /usr/local/google/home/piatek/racedemo.go:22 +0x307

The race is on a reference to the file descriptor, which is assigned -1 upon close and read when writing. The documentation is not precise, but I suspect this race is unintentional.

The implementation of StdinPipe uses the internal closeOnce type to prevent concurrent calls to Close; see: http://golang.org/src/os/exec/exec.go#L436

Once wrapped, the file descriptor is added to the closeAfterWait slice, suggesting that concurrent calls to Close and Wait are to be expected. Yet, a concurrent Write will race as shown above. The partially concurrent API is odd.

I believe it is common for code to block on Wait in one goroutine while writing to StdinPipe from other goroutines. (Indeed, I found such a race in a library at Google, prompting my investigation.)

If this race is expected behavior, I suggest updating the package docs to mirror the warning for use of StdoutPipe, e.g., "It is incorrect to call Wait before all writes to the pipe have completed."

FrozenDueToAge NeedsFix

Most helpful comment

For what it is worth, if this bug is blocking you, for a hot fix you can get around it by setting your own stdin before cmd.Start(). I ran into this bug two days back and it got me for about 18 hours until I realized that the race condition was caused because in cmd.Wait(), cmd.closeDescriptors() is invoked for the registered(internal) io.Pipe()s. The pipeReader that is returned for stdin is registered internally whenever you invoke cmd.StdinPipe() and will be closed on cmd.Wait(). The work around this is to assign a readCloser that you control and then close it on your own whenever you are done reading. Please see the patch below

--- orig.go 2016-02-11 03:44:56.000000000 -0800
+++ main.go 2016-02-11 03:44:57.000000000 -0800
@@ -2,23 +2,24 @@

 import (
    "fmt"
+   "io"
    "log"
    "os/exec"
 )

 func main() {
    cmd := exec.Command("/bin/echo")
-   stdin, err := cmd.StdinPipe()
-   if err != nil {
-       log.Fatalf("StdinPipe: %v", err)
-   }
+   pr, pw := io.Pipe()
+
+   cmd.Stdin = pr
    if err := cmd.Start(); err != nil {
        log.Fatalf("Start: %v", err)
    }
    wrote := make(chan bool)
    go func() {
        defer close(wrote)
-       fmt.Fprint(stdin, "echo\n")
+       fmt.Fprint(pw, "echo\n")
+       _ = pw.Close()
    }()
    if err := cmd.Wait(); err != nil {
        log.Fatalf("Wait: %v", err)

All 17 comments

How could the race be prevented at the os/exec level?

Should we delay Wait(), possibly indefinitely, waiting for Write to
complete?
How do we even know that there won't be another Write() when we decide
to do the clean up?

Before speculating on how best to fix the problem, I would like to reach agreement on the intended semantics of the API. Specifically, are writes to StdinPipe intended to be concurrent with Wait?

It seems the only way to fix the data race is to override provide our own
implementation of io.ReadCloser. (Similar races exist for StdoutPipe and
StderrPipe, if we fix StdinPipe, we should also fix the other two.)

Either we lock the struct for Read/Close, or use a sync/atomic to load the
underlying fd.

I don't think we can continue to use os.(*File), because otherwise we will
have to take the lock for the entire duration of the method call to prevent
races on the fd. But that means you can't Close it while there is concurrent
Write.

so to fix this, we will need to implement the method ourselves using syscall
directly.

We need to weight the pros and cons of fixing this and the effort required
for the user to avoid such races.

Leave for others to decide.

Note that the race only occurs if the process exits before reading the data read on the input pipe.

This is related to issue #6817, in that if that issue were fixed this one would no longer be a problem, as os.Pipe would return an object that supports simultaneous Write/Close.

Yes, if fixing os.Pipe will fix the underlying problem. Unfortunately,

6817 is much harder

a problem and require large scale changes. I'm still not sure how to make a
pollable
stat(2). High performance web servers seem to all workaround this problem
by having
separate worker threads for stat(2).

Yes, #6817 is hard. But, short of that, I'm not sure how hard to work to make a WriteCloser that supports simultaneous Write/Close only for the case where a program exits before reading from the input pipe. It's easy to construct such a case, but it doesn't seem like something that people are likely to do in real life.

Still, I just thought of one approach that should solve the problem at least for StdinPipe. In closeOnce.Close, call Fd to get the file descriptor and call syscall.Close (or syscall.CloseHandler) on the descriptor. Add closeOnce.Write that locks a mutex before writing. Have closeOnce.Close lock the mutex _after_ calling syscall.Close. Trust that the kernel will unblock the write when the descriptor is closed.

On Sat, Dec 13, 2014 at 8:37 PM, Ian Lance Taylor [email protected]
wrote:

Still, I just thought of one approach that should solve the problem at
least for StdinPipe. In closeOnce.Close, call Fd to get the file descriptor
and call syscall.Close (or syscall.CloseHandler) on the descriptor. Add
closeOnce.Write that locks a mutex before writing. Have closeOnce.Close
lock the mutex _after_ calling syscall.Close. Trust that the kernel will
unblock the write when the descriptor is closed.

Right. I think to fix the problem within the os/exec package, we must
implement the Write and Close
methods ourselves. Same for StdoutPipe and StderrPipe. It all depends on
whether the problem is
worth the trouble. Or we can just update the document.

A somewhat related issue is #9173. We don't know if the running child
process not consuming
all the write before exiting successfully is an error or not. If it is,
this data race is also an error.

I think I'd argue this is a dup of #7970, although it's more troubling, because you can actually demonstrate it in a simple program.

Too late to fix for Go 1.6.

For what it is worth, if this bug is blocking you, for a hot fix you can get around it by setting your own stdin before cmd.Start(). I ran into this bug two days back and it got me for about 18 hours until I realized that the race condition was caused because in cmd.Wait(), cmd.closeDescriptors() is invoked for the registered(internal) io.Pipe()s. The pipeReader that is returned for stdin is registered internally whenever you invoke cmd.StdinPipe() and will be closed on cmd.Wait(). The work around this is to assign a readCloser that you control and then close it on your own whenever you are done reading. Please see the patch below

--- orig.go 2016-02-11 03:44:56.000000000 -0800
+++ main.go 2016-02-11 03:44:57.000000000 -0800
@@ -2,23 +2,24 @@

 import (
    "fmt"
+   "io"
    "log"
    "os/exec"
 )

 func main() {
    cmd := exec.Command("/bin/echo")
-   stdin, err := cmd.StdinPipe()
-   if err != nil {
-       log.Fatalf("StdinPipe: %v", err)
-   }
+   pr, pw := io.Pipe()
+
+   cmd.Stdin = pr
    if err := cmd.Start(); err != nil {
        log.Fatalf("Start: %v", err)
    }
    wrote := make(chan bool)
    go func() {
        defer close(wrote)
-       fmt.Fprint(stdin, "echo\n")
+       fmt.Fprint(pw, "echo\n")
+       _ = pw.Close()
    }()
    if err := cmd.Wait(); err != nil {
        log.Fatalf("Wait: %v", err)

Missed again.

We need to decide what to do here.

CL https://golang.org/cl/31148 mentions this issue.

os/exec already guards against "implicit Close during cmd.Wait racing against user calling w.Close". The CL I just sent expands that to guard against "implicit Close during cmd.Wait racing against user calling w.Write".

There is still a race between w.Close and w.Write, like there is on any os.File. I'm not solving #7970, just the os/exec case.

Change https://golang.org/cl/65490 mentions this issue: os/exec: remove protection against simultaneous Wait/Write

Was this page helpful?
0 / 5 - 0 ratings