I believe there is an unintentional race in os/exec between Command.Wait and file operations. I am using go version go1.3 linux/amd64, but the race also reproduces with 1.4
A test case follows.
package main
import (
"fmt"
"log"
"os/exec"
)
func main() {
cmd := exec.Command("/bin/echo")
stdin, err := cmd.StdinPipe()
if err != nil {
log.Fatalf("StdinPipe: %v", err)
}
if err := cmd.Start(); err != nil {
log.Fatalf("Start: %v", err)
}
wrote := make(chan bool)
go func() {
defer close(wrote)
fmt.Fprint(stdin, "echo\n")
}()
if err := cmd.Wait(); err != nil {
log.Fatalf("Wait: %v", err)
}
<-wrote
}
To observe the race, save this program to racedemo.go, and run:
go run -race racedemo.go
The result:
WARNING: DATA RACE
Read by goroutine 5:
os.(*File).write()
/usr/lib/google-golang/src/pkg/os/file_unix.go:211 +0xc5
os.(*File).Write()
/usr/lib/google-golang/src/pkg/os/file.go:139 +0xcb
os/exec.(*closeOnce).Write()
<autogenerated>:42 +0x81
fmt.Fprint()
/usr/lib/google-golang/src/pkg/fmt/print.go:223 +0xc3
main.func路001()
/usr/local/google/home/piatek/racedemo.go:21 +0x159
Previous write by main goroutine:
os.(*file).close()
/usr/lib/google-golang/src/pkg/os/file_unix.go:108 +0x19b
os.(*File).Close()
/usr/lib/google-golang/src/pkg/os/file_unix.go:97 +0x94
os/exec.(*closeOnce).close()
/usr/lib/google-golang/src/pkg/os/exec/exec.go:442 +0x45
os/exec.*closeOnce.(os/exec.close)路fm()
/usr/lib/google-golang/src/pkg/os/exec/exec.go:437 +0x33
sync.(*Once).Do()
/usr/lib/google-golang/src/pkg/sync/once.go:40 +0xb1
os/exec.(*closeOnce).Close()
/usr/lib/google-golang/src/pkg/os/exec/exec.go:437 +0x8e
os/exec.(*Cmd).closeDescriptors()
/usr/lib/google-golang/src/pkg/os/exec/exec.go:222 +0xa2
os/exec.(*Cmd).Wait()
/usr/lib/google-golang/src/pkg/os/exec/exec.go:367 +0x45c
main.main()
/usr/local/google/home/piatek/racedemo.go:23 +0x316
Goroutine 5 (running) created at:
main.main()
/usr/local/google/home/piatek/racedemo.go:22 +0x307
The race is on a reference to the file descriptor, which is assigned -1 upon close and read when writing. The documentation is not precise, but I suspect this race is unintentional.
The implementation of StdinPipe uses the internal closeOnce type to prevent concurrent calls to Close; see: http://golang.org/src/os/exec/exec.go#L436
Once wrapped, the file descriptor is added to the closeAfterWait slice, suggesting that concurrent calls to Close and Wait are to be expected. Yet, a concurrent Write will race as shown above. The partially concurrent API is odd.
I believe it is common for code to block on Wait in one goroutine while writing to StdinPipe from other goroutines. (Indeed, I found such a race in a library at Google, prompting my investigation.)
If this race is expected behavior, I suggest updating the package docs to mirror the warning for use of StdoutPipe, e.g., "It is incorrect to call Wait before all writes to the pipe have completed."
How could the race be prevented at the os/exec level?
Should we delay Wait(), possibly indefinitely, waiting for Write to
complete?
How do we even know that there won't be another Write() when we decide
to do the clean up?
Before speculating on how best to fix the problem, I would like to reach agreement on the intended semantics of the API. Specifically, are writes to StdinPipe intended to be concurrent with Wait?
It seems the only way to fix the data race is to override provide our own
implementation of io.ReadCloser. (Similar races exist for StdoutPipe and
StderrPipe, if we fix StdinPipe, we should also fix the other two.)
Either we lock the struct for Read/Close, or use a sync/atomic to load the
underlying fd.
I don't think we can continue to use os.(*File), because otherwise we will
have to take the lock for the entire duration of the method call to prevent
races on the fd. But that means you can't Close it while there is concurrent
Write.
so to fix this, we will need to implement the method ourselves using syscall
directly.
We need to weight the pros and cons of fixing this and the effort required
for the user to avoid such races.
Leave for others to decide.
Note that the race only occurs if the process exits before reading the data read on the input pipe.
This is related to issue #6817, in that if that issue were fixed this one would no longer be a problem, as os.Pipe would return an object that supports simultaneous Write/Close.
Yes, if fixing os.Pipe will fix the underlying problem. Unfortunately,
a problem and require large scale changes. I'm still not sure how to make a
pollable
stat(2). High performance web servers seem to all workaround this problem
by having
separate worker threads for stat(2).
Yes, #6817 is hard. But, short of that, I'm not sure how hard to work to make a WriteCloser that supports simultaneous Write/Close only for the case where a program exits before reading from the input pipe. It's easy to construct such a case, but it doesn't seem like something that people are likely to do in real life.
Still, I just thought of one approach that should solve the problem at least for StdinPipe. In closeOnce.Close, call Fd to get the file descriptor and call syscall.Close (or syscall.CloseHandler) on the descriptor. Add closeOnce.Write that locks a mutex before writing. Have closeOnce.Close lock the mutex _after_ calling syscall.Close. Trust that the kernel will unblock the write when the descriptor is closed.
On Sat, Dec 13, 2014 at 8:37 PM, Ian Lance Taylor [email protected]
wrote:
Still, I just thought of one approach that should solve the problem at
least for StdinPipe. In closeOnce.Close, call Fd to get the file descriptor
and call syscall.Close (or syscall.CloseHandler) on the descriptor. Add
closeOnce.Write that locks a mutex before writing. Have closeOnce.Close
lock the mutex _after_ calling syscall.Close. Trust that the kernel will
unblock the write when the descriptor is closed.Right. I think to fix the problem within the os/exec package, we must
implement the Write and Close
methods ourselves. Same for StdoutPipe and StderrPipe. It all depends on
whether the problem is
worth the trouble. Or we can just update the document.
A somewhat related issue is #9173. We don't know if the running child
process not consuming
all the write before exiting successfully is an error or not. If it is,
this data race is also an error.
I think I'd argue this is a dup of #7970, although it's more troubling, because you can actually demonstrate it in a simple program.
Too late to fix for Go 1.6.
For what it is worth, if this bug is blocking you, for a hot fix you can get around it by setting your own stdin before cmd.Start(). I ran into this bug two days back and it got me for about 18 hours until I realized that the race condition was caused because in cmd.Wait(), cmd.closeDescriptors() is invoked for the registered(internal) io.Pipe()s. The pipeReader that is returned for stdin is registered internally whenever you invoke cmd.StdinPipe() and will be closed on cmd.Wait(). The work around this is to assign a readCloser that you control and then close it on your own whenever you are done reading. Please see the patch below
--- orig.go 2016-02-11 03:44:56.000000000 -0800
+++ main.go 2016-02-11 03:44:57.000000000 -0800
@@ -2,23 +2,24 @@
import (
"fmt"
+ "io"
"log"
"os/exec"
)
func main() {
cmd := exec.Command("/bin/echo")
- stdin, err := cmd.StdinPipe()
- if err != nil {
- log.Fatalf("StdinPipe: %v", err)
- }
+ pr, pw := io.Pipe()
+
+ cmd.Stdin = pr
if err := cmd.Start(); err != nil {
log.Fatalf("Start: %v", err)
}
wrote := make(chan bool)
go func() {
defer close(wrote)
- fmt.Fprint(stdin, "echo\n")
+ fmt.Fprint(pw, "echo\n")
+ _ = pw.Close()
}()
if err := cmd.Wait(); err != nil {
log.Fatalf("Wait: %v", err)
Screencast here https://asciinema.org/a/f13lrd8pep3ohr9ha0exe0rwh
Missed again.
We need to decide what to do here.
CL https://golang.org/cl/31148 mentions this issue.
os/exec already guards against "implicit Close during cmd.Wait racing against user calling w.Close". The CL I just sent expands that to guard against "implicit Close during cmd.Wait racing against user calling w.Write".
There is still a race between w.Close and w.Write, like there is on any os.File. I'm not solving #7970, just the os/exec case.
Change https://golang.org/cl/65490 mentions this issue: os/exec: remove protection against simultaneous Wait/Write
Most helpful comment
For what it is worth, if this bug is blocking you, for a hot fix you can get around it by setting your own stdin before cmd.Start(). I ran into this bug two days back and it got me for about 18 hours until I realized that the race condition was caused because in
cmd.Wait(),cmd.closeDescriptors()is invoked for the registered(internal) io.Pipe()s. The pipeReader that is returned for stdin is registered internally whenever you invoke cmd.StdinPipe() and will be closed on cmd.Wait(). The work around this is to assign a readCloser that you control and then close it on your own whenever you are done reading. Please see the patch below