Go: runtime: memory corruption crashes with os/exec on Linux kernel 4.4

Created on 19 May 2017  ·  78Comments  ·  Source: golang/go

What version of Go are you using (go version)?

go version go1.8.1 linux/amd64

Some crashes were also observed with go version go1.6.3 linux/amd64, prompting an upgrade to go1.8.1.

What operating system and processor architecture are you using (go env)?

The affected servers are linux/amd64. We've seen the crashes when running Linux kernel version 4.4, and have not seen any crashes with Linux kernel version 3.2.

The ./go command here was cross-compiled from darwin/amd64 and dropped onto an affected server:

$ GOROOT=/ ./go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/users/rhys/go"
GORACE=""
GOROOT="/"
GOTOOLDIR="/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I have a monitoring daemon that periodically opens/uses/closes some unix domain sockets, reads files from virtual file systems e.g. under /sys/, uses os/exec to run short-lived child processes, sends UDP packets, makes HTTP requests, and provides some debug endpoints on an HTTP server.

It does not use its own cgo or unsafe (aside from the internals of github.com/golang/protobuf/proto), and the race detector has not detected any data races when run with a production workload. It is not specifically compiled with CGO_ENABLED=0 or with the netgo build tag so the executables end up dynamically linked, probably due to the net package's use of cgo.

As a test of cgo's involvement we've built the program with CGO_ENABLED=0, resulting in a statically-linked executable. This version of the program still crashes occasionally on Linux kernel version 4.4.

The program is closed-source.

What did you expect to see?

Given the program's track record of not crashing on Linux kernel version 3.2, I expected that running an identical binary on Linux kernel version 4.4 would also not crash.

What did you see instead?

The program crashes with a variety of "fatal error" messages and an occasional "panic", each of which seems to point to memory corruption. Some examples follow, here's what I think they indicate:

One of the "fatal error" crashes complains about the GC finding a pointer to an unallocated span. Others complain about unexpected fault addresses of small-magnitude numbers, like 0x18 or 0xffffffffffffffff. Either something is corrupting the memory allocator's datastructures—such as data races which are invisible to the race detector, such as within the runtime package—or something is retaining memory in a way that's invisible to the GC, allowing that memory to be reallocated with a different pointer map which would allow small numbers to be written to word that are expected to be pointers.

I don't think it's related to the Go 1.8 changes to argument liveness / runtime.KeepAlive, since there have been some crashes with go1.6.3.

I can't say it's related to crypto/rand's use of getrandom(2), since one of the crashes below shows a goroutine in the code that reads from /dev/urandom.

All of the partial crash logs shown below are from go1.8.1 on Linux kernel version 4.4. We have a couple examples of crashes with go1.6.3 on 4.4. We have not seen this code crash on Linux kernel version 3.2 with either go1.6.3 or go1.8.1.

/cc @aclements

Crash log excerpts follow:

fatal error: exitsyscall: syscall frame is no longer valid
fatal error: unexpected signal during runtime execution
fatal error: exitsyscall: syscall frame is no longer valid
fatal error: runtime: sudog with non-nil elem
fatal error: systemstack called from unexpected goroutine
[repeated lines removed]
fatal error: systemstack called from unexpected goroutine
runtime: newstack called from g=0xc4206196c0
    m=0xc4204c8c00 m->curg=0xc420618ea0 m->g0=0xc420693a00 m->gsignal=0xc420693860
fatal error: unexpected signal during runtime execution
fatal error: systemstack called from unexpected goroutine
fatal error: systemstack called from unexpected goroutine
fatal error: systemstack called from unexpected goroutine
acquirep: p->m=0(0) p->status=3
fatal error: acquirep: invalid p state
fatal error: systemstack called from unexpected goroutine
[repeated lines removed]
fatal error: systemstack called from unexpected goroutine
fatal error: acquirep: already in go
fatal error: systemstack called from unexpected goroutine
fatal error: systemstack called from unexpected goroutine
runtime: newstack called from g=0xc4205edd40
    m=0xc42006ec00 m->curg=0x0 m->g0=0xc4206a16c0 m->gsignal=0xc4206a1520
strings.genSplit(0xc4207b9416, 0xce, 0x88b0da, 0x1, 0x0, 0x53, 0x1, 0x88b0d0, 0x1)
    /usr/local/go/src/strings/strings.go:245 +0x80
strings.Split(0xc4207b9416, 0xce, 0x88b0da, 0x1, 0x54, 0x54, 0x54)
    /usr/local/go/src/strings/strings.go:283 +0x5b
fatal error: systemstack called from unexpected goroutine
[repeated lines removed]
fatal error: systemstack called from unexpected goroutine
runtime: newstack called from g=0xc4205beb60
    m=0xc4202ee400 m->curg=0xc4200d5d40 m->g0=0xc42032c4e0 m->gsignal=0xc42032c340
runtime.SetFinalizer(0x87f880, 0xc42029a3f0, 0x7fa840, 0x8a8f90)
    /usr/local/go/src/runtime/mfinal.go:407 +0x281
net.(*netFD).setAddr(0xc42029a3f0, 0xa735e0, 0xc420502780, 0xa735e0, 0xc420502760)
    /usr/local/go/src/net/fd_unix.go:57 +0x93
fatal error: concurrent map writes
fatal error: runtime: releaseSudog with non-nil gp.param

goroutine 13 [running]:
runtime.throw(0x8a1c37, 0x2b)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc42013b068 sp=0xc42013b048
runtime.releaseSudog(0xc4202bd220)
    /usr/local/go/src/runtime/proc.go:346 +0x3be fp=0xc42013b0f0 sp=0xc42013b068
runtime.selectgoImpl(0xc42013b4e8, 0x0, 0x18)
    /usr/local/go/src/runtime/select.go:516 +0x157c fp=0xc42013b368 sp=0xc42013b0f0
unexpected fault address 0x0
fatal error: fault
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
runtime: newstack called from g=0xc4203ea340
    m=0xc4202ea400 m->curg=0xc4201264e0 m->g0=0xc4202bbd40 m->gsignal=0xc4202bbba0
time.startTimer(0xc420155108)
    /usr/local/go/src/runtime/time.go:68 +0x2b
time.AfterFunc(0x3b9ac0b0, 0xc420385210, 0xc4201a8b58)
    /usr/local/go/src/time/sleep.go:165 +0xa6
runtime: nonempty check fails b.log[0]= 0 b.log[1]= 0 b.log[2]= 0 b.log[3]= 0
fatal error: workbuf is empty
runtime: nonempty check fails b.log[0]= 0 b.log[1]= 0 b.log[2]= 0 b.log[3]= 0
fatal error: workbuf is empty
runtime: nonempty check fails b.log[0]= 0 b.log[1]= 0 b.log[2]= 0 b.log[3]= 0
fatal error: workbuf is empty
runtime: empty check fails b.log[0]= 0 b.log[1]= 0 b.log[2]= 0 b.log[3]= 0
fatal error: MHeap_AllocLocked - MSpan not free
fatal error: workbuf is not empty
runtime: pointer 0xc42097c000 to unallocated span idx=0x4be span.base()=0xc4209a2000 span.limit=0xc4209a4000 span.state=0
runtime: found in object at *(0xc420358300+0x10)
object=0xc420358300 k=0x62101ac s.base()=0xc420358000 s.limit=0xc420359fe0 s.sizeclass=4 s.elemsize=48 s.state=_MSpanInUse
 *(object+0) = 0x9
 *(object+8) = 0x29d53e9900000100
 *(object+16) = 0xc42097c000 <==
 *(object+24) = 0x0
 *(object+32) = 0x1
 *(object+40) = 0x0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0xffffffffffffffff pc=0x41466b]
fatal error: workbuf is not empty
fatal error: workbuf is empty
[repeated lines removed]
fatal error: workbuf is empty
fatal error: netpollUnblock: already closing
 panic: runtime error: invalid memory address or nil pointer dereference
 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x45be9b]

 goroutine 20 [running]:
 panic(0x8143c0, 0xa9abd0)
    /usr/local/go/src/runtime/panic.go:531 +0x1cf fp=0xc4204afb50 sp=0xc4204afab8
 runtime.panicmem()
    /usr/local/go/src/runtime/panic.go:63 +0x5e fp=0xc4204afb70 sp=0xc4204afb50
 runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:290 +0x29f fp=0xc4204afbc0 sp=0xc4204afb70
 runtime.memmove(0x0, 0xc4207b9000, 0x8)
    /usr/local/go/src/runtime/memmove_amd64.s:169 +0x6ab fp=0xc4204afbc8 sp=0xc4204afbc0
 bufio.(*Reader).Read(0xc420896900, 0x0, 0x8, 0x8, 0x1000, 0x1000, 0xc4200474c0)
    /usr/local/go/src/bufio/bufio.go:224 +0x442 fp=0xc4204afc58 sp=0xc4204afbc8
 crypto/rand.(*devReader).Read(0xc420090510, 0x0, 0x8, 0x8, 0x0, 0x0, 0x0)
    /usr/local/go/src/crypto/rand/rand_unix.go:64 +0x11a fp=0xc4204afd28 sp=0xc4204afc58
 io.ReadAtLeast(0xa700e0, 0xc420090510, 0x0, 0x8, 0x8, 0x8, 0x7f8aa0, 0x1, 0x0)
    /usr/local/go/src/io/io.go:307 +0xa9 fp=0xc4204afd88 sp=0xc4204afd28
 io.ReadFull(0xa700e0, 0xc420090510, 0x0, 0x8, 0x8, 0x8a9738, 0x2, 0xc420276400)
    /usr/local/go/src/io/io.go:325 +0x58 fp=0xc4204afde0 sp=0xc4204afd88
 fatal error: unexpected signal during runtime execution
 [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x43245f]

 runtime stack:
 runtime.throw(0x8a1634, 0x2a)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0x7faf26ffcdd0 sp=0x7faf26ffcdb0
 runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:274 +0x2db fp=0x7faf26ffce20 sp=0x7faf26ffcdd0

CherryPickApproved FrozenDueToAge

Most helpful comment

TEXT runtime·walltime(SB),NOSPLIT,$16
    // Be careful. We're calling a function with gcc calling convention here.
    // We're guaranteed 128 bytes on entry, and we've taken 16, and the
    // call uses another 8.
    // That leaves 104 for the gettime code to use. Hope that's enough!
0000000000000030 <__vdso_clock_gettime>:
  30:   55                      push   %rbp
  31:   48 89 e5                mov    %rsp,%rbp
  34:   48 81 ec 20 10 00 00    sub    $0x1020,%rsp
  3b:   48 83 0c 24 00          orq    $0x0,(%rsp)
  40:   48 81 c4 20 10 00 00    add    $0x1020,%rsp

GCC is doing a 4K stack probe here. Two things could happen: a segfault if that page is unmapped, or nothing in the general case, since it's just an OR 0. But since it's not an atomic/locked op, if it races another thread, it could corrupt data.

So yeah, 104 bytes is not enough. Whether the kernel should be doing stack probes in vdso is another question entirely... but this explains why only Go with its tight split stacks breaks. And why this whole mess is so nondeterministic and the crashes random and hard to reproduce and require multiple threads.

Turns out my second time debugging an issue in the go runtime wound up being a bug in Go->C interop again. Whee.

Edit: just confirmed that hexediting those probes from 0x1020 to 0x20 bytes on one of the crash-inducing kernels cures the crashes.

All 78 comments

I need to retract the paragraph on CGO_ENABLED=0 still resulting in crashes: The crashes observed after deploying the CGO_ENABLED=0 change were all on straggler processes that were still running the old version of the code. I have not yet seen an instance of the pure-Go program crashing on kernel 4.4.

My updated understanding of the problem is as follows: Go programs compiled with default settings on Ubuntu 12.04 may experience memory corruption when run on Ubuntu 16.04. ("Default settings" meaning that cgo has not been explicitly disabled.)

This could be due to differences in the C libraries present on those systems, so it may or may not be worth hunting down and understanding more fully. Setting CGO_ENABLED=0 may have resolved the issue, and updating the build environment to match the runtime environment might also resolve it. I'll be able to say more confidently next week.

Over the weekend we observed more crashes. All instances are now built with CGO_ENABLED=0, and the executable is bit-for-bit identical whether the build environment is based on Ubuntu 12.04 or 16.04.

This rules out any involvement from differing C libraries: our pure-Go program built with go1.8.1 is crashing on Linux kernel version 4.4, and is not crashing on 3.2.

It also means I'm running low on ideas for how to diagnose the problem. The race detector has not found anything, CGO is out of the picture, build vs runtime environment differences have been accounted for, use of "unsafe" is limited to github.com/golang/protobuf/proto/pointer_unsafe.go. There's no use of runtime.SetFinalizer outside of std packages, and the crashes on go1.6.3 mean that either Go 1.8's changes to argument liveness are uninvolved, or there's more than one bug here.

I have not yet run the program with GODEBUG=efence=1, which might help determine whether the GC is reclaiming memory prematurely. I'm hesitant to try that setting since I don't understand how it will affect performance, and the failure rate on each single host is relatively low: much less than once per day. (This program runs on a relatively large number of machines.)

Since this program isn't particularly latency-sensitive (or even very throughput-sensitive), it looks like it would be safe to try gcrescanstacks=1 and gccheckmark=1. I plan to try those next.

Do you see the crashes on multiple machines?

Is there a way that we could try to reproduce the problem ourselves?

Since the change seems related to changing kernel versions, perhaps it would help to run the program under strace -f and see if there is any consistency to the system calls when a crash occurs.

Does your program set the SysProcAttr field when using os/exec? I'm only asking to try to look for something that might be kernel related.

The crashes happen on multiple machines.

I don't know how to reproduce the problem myself, so I don't know what to suggest for reproducing.

The average failure rate on a single machine is less than once per week, so we'd need to run strace on a large number of machines in order to see anything. We unfortunately don't have the infrastructure to collect strace output from a large number of hosts.

We don't set SysProcAttr when using os/exec.


I have core dumps for most of the crashes, but I'm not sure what to look for. Below is the stack trace for a crash in runtime.runfinq while calling Close on a network connection in a finalizer. The net.netFD struct looks like it's around 14 pointers long; the fault address here is the ASCII string "state", which is the tail end of a 29-byte (4 pointers) long output of path.Join. Not only was the memory reallocated, it was used for a different size class.

unexpected fault address 0x6574617473
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x6574617473 pc=0x427416]

goroutine 4 [running]:
runtime.throw(0x885256, 0x5)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc420049658 sp=0xc420049638
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:297 +0x28c fp=0xc4200496a8 sp=0xc420049658
net.runtime_pollUnblock(0x6574617473)
    /usr/local/go/src/runtime/netpoll.go:256 +0x26 fp=0xc4200496f0 sp=0xc4200496a8
net.(*pollDesc).evict(0xc42058a1b8)
    /usr/local/go/src/net/fd_poll_runtime.go:58 +0x3d fp=0xc420049708 sp=0xc4200496f0
net.(*netFD).Close(0xc42058a150, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:205 +0x4c fp=0xc420049728 sp=0xc420049708
runtime.call32(0x0, 0x8a1de0, 0xc4205120a0, 0x2000000020)
    /usr/local/go/src/runtime/asm_amd64.s:514 +0x48 fp=0xc420049758 sp=0xc420049728
runtime.runfinq()
    /usr/local/go/src/runtime/mfinal.go:208 +0x205 fp=0xc4200497e0 sp=0xc420049758
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc4200497e8 sp=0xc4200497e0
created by runtime.createfing
    /usr/local/go/src/runtime/mfinal.go:142 +0x62

Here are two more example crashes:

unexpected fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x40a752]

goroutine 1185827 [running]:
runtime.throw(0x885256, 0x5)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc4201f7bd0 sp=0xc4201f7bb0
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:297 +0x28c fp=0xc4201f7c20 sp=0xc4201f7bd0
runtime.evacuate(0x80a3a0, 0xc4203cd650, 0xcdca935a0cfa6e35)
    /usr/local/go/src/runtime/hashmap.go:965 +0x72 fp=0xc4201f7d10 sp=0xc4201f7c20
runtime.growWork(0x80a3a0, 0xc4203cd650, 0xcdca935a0cfa6e35)
    /usr/local/go/src/runtime/hashmap.go:953 +0x6b fp=0xc4201f7d38 sp=0xc4201f7d10
runtime.mapassign(0x80a3a0, 0xc4203cd650, 0xc4201f7ea8, 0x425ec25c44a7c000)
    /usr/local/go/src/runtime/hashmap.go:513 +0x5d5 fp=0xc4201f7dd8 sp=0xc4201f7d38
redacted(0xc4203d26c0, 0xa6c000, 0xc42000f9e0, 0xc420136c60, 0x0, 0x0, 0x0)
    /redacted.go:99 +0x65c fp=0xc4201f7f28 sp=0xc4201f7dd8
redacted(0xc4203d26c0, 0xa6c000, 0xc42000eb40, 0xc420136c60)
    /redacted.go:36 +0xbd fp=0xc4201f7fc0 sp=0xc4201f7f28
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc4201f7fc8 sp=0xc4201f7fc0
created by redacted
    /redacted.go:155 +0x32c
fatal error: sync: inconsistent mutex state

goroutine 69 [running]:
runtime.throw(0x894c12, 0x1e)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc4208297c0 sp=0xc4208297a0
sync.throw(0x894c12, 0x1e)
    /usr/local/go/src/runtime/panic.go:585 +0x35 fp=0xc4208297e0 sp=0xc4208297c0
sync.(*Mutex).Lock(0xc4207082ac)
    /usr/local/go/src/sync/mutex.go:79 +0xce fp=0xc420829818 sp=0xc4208297e0
net/http.(*body).Close(0xc420708280, 0x0, 0x0)
    /usr/local/go/src/net/http/transfer.go:890 +0x4f fp=0xc420829888 sp=0xc420829818
net/http.(*body).readLocked(0xc420708340, 0xc42042c000, 0x200, 0x200, 0x0, 0x10, 0x1)
    /usr/local/go/src/net/http/transfer.go:761 +0x61 fp=0xc4208298e8 sp=0xc420829888
net/http.(*body).Read(0xc420708340, 0xc42042c000, 0x200, 0x200, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/transfer.go:753 +0xfd fp=0xc420829938 sp=0xc4208298e8
net/http.(*bodyEOFSignal).Read(0xc420708400, 0xc42042c000, 0x200, 0x200, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/transport.go:2035 +0xe9 fp=0xc4208299a0 sp=0xc420829938
redacted(0xc4203761e0, 0xc42042c000, 0x200, 0x200, 0x0, 0x0, 0x0)
    /redacted.go:360 +0x1e3 fp=0xc420829a10 sp=0xc4208299a0
[snip]
redacted(0xc420282270, 0xa6bf80, 0xc420043800, 0xc42024cd90)
    /redacted.go:120 +0x80 fp=0xc420829fc0 sp=0xc420829f98
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420829fc8 sp=0xc420829fc0
created by redacted
    /redacted.go:121 +0x600

It looks like the GC no longer scribbles over any part of reclaimed memory with 0xdeaddeaddeaddead, so double-allocated memory doesn't cause an obvious crash until the uses directly conflict. Which debug flags will help diagnose this, and what change in performance can we expect from enabling them?

Three shots in the dark:

Possibly signal handling? We had a nasty problem related to that, @bcmills knows more.
Recipe for disaster is that competing signal-handler-registerers (think Go vs Java runtimes, for example) fumble the catch-signal-on-alt-stack flag, and sooner or later you take a signal on a goroutine that doesn't have much unused stack. Hilarity ensues.

Another possibility: https://go-review.googlesource.com/c/43294/
This caused a rare problem in the GC that @aclements found.
I don't see how this would get you on 4.4 but not 3.2.

Are you using the GOEXPERIMENT=preemptibleloops experiment for tight-loop-vs-GC-latency
issues? I think there might be a rare bug with that involving backedges and GC, will be fixed in 1.9. Again, no reason I know that should be OS-specific, but maybe some "thing" is a different size under the two OSes and thus a loop trip count changes from bug-avoiding to bug-triggering (because the 1.8 version of the experiment is counter-based, that's a possibility).

Since the crashes occur even with CGO_ENABLED=0, the signal-handling issues I've been tracking probably don't apply.

Another possibility: https://go-review.googlesource.com/c/43294/

If you wanted to patch in a simpler/safer fix for this to try to rule it out, you probably want https://go-review.googlesource.com/c/43311/ instead--it fixes this specific problem rather than the general compiler case.

Thank you all for the ideas. I'll upgrade to go1.8.3 soon after it's out—it looks like it's slated to include cherry-picks of both CL 43294 and CL 43311.

This program doesn't use GOEXPERIMENT=preemptibleloops, and I agree that we can likely rule out inter-runtime signal handler disputes since it's built with CGO_ENABLED=0.

The program uses crypto/rand.Read, which tries to use getrandom(2) when available (on Linux 3.17+). Should we be suspicious of this feature?

I got the following crash from an instance of the program that was running with GODEBUG=gcrescanstacks=1,gccheckmark=1 (and GOGC=1). The crash doesn't look like it's from the GC's assertions that those settings enabled. Setting GOGC=1 on the test instance doesn't seem to have made the program crash more often.

unexpected fault address 0xd01ed0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0xd01ed0 pc=0xd01ed0]

goroutine 1153267 [running]:
runtime.throw(0xa56736, 0x5)
     /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc42059fb08 sp=0xc42059fae8
runtime.sigpanic()
     /usr/local/go/src/runtime/signal_unix.go:297 +0x28c fp=0xc42059fb58 sp=0xc42059fb08
created by redacted
     /redacted.go:155 +0x46a

goroutine 1 [chan receive, 126 minutes]:
runtime.gopark(0xa73c38, 0xc4200e8298, 0xa5bd35, 0xc, 0xc4202b8717, 0x3)
     /usr/local/go/src/runtime/proc.go:271 +0xfd fp=0xc420062c38 sp=0xc420062c08
runtime.goparkunlock(0xc4200e8298, 0xa5bd35, 0xc, 0x42fd17, 0x3)
     /usr/local/go/src/runtime/proc.go:277 +0x5e fp=0xc420062c78 sp=0xc420062c38
runtime.chanrecv(0x9b1f60, 0xc4200e8240, 0x0, 0xc42027c201, 0xc4200e8240)
     /usr/local/go/src/runtime/chan.go:513 +0x24b fp=0xc420062d10 sp=0xc420062c78
runtime.chanrecv1(0x9b1f60, 0xc4200e8240, 0x0)
     /usr/local/go/src/runtime/chan.go:395 +0x35 fp=0xc420062d48 sp=0xc420062d10
[snip]

This program uses Unix domain sockets much more actively than most of our others. Some of the crashes are encountered near the netpoller (like the one below), but I don't know if that's because of memory corruption coming from the netpoller, or if the program is just as likely to crash on any active goroutine and a lot of the active goroutines are working with UDS.

fatal error: netpollblock: corrupted state

goroutine 340603 [running]:
runtime.throw(0x894220, 0x1d)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc420a68ae0 sp=0xc420a68ac0
runtime.netpollblock(0x7f7c182d1e00, 0x72, 0xb)
    /usr/local/go/src/runtime/netpoll.go:350 +0x105 fp=0xc420a68b28 sp=0xc420a68ae0
net.runtime_pollWait(0x7f7c182d1e00, 0x72, 0x14)
    /usr/local/go/src/runtime/netpoll.go:164 +0x59 fp=0xc420a68b50 sp=0xc420a68b28
net.(*pollDesc).wait(0xc4201724c8, 0x72, 0xa68080, 0xa643c8)
    /usr/local/go/src/net/fd_poll_runtime.go:75 +0x38 fp=0xc420a68b78 sp=0xc420a68b50
net.(*pollDesc).waitRead(0xc4201724c8, 0xc420a08a00, 0x200)
    /usr/local/go/src/net/fd_poll_runtime.go:80 +0x34 fp=0xc420a68ba8 sp=0xc420a68b78
net.(*netFD).Read(0xc420172460, 0xc420a08a00, 0x200, 0x200, 0x0, 0xa68080, 0xa643c8)
    /usr/local/go/src/net/fd_unix.go:250 +0x1b7 fp=0xc420a68c10 sp=0xc420a68ba8
net.(*conn).Read(0xc4205cc0c8, 0xc420a08a00, 0x200, 0x200, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:181 +0x70 fp=0xc420a68c80 sp=0xc420a68c10
bytes.(*Buffer).ReadFrom(0xc420a68d20, 0x7f7c184f1108, 0xc4205cc0c8, 0xc420a08a00, 0x0, 0x200)
    /usr/local/go/src/bytes/buffer.go:179 +0x160 fp=0xc420a68cf0 sp=0xc420a68c80
io/ioutil.readAll(0x7f7c184f1108, 0xc4205cc0c8, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/io/ioutil/ioutil.go:33 +0x150 fp=0xc420a68d98 sp=0xc420a68cf0
io/ioutil.ReadAll(0x7f7c184f1108, 0xc4205cc0c8, 0xc4205cc0c8, 0x7f7c184f1108, 0xc4205cc0c8, 0x0, 0x0)
    /usr/local/go/src/io/ioutil/ioutil.go:42 +0x3e fp=0xc420a68de8 sp=0xc420a68d98
[snip]

Upgrading to go1.8.3 did not resolve the issue, and we've narrowed it down to being related to os/exec.

A coworker reminded me of the FreeBSD os/exec memory corruption bug—#15658—and indeed, the first reproducer there (from May 11, 2016) leads to crashes on a Linux 4.4 system.

$ uname -a
Linux redacted 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Here are the beginnings of the three crashes I've seen with the May 11, 2016 reproducer so far (from spending around 100–250 process-hours of execution):

Starting a bunch of goroutines...
unexpected fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x461df2]

goroutine 189069 [running]:
runtime.throw(0x4cd41c, 0x5)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc420067a18 sp=0xc4200679f8
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:297 +0x28c fp=0xc420067a68 sp=0xc420067a18
syscall.ByteSliceFromString(0x6962732f7273752f, 0x657572742f6e, 0xc4202ce130, 0xc4202ce070, 0x9, 0xc4202ce110, 0xa)
    /usr/local/go/src/syscall/syscall.go:51 +0x32 fp=0xc420067ac8 sp=0xc420067a68
syscall.BytePtrFromString(0x6962732f7273752f, 0x657572742f6e, 0x2, 0xc4202ce130, 0x2)
    /usr/local/go/src/syscall/syscall.go:71 +0x39 fp=0xc420067b10 sp=0xc420067ac8
syscall.SlicePtrFromStrings(0xc4202ce000, 0x1, 0x1, 0x0, 0x0, 0x7f2bde7d3113, 0x7f2bde7d3140, 0x1e)
    /usr/local/go/src/syscall/exec_unix.go:87 +0x97 fp=0xc420067b80 sp=0xc420067b10
syscall.forkExec(0xc4202ce070, 0x9, 0xc4202ce000, 0x1, 0x1, 0xc420067d90, 0x0, 0x0, 0x0)
    /usr/local/go/src/syscall/exec_unix.go:154 +0x10d fp=0xc420067ca8 sp=0xc420067b80
syscall.StartProcess(0xc4202ce070, 0x9, 0xc4202ce000, 0x1, 0x1, 0xc420067d90, 0x2, 0x4, 0x0, 0xc420067d60)
    /usr/local/go/src/syscall/exec_unix.go:240 +0x64 fp=0xc420067d00 sp=0xc420067ca8
os.startProcess(0xc4202ce070, 0x9, 0xc4202ce000, 0x1, 0x1, 0xc420067f38, 0xc4203f84e0, 0x1a, 0x1a)
    /usr/local/go/src/os/exec_posix.go:45 +0x1a3 fp=0xc420067de8 sp=0xc420067d00
os.StartProcess(0xc4202ce070, 0x9, 0xc4202ce000, 0x1, 0x1, 0xc420067f38, 0x0, 0x0, 0xc4202ce070)
    /usr/local/go/src/os/exec.go:94 +0x64 fp=0xc420067e40 sp=0xc420067de8
os/exec.(*Cmd).Start(0xc4203e0000, 0x4, 0x0)
    /usr/local/go/src/os/exec/exec.go:359 +0x3d2 fp=0xc420067f90 sp=0xc420067e40
main.run(0xc420092060)
    /tmp/15658.go:11 +0x68 fp=0xc420067fd8 sp=0xc420067f90
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420067fe0 sp=0xc420067fd8
created by main.main
    /tmp/15658.go:35 +0x141

goroutine 1 [runnable]:
main.main()
    /tmp/15658.go:34 +0x11f

[...]
Starting a bunch of goroutines...
fatal error: forEachP: P did not run fn
acquirep: p->m=842353238016(37) p->status=1
fatal error: acquirep: invalid p state
acquirep: p->m=0(0) p->status=3
fatal error: acquirep: invalid p state

runtime stack:
runtime.throw(0x4d09a1, 0x1a)
    /usr/local/go/src/runtime/panic.go:596 +0x95
runtime.forEachP(0x4d4460)
    /usr/local/go/src/runtime/proc.go:1276 +0x2df
runtime.gcMarkDone.func1()
    /usr/local/go/src/runtime/mgc.go:1153 +0x2d
runtime.systemstack(0xc420436d40)
    /usr/local/go/src/runtime/asm_amd64.s:327 +0x79
runtime.mstart()
    /usr/local/go/src/runtime/proc.go:1132

goroutine 400850 [running]:
runtime.systemstack_switch()
    /usr/local/go/src/runtime/asm_amd64.s:281 fp=0xc42012ef30 sp=0xc42012ef28
runtime.gcMarkDone()
    /usr/local/go/src/runtime/mgc.go:1154 +0xe9 fp=0xc42012ef50 sp=0xc42012ef30
runtime.gcBgMarkWorker(0xc42001b300)
    /usr/local/go/src/runtime/mgc.go:1581 +0x303 fp=0xc42012efd8 sp=0xc42012ef50
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42012efe0 sp=0xc42012efd8
created by runtime.gcBgMarkStartWorkers
    /usr/local/go/src/runtime/mgc.go:1412 +0x98

goroutine 1 [chan receive]:
main.main()
    /tmp/15658.go:34 +0x11f

[...]
Starting a bunch of goroutines...
fatal error: workbuf is not empty

runtime stack:
runtime.throw(0x4cfa21, 0x14)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc420195df0 sp=0xc420195dd0
runtime.(*workbuf).checkempty(0x7f9ead2609c0)
    /usr/local/go/src/runtime/mgcwork.go:320 +0x4e fp=0xc420195e10 sp=0xc420195df0
runtime.getempty(0xc420023828)
    /usr/local/go/src/runtime/mgcwork.go:332 +0x96 fp=0xc420195e48 sp=0xc420195e10
runtime.(*gcWork).init(0xc420022528)
    /usr/local/go/src/runtime/mgcwork.go:90 +0x22 fp=0xc420195e60 sp=0xc420195e48
runtime.(*gcWork).tryGet(0xc420022528, 0x27)
    /usr/local/go/src/runtime/mgcwork.go:156 +0xdb fp=0xc420195e88 sp=0xc420195e60
runtime.gcDrain(0xc420022528, 0x2)
    /usr/local/go/src/runtime/mgcmark.go:1075 +0x2eb fp=0xc420195ec0 sp=0xc420195e88
runtime.gchelper()
    /usr/local/go/src/runtime/mgc.go:1912 +0x110 fp=0xc420195ef0 sp=0xc420195ec0
runtime.stopm()
    /usr/local/go/src/runtime/proc.go:1653 +0xdd fp=0xc420195f18 sp=0xc420195ef0
runtime.gcstopm()
    /usr/local/go/src/runtime/proc.go:1854 +0xb7 fp=0xc420195f40 sp=0xc420195f18
runtime.schedule()
    /usr/local/go/src/runtime/proc.go:2186 +0x110 fp=0xc420195f80 sp=0xc420195f40
runtime.park_m(0xc4202ec000)
    /usr/local/go/src/runtime/proc.go:2285 +0xab fp=0xc420195fb8 sp=0xc420195f80
runtime.mcall(0x0)
    /usr/local/go/src/runtime/asm_amd64.s:269 +0x5b fp=0xc420195fc8 sp=0xc420195fb8

goroutine 1 [chan receive]:
runtime.gopark(0x4d45a8, 0xc4200940b8, 0x4ce3a3, 0xc, 0x17, 0x3)
    /usr/local/go/src/runtime/proc.go:271 +0x13a fp=0xc420064e00 sp=0xc420064dd0
runtime.goparkunlock(0xc4200940b8, 0x4ce3a3, 0xc, 0x44b417, 0x3)
    /usr/local/go/src/runtime/proc.go:277 +0x5e fp=0xc420064e40 sp=0xc420064e00
runtime.chanrecv(0x4ab280, 0xc420094060, 0x0, 0xc420064f01, 0x49a6f1)
    /usr/local/go/src/runtime/chan.go:513 +0x371 fp=0xc420064ee0 sp=0xc420064e40
runtime.chanrecv1(0x4ab280, 0xc420094060, 0x0)
    /usr/local/go/src/runtime/chan.go:395 +0x35 fp=0xc420064f18 sp=0xc420064ee0
main.main()
    /tmp/15658.go:34 +0x11f fp=0xc420064f88 sp=0xc420064f18
runtime.main()
    /usr/local/go/src/runtime/proc.go:185 +0x20a fp=0xc420064fe0 sp=0xc420064f88
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420064fe8 sp=0xc420064fe0

[...]

I tried the March 8, 2017 reproducer with seven variants, but none crashed within a few seconds. I haven't seen variant (1) crash after 60 process-hours of execution.


My very slow and not terribly reliable reproducer (version c7141caadf in a private repo) is below:

package main

import (
    "context"
    "flag"
    "log"
    "os/exec"
    "sync"
    "sync/atomic"
)

func main() {
    var (
        workers = flag.Int("workers", 10, "worker count")
        cmdIter = flag.Int("cmd_iter", 10000, "iterations per command worker")
        cmd     = flag.String("cmd", "/bin/true", "command to run")
    )
    flag.Parse()

    ctx := context.Background()

    var wg sync.WaitGroup
    ctx, cancel := context.WithCancel(ctx)
    errs := make(chan error, 1)
    for i := 0; i < *workers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()

            for j := 0; j < *cmdIter; j++ {
                sink.Store(make([]*byte, 1<<20))

                err := doCmd(ctx, *cmd)

                if err != nil {
                    cancel()
                    select {
                    case errs <- err:
                    default:
                    }
                    return
                }
            }
        }()
    }

    wg.Wait()
    cancel()
    close(errs)
    err := <-errs
    if err != nil {
        log.Fatalf("%v", err)
    }
}

var sink atomic.Value

func doCmd(ctx context.Context, command string) error {
    ctx, cancel := context.WithCancel(ctx)
    defer cancel()

    cmd := exec.Command(command)

    err := cmd.Start()
    if err != nil {
        return err
    }

    done := make(chan error, 1)
    go func() {
        err := cmd.Wait()
        done <- err
    }()

    select {
    case <-ctx.Done():
        err := cmd.Process.Kill()
        if err != nil {
            return err
        }
        return ctx.Err()
    case err = <-done:
        return err
    }
}

It crashed with the following error while running under perf trace record while running with the arguments -cmd=/bin/true -workers=10 -cmd_iter=300. That was one crash out of 4480 attempts.

The "stress" command here is golang.org/x/tools/cmd/stress

$ sudo ./stress -timeout=1h -p=10 -- bash -c 'trace_file="/tmp/perf-trace-$(date +%s)-$$" && echo "$trace_file" && perf trace record --quiet --output="$trace_file" -- ./20427.c7141caadf -cmd=/bin/true -workers=10 -cmd_iter=300 && (rm -- "$trace_file" || true)'

fatal error: unexpected signal during runtime execution
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x42d1bf]

runtime stack:
runtime.throw(0x4e755d, 0x2a)
    /usr/local/go/src/runtime/panic.go:596 +0x95
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:274 +0x2db
runtime.execute(0xc4238c4000, 0xc42001b300)
    /usr/local/go/src/runtime/proc.go:1874 +0x17f
runtime.schedule()
    /usr/local/go/src/runtime/proc.go:2239 +0x127
runtime.exitsyscall0(0xc420001a00)
    /usr/local/go/src/runtime/proc.go:2772 +0xec
runtime.mcall(0x0)
    /usr/local/go/src/runtime/asm_amd64.s:269 +0x5b

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc42001230c)
    /usr/local/go/src/runtime/sema.go:47 +0x34
sync.(*WaitGroup).Wait(0xc420012300)
    /usr/local/go/src/sync/waitgroup.go:131 +0x7a
main.main()
    main.go:47 +0x27e

[...]

I don't see an obvious smoking gun in the perf trace output. I've shared a portion of it at https://gist.github.com/rhysh/be58b5a5f3a41e218a05f1d8c3522dc0: Line 7 shows thread 16347 begin a call to clone, line 671 and 672 show that call resulting in the creation of process 86746. Lines 677 and 688 show the execve call for /bin/true, and 727 shows the exit_group call that ends the /bin/true process. Line 984 shows thread 16347 "returning" from rt_sigreturn (I don't understand how this works) with value 13, which happens to be the number of bytes in "fatal error: ", followed by a 42-byte write to fd 2 (probably "unexpected signal during runtime execution") and a 1-byte write to fd 2 (probably "\n").

@rhysh How hard would it be for you to test https://golang.org/cl/43713 to see if it changes anything for you?

Thanks @ianlancetaylor , I've cherry-picked CL 43713 PS 2 onto go1.8.3, and I'll try both my reproducer (version c7141caadf) and the 2016-05-11 FreeBSD reproducer with it.

Overnight with go1.8.3, I saw 4 failures out of 478 attempts on my reproducer (with 10 workers each executing /bin/true 3000 times), and 3 failures out of 182 attempts at running the 2016-05-11 reproducer (aborted and restarted after an hour).

I cherry-picked CL 43713 PS 2 onto go1.8.3 _and deleted ./VERSION_, and ran the reproducers with the resulting toolchain over the weekend.

The 2016-05-11 FreeBSD reproducer encountered 21 failures out of 690 attempts (timing out and retrying after 1 hour each), and my reproducer encountered only one out of 6870 attempts.


My reproducer (version c7141caadf) got this error:

2017/06/03 02:29:51 exec: not started


ERROR: exit status 1

which I hadn't seen before. It was running on ubuntu's kernel 4.4.0-66-generic:

$ uname -a
Linux redacted 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I don't see how the Wait call would return this error without Start returning a more descriptive error.


Below are some of the errors from the weekend's run of the 2016-05-11 FreeBSD reproducer. It was running on ubuntu's kernel 4.4.0-78-generic:

$ uname -a
Linux redacted 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The kernel bug that CL 43713 is working around (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819) looks, based on comments 16 and 17 there and the filenames in 16's links, like it was fixed in 4.4.0-77-generic. I'd expect that 4.4.0-78-generic would include the fix as well, but I haven't checked the sources.

Starting a bunch of goroutines...
fatal error: workbuf is not empty
fatal error: workbuf is not empty

runtime stack:
runtime.throw(0x4cfa1a, 0x14)
        /usr/local/go/src/runtime/panic.go:596 +0x95
runtime.(*workbuf).checkempty(0x7f549c501000)
        /usr/local/go/src/runtime/mgcwork.go:320 +0x4e
runtime.getempty(0x7f549c472800)
[...]
Starting a bunch of goroutines...
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x40f47a]

goroutine 746308 [running]:
runtime.throw(0x4d32d9, 0x2a)
        /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc4204d35c0 sp=0xc4204d35a0
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:274 +0x2db fp=0xc4204d3610 sp=0xc4204d35c0
runtime.markBits.setMarked(0x0, 0x1, 0x0)
[...]
Starting a bunch of goroutines...
acquirep: p->m=842352011264(36) p->status=1
fatal error: acquirep: invalid p state
fatal error: releasep: invalid arg

runtime stack:
runtime.throw(0x4d068d, 0x19)
        /usr/local/go/src/runtime/panic.go:596 +0x95
runtime.acquirep1(0xc420020000)
        /usr/local/go/src/runtime/proc.go:3663 +0x16b
runtime.acquirep(0xc420020000)
[...]

CL 43713 PS 2 seems to serialize calls of runtime.newm and calls of syscall.Exec, but only runtime.newm would be relevant to my reproducer since syscall.forkAndExecInChild calls syscall.RawSyscall* directly rather than via syscall.Exec.

Calls from syscall.forkExec to syscall.forkAndExecInChild are already serialized via syscall.ForkLock, though they're not forced to have any particular relationship with calls to runtime.newm.

CL 43713 PS 2 doesn't fix the problem, and I don't understand enough about its approach to see how to modify it to this issue's needs. Locking between runtime.newm and code that might be preemptible seems risky. It might make sense to take the new execlock in runtime_BeforeFork (and later release it), but runtime.lock isn't marked nosplit, so I'm not sure it could be called in that section of syscall.forkAndExecInChild.


Does this seem like it's in fact a kernel bug, and does it seem like one that the Go runtime could work around?

What can I try next to either fix or to work around the problem?

Does this seem like it's in fact a kernel bug, and does it seem like one that the Go runtime could work around?

What can I try next to either fix or to work around the problem?

Those are good questions but I don't have any good answers. It certainly looks like some sort of memory corruption is occurring, but I have no idea where it is coming from. Since I haven't seen any reports of anything similar, I suppose the first place to look is your program itself. It's at least conceivable that the change in kernel version is changing the memory allocation pattern in such a way that errors that were previously hidden are now exposed. But frankly it doesn't seem especially likely to me. It's just that everything else I can think of seems even less likely.

I don't know what to make of it either, but the reproducer is pretty small.

I've looked for sources of corruption in my original program .. the usual suspects of cgo, unsafe, and data races turned up nothing. It's now built with CGO_ENABLED=0, unsafe is only used in the stdlib and protobuf, and the program has run for hours on a real workload with the race detector finding nothing.

The crashiest reproducer I've found is a 38-line program from the FreeBSD bug (#15658), written 2016-05-11 (included below).


package main

/* stdlib includes */
import (
        "fmt"
        "os/exec"
)

func run(done chan struct{}) {
        cmd := exec.Command("true")
        if err := cmd.Start(); err != nil {
                goto finished
        }

        cmd.Wait()

finished:
        done <- struct{}{}
        return
}

func main() {
        fmt.Println("Starting a bunch of goroutines...")

        // 8 & 16 are arbitrary
        done := make(chan struct{}, 16)

        for i := 0; i < 8; i++ {
                go run(done)
        }

        for {
                select {
                case <-done:
                        go run(done)
                }
        }
}

@rhysh, I'm going to attempt to reproduce this in qemu using an Ubuntu 16.04.2 image. Apologies if you've already answered these questions and I missed it:

1) How many CPUs are you running with?

2) How exactly are you running the reproducer from https://github.com/golang/go/issues/20427#issuecomment-306346724? E.g., are you running multiple in parallel on the same host and/or on different hosts?

Thanks @aclements

  1. I'm running with 36 and 40 CPUs: 36 on the 4.4.0-66-generic machine (it's a c4.8xlarge on EC2) with my reproducer (version c7141caadf), and 40 on the 4.4.0-78-generic machine (bare metal) with the 2016-05-11 reproducer.

  2. I cross-compiled that code from darwin/amd64 for linux/amd64, and am running it with golang.org/x/tools/cmd/stress like so:

$ nohup env GOGC=5 ./stress.linux_amd64 -timeout=1h -p=10 -ignore='SIGABRT: abort' ./15658.go183.cl43713.ps2.linux_amd64 &

During last week, I often ran multiple copies of stress on those two machines to try other reproducers. The two machines are also currently doing other work—the 36-core EC2 machine is running an unrelated application, and the 40-core bare-metal machine is running a particularly crash-prone version of our affected application. I can control for these variables if you think they might have an effect.

Thanks. I'm now running this on an 8-core KVM-enabled qemu with 4.4.0-78-generic. If that turns up something, I'll try running it under software-only mode. And if that turns up something, my plan is to tweak qemu to record a memory trace and see if I can make something of that.

I've been running this both on KVM/qemu and on a real 48 core machine running the 4.4.0-78-generic Ubuntu kernel for the past week with no failures.

I'm not really sure where to go from here. If you can capture some core dumps from the 2016-05-11 reproducer, I might be able to find a pattern in them. Alternatively, I have a pin-based memory tracer I could dust off and we could try running under that, but it pretty seriously perturbs the execution.

Does look similar, I have some Linux boxes I'll see if my repo case triggers on there

Ok the simple for test running ubuntu 4.4.0-79-generic does repo but it seems rare, once so far in 24 hours on a old L5520 with 24 cores.

fatal error: runtime·unlock: lock count.
fatal error: runtime·unlock: lock count.
fatal error: runtime·unlock: lock count.
...
fatal error: runtime·unlock: lock count.
fatal: morestack on g0
SIGTRAP: trace trap

@stevenh, thanks. Same question for you: can you capture some core dumps from the failures? (ulimit -c unlimited; export GOTRACEBACK=crash and make sure cores are going somewhere reasonable :)

@stevenh and @rhysh, is it possible to make the reproducer more deterministic? For example, does it need all of the goroutines or is it enough to run the exec in a loop (possibly from many separate processes)? I assume it does need the non-determinism, but if it were more deterministic, it would be much easier to compare failures (and successes).

I have some traces with GOTRACEBACK=crash but no core dumps. We usually use FreeBSD and didn't realise Linux had the dubious default of not creating cores, so still waiting for it to crash again, I'll post as soon as I get them.

I'm not sure if its possible to crash with just one goroutine, likely it is but could well take much longer to reproduce and it already takes quite some time. I've kicked off some tests under stress and will report back if I get anything.

We usually use FreeBSD and didn't realise Linux had the dubious default of not creating cores, so still waiting for it to crash again, I'll post as soon as I get them.

Yeah. :( It seems to be an increasingly popular bandwagon to send core dumps to /dev/null even if the user asked for them (and it requires root privileges to change that, ugh). Thanks for testing.

@rhysh or @stevenh, while you're testing, another thing to try is to remove the CLONE_VFORK flag from os/exec_linux.go:forkAndExecInChild. This is purely a hunch, though.

Ok I got a crash under stress with only one goroutine (main).

Unfortunately I don't have crash output as the log file the shell script stress was running was setup for a single instance, so it got overwritten :(

While I wait for it to crash again here's the info.

#0  runtime.raise () at /usr/local/go/src/runtime/sys_linux_amd64.s:113
#1  0x000000000043a50b in runtime.dieFromSignal (sig=6) at /usr/local/go/src/runtime/signal_unix.go:323
#2  0x000000000043a699 in runtime.crash () at /usr/local/go/src/runtime/signal_unix.go:405
#3  0x0000000000426bcc in runtime.dopanic_m (gp=0xc42017e000, pc=4351445, sp=842353294808) at /usr/local/go/src/runtime/panic.go:714
#4  0x000000000044cc6c in runtime.dopanic.func1 () at /usr/local/go/src/runtime/panic.go:578
#5  0x000000000044de4b in runtime.systemstack () at /usr/local/go/src/runtime/asm_amd64.s:343
#6  0x00000000004264e1 in runtime.dopanic (unused=0) at /usr/local/go/src/runtime/panic.go:579
#7  0x00000000004265d5 in runtime.throw (s="unexpected signal during runtime execution") at /usr/local/go/src/runtime/panic.go:596
#8  0x000000000043a4bb in runtime.sigpanic () at /usr/local/go/src/runtime/signal_unix.go:274
#9  0x000000c4202b3e70 in ?? ()
#10 0x000000000040c48b in runtime.notesleep (n=0xc42015b110) at /usr/local/go/src/runtime/lock_futex.go:145
#11 0x000000000042bacd in runtime.stopm () at /usr/local/go/src/runtime/proc.go:1650
#12 0x000000000042c8e4 in runtime.findrunnable (gp#10=0xc42001b300, inheritTime=false) at /usr/local/go/src/runtime/proc.go:2102
#13 0x000000000042d47c in runtime.schedule () at /usr/local/go/src/runtime/proc.go:2222
#14 0x000000000042d77b in runtime.park_m (gp=0xc4201fd1e0) at /usr/local/go/src/runtime/proc.go:2285
#15 0x000000000044dd7b in runtime.mcall () at /usr/local/go/src/runtime/asm_amd64.s:269
#16 0x000000c42017e040 in ?? ()
#17 0x0000000000000000 in ?? ()
package main

import (
        "log"
        "os/exec"
)

func run() {
        cmd := exec.Command("true")
        if err := cmd.Start(); err != nil {
                log.Fatal(err)
        }
        cmd.Wait()
}

func main() {
        for {
                run()
        }
}

This is with go 1.8.3. I'll wait for another concrete positive to capture the crash log before doing the CLONE_VFORK test.

Here's the results of the single goroutine run, two crashes so far.

Went to do the CLONE_VFORK but I guess that's a head feature, as there's no use of CLONE_VFORK in 1.8.3

I have more cores if required.

Thank you for simplifying the reproducer, @stevenh . It crashes on my test machine too. The test machine is a c4.8xlarge EC2 instance, with the following kernel:

$ uname -a
Linux redacted 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

@stevenh can you share the cores you collected? The link you shared gives me a 404.

With the June 19th reproducer and go1.8.3, I saw 2 crashes out of 685 runs, run as follows (leading to 36 concurrent instances of the test program, running for at most one hour each):

$ nohup ./stress.linux_amd64 -timeout=1h -ignore='SIGABRT: abort' ./20427-20170619-go183.linux_amd64 &

I modified the reproducer to call runtime.LockOSThread in main.init and saw 0 crashes out of 864 one-hour runs. (This is probably not enough runs to conclude whether or not it will crash.)

I modified the reproducer to always use os/exec in a non-main locked OS thread, which has had only 36 runs so far (0 crashes).


$ cat ./20170619-lockmain/main.go 
package main

import (
        "log"
        "os/exec"
        "runtime"
)

func init() {
        runtime.LockOSThread()
}

func run() {
        cmd := exec.Command("true")
        if err := cmd.Start(); err != nil {
                log.Fatal(err)
        }
        cmd.Wait()
}

func main() {
        for {
                run()
        }
}
$ cat ./20170619-locknotmain/main.go 
package main

import (
        "log"
        "os/exec"
        "runtime"
)

func init() {
        runtime.LockOSThread()
}

func run() {
        cmd := exec.Command("true")
        if err := cmd.Start(); err != nil {
                log.Fatal(err)
        }
        cmd.Wait()
}

func main() {
        go func() {
                runtime.LockOSThread()
                for {
                        run()
                }
        }()

        select {}
}

@rhysh link to cores fixed, typoed ;-)

Just a hunch, can it be related to a Intel Skylake/Kaby lake hyperthreading hardware bug (some details here)? If anyone who was able to reproduce the bug is running on these CPUs with hyperthreading enabled, and disabling ht helps, this may be a strong hint that this is indeed the issue.

Already asked, no, hardware here is way too old.

I've done 5436 one-hour runs of 20170619-locknotmain built with go1.8.3 and have seen 0 failures.

I think the next diagnosis step is to use perf trace record to collect failures of the 20170619 reproducer (original non-thread-locked version) _with full call graphs_ (via -g). That should make it possible not only to see the order of syscalls leading up to the crash, but to get a sense of what exactly is responsible for those calls.

It might be interesting to try to reproduce this after editing syscall/exec_linux.go to not or SIGCHLD with sys.CloneFlags. That is, just pass sys.CloneFlags by itself as the second argument to clone. That will change the code to not send the SIGCHLD signal to the process for each process, which means that we will not be invoking the Go signal handler. (It should not make a difference, but then of course this problem should not happen at all.)

I've run the single-goroutine reproducer (20170619) under perf trace for the last 17 days and have not yet seen it fail .. it might be that perf trace changes a key detail which hides the problem, but it's also possible that my bash wrapper (to save the trace when stress kills the program on timeout) is buggy. It's done 24710 runs of 10 minutes each, with 10 executions happening concurrently. (I had to dial it down so the perf trace logs wouldn't fill the instance's disk.)

I've run the single-goroutine reproducer (without perf trace) on some other free hardware for the last six days to re-establish a baseline for how often it crashes. The machines are slightly heterogeneous so I won't describe them here .. but they see on average around 0.6 crashes per machine-day. There have been 23 crashes over six days, with six machines running the reproducer for 10-minute intervals with default (n=numcpu) concurrency.

I'm now running the single-goroutine reproducer with @ianlancetaylor 's suggestion of removing SIGCHLD from the call to clone(2):

diff --git a/VERSION b/VERSION
deleted file mode 100644
index 3e144cd980..0000000000
--- a/VERSION
+++ /dev/null
@@ -1 +0,0 @@
-go1.8.3
\ No newline at end of file
diff --git a/src/syscall/exec_linux.go b/src/syscall/exec_linux.go
index 979b6a247a..771742adbf 100644
--- a/src/syscall/exec_linux.go
+++ b/src/syscall/exec_linux.go
@@ -98,7 +98,7 @@ func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, attr
        if runtime.GOARCH == "s390x" {
                r1, _, err1 = RawSyscall6(SYS_CLONE, 0, uintptr(SIGCHLD)|sys.Cloneflags, 0, 0, 0, 0)
        } else {
-               r1, _, err1 = RawSyscall6(SYS_CLONE, uintptr(SIGCHLD)|sys.Cloneflags, 0, 0, 0, 0, 0)
+               r1, _, err1 = RawSyscall6(SYS_CLONE, sys.Cloneflags, 0, 0, 0, 0, 0)
        }
        if err1 != 0 {
                runtime_AfterFork()
$ go tool objdump -s 'syscall\.forkAndExecInChild' /tmp/20427-20170619-go183.linux_amd64 | grep 'exec_linux.go:101' | grep OR | wc -l
       1
$ go tool objdump -s 'syscall\.forkAndExecInChild' /tmp/20427-20170619-go183-nosigchld.linux_amd64 | grep 'exec_linux.go:101' | grep OR | wc -l
       0

With the syscall package modified to not request SIGCHLD, the reproducer still crashes. Also, the OS seems to ignore the request to not send SIGCHLD.

On vanilla go1.8.3, the reproducer crashed 23 times out of 188727 attempts. With the above modification, it has crashed 2 times out of 61346 attempts.

I've verified with strace that the clone syscall is called with 0 flags rather than 17/SIGCHLD. However, strace also shows that the process is receiving SIGCHLD in spite of that.

The SIGCHLD delivery doesn't always show up in the same order within these syscalls, I've plucked two examples where it was:

$ uname -a
Linux redacted 4.4.0-75-generic #96-Ubuntu SMP Thu Apr 20 09:56:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ strace ./20427-20170619-go183-nosigchld.linux_amd64 
execve("./20427-20170619-go183-nosigchld.linux_amd64", ["./20427-20170619-go183-nosigchld"...], [/* 25 vars */]) = 0
[...]
stat("/usr/bin/true", 0xc4200f42a8)     = -1 ENOENT (No such file or directory)
stat("/sbin/true", 0xc4200f4378)        = -1 ENOENT (No such file or directory)
stat("/bin/true", {st_mode=S_IFREG|0755, st_size=27280, ...}) = 0
openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/dev/null", O_WRONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/dev/null", O_WRONLY|O_CLOEXEC) = 5
pipe2([6, 7], O_CLOEXEC)                = 0
getpid()                                = 8189
clone(child_stack=0, flags=)            = 11048
close(7)                                = 0
read(6, "", 8)                          = 0
close(6)                                = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
waitid(P_PID, 11048, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11048, si_uid=10018, si_status=0, si_utime=0, si_stime=0}, WEXITED|WNOWAIT, NULL) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11048, si_uid=10018, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 0
wait4(11048, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 11048
[...]

$ strace ./20427-20170619-go183.linux_amd64
execve("./20427-20170619-go183.linux_amd64", ["./20427-20170619-go183.linux_amd"...], [/* 25 vars */]) = 0
[...]
stat("/usr/bin/true", 0xc4200f33b8)     = -1 ENOENT (No such file or directory)
stat("/sbin/true", 0xc4200f3488)        = -1 ENOENT (No such file or directory)
stat("/bin/true", {st_mode=S_IFREG|0755, st_size=27280, ...}) = 0
openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/dev/null", O_WRONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/dev/null", O_WRONLY|O_CLOEXEC) = 5
pipe2([6, 7], O_CLOEXEC)                = 0
getpid()                                = 32205
clone(child_stack=0, flags=SIGCHLD)     = 33655
close(7)                                = 0
read(6, "", 8)                          = 0
close(6)                                = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
waitid(P_PID, 33655, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=33655, si_uid=10018, si_status=0, si_utime=0, si_stime=0}, WEXITED|WNOWAIT, NULL) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=33655, si_uid=10018, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 0
wait4(33655, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 33655
[...]

Here's one crash output:

gp 0xc4200001a0 goid 1 status 2 gcscandone false gcscanvalid false
fatal error: scan missed a g

goroutine 5 [running]:
runtime.throw(0x4cfd40, 0xf)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc42004dee0 sp=0xc42004dec0
runtime.gcMarkRootCheck()
    /usr/local/go/src/runtime/mgcmark.go:162 +0x22e fp=0xc42004df30 sp=0xc42004dee0
runtime.gcMarkDone()
    /usr/local/go/src/runtime/mgc.go:1164 +0xee fp=0xc42004df50 sp=0xc42004df30
runtime.gcBgMarkWorker(0xc420018000)
    /usr/local/go/src/runtime/mgc.go:1581 +0x303 fp=0xc42004dfd8 sp=0xc42004df50
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42004dfe0 sp=0xc42004dfd8
created by runtime.gcBgMarkStartWorkers
    /usr/local/go/src/runtime/mgc.go:1412 +0x98

goroutine 1 [syscall]:
syscall.Syscall(0x0, 0x6, 0xc4204a2b90, 0x8, 0x0, 0x0, 0x0)
    /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.readlen(0x6, 0xc4204a2b90, 0x8, 0x2, 0xc42064e000, 0xd)
    /usr/local/go/src/syscall/zsyscall_linux_amd64.go:1085 +0x56
syscall.forkExec(0xc420632000, 0x9, 0xc42060e000, 0x1, 0x1, 0xc4204a2d20, 0x0, 0x0, 0x0)
    /usr/local/go/src/syscall/exec_unix.go:202 +0x3d3
syscall.StartProcess(0xc420632000, 0x9, 0xc42060e000, 0x1, 0x1, 0xc4204a2d20, 0x2, 0x4, 0xc420016080, 0xc4204a2cf0)
    /usr/local/go/src/syscall/exec_unix.go:240 +0x64
os.startProcess(0xc420632000, 0x9, 0xc42060e000, 0x1, 0x1, 0xc4204a2ec8, 0xc42064a000, 0xc, 0xc)
    /usr/local/go/src/os/exec_posix.go:45 +0x1a3
os.StartProcess(0xc420632000, 0x9, 0xc42060e000, 0x1, 0x1, 0xc4204a2ec8, 0x0, 0x0, 0xc420632000)
    /usr/local/go/src/os/exec.go:94 +0x64
os/exec.(*Cmd).Start(0xc420610000, 0x4, 0x0)
    /usr/local/go/src/os/exec/exec.go:359 +0x3d2
main.run()
    main.go:10 +0x68
main.main()
    main.go:18 +0x20


ERROR: exit status 2

And the other crash output:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0xc420047748 pc=0xc420047748]

goroutine 4 [running]:
runtime.throw(0x4d4567, 0x2a)
    /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc4200476c8 sp=0xc4200476a8
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:274 +0x2db fp=0xc420047718 sp=0xc4200476c8
runtime.goparkunlock(0x0, 0x4d54f8, 0xc420474020, 0x2000000020, 0x1)
    /usr/local/go/src/runtime/proc.go:278 +0x5e fp=0xc420047758 sp=0xc420047718
runtime.runfinq()
    /usr/local/go/src/runtime/mfinal.go:208 +0x205 fp=0xc4200477e0 sp=0xc420047758
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc4200477e8 sp=0xc4200477e0
created by runtime.createfing
    /usr/local/go/src/runtime/mfinal.go:142 +0x62

goroutine 1 [runnable]:
os.(*File).Close(0xc420158120, 0xc4204b9ec8, 0xc420012ff0)
    /usr/local/go/src/os/file_unix.go:128 +0x78
os/exec.(*Cmd).closeDescriptors(0xc4202b2dc0, 0xc4202b6500, 0x3, 0x4)
    /usr/local/go/src/os/exec/exec.go:263 +0x49
os/exec.(*Cmd).Start(0xc4202b2dc0, 0x4, 0x0)
    /usr/local/go/src/os/exec/exec.go:366 +0x44b
main.run()
    main.go:10 +0x68
main.main()
    main.go:18 +0x20


ERROR: exit status 2

Thanks for testing. I wonder why SIGCHLD was getting delivered? But it doesn't matter much, as the point of that test was to see whether this problem might have been fixed by the fix for #18600. At this point it's clear that we are not going to fix this problem for 1.9. So my immediate question now is whether it has already been fixed in 1.9, or whether we need to keep working on this for 1.10.

Thanks, I hadn't seen that patch—masking all signals during the fork looks promising, and I hadn't tested with tip since that change was merged. (Reproducing this is very slow, which discourages changing variables like Go version :-/ )

I'm running the reproducer now with go1.9rc1. I'll update here when I know more.

1.9rc1 doesn't fix it for FreeBSD still get crashes e.g.
acquirep: p->m=842351098880(48) p->status=1 fatal error: acquirep: invalid p state acquirep: p->m=0(0) p->status=3 fatal error: acquirep: invalid p state

I haven't seen this crash since updating the affected applications (and reproducers) to Go 1.9. I haven't bisected to find the commit that fixed it, but I can see how https://golang.org/cl/45471 could have done it.

Thanks for following up.

Been running a test here for 5 days now, usually fails every 1-3days, so its looking good here too.

If its still running without out error by the end of the week, I'd say we can close this issue.

Not been able to reproduce this and the test has been running for 2 weeks now, so I'd consider this fixed.

Thanks for testing it so thoroughly.

I've been tracking down these crashes on node_exporter and stumbled upon this Go bug. Running the reproducer from the above comment on my host (go version go1.9.1 linux/amd64, from a Gentoo build of dev-lang/go) reliably crashes in less than 10 seconds. I don't think this is fixed.

uname -a: Linux raider 4.13.9-gentoo #1 SMP PREEMPT Wed Nov 1 20:07:45 JST 2017 x86_64 Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz GenuineIntel GNU/Linux
I've been doing some stress memtesting and things are clean; the machine is fairly heavily used (it's my workstation) and otherwise stable.

Edit: sometimes the reproducer seems to hang instead of crashing. Here is a tarball with 28 crashes and 2 lockups (manually killed), with the reproducer source and binary.

A build made on an Arch Linux machine (also go1.9.1) crashes the same way on this host, so this isn't a problem with the Gentoo Go toolchain. However, the reproducer doesn't immediately crash the same way on the Arch box. I think there's something about my workstation that makes it really easy to reproduce this bug.

Edit 2: GOMAXPROCS=1 seems to work around the problem (so far, I'm running it in a loop). GOMAXPROCS=2 is sufficient to reproduce.

Edit 3: Just to eliminate the usual first suspect when CPU bugs are involved, I offlined the second thread of my four CPU cores (CPUs 4-7), effectively disabling hyperthreading. Still crashes.

I've reproduced the crashes as qemu VMs on three different Intel hosts and one AMD host, using three similar but not identical kernels (nearby versions, nearly the same config, same compiler). However, I can't easily reproduce it with other random kernels I have lying around, which suggests this is closely dependent on kernel config, version, or compiler. The host I run this on does also affect how quickly the crash shows up. I'll continue to bisect the issue to see if I can narrow it down to a given kernel config or other factor.

OK, reopening.

My current guess is that this must be related to the use of vfork. Somehow the child must be corrupting some memory in the parent process.

Turns out my repro is related to the compiler and compiler flags used to compile the kernel.

  • linux-4.13.9-gentoo + gcc (Gentoo Hardened 5.4.0-r3 p1.3, pie-0.6.5) 5.4.0 + CONFIG_OPTIMIZE_INLINING=n - no crash
  • linux-4.13.9-gentoo + gcc (Gentoo Hardened 6.4.0 p1.0) 6.4.0 + CONFIG_OPTIMIZE_INLINING=n - crash
  • linux-4.13.9-gentoo + gcc (Gentoo Hardened 6.4.0 p1.0) 6.4.0 + CONFIG_OPTIMIZE_INLINING=y - no crash

Will now try to narrow down which change introduced by flipping CONFIG_OPTIMIZE_INLINING causes the issue.

That is already an impressive job of fault isolation.

I narrowed it down to arch/x86/entry/vdso/vclock_gettime.o. This makes perfect sense, since vdso code is userspace code - effectively this is GCC-compiled code that winds up interacting directly with Go whether Go wants it or not. Explains how mysteriously the kernel compiler has a direct impact on userspace.

Still investigating, but here are objdumps of the object files: ok version, crash version. (Edit: had those flipped)

TEXT runtime·walltime(SB),NOSPLIT,$16
    // Be careful. We're calling a function with gcc calling convention here.
    // We're guaranteed 128 bytes on entry, and we've taken 16, and the
    // call uses another 8.
    // That leaves 104 for the gettime code to use. Hope that's enough!
0000000000000030 <__vdso_clock_gettime>:
  30:   55                      push   %rbp
  31:   48 89 e5                mov    %rsp,%rbp
  34:   48 81 ec 20 10 00 00    sub    $0x1020,%rsp
  3b:   48 83 0c 24 00          orq    $0x0,(%rsp)
  40:   48 81 c4 20 10 00 00    add    $0x1020,%rsp

GCC is doing a 4K stack probe here. Two things could happen: a segfault if that page is unmapped, or nothing in the general case, since it's just an OR 0. But since it's not an atomic/locked op, if it races another thread, it could corrupt data.

So yeah, 104 bytes is not enough. Whether the kernel should be doing stack probes in vdso is another question entirely... but this explains why only Go with its tight split stacks breaks. And why this whole mess is so nondeterministic and the crashes random and hard to reproduce and require multiple threads.

Turns out my second time debugging an issue in the go runtime wound up being a bug in Go->C interop again. Whee.

Edit: just confirmed that hexediting those probes from 0x1020 to 0x20 bytes on one of the crash-inducing kernels cures the crashes.

Worth noting that both Documentation/ABI/stable/vdso and man 7 vdso make no mention of maximum stack usage guarantees, they just say standard C calling conventions (which obviously make no such guarantees).

Why is it doing such a big stack probe? (Is there actually some chain of calls that needs 4k of stack, or is gcc faulting pages for no good reason?)

104 bytes for the whole VDSO call does seem... excessively optimistic, though.

Stack probes are conservative to guarantee enough space for a leaf function with at least that much stack usage (this is a stack overflow exploit mitigation). In the bad kernel GCC didn't decide to inline vread_tsc, making the vdso functions non-leaf and thus requiring a stack probe. Sure, GCC could know that vread_tsc doesn't actually need that much stack (since it's in the same compilation unit), but that's just saying it missed a potential optimization. In the C world, having an extra 4K of free stack space at all times is entirely reasonable, which is why stack probes like this are a thing.

In the C world, having an extra 4K of free stack space at all times is entirely reasonable,

Only if you're assuming a very tiny number of threads. An extra page per thread in a program with a lot of threads adds up to a lot of wasted memory (and, in particular, a lot of wasted TLB entries).

(Or does writing a 0 to an unmapped page not actually fault it in?)

Feel free to take that argument up with the GCC developers, but that's what -fstack-check does, and Gentoo Hardened enables that by default in their GCC builds, thus that probe winding up in the vDSO. This is important to prevent stack clash attacks. Sure, it could be optimized further to elide the probes when the compiler can prove they're unnecessary, but it is what it is right now. Security vs performance is often a tradeoff.

@marcan Thanks very much for the detailed analysis.

That is an odd result, though, because as far as I can tell it doesn't have anything to do with forking. I think you've found a code path that could cause a random crash in approximately any amd64 GNU/Linux Go program at approximately any time, if running on a kernel compiled with CONFIG_OPTIMIZE_INLINING turned off (fortunately that is not the default). We clearly need to fix that, somehow. But now I'm confused as to why it happens with this particular reproducer.

Note that I came here through node_exporter crashing, and it doesn't use os/exec. I think that reproducer just so happens to be good at triggering this issue by coincidence, even though it has nothing to do with exec directly.

Also keep in mind that CONFIG_OPTIMIZE_INLINING is somewhat coincidental too; for some reason flipping that flag off (which is supposed to force inline more functions) is making GCC not inline the vread_tsc function. That function isn't marked inline, so it's perfectly within its right to do so. But there's no guarantee that under some other combination of GCC version and flags it won't happen too. Basically, any GCC with -fstack-check enabled by default can trigger this bug if the vDSO function makes any function calls at all as compiled, which could happen for any reason since the source code does indeed make function calls. You're depending on everything getting inlined to not crash, even though the kernel source makes no attempt to guarantee that.

It would be technically possible to ask gcc to always inline vread_tsc to make vdso behavior more deterministic, rdtsc_ordered that vread_tsc uses already does that.

Obviously, this would indirectly eliminate the stack probe that GCC inserts in this one case, but it doesn't look like the stack probe serves much purpose here anyway. (something that's hard for GCC to judge by itself - so an attribute looks appropriate).

Of course the issue can be fixed in the kernel (and that might even make sense). But right now it's Go making the undocumented assumptions. So there are two options here (not mutually exclusive):

  1. Fix Go so it stops assuming vDSO stack usage limits
  2. Convince the kernel guys to change the vDSO spec to guarantee a maximum stack usage, which may include disabling stack probing and/or inlining everything

Doing 2 and not 1 of course means accepting that Go code will potentially break on certain kernels before the change.

I emailed the x86 vDSO maintainer (cc: x86, linux-kernel, kernel-hardening) to see what their take on all this is (but I still think Go should fix this, e.g. somehow allocate more stack space or disable vDSO usage on current kernels if the overhead of cgo/stack munging is greater than just using a syscall).

Edit: thread

Change https://golang.org/cl/76990 mentions this issue: time: call amd64 VDSO entry points on large stack

@marcan If you have time it might be interesting to see whether https://golang.org/cl/76990 fixes the problem. However, the benchmarks for that are not so great, so I'm not sure we want to use that approach. Thanks.

@ianlancetaylor Just built the toolchain with that patch. I'll let the reproducer run overnight just to be sure, but so far it looks good.

Yeah, no crash running overnight. The fix works.

45ns -> 55ns for time.Now seems OK to me.

@marcan: Awesome debugging. Thanks!

The 10ns slowdown is unfortunate, but if that's what we have to do, that's what we have to do. I made some suggestions on Ian's CL that may speed things up, but we'll see.

Just for reference, GCC 6.4.0 is now stable in Gentoo and some other infra I've just updated is hitting this too, even though I built its kernel with CONFIG_OPTIMIZE_INLINING=y. Neither value of that option is a guarantee that everything will be inlined and no stack probe will be emitted.

@marcan, just to clarify, you're hitting this with binaries built with 1.9.x? (Do you know if it's still a problem with binaries compiled from tip?)

-fstack-check is incompatible with small stacks. Its main intent is to support Ada and probe sufficiently below the stack pointer so that it is possible to raise an Ada exception from a signal handler. For other purposes, it is not really useful:

-fstack-clash-protection avoids these issues because it restricts stack probing to the actual stack frame. It should therefore be compatible with Go's minimal stacks, although it will not confer any protection there because of the lack of a guard page.

@aclements correct, Go 1.9.x. Cherry-picking a158382 in (backported, which is just dumping the x86 half which is new AIUI) fixes the issue, which is what I'm doing with my Go packages now. I guess the question is will it be backported into the 1.9 branch officially or will this only be fixed with the next major release?

@fweimer I'm aware of that, but -fstack-clash-protection isn't in any GCC release yet. This is why Gentoo is using -fstack-check, which, although not intended for stack-clash protection (and thus not being optimized for it/having some downsides/making little sense on first impression), does effectively provide stack clash protection since it does wind up touching every page of the stack as long as all code is compiled with it.

Ultimately the issue here is that Go is (was) calling code that makes no promises about stack usage on a small stack. -fstack-check is just the way I happened to hit it, but nothing says vDSO might not decide to use more stack space for some reason in the future.

Marking as a candidate for 1.9.x. I'm not sure whether we want to back-port this or not, but it's worth considering. My understanding (and please correct me if I'm wrong) is that it requires a pretty uncommon configuration to trigger. OTOH, if it is triggered, the effects are quite bad and telling people to reconfigure their kernel doesn't seem a like a viable work-around to me.

@aclements You can't really fix it by reconfiguring the kernel (not reliably, as I just discovered today). Really the only way to avoid the issue would be to patch the kernel Makefile to use -fno-stack-check or to switch to a non-hardened toolchain altogether.

I'm not sure if any other distros other than Gentoo Hardened use -fstack-check globally, but yeah, in this instance at least, there's no sensible workaround.

Personally, I'm looking forward to this trickling downstream because I've got one package (Grafana) which is a binary package on Gentoo, and has a Go server component. That will start crashing as soon as I update GCC/the kernel on my hosts that use it, and trying to get a self-compiled build in for that will be a lot more annoying than the source patch for source packages.

@fweimer Are you saying that GCC's -fstack-check option inserts a stack check that goes beyond the stack frame even in a leaf function? That seems pointless.

@ianlancetaylor I do not fully understand the heuristics for -fstack-check. If the frame becomes sufficiently large for a leave function, it does something like this:

subl    $8972, %esp
orl $0, 764(%esp)
orl $0, (%esp)
addl    $4112, %esp

That is, it probes beyond what is actually needed by the leaf function. However, I such probes are not inserted proactively if the function does not have a large frame in the first place, so it's inconsistent.

CL 88495 OK for Go 1.9.3.

Change https://golang.org/cl/88495 mentions this issue: [release-branch.go1.9] runtime: call amd64 VDSO entry points on large stack

go1.9.3 has been packaged and includes:

  • CL 88495 [release-branch.go1.9] runtime: call amd64 VDSO entry points on large stack

The release is posted at golang.org/dl.

— golang.org/x/build/cmd/releasebot, Jan 22 21:02:53 UTC

The kernel's main Makefile now sets -fno-stack-check, in mainline and at least some of the stable trees (see e.g. here for kernel 4.4), because -fstack-check broke parts of the kernel proper as well. So this shouldn't be much of an issue going forward, even without this change to golang.

Yes, -fstack-check was always a bad idea for code executing in kernel mode (but the change should apply to vDSO as well, effectively working around this bug).

Nonetheless, there are still no guarantees on vDSO stack usage, so Go is still doing the right thing by switching stacks before calling into it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

myitcv picture myitcv  ·  3Comments

gopherbot picture gopherbot  ·  3Comments

enoodle picture enoodle  ·  3Comments

stub42 picture stub42  ·  3Comments

longzhizhi picture longzhizhi  ·  3Comments