go version)?go1.14.7, also observed the same problem with earlier versions: go1.13.7
Unknown
go env)?Linux 4.14.35-1818.3.3.el7uek.x86_64 #2 SMP Mon Sep 24 14:45:01 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
(Oracle Linux Server release 7.6)
Original issue reported here: https://github.com/hashicorp/consul/issues/8558
The Consul binary appears to be crashing with errors from the runtime, ex:
fatal error: stopg: invalid statusfatal error: unexpected signal during runtime execution[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]runtime: unexpected return pc for runtime.sigpanic called from 0x0The stack trace from the application shows the line as a select blocking on a channel from signal.Notify (source). I'm not sure if that is related to the underlying issue, or has to do with the signal from the runtime.
Full trace
runtime: gp: gp=0xc000450180, goid=0, gp->atomicstatus=0
runtime: g: g=0xc0005b5680, goid=0, g->atomicstatus=0
fatal error: stopg: invalid status
runtime: gp: gp=0xc000450300, goid=0, gp->atomicstatus=0
runtime: g: g=0xc000604180, goid=0, g->atomicstatus=0
fatal error: unexpected signal during runtime execution
fatal error: stopg: invalid status
runtime: gp: gp=0xc000450000, goid=0, gp->atomicstatus=0
runtime: g: g=0xc000566180, goid=0, g->atomicstatus=0
fatal error: stopg: invalid status
runtime stack:
runtime.throw(0x2fac299, 0x15)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.scang(0xc000450180, 0xc000060170)
/usr/local/go/src/runtime/proc.go:869 +0x379
runtime.markroot.func1()
/usr/local/go/src/runtime/mgcmark.go:221 +0x6d
runtime.markroot(0xc000060170, 0xc00000001b)
/usr/local/go/src/runtime/mgcmark.go:202 +0x2f3
runtime.gcDrain(0xc000060170, 0x7)
/usr/local/go/src/runtime/mgcmark.go:915 +0x110
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1936 +0x16b
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146
goroutine 34 [GC worker (idle)]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc00028ef60 sp=0xc00028ef58 pc=0x45d6d0
runtime.gcBgMarkWorker(0xc00005ef00)
/usr/local/go/src/runtime/mgc.go:1900 +0x1bd fp=0xc00028efd8 sp=0xc00028ef60 pc=0x41c38d
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc00028efe0 sp=0xc00028efd8 pc=0x45f621
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1794 +0x77
goroutine 1 [select, 149 minutes]:
github.com/hashicorp/consul/command/agent.(*cmd).run(0xc000651500, 0xc000170020, 0x8, 0x8, 0x0)
/home/circleci/project/consul/command/agent/agent.go:331 +0x13eb
github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc000651500, 0xc000170020, 0x8, 0x8, 0xc0001452a0)
/home/circleci/project/consul/command/agent/agent.go:78 +0x4d
github.com/mitchellh/cli.(*CLI).Run(0xc00031c000, 0xc00031c000, 0x80, 0xc000145ac0)
/go/pkg/mod/github.com/mitchellh/[email protected]/cli.go:260 +0x1da
main.realMain(0xc00014a058)
/home/circleci/project/consul/main.go:53 +0x393
main.main()
/home/circleci/project/consul/main.go:20 +0x22
goroutine 20 [syscall, 149 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:29 +0x41
goroutine 23 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x50186a0)
/go/pkg/mod/github.com/golang/[email protected]/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
/go/pkg/mod/github.com/golang/[email protected]/glog.go:410 +0x26f
goroutine 0 [idle]:
fatal error: unexpected signal during runtime execution
panic during panic
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x452486]
runtime stack:
runtime.throw(0x300b2ed, 0x2a)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:378 +0x47c
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
/usr/local/go/src/runtime/traceback.go:159 +0x156
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000, 0x0)
/usr/local/go/src/runtime/traceback.go:722 +0xf0
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000)
/usr/local/go/src/runtime/traceback.go:676 +0x52
runtime.tracebackothers(0xc0005b5680)
/usr/local/go/src/runtime/traceback.go:929 +0x1a6
runtime.dopanic_m(0xc0005b5680, 0x42f602, 0xc000aabdf8, 0x1)
/usr/local/go/src/runtime/panic.go:974 +0x29e
runtime.fatalthrow.func1()
/usr/local/go/src/runtime/panic.go:829 +0x5f
runtime.fatalthrow()
/usr/local/go/src/runtime/panic.go:826 +0x57
runtime.throw(0x2fac299, 0x15)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.scang(0xc000450180, 0xc000060170)
/usr/local/go/src/runtime/proc.go:869 +0x379
runtime.markroot.func1()
/usr/local/go/src/runtime/mgcmark.go:221 +0x6d
runtime.markroot(0xc000060170, 0xc00000001b)
/usr/local/go/src/runtime/mgcmark.go:202 +0x2f3
runtime.gcDrain(0xc000060170, 0x7)
/usr/local/go/src/runtime/mgcmark.go:915 +0x110
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1936 +0x16b
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x0
stack: frame={sp:0xc000c61dd8, fp:0xc000c61e08} stack=[0xc000c60000,0xc000c62000)
000000c000c61cd8: 000000c0007bc180 01000000000003e8
000000c000c61ce8: 0000000000000004 000000000000001f
000000c000c61cf8: 0000000000000000 0000000000000000
000000c000c61d08: 0000000000000001 0000000002fdb0bd
000000c000c61d18: 000000c000c61d60 000000000045bfbf <runtime.fatalthrow.func1+95>
I looked around this issue tracker for an existing issue. I saw some related things, but nothing that sounded like exactly the same problem.
One issue mentioned C signal handlers. There should not be any such handlers in this binary.
Sadly, with the release of go 1.15, go 1.13 is no longer receiving active support.
Can you please try with the latest release of Go can confirm if the problem occurs.
Ah, that's right, thank you. I've asked they try with a more recent release, which uses go1.14.x.
More details from https://github.com/hashicorp/consul/issues/8558#issuecomment-684855990. They were able to reproduce with go1.14.7. The full log is attached. They also said it occurs only on hosts with Intel Optane connected in RAM mode.
The stack is a bit different, but the errors look similar
fatal error: unexpected signal during runtime execution
fatal error: runtimer: bad p
runtime: unexpected return pc for runtime.sigpanic called from 0x0
fatal error: cleantimers: bad p
cc @aclements @randall77 @prattmic
@gopherbot remove WaitingForInfo
This looks a lot like memory corruption.
Hello to all.
These fatal errors have appeared since version 1.12 and older.
And is repeated only on hosts with Intel optane as RAM.
All the more reason to determine if the binary contains data races or cgo related memory corruption.
@dnephin have you built the app with the race detector?
I've been working on running the tests with the race detector enabled. We run some packages with -race, but not all. We don't build binaries with it enabled. Would that help identify the problem?
A data race could cause memory corruption, so building the app with the race detector may help.
Most helpful comment
Ah, that's right, thank you. I've asked they try with a more recent release, which uses go1.14.x.