go version)?I'm using bazel rules_go release v0.22.6 which is using go 1.14.2.
I'm not sure, this issue is not consistently reproducible.
go env)?linux_amd64
go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/preston/.cache/go-build"
GOENV="/home/preston/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/preston/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/6123"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/6123/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build083098668=/tmp/go-build -gno-record-gcc-switches"
I was running llvm libfuzzer tests and saw this testcase appear. Binaries compiled with gc_goopts=-d=libfuzzer and linked with address sanitizer.
No panics from the core select logic.
An index out of range here https://github.com/golang/go/blob/074f2d800f2c7b741a080081cfcc5295b375b23d/src/runtime/select.go#L48
Full stack error below
fatal error: index out of range
--
聽 | 聽
聽 | goroutine 83 [running]:
聽 | runtime.throw(0x28d934, 0x12)
聽 | GOROOT/src/runtime/panic.go:1116 +0x74 fp=0x10c00007b910 sp=0x10c00007b8e0 pc=0x883a74
聽 | runtime.panicCheck1(0x8951ed, 0x28d934, 0x12)
聽 | GOROOT/src/runtime/panic.go:34 +0xdd fp=0x10c00007b940 sp=0x10c00007b910 pc=0x88125d
聽 | runtime.goPanicIndex(0x3d48, 0x7)
聽 | GOROOT/src/runtime/panic.go:87 +0x46 fp=0x10c00007b988 sp=0x10c00007b940 pc=0x881326
聽 | runtime.sellock(0x10c00007be90, 0x7, 0x7, 0x7f60c6d2c876, 0x7, 0x7)
聽 | GOROOT/src/runtime/select.go:48 +0xad fp=0x10c00007b9b8 sp=0x10c00007b988 pc=0x8951ed
聽 | runtime.selectgo(0x10c00007be90, 0x10c00007bb64, 0x7, 0x5, 0x8fb701)
聽 | GOROOT/src/runtime/select.go:319 +0xcc2 fp=0x10c00007bae0 sp=0x10c00007b9b8 pc=0x896092
聽 | github.com/ethereum/go-ethereum/consensus/ethash.(*remoteSealer).loop(0x10c000012f00)
聽 | external/com_github_ethereum_go_ethereum/consensus/ethash/sealer.go:278 +0x25a fp=0x10c00007bfd8 sp=0x10c00007bae0 pc=0x16098fa
聽 | runtime.goexit()
聽 | src/runtime/asm_amd64.s:1373 +0x1 fp=0x10c00007bfe0 sp=0x10c00007bfd8 pc=0x8b4ff1
聽 | created by github.com/ethereum/go-ethereum/consensus/ethash.startRemoteSealer
聽 | external/com_github_ethereum_go_ethereum/consensus/ethash/sealer.go:262 +0x2b0
Can you run the tests under the race detector?
CC @aclements @randall77
I tried recompiling these tests with race detection, but it failed to link due to duplicate symbols.
I'll fiddle around with it for a bit today.
Does race detection work with gc_goopts=-d=libfuzzer?
It should work to run go build -gcflags=-d=libfuzzer -race. But I don't know about Bazel.
Can you use the fuzzer to find a problematic case and then run that case without -d=libfuzzer?
In theory this could also be caused by the libfuzzer instrumentation. cc @mdempsky just in case
libfuzzer instrumentation is turned off when compiling the runtime, so I'd be really surprised if it was responsible for this.
go-ethereum is using reflect.SliceHeader incorrectly. For example, here: https://github.com/ethereum/go-ethereum/blob/master/consensus/ethash/algorithm.go#L154
It's only safe to use *reflect.SliceHeader as a pointer to an actual slice-typed variable. You should never have a value or variable of type reflect.SliceHeader.
Spot checking, I don't immediately see how this could be the fault, but I suggest fixing them, and then also using -d checkptr when fuzzing to enable runtime validation of unsafe.Pointer usage, and let us know if you still see the failure after that.
Why doesn't this code need to initialize pollorder[0] = 0?
If select is called in a loop and the compiler is able to reuse the pollorder array, isn't there a risk that pollorder[0] could be out of range? (Using select in a loop is hardly a rare usage, so I feel like I must be missing something obvious. Otherwise, surely this would fail super commonly?)
Edit: Nevermind, it's zero initialized by the compiler:
Running fuzzer with -d checkptr now. I'll circle back here in a few hours to see if any new testcases appear.
In an older build (without checkptr), I just saw this issue again.
goroutine 7 [running]:
--
聽 | runtime.throw(0x248753, 0x12)
聽 | GOROOT/src/runtime/panic.go:1116 +0x74 fp=0x10c000078cf8 sp=0x10c000078cc8 pc=0x48e174
聽 | runtime.panicCheck1(0x49f3ed, 0x248753, 0x12)
聽 | GOROOT/src/runtime/panic.go:34 +0xdd fp=0x10c000078d28 sp=0x10c000078cf8 pc=0x48b9cd
聽 | runtime.goPanicIndex(0x6170, 0x3)
聽 | GOROOT/src/runtime/panic.go:87 +0x46 fp=0x10c000078d70 sp=0x10c000078d28 pc=0x48ba96
聽 | runtime.sellock(0x10c000078f50, 0x3, 0x3, 0x10c000078f2a, 0x3, 0x3)
聽 | GOROOT/src/runtime/select.go:48 +0xad fp=0x10c000078da0 sp=0x10c000078d70 pc=0x49f3ed
聽 | runtime.selectgo(0x10c000078f50, 0x10c000078f24, 0x3, 0x10c000078f30, 0x7ffc34fdfc62)
聽 | GOROOT/src/runtime/select.go:319 +0xcc2 fp=0x10c000078ec8 sp=0x10c000078da0 pc=0x4a0242
聽 | github.com/dgraph-io/ristretto.(*Cache).processItems(0x10c000092720)
聽 | external/com_github_dgraph_io_ristretto/cache.go:299 +0x248 fp=0x10c000078fd8 sp=0x10c000078ec8 pc=0xa1b498
聽 | runtime.goexit()
聽 | src/runtime/asm_amd64.s:1373 +0x1 fp=0x10c000078fe0 sp=0x10c000078fd8 pc=0x4bd671
聽 | created by github.com/dgraph-io/ristretto.NewCache
聽 | external/com_github_dgraph_io_ristretto/cache.go:162 +0x2d7
Notably, both of these failures happen when the channel locks are re-acquired after sleeping. So the first call to sellock(scases, lockorder) succeeded, and only the second one is failing. We can also see from the traceback that len(scases) is correct in both cases. So it seems like the contents of lockorder must be getting corrupted while the goroutine is parked.