Go: runtime: select: fatal error: index out of range

Created on 24 Jul 2020  路  10Comments  路  Source: golang/go

What version of Go are you using (go version)?

I'm using bazel rules_go release v0.22.6 which is using go 1.14.2.

Does this issue reproduce with the latest release?

I'm not sure, this issue is not consistently reproducible.

What operating system and processor architecture are you using (go env)?

linux_amd64

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/preston/.cache/go-build"
GOENV="/home/preston/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/preston/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/6123"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/6123/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build083098668=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I was running llvm libfuzzer tests and saw this testcase appear. Binaries compiled with gc_goopts=-d=libfuzzer and linked with address sanitizer.

What did you expect to see?

No panics from the core select logic.

What did you see instead?

An index out of range here https://github.com/golang/go/blob/074f2d800f2c7b741a080081cfcc5295b375b23d/src/runtime/select.go#L48

Full stack error below

fatal error: index out of range
--
聽 | 聽
聽 | goroutine 83 [running]:
聽 | runtime.throw(0x28d934, 0x12)
聽 | GOROOT/src/runtime/panic.go:1116 +0x74 fp=0x10c00007b910 sp=0x10c00007b8e0 pc=0x883a74
聽 | runtime.panicCheck1(0x8951ed, 0x28d934, 0x12)
聽 | GOROOT/src/runtime/panic.go:34 +0xdd fp=0x10c00007b940 sp=0x10c00007b910 pc=0x88125d
聽 | runtime.goPanicIndex(0x3d48, 0x7)
聽 | GOROOT/src/runtime/panic.go:87 +0x46 fp=0x10c00007b988 sp=0x10c00007b940 pc=0x881326
聽 | runtime.sellock(0x10c00007be90, 0x7, 0x7, 0x7f60c6d2c876, 0x7, 0x7)
聽 | GOROOT/src/runtime/select.go:48 +0xad fp=0x10c00007b9b8 sp=0x10c00007b988 pc=0x8951ed
聽 | runtime.selectgo(0x10c00007be90, 0x10c00007bb64, 0x7, 0x5, 0x8fb701)
聽 | GOROOT/src/runtime/select.go:319 +0xcc2 fp=0x10c00007bae0 sp=0x10c00007b9b8 pc=0x896092
聽 | github.com/ethereum/go-ethereum/consensus/ethash.(*remoteSealer).loop(0x10c000012f00)
聽 | external/com_github_ethereum_go_ethereum/consensus/ethash/sealer.go:278 +0x25a fp=0x10c00007bfd8 sp=0x10c00007bae0 pc=0x16098fa
聽 | runtime.goexit()
聽 | src/runtime/asm_amd64.s:1373 +0x1 fp=0x10c00007bfe0 sp=0x10c00007bfd8 pc=0x8b4ff1
聽 | created by github.com/ethereum/go-ethereum/consensus/ethash.startRemoteSealer
聽 | external/com_github_ethereum_go_ethereum/consensus/ethash/sealer.go:262 +0x2b0

NeedsInvestigation

All 10 comments

Can you run the tests under the race detector?

CC @aclements @randall77

I tried recompiling these tests with race detection, but it failed to link due to duplicate symbols.
I'll fiddle around with it for a bit today.

Does race detection work with gc_goopts=-d=libfuzzer?

It should work to run go build -gcflags=-d=libfuzzer -race. But I don't know about Bazel.

Can you use the fuzzer to find a problematic case and then run that case without -d=libfuzzer?

In theory this could also be caused by the libfuzzer instrumentation. cc @mdempsky just in case

libfuzzer instrumentation is turned off when compiling the runtime, so I'd be really surprised if it was responsible for this.

go-ethereum is using reflect.SliceHeader incorrectly. For example, here: https://github.com/ethereum/go-ethereum/blob/master/consensus/ethash/algorithm.go#L154

It's only safe to use *reflect.SliceHeader as a pointer to an actual slice-typed variable. You should never have a value or variable of type reflect.SliceHeader.

Spot checking, I don't immediately see how this could be the fault, but I suggest fixing them, and then also using -d checkptr when fuzzing to enable runtime validation of unsafe.Pointer usage, and let us know if you still see the failure after that.

Why doesn't this code need to initialize pollorder[0] = 0?

https://github.com/golang/go/blob/074f2d800f2c7b741a080081cfcc5295b375b23d/src/runtime/select.go#L158-L163

If select is called in a loop and the compiler is able to reuse the pollorder array, isn't there a risk that pollorder[0] could be out of range? (Using select in a loop is hardly a rare usage, so I feel like I must be missing something obvious. Otherwise, surely this would fail super commonly?)

Edit: Nevermind, it's zero initialized by the compiler:

https://github.com/golang/go/blob/074f2d800f2c7b741a080081cfcc5295b375b23d/src/cmd/compile/internal/gc/select.go#L269-L270

Running fuzzer with -d checkptr now. I'll circle back here in a few hours to see if any new testcases appear.

In an older build (without checkptr), I just saw this issue again.

goroutine 7 [running]:
--
聽 | runtime.throw(0x248753, 0x12)
聽 | GOROOT/src/runtime/panic.go:1116 +0x74 fp=0x10c000078cf8 sp=0x10c000078cc8 pc=0x48e174
聽 | runtime.panicCheck1(0x49f3ed, 0x248753, 0x12)
聽 | GOROOT/src/runtime/panic.go:34 +0xdd fp=0x10c000078d28 sp=0x10c000078cf8 pc=0x48b9cd
聽 | runtime.goPanicIndex(0x6170, 0x3)
聽 | GOROOT/src/runtime/panic.go:87 +0x46 fp=0x10c000078d70 sp=0x10c000078d28 pc=0x48ba96
聽 | runtime.sellock(0x10c000078f50, 0x3, 0x3, 0x10c000078f2a, 0x3, 0x3)
聽 | GOROOT/src/runtime/select.go:48 +0xad fp=0x10c000078da0 sp=0x10c000078d70 pc=0x49f3ed
聽 | runtime.selectgo(0x10c000078f50, 0x10c000078f24, 0x3, 0x10c000078f30, 0x7ffc34fdfc62)
聽 | GOROOT/src/runtime/select.go:319 +0xcc2 fp=0x10c000078ec8 sp=0x10c000078da0 pc=0x4a0242
聽 | github.com/dgraph-io/ristretto.(*Cache).processItems(0x10c000092720)
聽 | external/com_github_dgraph_io_ristretto/cache.go:299 +0x248 fp=0x10c000078fd8 sp=0x10c000078ec8 pc=0xa1b498
聽 | runtime.goexit()
聽 | src/runtime/asm_amd64.s:1373 +0x1 fp=0x10c000078fe0 sp=0x10c000078fd8 pc=0x4bd671
聽 | created by github.com/dgraph-io/ristretto.NewCache
聽 | external/com_github_dgraph_io_ristretto/cache.go:162 +0x2d7

Notably, both of these failures happen when the channel locks are re-acquired after sleeping. So the first call to sellock(scases, lockorder) succeeded, and only the second one is failing. We can also see from the traceback that len(scases) is correct in both cases. So it seems like the contents of lockorder must be getting corrupted while the goroutine is parked.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dominikh picture dominikh  路  3Comments

enoodle picture enoodle  路  3Comments

bbodenmiller picture bbodenmiller  路  3Comments

OneOfOne picture OneOfOne  路  3Comments

michaelsafyan picture michaelsafyan  路  3Comments