Go: runtime: error message: P has cached GC work at end of mark termination

Created on 3 Oct 2018  Â·  59Comments  Â·  Source: golang/go

I just saw this in a trybot run on OpenBSD amd64: https://storage.googleapis.com/go-build-log/e4418337/openbsd-amd64-62_9bec130b.log

greplogs shows:

2018-10-03T02:09:38-06ff477/freebsd-386-10_4
2018-10-03T17:40:17-048de7b/freebsd-386-11_2
2018-10-03T19:54:23-9dac0a8/linux-amd64-nocgo

CC @aclements @RLH

GarbageCollector NeedsFix early-in-cycle

Most helpful comment

I think I've figured it out.

I've been running the influx test with a lot of extra runtime debug logging and have been getting some interesting output. Here's an annotated subset of a failure (go-stress-20190104T162318-083562928; I've posted just the last GC cycle since the log is huge).

The unmarked object is 0xc000575800, which is a _defer.

| log | notes |
| -- | -- |
| [1.915567099 P 1] setGCPhase 1 | Start last GC. 0xc000575800 is 2nd in some G's defer list. WB buffer is full size right now, and we only log on flush, so WB logging can be delayed. |
| ... | Lots of stuff on all Ps |
| [1.922270499 P 2] deferreturn 0xc000602740 link 0xc000575800 | Now head of list. gp._defer = d.link should have added to WB buf as new value. |
| [1.922273799 P 2] deferreturn 0xc000575800 link 0x0 | Running defer. gp._defer = d.link should have added to WB buf as old value. |
| [1.922274257 P 2] freedefer 0xc000575800 | Should have added to WB buf as new value when inserting into pool |
| [1.922275588 P 2] newdefer from pool 0xc000575800 | defer gotten from pool. Again should have added to WB buf |
| [1.922276023 P 2] newdefer link defer 0x0 | Now head of list. Should have added to WB buf |
| ... | No logs from P 2. Notably, no WB marks from P 2. |
| [1.922292123 P 1] gcMarkDone flush | Starting mark completion. WB buffering disabled. |
| [1.922293357 P 1] gcMarkDone flushing 1 |
| [1.922293869 P 3] gcMarkDone flushing 3 |
| [1.922294506 P 0] gcMarkDone flushing 0 |
| ... | No logs from P 2, but plenty from other Ps |
| [1.922574537 P 3] deferreturn 0xc000575800 link 0x0 | Running defer. |
| [1.922575525 P 3] wb mark 0xc000575800 | Finally we see it actually getting marked. This mark spins and ultimately fails. |
| ... | A bunch of stuff on P 0 |
| [1.922863641 P 2] gcMarkDone flushing 2 | This flushes P 2's WB buffer, which presumably contains 0xc000575800, but P 3 already marked it above, so it doesn't get marked here, and gcMarkDone considers P 2 to be clean. |
| [1.922902570 P 1] gcMarkDone flush succeeded | But shouldn't have because P 3 has pending work or is dirty. |

The high-level sequence of events is:

  1. Object x is white. Suppose there are three P (and thus write barrier buffers). All Ps are clean and have no pending GC work buffers. Write barrier buffer 0 contains x.
  2. P 1 initiates mark completion. Ps 1 and 2 pass the completion barrier successfully.
  3. P 2 performs a write barrier on object x and greys it. (As a result, P 2 now has pending GC work.)
  4. P 0 enters the completion barrier, which flushes its write barrier buffer. Since x was greyed in step 2, this does not grey x. Hence, P 0 passes the completion barrier successfully.

Now all of the Ps have passed the barrier, but in fact P 2 has pending work. If we enter mark termination at this point, there may even be white objects reachable from x.

I'm not sure why this affects defers so disproportionately. It may simply be that defers depend more heavily on write barrier marking than most objects because of their access patterns.

The workaround I committed a couple days ago will fix this problem. I'm still thinking about how to fix it "properly".

I also need to go back to the proof and figure out where reality diverged from the math. I think the core of the problem is that pointers in the write barrier buffer are only "sort of" grey: they haven't been marked yet, so other Ps can't observe their grey-ness). As a result, you wind up with a black-to-"sort-of-grey" pointer, and that isn't strong enough for the proof's assumptions.

All 59 comments

See https://github.com/golang/go/issues/27993 .

On Mon, Nov 5, 2018 at 12:39 PM Ian Lance Taylor notifications@github.com
wrote:

linux-amd64:
https://storage.googleapis.com/go-build-log/64b96186/linux-amd64_ef4d4c1f.log

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/27993#issuecomment-436026010, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIIkT_KHSnzEuYAJq8wmvhDbk5g59LvUks5usKH7gaJpZM4XG7zF
.

Hey @griesemer in your comment https://github.com/golang/go/issues/27993#issuecomment-436027240

See #27993 .

you self referenced this same issue 27993, which issue did you mean to refer to?

@odeke-em This was just an e-mail reply - I missed that the mail already referred to this issue. Please ignore.

$ greplogs -e 'P has cached GC work' -dashboard -md -l
2018-11-16T23:30:19-6797b32/netbsd-amd64-8_0
2018-11-16T19:46:17-6157dda/linux-amd64-stretch
2018-11-16T17:49:55-53ed92d/nacl-386
2018-11-16T17:33:54-a1025ba/nacl-386
2018-11-14T21:47:50-0a40d45/freebsd-amd64-11_1
2018-11-14T20:32:15-f36e92d/android-arm-wiko-fever
2018-11-13T20:46:39-af07f77/windows-386-2008
2018-11-13T15:08:13-e51b19a/freebsd-amd64-10_3
2018-11-12T20:46:39-7f1dd3a/linux-386-387
2018-11-12T20:46:25-52b2220/windows-amd64-2008
2018-11-12T20:43:55-106db71/linux-386-387
2018-11-12T20:27:21-ec4ae29/linux-mipsle
2018-11-12T20:27:14-4f3604d/android-arm64-wiko-fever
2018-11-10T16:04:18-e4c1fee/solaris-amd64-oraclerel
...
2018-10-11T16:31:24-689321e/freebsd-386-11_2
2018-10-10T14:55:17-29907b1/freebsd-386-10_4
2018-10-10T04:29:55-8f9902d/freebsd-amd64-10_4
2018-10-09T18:20:23-416804f/windows-386-2008
2018-10-09T18:19:59-7d2f46d/freebsd-amd64-10_3
2018-10-08T17:47:49-26d2260/windows-arm
2018-10-06T19:18:34-2bb91e0/nacl-amd64p32
2018-10-06T15:40:03-f90e89e/linux-amd64-nocgo
2018-10-05T21:53:34-9d90716/freebsd-amd64-race
2018-10-03T22:50:25-1961d8d/freebsd-amd64-10_4
2018-10-03T19:54:23-9dac0a8/linux-amd64-nocgo
2018-10-03T17:40:17-048de7b/freebsd-386-11_2
2018-10-03T02:09:38-06ff477/freebsd-386-10_4

It clearly happens on all GOOSes:

      1 darwin
      1 js
      1 solaris
      2 android
      2 netbsd
      3 plan9
      4 openbsd
      5 nacl
      8 windows
     19 linux
     23 freebsd

And has shown up on plenty of GOARCHes:

      1 arm64
      1 wasm
      2 amd64p32
      3 mipsle
      5 arm
     24 386
     33 amd64

The failure is usually, though not always, in cmd/go, but that might just be sampling bias, and the exact subcommand that's running during the failure varies.

Change https://golang.org/cl/149968 mentions this issue: runtime: improve "P has cached GC work" debug info

Change https://golang.org/cl/149969 mentions this issue: runtime: debug code to catch bad gcWork.puts

Change https://golang.org/cl/150778 mentions this issue: Revert "runtime: debug code to catch bad gcWork.puts"

I got a little data over the weekend with the added debug code.

$ greplogs -dashboard -E "flushedWork|throwOnGCWork" -md
2018-11-26T14:13:53-9fe9853/darwin-amd64-10_12:

runtime: P 5 flushedWork false wbuf1.n=1 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

2018-11-23T21:21:35-c6d4939/linux-amd64-noopt:

fatal error: throwOnGCWork

runtime stack:
runtime.throw(0x6f386c, 0xd)
    /workdir/go/src/runtime/panic.go:608 +0x72 fp=0x7f034b7fdd48 sp=0x7f034b7fdd18 pc=0x42f4a2
runtime.(*gcWork).putBatch(0xc000021770, 0xc0000217a8, 0x13, 0x200)
    /workdir/go/src/runtime/mgcwork.go:182 +0x1e0 fp=0x7f034b7fdd90 sp=0x7f034b7fdd48 pc=0x424aa0
runtime.wbBufFlush1(0xc000020500)
    /workdir/go/src/runtime/mwbbuf.go:277 +0x1ba fp=0x7f034b7fdde8 sp=0x7f034b7fdd90 pc=0x42b87a
runtime.gcMark(0xa4a16143b5)
    /workdir/go/src/runtime/mgc.go:1932 +0x10b fp=0x7f034b7fde80 sp=0x7f034b7fdde8 pc=0x41e22b
runtime.gcMarkTermination.func1()
    /workdir/go/src/runtime/mgc.go:1501 +0x2a fp=0x7f034b7fde98 sp=0x7f034b7fde80 pc=0x45c57a
runtime.systemstack(0x1)
    /workdir/go/src/runtime/asm_amd64.s:351 +0x66 fp=0x7f034b7fdea0 sp=0x7f034b7fde98 pc=0x45f336
runtime.mstart()
    /workdir/go/src/runtime/proc.go:1153 fp=0x7f034b7fdea8 sp=0x7f034b7fdea0 pc=0x4339c0

goroutine 181 [garbage collection]:
runtime.systemstack_switch()
    /workdir/go/src/runtime/asm_amd64.s:311 fp=0xc000342d60 sp=0xc000342d58 pc=0x45f2c0
runtime.gcMarkTermination(0x3fe295012d50d6c8)
    /workdir/go/src/runtime/mgc.go:1500 +0x178 fp=0xc000342f20 sp=0xc000342d60 pc=0x41d218
runtime.gcMarkDone()
    /workdir/go/src/runtime/mgc.go:1475 +0x168 fp=0xc000342f60 sp=0xc000342f20 pc=0x41cff8
runtime.gcBgMarkWorker(0xc00001e000)
    /workdir/go/src/runtime/mgc.go:1858 +0x294 fp=0xc000342fd8 sp=0xc000342f60 pc=0x41df14
runtime.goexit()
    /workdir/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000342fe0 sp=0xc000342fd8 pc=0x461391
created by runtime.gcBgMarkStartWorkers
    /workdir/go/src/runtime/mgc.go:1679 +0x77

2018-11-21T16:28:17-2a7f904/linux-386-clang:

runtime: P 0 flushedWork false wbuf1.n=1 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

2018-11-21T16:28:17-2a7f904/linux-mipsle:

runtime: P 0 flushedWork false wbuf1.n=2 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

It's really interesting that in three of the four failures, it did not trip the check on adding work to the gcWork. There's a pretty narrow window when we're still running the termination detection algorithm where work can still be added without triggering this check. Perhaps for debugging we should do a second round after setting throwOnGCWork to see if anything was added during this window.

In the one throwOnGCWork failure, it was flushing a write barrier buffer that apparently had 19 (0x13) unmarked pointers in it. Perhaps for debugging we should disable the write barrier buffers, too, when we set throwOnGCWork.

FWIW, I was able to reproduce this once out of 1,334 runs of all.bash on my linux/amd64 box.

##### ../test
# go run run.go -- recover2.go
signal: aborted
runtime: P 5 flushedWork false wbuf1.n=1 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

@aclements how serious is this and how long do you think it will take to fix (worst question to ask I know)? We’re asking because we may delay the beta due to it so getting more insight into the scope of the issue will help us with that decision. Thanks.

Another repro in https://storage.googleapis.com/go-build-log/f64c1069/windows-386-2008_34d5c37b.log, if it helps:

            > go list -f '{{.ImportPath}}: {{.Match}}' all ... example.com/m/... ./... ./xyz...
            [stderr]
            runtime: P 0 flushedWork false wbuf1.n=1 wbuf2.n=0
            fatal error: P has cached GC work at end of mark termination

@aclements how serious is this and how long do you think it will take to fix (worst question to ask I know)?

In terms of seriousness, it can cause crashes, but not corruption. I was only able to reproduce it once out of 1,334 runs of all.bash, though it seems to be somewhat more common on the buildbots.

I still have no idea what's causing this bug, though, so it's hard to say what it will take to fix it. If I'm not able to figure it out, I do have a likely work-around in mind that is easy to implement (run a second round of the completion protocol before going into mark termination).

Today I'm planning to deploy more debugging code based on what (little) I gleaned from the debugging code I added last week.

A few more failures. All still tripping over the later check.

2018-11-29T18:58:21-aab0b70/linux-amd64-ssacheck:

runtime: P 1 flushedWork false wbuf1.n=2 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

2018-11-28T21:49:33-048580d/linux-arm:

runtime: P 0 flushedWork false wbuf1.n=1 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

2018-11-28T16:27:48-71f9f4b/windows-amd64-2008:

runtime: P 2 flushedWork false wbuf1.n=2 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

2018-11-27T15:21:22-cfac65b/linux-amd64-race:

runtime: P 2 flushedWork false wbuf1.n=2 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

2018-11-26T23:37:00-63ac89b/openbsd-amd64-62:

runtime: P 3 flushedWork false wbuf1.n=4 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

2018-11-26T19:24:54-9ab2ffe/nacl-386:

runtime: P 3 flushedWork false wbuf1.n=1 wbuf2.n=0
fatal error: P has cached GC work at end of mark termination

Change https://golang.org/cl/151797 mentions this issue: runtime: check more work flushing races

Got two failures with the new-and-improved debugging. I don't know what's going on yet, but these both happened in the exact same place: the write barrier in fmt.(*fmt).init. There are a few odd things going on here. Both the slot and the pointer being written are to an object that was just retrieved from a sync.Pool. Conveniently, the slot and the pointer are both to the same object, so we know that's the unmarked object. sync.Pool interacts with the GC in weird ways (which maybe violate the hybrid barrier assumptions?) and uses various atomic pointer writes, which also interact with the write barrier in slightly atypical ways.

2018-11-29T22:08:05-438b954/linux-amd64:

--- FAIL: TestGoTestRaceFailures (1.20s)
...
    go_test.go:3766: running testgo [test -race testrace -run XXX -bench .]
    go_test.go:3766: standard error:
    go_test.go:3766: fatal error: throwOnGCWork

        runtime stack:
        runtime.throw(0x9c785d, 0xd)
            /workdir/go/src/runtime/panic.go:608 +0x72
        runtime.(*gcWork).checkPut(0xc00002d770)
            /workdir/go/src/runtime/mgcwork.go:123 +0x5b
        runtime.(*gcWork).putBatch(0xc00002d770, 0xc00002d7a8, 0x1, 0x200)
            /workdir/go/src/runtime/mgcwork.go:191 +0x3d
        runtime.wbBufFlush1(0xc00002c500)
            /workdir/go/src/runtime/mwbbuf.go:277 +0x1ba
        runtime.wbBufFlush.func1()
            /workdir/go/src/runtime/mwbbuf.go:200 +0x3a
        runtime.systemstack(0x0)
            /workdir/go/src/runtime/asm_amd64.s:351 +0x66
        runtime.mstart()
            /workdir/go/src/runtime/proc.go:1153

        goroutine 8 [running]:
        runtime.systemstack_switch()
            /workdir/go/src/runtime/asm_amd64.s:311 fp=0xc0001e35c0 sp=0xc0001e35b8 pc=0x456850
        runtime.wbBufFlush(0xc0000841c0, 0xc000084180)
            /workdir/go/src/runtime/mwbbuf.go:199 +0x47 fp=0xc0001e35e0 sp=0xc0001e35c0 pc=0x429397
        runtime.gcWriteBarrier(0xdf6550, 0x9af080, 0xc000084180, 0xc0001e36d8, 0x4c9a76, 0xc000084180, 0xc0001c9b00, 0xc0001e36d8, 0x40b012, 0xc000298a60, ...)
            /workdir/go/src/runtime/asm_amd64.s:1424 +0xb4 fp=0xc0001e3668 sp=0xc0001e35e0 pc=0x4589e4
        fmt.(*fmt).init(...)
            /workdir/go/src/fmt/format.go:59
        fmt.newPrinter(0xc000084180)
            /workdir/go/src/fmt/print.go:137 +0x85 fp=0xc0001e3690 sp=0xc0001e3668 pc=0x4c9545
...

2018-11-30T15:17:34-42e8b9c/linux-s390x-ibm:

fatal error: throwOnGCWork

runtime stack:
runtime.throw(0x5f48b6, 0xd)
    /data/golang/workdir/go/src/runtime/panic.go:608 +0x72
runtime.(*gcWork).checkPut(0xc000038170)
    /data/golang/workdir/go/src/runtime/mgcwork.go:123 +0x72
runtime.(*gcWork).putBatch(0xc000038170, 0xc0000381a8, 0x2, 0x200)
    /data/golang/workdir/go/src/runtime/mgcwork.go:191 +0x38
runtime.wbBufFlush1(0xc000036f00)
    /data/golang/workdir/go/src/runtime/mwbbuf.go:277 +0xf0
runtime.wbBufFlush.func1()
    /data/golang/workdir/go/src/runtime/mwbbuf.go:200 +0x3c
runtime.systemstack(0x0)
    /data/golang/workdir/go/src/runtime/asm_s390x.s:293 +0x9c

goroutine 9 [running]:
runtime.systemstack_switch()
    /data/golang/workdir/go/src/runtime/asm_s390x.s:246 +0x10 fp=0xc000337110 sp=0xc000337108 pc=0x719b0
runtime.wbBufFlush(0xc000344040, 0xc000344000)
    /data/golang/workdir/go/src/runtime/mwbbuf.go:199 +0x60 fp=0xc000337128 sp=0xc000337110 pc=0x3f350
runtime.gcWriteBarrier(0xa2ffd0, 0x5db900, 0xc000344000, 0x435452, 0x30, 0x2e, 0x0, 0x0, 0x4353fe, 0x0, ...)
    /data/golang/workdir/go/src/runtime/asm_s390x.s:860 +0x88 fp=0xc000337198 sp=0xc000337128 pc=0x73d88
fmt.(*fmt).init(...)
    /data/golang/workdir/go/src/fmt/format.go:59
fmt.newPrinter(0x30)
    /data/golang/workdir/go/src/fmt/print.go:137 +0x8c fp=0xc0003371b8 sp=0xc000337198 pc=0xecd5c
...

Another on linux-amd64-race from a trybot:

https://storage.googleapis.com/go-build-log/a01231a0/linux-amd64-race_4cd43732.log

ok      math/big    6.924s
ok      math/bits   1.048s
ok      math/cmplx  1.047s
ok      math/rand   5.716s
ok      mime    1.092s

##### 
fatal error: throwOnGCWork

runtime stack:
runtime.throw(0x9cd25d, 0xd)
    /workdir/go/src/runtime/panic.go:608 +0x72
runtime.(*gcWork).checkPut(0xc000034170)
    /workdir/go/src/runtime/mgcwork.go:123 +0x5b
runtime.(*gcWork).putBatch(0xc000034170, 0xc0000341a8, 0x1, 0x200)
    /workdir/go/src/runtime/mgcwork.go:191 +0x3d
runtime.wbBufFlush1(0xc000032f00)
    /workdir/go/src/runtime/mwbbuf.go:277 +0x1ba
runtime.wbBufFlush.func1()
    /workdir/go/src/runtime/mwbbuf.go:200 +0x3a
runtime.systemstack(0x456729)
    /workdir/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
    /workdir/go/src/runtime/proc.go:1153

goroutine 19 [running]:
runtime.systemstack_switch()
    /workdir/go/src/runtime/asm_amd64.s:311 fp=0xc000457280 sp=0xc000457278 pc=0x456820
runtime.wbBufFlush(0xc000514018, 0x0)
    /workdir/go/src/runtime/mwbbuf.go:199 +0x47 fp=0xc0004572a0 sp=0xc000457280 pc=0x429367
runtime.gcWriteBarrier(0xc000457370, 0xc000514030, 0x18, 0xc00038b680, 0xc0003e14a1, 0xc000514000, 0xc00044bda0, 0xc000457400, 0x7ff11b, 0xc00010e1a0, ...)
    /workdir/go/src/runtime/asm_amd64.s:1424 +0xb4 fp=0xc000457328 sp=0xc0004572a0 pc=0x4589b4
runtime.deferreturn(0xc00010e1a0)
    /workdir/go/src/runtime/panic.go:362 +0xb1 fp=0xc000457370 sp=0xc000457328 pc=0x42c551
cmd/internal/buildid.ReadFile(0xc0000278f0, 0x27, 0xc00044bda1, 0x29, 0x0, 0x0)
    /workdir/go/src/cmd/internal/buildid/buildid.go:101 +0x76b fp=0xc000457410 sp=0xc000457370 pc=0x7ff11b
cmd/go/internal/work.(*Builder).useCache(0xc000357f40, 0xc000360a00, 0xc00024f680, 0xd15a37b3a41ae4cd, 0xfc4a722dfa0d6a44, 0xe9b2998ab6f18a35, 0x114487c878347ff4, 0xc0000278f0, 0x27, 0x2c656e6f64203a01)
    /workdir/go/src/cmd/go/internal/work/buildid.go:440 +0xdf8 fp=0xc000457630 sp=0xc000457410 pc=0x80d868
cmd/go/internal/work.(*Builder).build(0xc000357f40, 0xc000360a00, 0x0, 0x0)
    /workdir/go/src/cmd/go/internal/work/exec.go:373 +0x55ef fp=0xc000457ea0 sp=0xc000457630 pc=0x81704f
cmd/go/internal/work.(*Builder).Do.func1(0xc000360a00)
    /workdir/go/src/cmd/go/internal/work/exec.go:107 +0x6c fp=0xc000457f28 sp=0xc000457ea0 pc=0x844c7c
cmd/go/internal/work.(*Builder).Do.func2(0xc000524404, 0xc000357f40, 0xc0002dd4e0)
    /workdir/go/src/cmd/go/internal/work/exec.go:165 +0x87 fp=0xc000457fc8 sp=0xc000457f28 pc=0x844f47
runtime.goexit()
    /workdir/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000457fd0 sp=0xc000457fc8 pc=0x4588f1
created by cmd/go/internal/work.(*Builder).Do
    /workdir/go/src/cmd/go/internal/work/exec.go:152 +0x39b
....

Change https://golang.org/cl/152957 mentions this issue: runtime: print pointers being put in checkPut

Analysis of some recent failures.

2018-12-05T20:59:58-cd47e89/solaris-amd64-smartosbuildlet

  • sync.Pool internals. fmt.(pp).free(0xc00028a000) -> sync.(Pool).Put. Assigning l.private = x, where l.private was nil, &l.private is 0xc0001eb788, and x is 0xc00028a000. It's marking 2 pointers, which is weird: given that l.private is nil, at most it should be marking 0xc00028a000. What's the other pointer?

2018-12-05T02:56:47-9be01c2/freebsd-386-10_3

  • defer internals. freedefer writing d.link = nil. &d.link is 0x3997c8d8. This is marking 2 pointers, which is weird: we know the new value is nil, so it should at most be marking the old value. One of the marks is presumably a _defer object.

2018-12-04T23:01:00-be09bdf/solaris-amd64-oraclerel

  • This one happened when gcMarkTermination was flushing write barrier buffers. I don't know how we could possibly not have caught this earlier, given all of the debugging code that's in there now.

2018-12-04T00:19:55-c2412a7/linux-amd64-stretch

  • Another object from a sync.Pool.

2018-12-03T20:09:50-f98081e/linux-amd64-noopt

  • defer internals. Writing _defer.fn. &_defer.fn is 0xc00025d5d8 and the new value is a static pointer (funcval 0xbf6118, seen in both deferproc, and wbBufFlush). The old value must have been nil because the defer was either just allocated or came from the pool, which contains only cleared _defer objects. Nevertheless, the led to flushing one pointer (seen in the third argument to putBatch).

2018-12-03T19:37:57-cef41e0/solaris-amd64-smartosbuildlet

  • defer internals. Writing _defer.fn = nil. &_defer.fn is 0xc000602298. This leads to one pointer being flushed, which must be the old funcval, but that's a static value because the closure was lifted.

2018-12-03T16:48:21-4a1a783/darwin-amd64-10_11

  • defer internals. Writing gp._defer = d. In this case, the object being marked must be a _defer.

2018-12-03T16:48:21-4a1a783/darwin-amd64-race

  • Another object from a sync.Pool.

2018-12-02T22:46:27-b3294d9/darwin-amd64-race

  • Writing a field in a freshly-allocated object. This is odd because it's writing a nil, but the old value must also be nil, yet it's marking 1 pointer.

2018-12-02T22:46:27-b3294d9/linux-386

  • defer internals. Writing _defer.link = nil, so one of the marks is a _defer object.

The top one here may be the most informative failure.

https://storage.googleapis.com/go-build-log/be3b820c/linux-amd64_6758157c.log

  • Assigning gp._defer = d in newdefer. Slot is 0xc0000cb0a8, new value is 0xc00023d700. Marking 0xc00023d700 (the new value). The new defer is siz 8, and links to 0xc000369040, which must have been the old value of gp._defer. There are two possibilities: 1. d was just allocated, in which case it should have been marked and since the allocation happened on the system stack there were no safe-points between the allocation and the assignment; 2. d came from a deferpool, in which case it should have been marked by the write barrier for pp.deferpool[sc][n-1] = nil.

https://storage.googleapis.com/go-build-log/be3b820c/linux-386_3eeb1417.log

  • This one is really weird. It's gp.m = _g_.m in execute, but the object being marked isn't an m. The slot is 0x8cbdc18, which is g.m, and the new value is 0x88f64e0. It's marking 0x8c56000, which is at most a 64 byte object.

https://storage.googleapis.com/go-build-log/fc658a42/linux-amd64_957a1707.log

  • Appending d to pp.deferpool[sc] in freedefer (must be the write of d and not the grow). Slot is 0xc000030be8, new value is 0xc000346000 (which is d, the _defer being freed). Marking 0xc000346000, which is the _defer being freed. The _defer is zeroed, plus room for two words of arguments. In this case, the defer is fd.readUnlock(), so it has one actual word of arguments. (*FD).Read pushed the defer and is returning after a Read syscall.

2018-12-06T22:06:55-4a801cd/darwin-arm-mg912baios

  • Assigning p.arg = nil, where p came from a sync.Pool. Slot is 0xc0002b32e0 (probably the second word of p.arg), new value is 0. Object being marked is 0xc0000b63c0 (zzzzzzPP). Since p.arg is an interface{}, it's hard to say what this object actually is.

2018-12-06T21:10:35-ec0077c/freebsd-386-10_3

  • Assigning pp.deferpool[sc][n-1] = nil in newdefer, which is a *_defer write. Slot is 0x39823988, new value is 0. The defer pool is for 4-byte argument defers. Object being marked is 0x39b44140 and is consistent with a zeroed _defer with 4 bytes of arguments (not zeroed).

Change https://golang.org/cl/154537 mentions this issue: runtime: poison the write barrier buffer during flushing

Change https://golang.org/cl/154598 mentions this issue: runtime: make traceback indicate whether _defer was just allocated

Change https://golang.org/cl/154597 mentions this issue: runtime: record extra information in throwOnGCWork crashes

Change https://golang.org/cl/154599 mentions this issue: runtime: capture pause stack for late gcWork put debugging

Interesting failure that happened to happen while I was adding more debug info:

https://storage.googleapis.com/go-build-log/5f05166b/linux-amd64_f7531d92.log

  • wbBufFlush shows the slot is 0xc0002ae078 and the new value is 0xc0004ee5a0. Shortly thereafter, wbBufFlush1Debug shows the old value is 0x0, but the first two pointers in the buffer are 0xc000524380, 0x0. These should be the only two pointers (I added more debugging to check that after), but that new value disagrees with what we saw in wbBufFlush. Finally, we mark 0xc0000f0000, which is yet another completely different pointer value. We would expect to see a object-rounded value when marking, but the object at this address is only 64 bytes, so we're much too far from the other pointers.

Change https://golang.org/cl/154638 mentions this issue: runtime: flush on every write barrier while debugging

The failures prior to commit 2d00007bdb996dc859ffb16b601e0f51bb72662c were probably misleading because we weren't successfully flushing the write barrier buffer on every write barrier. Hence, it could have been any of up to 512 pointers that triggered marking, making the stack trace and any deduction about the "old" slot value irrelevant. With 2d00007bdb996dc859ffb16b601e0f51bb72662c, hopefully we'll get some more useful failures.

@aclements since you have introduced the change to help debugging this bug, I'm getting a lot of timeouts on aix/ppc64. I have also find that some others builders are failing because of timeouts.

https://build.golang.org/log/b6fe792add74e2aadfba1023ba36788a26a66976
https://build.golang.org/log/4ed9817cb0d1afd91559e0ddc63143c327549907
https://build.golang.org/log/b9b5e202990f5c655147798bcd9024bc75760276
...
(You can find many others)

On aix/ppc64, these timeouts occur because of the infinite loop in your new runtime.(*gcWork).checkPut. I'm aware that there is some problems with P objects on aix and forEachP not done correctly (cf my comments on #29065), but it might be a more general issue.

Anyway, I don't know if it's related to this issue but I can try to provide some traces if you think it can be useful for this issue.

Looking at the dashboard overview, it's not obvious to me that things are any less stable after commit 2d00007bdb996dc859ffb16b601e0f51bb72662c, and the three failures you pointed to aren't obviously related to my change. On the dashboard, the aix/ppc64 builder timed out for a whole batch of commits that included mine but started before it, and it looks like it's healthy again. Maybe the builder was just overloaded?

New failures with reliable write barrier tracebacks:

Unmarked _defer objects

2018-12-20T05:42:07-7e1ec1e/darwin-amd64-10_12

  • d.link = gp._defer in newdefer, where d came from the deferpool. Slot (&d.link) is 0xc000288348, old value is nil, new value (gp._defer) is 0xc00028b440. Marking new value.

2018-12-20T05:42:07-7e1ec1e/windows-386-2008

  • gp._defer = d.link in deferreturn. Slot (&gp._defer) is 0x12c255f4, old value (gp._defer) is 0x130620c0, new value (d.link) is 0x12c923e0. Marking new value.

2018-12-19T19:47:56-d0f8a75/windows-arm

  • gp._defer = d.link in deferreturn. Slot (&gp._defer) is 0x11c235f4, old value (gp._defer) is 0x11eb20c0, new value (d.link) is 0x11c40180. Marking old value.

2018-12-19T17:38:19-e2897e4/android-arm64-wiko-fever

  • gp._defer = d.link in deferreturn. Slot (&gp._defer) is 0xc0000013a8, old value (gp._defer) is 0xc0000e2500, new value (d.link) is 0xc0000e3f40. Marking old value.

2018-12-19T17:38:19-e2897e4/openbsd-386-64

  • gp._defer = d.link in deferreturn. Slot (&gp._defer) is 0x734adb34, old value (gp._defer) is 0x738be090, new value (d.link) is 0x73715590. Marking new value.

2018-12-19T15:47:36-d9e2ba4/linux-386

  • gp._defer = d.link in deferreturn. Slot (&gp._defer) is 0xa912c54, old value (gp._defer) is 0xa8ccd40, new value (d.link) is 0xaa11a20. Marking old value.

2018-12-18T22:40:56-9ded8b0/linux-386

  • d.link = gp._defer in newdefer, where d came from the deferpool. Slot (&d.link) is 0xab7c018, old value is nil, new value (gp._defer) is 0xa410160. Marking new value.

2018-12-18T21:27:51-9d0318f/darwin-amd64-10_10

  • gp._defer = d.link in deferreturn. Slot (&gp._defer) is 0xc0004790a8, old value (gp._defer) is 0xc0001162c0, new value (d.link) is 0xc0001f2ff0. Marking new value.

2018-12-18T21:27:51-9d0318f/freebsd-386-11_1

  • gp._defer = d.link in deferreturn. Marking new value.

2018-12-18T20:57:18-d689946/darwin-amd64-10_12

  • d.link = gp._defer in newdefer, where d came from the deferpool. Marking new value.

2018-12-18T20:55:57-77caea5/plan9-arm

  • gp._defer = d.link in deferreturn. Traceback corrupted.

2018-12-18T20:50:43-d063b7c/freebsd-386-11_1

  • gp._defer = d.link in deferreturn. Marking new value.

Unmarked sudog objects

2018-12-19T15:47:36-d9e2ba4/misc-vet-vetall

  • root.treap = nil in (*semaRoot).dequeue. Slot (&root.treap) is 0xe2de08, old value (root.treap) is 0xc000341bc0, new value is nil. Marking old value. semaRoot.treap is a *sudog, which came from acquireSudog, which, like newdefer, tries to return a sudog from a per-P cache (refilling from a global cache) or heap-allocates a new one.

2018-12-18T20:55:57-77caea5/linux-amd64

  • sgp.prev = x in (*waitq).enqueue, where x is the sudog in q.last. Old value is nil. Marking new value.

It seems like we may be failing to scan a P somehow. The _defers and sudogs are the only heap objects that could be reachable solely from a P. It certainly seems like we must be marking the Ps, though, since so much would break if we incorrectly recycled a P.

Some more observations:

  • The deferpool and sudogcache are actually self-pointers into the p struct.

  • I checked the type bitmap and the bitmap immediately after allocation of a p and they're correct.

  • Ps are only reachable from allp and the stack of gcBgMarkWorker.

  • If you inhibit the garbage collector from scanning allp, all.bash gets surprisingly far. This may be because the ps are usually (though not always) reachable from the stack of the gcBgMarkWorkers. Running in GODEBUG=gcstoptheworld=1 makes it more likely to fail if you do this. However, the failures are far more wide-ranging than the failures we're seeing in this issue, suggesting that we are marking the p objects.

  • If you inhibit the garbage collector from scanning p objects (but allow them to be marked), it usually fails with "traceback has leftover defers".

More detailed analysis of some failure modes:

  • The gp._defer = d.link in deferreturn where the new value is already marked, but the old value is unmarked are particularly interesting. This means that the head of gp's defer list is unmarked, but the second entry in the list is marked. If the head was allocated before the GC cycle, then it should have been marked by heap scanning, and if it weren't then the second defer certainly wouldn't be marked. If the head was allocated (or pulled from the pool) after the GC cycle, then it should have been marked by the write barriers in newdefer or by allocate-black.

  • In 2018-12-19T15:47:36-d9e2ba4/misc-vet-vetall, where root.treap = nil fails, the slot is a global. Most likely (unless there was lock barging) the last write to root.treap was from the end of (*semaRoot).queue doing *pt = s on goroutine 10. If that write happened before the GC cycle, global root scanning should have marked 0xc000341bc0. If that write happened after the GC cycle started, then the write barrier in queue should have marked it.

If you need a repro case: applying the patch over in https://github.com/golang/go/issues/24993#issuecomment-449495346, against 90dca98d33, and then running that build of go against https://github.com/mark-rushakoff/go-issue-24993 seems to semi-reliably hit the panic. Here's one such stack trace:

runtime: late gcWork put
ptrs=0xc0002c0050 s.base()=0xc0002c0000 s.limit=0xc0002c1fe0 s.spanclass=12 s.elemsize=80 s.state=mSpanInUse
 *(ptrs+0) = 0x18
 *(ptrs+8) = 0xc000065aa8
 *(ptrs+16) = 0x108cf3c
 *(ptrs+24) = 0x0
 *(ptrs+32) = 0x0
 *(ptrs+40) = 0xc000396230
 *(ptrs+48) = 0xc0001782a0
 *(ptrs+56) = 0xc00007a240
 *(ptrs+64) = 0xc0000b3b00
 *(ptrs+72) = 0x0
runtime: paused at
runtime.runSafePointFn(...)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/proc.go:1437 +0x5d
runtime.schedule(...)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/proc.go:2493 +0x384
runtime.exitsyscall0(...)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/proc.go:3129 +0x116
runtime.mcall(...)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/asm_amd64.s:299 +0x5b
fatal error: throwOnGCWork

runtime stack:
runtime.throw(0x178d87d, 0xd)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/panic.go:617 +0x72
runtime.(*gcWork).checkPut(0xc000047c70, 0x0, 0xc000047d28, 0x1, 0x200)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/mgcwork.go:152 +0x227
runtime.(*gcWork).putBatch(0xc000047c70, 0xc000047d28, 0x1, 0x200)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/mgcwork.go:220 +0x5f
runtime.wbBufFlush1(0xc000046a00)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/mwbbuf.go:308 +0x1be
runtime.wbBufFlush1Debug(0xc0002c0050, 0xc000396230, 0xc0002c0050, 0xc000047d28, 0xc000047d38)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/mwbbuf.go:230 +0x3a
runtime.wbBufFlush.func1()
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/mwbbuf.go:216 +0x7a
runtime.systemstack(0x1f00000)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/proc.go:1153

goroutine 3899 [running]:
runtime.systemstack_switch()
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/asm_amd64.s:311 fp=0xc0000659a8 sp=0xc0000659a0 pc=0x1056e70
runtime.wbBufFlush(0xc0000b3b28, 0xc000396230)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/mwbbuf.go:206 +0x66 fp=0xc0000659d8 sp=0xc0000659a8 pc=0x1029ce6
runtime.gcWriteBarrier(0xc000065aa8, 0xc0002c0080, 0x18, 0xc0000b3b00, 0x17ac690, 0xc0002c0050, 0xc000065ab0, 0xc000065ad8, 0x108cf8a, 0xc0001782a0, ...)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/asm_amd64.s:1424 +0xb4 fp=0xc000065a60 sp=0xc0000659d8 pc=0x1058fe4
runtime.deferreturn(0xc0001782a0)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/panic.go:372 +0xa4 fp=0xc000065aa8 sp=0xc000065a60 pc=0x102ced4
internal/poll.(*FD).Seek(0xc0001782a0, 0x3f, 0x0, 0x3f, 0x0, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/internal/poll/fd_unix.go:404 +0xca fp=0xc000065ae8 sp=0xc000065aa8 pc=0x108cf8a
os.(*File).seek(...)
    /Users/mr/gotip/src/github.com/golang/go/src/os/file_unix.go:281
os.(*File).Seek(0xc0000100d0, 0x3f, 0x0, 0xc0000b3b00, 0x0, 0x6)
    /Users/mr/gotip/src/github.com/golang/go/src/os/file.go:196 +0x5e fp=0xc000065b48 sp=0xc000065ae8 pc=0x1090bee
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Tombstoner).readTombstoneV4(0xc0001e0be0, 0xc0000100d0, 0xc000065dc0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/tombstone.go:597 +0x74 fp=0xc000065c98 sp=0xc000065b48 pc=0x160e074
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Tombstoner).Walk(0xc0001e0be0, 0xc000065dc0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/tombstone.go:254 +0x2c0 fp=0xc000065d00 sp=0xc000065c98 pc=0x160bdf0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*TSMReader).applyTombstones(0xc0000e6d90, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/reader.go:262 +0x124 fp=0xc000065df8 sp=0xc000065d00 pc=0x15fee74
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*batchDelete).Commit(0xc0000b60b0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/reader.go:622 +0xa7 fp=0xc000065e20 sp=0xc000065df8 pc=0x1600e27
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteSeriesRange.func1(0x18743e0, 0xc0000e6d90, 0xc0000e6d90, 0x17accc8)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1537 +0x5ba fp=0xc000065f70 sp=0xc000065e20 pc=0x1620a7a
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6d90)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:411 +0xc2 fp=0xc000065fb8 sp=0xc000065f70 pc=0x1623192
runtime.goexit()
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000065fc0 sp=0xc000065fb8 pc=0x1058f21
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 1 [chan receive]:
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply(0xc00017ad00, 0xc0001685a0, 0x2701, 0xc0003320e0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:417 +0x152
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteSeriesRange(0xc00017c240, 0xc00068a000, 0x1, 0x2710, 0x8000000000000000, 0x7fffffffffffffff, 0xc000168570, 0x1)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1501 +0x156
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteSeriesRangeWithPredicate(0xc00017c240, 0x185c040, 0xc0003320a0, 0xc0000ed810, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1470 +0x7ae
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteSeriesRange(0xc00017c240, 0x185c040, 0xc0003320a0, 0x8000000000000000, 0x7fffffffffffffff, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1362 +0x7e
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteMeasurement(0xc00017c240, 0xc000166010, 0x3, 0x3, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1770 +0x226
main.main()
    /Users/mr/gomod/go-issue-24993/main.go:58 +0x858

goroutine 34 [select]:
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactCache(0xc00017c240)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1920 +0x11c
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).enableSnapshotCompactions.func1(0xc0000c0c00, 0xc00017c240)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:499 +0x51
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).enableSnapshotCompactions
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:499 +0x128

goroutine 3903 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e22a0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3929 [syscall]:
syscall.syscall(0x1076310, 0x3f, 0x33, 0x0, 0x0, 0x0, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/sys_darwin.go:63 +0x2e
syscall.fcntl(0x3f, 0x33, 0x0, 0x8, 0xc000069d60, 0x113ae9b)
    /Users/mr/gotip/src/github.com/golang/go/src/syscall/zsyscall_darwin_amd64.go:318 +0x58
internal/poll.(*FD).Fsync(0xc0002d69c0, 0x0, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/internal/poll/fd_fsync_darwin.go:21 +0x97
os.(*File).Sync(0xc0000ba098, 0x0, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/os/file_posix.go:133 +0x4a
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Tombstoner).commit(0xc0001e1ea0, 0x17acfe8, 0xc0001e1ea0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/tombstone.go:379 +0x88
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Tombstoner).Flush(0xc0001e1ea0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/tombstone.go:156 +0x6d
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*batchDelete).Commit(0xc0000ba078, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/reader.go:618 +0x6d
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteSeriesRange.func1(0x18743e0, 0xc0000e7730, 0xc0000e7730, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1537 +0x5ba
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7730)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:411 +0xc2
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3921 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e73b0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3759 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6540)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3895 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6bd0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3923 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7490)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3927 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7650)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3893 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6af0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3907 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e2460)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3761 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6620)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3911 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e2620)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3912 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6f50)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3901 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e20e0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3758 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e64d0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3906 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e23f0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3917 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e71f0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3760 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e65b0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3755 [runnable]:
syscall.syscall(0x1076420, 0x3c, 0xc000718000, 0x10000, 0x0, 0x0, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/runtime/sys_darwin.go:63 +0x2e
syscall.read(0x3c, 0xc000718000, 0x10000, 0x10000, 0x0, 0x0, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/syscall/zsyscall_darwin_amd64.go:1169 +0x66
syscall.Read(...)
    /Users/mr/gotip/src/github.com/golang/go/src/syscall/syscall_unix.go:172
internal/poll.(*FD).Read(0xc0000bd4a0, 0xc000718000, 0x10000, 0x10000, 0x0, 0x0, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/internal/poll/fd_unix.go:165 +0x12c
os.(*File).read(...)
    /Users/mr/gotip/src/github.com/golang/go/src/os/file_unix.go:246
os.(*File).Read(0xc0000b60b8, 0xc000718000, 0x10000, 0x10000, 0x0, 0x1853720, 0x100a6db)
    /Users/mr/gotip/src/github.com/golang/go/src/os/file.go:108 +0x6e
bufio.(*Reader).Read(0xc0000b95c0, 0xc0001d6080, 0xa, 0x200, 0xc000364660, 0x100b7d5, 0x16da560)
    /Users/mr/gotip/src/github.com/golang/go/src/bufio/bufio.go:220 +0x22c
io.ReadAtLeast(0x1853480, 0xc0000b95c0, 0xc0001d6080, 0xa, 0x200, 0xa, 0xc0001d6060, 0xc0000b95c0, 0x379)
    /Users/mr/gotip/src/github.com/golang/go/src/io/io.go:310 +0x88
io.ReadFull(...)
    /Users/mr/gotip/src/github.com/golang/go/src/io/io.go:329
compress/gzip.(*Reader).readHeader(0xc0001d6000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /Users/mr/gotip/src/github.com/golang/go/src/compress/gzip/gunzip.go:175 +0xd6
compress/gzip.(*Reader).Reset(0xc0001d6000, 0x1853480, 0xc0000b95c0, 0x58, 0x102cc01)
    /Users/mr/gotip/src/github.com/golang/go/src/compress/gzip/gunzip.go:113 +0x124
compress/gzip.NewReader(0x1853480, 0xc0000b95c0, 0xc000364c30, 0xc000718000, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/compress/gzip/gunzip.go:94 +0x58
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Tombstoner).readTombstoneV4(0xc0001e01e0, 0xc0000b60b8, 0xc000364dc0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/tombstone.go:611 +0x1c7
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Tombstoner).Walk(0xc0001e01e0, 0xc000364dc0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/tombstone.go:254 +0x2c0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*TSMReader).applyTombstones(0xc0000e60e0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/reader.go:262 +0x124
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*batchDelete).Commit(0xc0000b6098, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/reader.go:622 +0xa7
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteSeriesRange.func1(0x18743e0, 0xc0000e60e0, 0xc0000e60e0, 0x17accc8)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1537 +0x5ba
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e60e0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:411 +0xc2
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3892 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6a10)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3925 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7570)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3924 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7500)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3900 [runnable]:
compress/gzip.NewReader(0x1853480, 0xc0000bd5c0, 0xc00035fc30, 0xc000728000, 0x0)
    /Users/mr/gotip/src/github.com/golang/go/src/compress/gzip/gunzip.go:92 +0xa5
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Tombstoner).readTombstoneV4(0xc0001e0c80, 0xc0000b60c0, 0xc00035fdc0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/tombstone.go:611 +0x1c7
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Tombstoner).Walk(0xc0001e0c80, 0xc00035fdc0, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/tombstone.go:254 +0x2c0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*TSMReader).applyTombstones(0xc0000e6e00, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/reader.go:262 +0x124
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*batchDelete).Commit(0xc0000100c8, 0x0, 0x0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/reader.go:622 +0xa7
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteSeriesRange.func1(0x18743e0, 0xc0000e6e00, 0xc0000e6e00, 0x17accc8)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/engine.go:1537 +0x5ba
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6e00)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:411 +0xc2
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3920 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7340)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3922 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7420)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3914 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e70a0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3891 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6770)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3909 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e2540)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3918 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7260)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3926 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e75e0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3919 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e72d0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3928 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e76c0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3908 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e24d0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3894 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6b60)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3905 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e2380)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3916 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7180)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3757 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e63f0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3902 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e2230)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3904 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e2310)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3915 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e7110)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3910 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0001e25b0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113

goroutine 3913 [chan send]:
github.com/influxdata/influxdb/pkg/limiter.Fixed.Take(...)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/pkg/limiter/fixed.go:40
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply.func1(0xc000174120, 0xc0002d68a0, 0xc0001685a0, 0x18743e0, 0xc0000e6fc0)
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:406 +0x39
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Apply
    /Users/mr/go/pkg/mod/github.com/influxdata/[email protected]/tsdb/engine/tsm1/file_store.go:405 +0x113
exit status 2

The first time I hit the error, I ran the reproducer in a loop 25-30 times. Then I left it to run again, and it succeeded 1252 times (running for about an hour on my machine) before it triggered the panic again. I'm happy to keep running it on my machine if further stack traces are helpful - let me know if you want me to apply any GODEBUG settings or anything.

Scratch that. After c5414457c62fc11f299946a46f6c868c4f0bf2ab, which is a slightly different fix than the comment I linked previously, I've run the repro script in a loop for a couple hours and haven't been able to trigger the panic. I'm intending to leave it running overnight just in case, but at this point I am doubting that it will trigger the panic again.


Update: it did trigger a few times overnight. Laptop got shut without my knowledge so I don't know how long it was actually executing, but here's another 5 stack traces.

1545459522-5112.txt
1545459016-14874.txt
1545454376-18389.txt
1545453172-14941.txt
1545451721-3021.txt

Change https://golang.org/cl/156017 mentions this issue: runtime: don't spin in checkPut if non-preemptible

@aclements We're assuming we should wait on this for RC1, let us know if you feel otherwise.

CL 156017 didn't prevent the late gcWork put error (but I think that was expected).

Using go version devel +64096dbb69 Wed Jan 2 21:21:53 2019 +0000 darwin/amd64, and repeatedly running go test -race -run=Task github.com/influxdata/platform/http from https://github.com/influxdata/platform/commit/278f67925cc197f57a32244546fc660df2712e6e (because that was a case where I happened to see a late gcWork put in the past), here's another 20+ stack traces over the course of running that in a loop for about 16 hours.

1546522489-22255.txt
1546521051-27381.txt
1546515641-1038.txt
1546515404-17480.txt
1546510336-4510.txt
1546506868-9166.txt
1546506362-31361.txt
1546504512-19560.txt
1546502232-32045.txt
1546501261-30542.txt
1546499267-31881.txt
1546495909-24024.txt
1546493582-31537.txt
1546493514-32457.txt
1546491250-1139.txt
1546491175-20329.txt
1546490394-25780.txt
1546490041-8027.txt
1546488386-29902.txt
1546484304-6055.txt
1546483677-8850.txt
1546481108-15098.txt
1546478442-22548.txt
1546471442-16184.txt

Change https://golang.org/cl/156140 mentions this issue: runtime: work around "P has cached GC work" failures

@mark-rushakoff, thanks for the pointer to the influx tests. I'll see if I can reproduce it as easily using those.

For reference, I had to install bzr and fetch that package in modules-mode since the HEAD of some of its dependencies is broken:

mkdir /tmp/z
cd /tmp/z
echo "module m" > go.mod
go get -d github.com/influxdata/platform/http@278f679
go test -c -race github.com/influxdata/platform/http

CL 156140 has been submitted, which works around this issue, so I'm going to remove release-blocker. We still don't understand the root cause, however, so the issue will remain open.

Change https://golang.org/cl/156318 mentions this issue: doc/go1.12: remove known issue note

I think I've figured it out.

I've been running the influx test with a lot of extra runtime debug logging and have been getting some interesting output. Here's an annotated subset of a failure (go-stress-20190104T162318-083562928; I've posted just the last GC cycle since the log is huge).

The unmarked object is 0xc000575800, which is a _defer.

| log | notes |
| -- | -- |
| [1.915567099 P 1] setGCPhase 1 | Start last GC. 0xc000575800 is 2nd in some G's defer list. WB buffer is full size right now, and we only log on flush, so WB logging can be delayed. |
| ... | Lots of stuff on all Ps |
| [1.922270499 P 2] deferreturn 0xc000602740 link 0xc000575800 | Now head of list. gp._defer = d.link should have added to WB buf as new value. |
| [1.922273799 P 2] deferreturn 0xc000575800 link 0x0 | Running defer. gp._defer = d.link should have added to WB buf as old value. |
| [1.922274257 P 2] freedefer 0xc000575800 | Should have added to WB buf as new value when inserting into pool |
| [1.922275588 P 2] newdefer from pool 0xc000575800 | defer gotten from pool. Again should have added to WB buf |
| [1.922276023 P 2] newdefer link defer 0x0 | Now head of list. Should have added to WB buf |
| ... | No logs from P 2. Notably, no WB marks from P 2. |
| [1.922292123 P 1] gcMarkDone flush | Starting mark completion. WB buffering disabled. |
| [1.922293357 P 1] gcMarkDone flushing 1 |
| [1.922293869 P 3] gcMarkDone flushing 3 |
| [1.922294506 P 0] gcMarkDone flushing 0 |
| ... | No logs from P 2, but plenty from other Ps |
| [1.922574537 P 3] deferreturn 0xc000575800 link 0x0 | Running defer. |
| [1.922575525 P 3] wb mark 0xc000575800 | Finally we see it actually getting marked. This mark spins and ultimately fails. |
| ... | A bunch of stuff on P 0 |
| [1.922863641 P 2] gcMarkDone flushing 2 | This flushes P 2's WB buffer, which presumably contains 0xc000575800, but P 3 already marked it above, so it doesn't get marked here, and gcMarkDone considers P 2 to be clean. |
| [1.922902570 P 1] gcMarkDone flush succeeded | But shouldn't have because P 3 has pending work or is dirty. |

The high-level sequence of events is:

  1. Object x is white. Suppose there are three P (and thus write barrier buffers). All Ps are clean and have no pending GC work buffers. Write barrier buffer 0 contains x.
  2. P 1 initiates mark completion. Ps 1 and 2 pass the completion barrier successfully.
  3. P 2 performs a write barrier on object x and greys it. (As a result, P 2 now has pending GC work.)
  4. P 0 enters the completion barrier, which flushes its write barrier buffer. Since x was greyed in step 2, this does not grey x. Hence, P 0 passes the completion barrier successfully.

Now all of the Ps have passed the barrier, but in fact P 2 has pending work. If we enter mark termination at this point, there may even be white objects reachable from x.

I'm not sure why this affects defers so disproportionately. It may simply be that defers depend more heavily on write barrier marking than most objects because of their access patterns.

The workaround I committed a couple days ago will fix this problem. I'm still thinking about how to fix it "properly".

I also need to go back to the proof and figure out where reality diverged from the math. I think the core of the problem is that pointers in the write barrier buffer are only "sort of" grey: they haven't been marked yet, so other Ps can't observe their grey-ness). As a result, you wind up with a black-to-"sort-of-grey" pointer, and that isn't strong enough for the proof's assumptions.

The proof missed a case. It assumed that new GC work could only be created by consuming GC work from a local work queue. But, in fact, the write barrier allows new work to be created even if the local work queue is empty, as long as some queue is non-empty. In particular, if some queue is non-empty, that means some object is grey. Since that's just in the heap, any P can walk from that object to a white object reachable from it and mark that object using a write barrier, causing work to be added to its local queue (which may have been empty). This implicit communication of work isn't tracked by the "flushed" bit in the algorithm/proof, which is what causes the algorithm to fail.

I'm still thinking about how to fix the algorithm. A stronger algorithm could ensure that all work queues were empty at the beginning of the successful completion round. I think a two-round algorithm could do this. However, there's some cost to this since the GC would be stuck in limbo for the whole second round with no work to do, but unable to enter mark termination.

I think a weaker algorithm is still possible where the write barrier sets a global flag if it creates work during termination detection, and we check that flag after the ragged barrier. I haven't convinced myself that this is right yet.

Since we have a workaround and I'm not going to fix the root cause for 1.12, I'm bumping this to 1.13.

Change https://golang.org/cl/262350 mentions this issue: runtime: remove debugCachedWork

Was this page helpful?
0 / 5 - 0 ratings