This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.
Sentry link: https://sentry.io/organizations/cockroach-labs/issues/1891013445/?referrer=webhooks_plugin
Panic message:
*errors.errorString
*safedetails.withSafeDetails: format: "pebble/table: invalid table %s (checksum mismatch at %d/%d)" (1)
reader.go:1466: *withstack.withStack (top exception)
panic.go:679: *withstack.withStack (2)
(check the extra data payloads)
Stacktrace (expand for inline code snippets):
/usr/local/go/src/runtime/panic.go#L678-L680 in runtime.gopanic
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/storage/pebble_iterator.go#L449-L451 in pkg/storage.(pebbleIterator).destroy
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/storage/pebble_iterator.go#L173-L175 in pkg/storage.(pebbleIterator).Close
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/storage/mvcc.go#L3482-L3484 in pkg/storage.MVCCFindSplitKey
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/kv/kvserver/replica_command.go#L336-L338 in pkg/kv/kvserver.(Replica).adminSplitWithDescriptor
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/kv/kvserver/split_queue.go#L207-L209 in pkg/kv/kvserver.(splitQueue).processAttempt
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/kv/kvserver/split_queue.go#L163-L165 in pkg/kv/kvserver.(splitQueue).process
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/kv/kvserver/queue.go#L957-L959 in pkg/kv/kvserver.(baseQueue).processReplica.func1
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/util/contextutil/context.go#L134-L136 in pkg/util/contextutil.RunWithTimeout
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/kv/kvserver/queue.go#L916-L918 in pkg/kv/kvserver.(baseQueue).processReplica
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/kv/kvserver/queue.go#L844-L846 in pkg/kv/kvserver.(baseQueue).processLoop.func1.2
https://github.com/cockroachdb/cockroach/blob/2c6646c4c39fedb23f724c98a8d83970b35cee37/pkg/util/stop/stopper.go#L346-L348 in pkg/util/stop.(*Stopper).RunAsyncTask.func1
/usr/local/go/src/runtime/asm_amd64.s#L1356-L1358 in runtime.goexit
/usr/local/go/src/runtime/panic.go in runtime.gopanic at line 679
pkg/storage/pebble_iterator.go in pkg/storage.(*pebbleIterator).destroy at line 450
pkg/storage/pebble_iterator.go in pkg/storage.(*pebbleIterator).Close at line 174
pkg/storage/mvcc.go in pkg/storage.MVCCFindSplitKey at line 3483
pkg/kv/kvserver/replica_command.go in pkg/kv/kvserver.(*Replica).adminSplitWithDescriptor at line 337
pkg/kv/kvserver/split_queue.go in pkg/kv/kvserver.(*splitQueue).processAttempt at line 208
pkg/kv/kvserver/split_queue.go in pkg/kv/kvserver.(*splitQueue).process at line 164
pkg/kv/kvserver/queue.go in pkg/kv/kvserver.(*baseQueue).processReplica.func1 at line 958
pkg/util/contextutil/context.go in pkg/util/contextutil.RunWithTimeout at line 135
pkg/kv/kvserver/queue.go in pkg/kv/kvserver.(*baseQueue).processReplica at line 917
pkg/kv/kvserver/queue.go in pkg/kv/kvserver.(*baseQueue).processLoop.func1.2 at line 845
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunAsyncTask.func1 at line 347
/usr/local/go/src/runtime/asm_amd64.s in runtime.goexit at line 1357
| Tag | Value |
|---|---|
| Cockroach Release | v20.2.0-alpha.3 |
| Cockroach SHA: | 2c6646c4c39fedb23f724c98a8d83970b35cee37 |
| Platform | linux amd64 |
| Distribution | CCL |
| Environment | v20.2.0-alpha.3 |
| Command | start-single-node |
| Go Version | ``|
| # of CPUs ||
| # of Goroutines ||
I believed this is me, just about to report :smile:
I've the whole cockroach-data directory backed up, in case you need.
Seems like a dup of #54197, and we have a user willing to share more details. cc @jbowens
Hey @kocoten1992, thanks for following up. Can you send me the file 021790.sst from the cockroach-data directory? Also, if you have still have the log files, it'd be helpful to have any of the files prefixed with cockroach-pebble. My email is jackson at cockroachlabs.com.
Also, what was happening on the cluster when it first crashed with the corruption. Was an IMPORT in progress?
Definitely not IMPORT,
This cluster workload is:
delete * from table_name where 1=1 which is faster).Can't recall which step cause this (all automation).
I've sent you email, thanks!
I've got another today, this time restart systemd cockroachdb would make it go away,
Sep 26 22:57:04 stormtrooper cockroach[643765]: *
Sep 26 22:57:04 stormtrooper cockroach[643765]: * ERROR: [n1,s1,r4/1:/System/tsd{-/cr.nod…}] a panic has occurred!
Sep 26 22:57:04 stormtrooper cockroach[643765]: * pebble/table: invalid table 008243 (checksum mismatch at 2833211/17453)
Sep 26 22:57:04 stormtrooper cockroach[643765]: * (1) attached stack trace
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | runtime.gopanic
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /usr/local/go/src/runtime/panic.go:679
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleIterator).destroy
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_iterator.go:450
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleIterator).Close
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_iterator.go:174
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | [...repeated from below...]
Sep 26 22:57:04 stormtrooper cockroach[643765]: * Wraps: (2) attached stack trace
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble/sstable.(*Reader).readBlock
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/sstable/reader.go:1466
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).loadBlock
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/sstable/reader.go:212
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).skipForward
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/sstable/reader.go:422
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).Next
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/sstable/reader.go:398
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble.(*levelIter).Next
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/level_iter.go:441
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble.(*mergingIter).nextEntry
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/merging_iter.go:495
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble.(*mergingIter).Next
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/merging_iter.go:981
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble.(*Iterator).mergeNext
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/iterator.go:271
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble.(*Iterator).findNextEntry
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/iterator.go:112
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/pebble.(*Iterator).Next
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/iterator.go:447
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleIterator).Next
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_iterator.go:201
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/cockroach/pkg/storage.ComputeStatsGo
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:3544
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sha512
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_consistency.go:628
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func1.1
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:242
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func1
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:248
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | runtime.goexit
Sep 26 22:57:04 stormtrooper cockroach[643765]: * | /usr/local/go/src/runtime/asm_amd64.s:1357
Sep 26 22:57:04 stormtrooper cockroach[643765]: * Wraps: (3) 4 safe details enclosed
Sep 26 22:57:04 stormtrooper cockroach[643765]: * Wraps: (4) pebble/table: invalid table 008243 (checksum mismatch at 2833211/17453)
Sep 26 22:57:04 stormtrooper cockroach[643765]: * Error types: (1) *withstack.withStack (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString
Sep 26 22:57:04 stormtrooper cockroach[643765]: *
Sep 26 22:57:04 stormtrooper cockroach[643765]: *
Sep 26 22:57:04 stormtrooper cockroach[643765]: * ERROR: [n1,s1,r4/1:/System/tsd{-/cr.nod…}] Queued as error 8fe612e3e6b4491ca9ba699e20e2f78a
Sep 26 22:57:04 stormtrooper cockroach[643765]: *
I've backup for you
@jbowens Worth taking a look at this and seeing if this looks like another single bit-flip corruption.
@kocoten1992 do you mind emailing me 008243.sst?
I've sent it
Thanks @kocoten1992! I first checked the checksums themselves, and they were not off by a single bit.
ede5b46d (11101101111001011011010001101101), 4072822f (1000000011100101000001000101111)
I next tried to decode the block ignoring the checksum failure, and snappy reported the input as corrupt. Next I tried flipping each bit in the block, decoding and recalculating the checksum. Flipping the 96019-th bit of the block resulted in the block successfully decoding and the calculated checksum matched the checksum in the block trailer.
@kocoten1992 — It does seem like some part of your hardware is experiencing intermittent bit flips. I suppose it could be your RAM _or_ hard drive.
thanks for letting me know, guess I'll close this, from now on I'll look for AMD cpu only, they support ECC ram in consumer grade,
thanks again!