Cockroach: sql: v20.2.0: unexpected leftover bytes in index backfiller

Created on 26 Oct 2020  路  9Comments  路  Source: cockroachdb/cockroach

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/1981679526/?referrer=webhooks_plugin

Panic message:

>crash_reporting.go:338: index-backfill-mon: unexpected 20480 leftover bytes

*errutil.leafError: index-backfill-mon: unexpected 20480 leftover bytes (1)
crash_reporting.go:338: *withstack.withStack (top exception)
(check the extra data payloads)


Stacktrace (expand for inline code snippets):

https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/util/log/crash_reporting.go#L337-L339 in pkg/util/log.ReportOrPanic
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/util/mon/bytes_usage.go#L403-L405 in pkg/util/mon.(BytesMonitor).doStop
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/util/mon/bytes_usage.go#L390-L392 in pkg/util/mon.(
BytesMonitor).Stop
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/backfill/backfill.go#L516-L518 in pkg/sql/backfill.(IndexBackfiller).Close
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/rowexec/indexbackfiller.go#L115-L117 in pkg/sql/rowexec.(
indexBackfiller).close
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/rowexec/backfiller.go#L237-L239 in pkg/sql/rowexec.(backfiller).mainLoop
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/rowexec/backfiller.go#L138-L140 in pkg/sql/rowexec.(
backfiller).doRun
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/rowexec/backfiller.go#L122-L124 in pkg/sql/rowexec.(backfiller).Run
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/flowinfra/flow.go#L391-L393 in pkg/sql/flowinfra.(
FlowBase).Run
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/distsql_running.go#L421-L423 in pkg/sql.(DistSQLPlanner).Run
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/backfill.go#L1018-L1020 in pkg/sql.(
SchemaChanger).distBackfill.func4
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/kv/db.go#L706-L708 in pkg/kv.(DB).Txn.func1
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/kv/txn.go#L810-L812 in pkg/kv.(
Txn).exec
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/kv/db.go#L705-L707 in pkg/kv.(DB).Txn
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/backfill.go#L957-L959 in pkg/sql.(
SchemaChanger).distBackfill
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/backfill.go#L1499-L1501 in pkg/sql.(SchemaChanger).backfillIndexes
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/backfill.go#L299-L301 in pkg/sql.(
SchemaChanger).runBackfill
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/schema_changer.go#L1406-L1408 in pkg/sql.(SchemaChanger).runStateMachineAndBackfill
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/schema_changer.go#L669-L671 in pkg/sql.(
SchemaChanger).exec
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/schema_changer.go#L2042-L2044 in pkg/sql.schemaChangeResumer.Resume.func1
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/sql/schema_changer.go#L2154-L2156 in pkg/sql.schemaChangeResumer.Resume
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/jobs/registry.go#L1075-L1077 in pkg/jobs.(Registry).stepThroughStateMachine
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/jobs/adopt.go#L243-L245 in pkg/jobs.(
Registry).runJob
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/jobs/adopt.go#L180-L182 in pkg/jobs.(Registry).resumeJob.func1
https://github.com/cockroachdb/cockroach/blob/9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2/pkg/util/stop/stopper.go#L346-L348 in pkg/util/stop.(
Stopper).RunAsyncTask.func1
/usr/local/go/src/runtime/asm_amd64.s#L1356-L1358 in runtime.goexit

pkg/util/log/crash_reporting.go in pkg/util/log.ReportOrPanic at line 338
pkg/util/mon/bytes_usage.go in pkg/util/mon.(*BytesMonitor).doStop at line 404
pkg/util/mon/bytes_usage.go in pkg/util/mon.(*BytesMonitor).Stop at line 391
pkg/sql/backfill/backfill.go in pkg/sql/backfill.(*IndexBackfiller).Close at line 517
pkg/sql/rowexec/indexbackfiller.go in pkg/sql/rowexec.(*indexBackfiller).close at line 116
pkg/sql/rowexec/backfiller.go in pkg/sql/rowexec.(*backfiller).mainLoop at line 238
pkg/sql/rowexec/backfiller.go in pkg/sql/rowexec.(*backfiller).doRun at line 139
pkg/sql/rowexec/backfiller.go in pkg/sql/rowexec.(*backfiller).Run at line 123
pkg/sql/flowinfra/flow.go in pkg/sql/flowinfra.(*FlowBase).Run at line 392
pkg/sql/distsql_running.go in pkg/sql.(*DistSQLPlanner).Run at line 422
pkg/sql/backfill.go in pkg/sql.(*SchemaChanger).distBackfill.func4 at line 1019
pkg/kv/db.go in pkg/kv.(*DB).Txn.func1 at line 707
pkg/kv/txn.go in pkg/kv.(*Txn).exec at line 811
pkg/kv/db.go in pkg/kv.(*DB).Txn at line 706
pkg/sql/backfill.go in pkg/sql.(*SchemaChanger).distBackfill at line 958
pkg/sql/backfill.go in pkg/sql.(*SchemaChanger).backfillIndexes at line 1500
pkg/sql/backfill.go in pkg/sql.(*SchemaChanger).runBackfill at line 300
pkg/sql/schema_changer.go in pkg/sql.(*SchemaChanger).runStateMachineAndBackfill at line 1407
pkg/sql/schema_changer.go in pkg/sql.(*SchemaChanger).exec at line 670
pkg/sql/schema_changer.go in pkg/sql.schemaChangeResumer.Resume.func1 at line 2043
pkg/sql/schema_changer.go in pkg/sql.schemaChangeResumer.Resume at line 2155
pkg/jobs/registry.go in pkg/jobs.(*Registry).stepThroughStateMachine at line 1076
pkg/jobs/adopt.go in pkg/jobs.(*Registry).runJob at line 244
pkg/jobs/adopt.go in pkg/jobs.(*Registry).resumeJob.func1 at line 181
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunAsyncTask.func1 at line 347
/usr/local/go/src/runtime/asm_amd64.s in runtime.goexit at line 1357

| Tag | Value |
|---|---|
| Cockroach Release | v20.2.0-rc.2 |
| Cockroach SHA: | 9fe9b6d95e858d9f7d65d9fd661311a88d9daaf2 |
| Platform | linux amd64 |
| Distribution | CCL |
| Environment | v20.2.0-rc.2 |
| Command | start-single-node |
| Go Version | ``|
| # of CPUs ||
| # of Goroutines ||

C-bug O-sentry T-bulkio

All 9 comments

This one concerns me. @yuzefovich is there any chance that recent changes in memory monitoring and flow lifecycle code would address this?

Most likely explanation for this report is that we didn't close all memory accounts before closing the memory monitor (another possibility in case there is some concurrency is that there is a race, but I hope that's not likely), I'll take a quick look at the code.

The problem we had with memory monitoring in the index backfiller was that we simply didn't shrink the account ever which led to us to significantly over-account. I'm not aware of any other changes in that area.

Hm, I don't see anything wrong with the lifecycle of the memory monitoring infrastructure there.

I did find something suspicious: we seem to be not using a memory account created in ColumnBackfiller.init. cc @adityamaru

The column backfiller does create a bound account it doesn't use, but it also seems to close it correctly. Further, this error is in the index backfiller code path, so I'm not sure that is the root cause here. Still investigating.

Yeah, I didn't mean that that unused memory account is the root cause of this issue - I just pointed it out that we should either removed the unused account or use it.

Another instance of this #56566

Just wanted to note that we've started to see an uptick of these.

They're not panics, since we only report to Sentry and not crash on such things in release builds, but I think we should look into this.

馃憤馃徏 going to give tracing this down another shot. I'll also bring it up with the team in case they see something I'm missing.

Another instance of this #57366.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

intech picture intech  路  3Comments

xudongzheng picture xudongzheng  路  3Comments

richardanaya picture richardanaya  路  3Comments

nvanbenschoten picture nvanbenschoten  路  3Comments

tim-o picture tim-o  路  3Comments