Cockroach: cdc,sql: core changefeeds don't work as expected with `vectorized=on`

Created on 15 Oct 2020  路  4Comments  路  Source: cockroachdb/cockroach

Describe the problem

When a core changefeed runs, it pushes rows out one a time, when they are available. This seems at odds with the goals of the column materialized used in vectorized execution. The problem seems to be that the execution engine waits for a batch of rows and just hangs. This results in core changefeeds effectively not working with the vectorized execution engine.

To Reproduce

Create a table with a small number of rows.

SET CLUSTER SETTING kv.rangefeed.enabled=true;
CREATE TABLE foo(i INT PRIMARY KEY);
INSERT INTO foo VALUES (1), (2);
SET vectorize=on;
\set display_format=csv
EXPERIMENTAL CHANGEFEED FOR foo WITH resolved='5s'; -- this will just hang.

Expected behavior

The above should work. We need to either set a batch size of 1 or provide some other mechanism to sync the changefeed.

Environment:

  • CockroachDB version 20.1.6

Additional context

It seems that this bufferring in the materializer also leads to issues during shutdown and is the reason we see https://github.com/cockroachdb/cockroach/issues/55408.

A-cdc A-sql-execution C-bug

All 4 comments

cc @asubiotto to triage for @cockroachdb/sql-execution. I have not verified whether this remains an issue in 20.2.

Seems that this works in 20.2 and master.

In order to repro trivially on release-20.1:

diff --git a/pkg/ccl/changefeedccl/helpers_test.go b/pkg/ccl/changefeedccl/helpers_test.go
index d1385ba17d..079b24bcf3 100644
--- a/pkg/ccl/changefeedccl/helpers_test.go
+++ b/pkg/ccl/changefeedccl/helpers_test.go
@@ -226,6 +226,7 @@ func sinklessTest(testFn func(*testing.T, *gosql.DB, cdctest.TestFeedFactory)) f
                defer s.Stopper().Stop(ctx)
                sqlDB := sqlutils.MakeSQLRunner(db)
                sqlDB.Exec(t, `SET CLUSTER SETTING kv.rangefeed.enabled = true`)
+               sqlDB.Exec(t, `SET CLUSTER SETTING sql.defaults.vectorize = on`)
                // TODO(dan): We currently have to set this to an extremely conservative
                // value because otherwise schema changes become flaky (they don't commit
                // their txn in time, get pushed by closed timestamps, and retry forever).

make test PKG=./pkg/ccl/changefeedccl/ TESTS=TestChangefeedBasics/sinkless TESTFLAGS='-v --show-logs' // will time out

Summarizing the discussion from slack.

The root of the problem is that Columnarizer has buffering behavior - in 20.1 it will be hanging until coldata.BatchSize() (1024 by default) rows are emitted by the changefeed. We'll need to fix it somehow (probably adding a knob to the columnarizer whether buffering behavior is prohibited). On 20.2 due to dynamic batch size behavior it will still be hanging but in a slightly different manner.

This is not a problem on 20.2 and the current master because the vectorized engine will not be used for the changefeed DistSQL flow since the vectorized row count threshold is never met for it (the estimated row count for the plan is 0, so unless a user does SET vectorize_row_count_threshold=0; or SET vectorize=experimental_always;, we will always use row-by-row engine). In 20.1 the meaning of vectorize=on was different - we never looked at the threshold and used the vectorized engine if it was supported.

This shows that the changefeeds are incompatible with vectorize=on setting on 20.1 branch. I think this deserves a callout in the docs as a known limitation. cc @ericharmeling

Thank you for raising this! CC/ @glerchundi

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bdarnell picture bdarnell  路  4Comments

richardanaya picture richardanaya  路  3Comments

otan picture otan  路  4Comments

xudongzheng picture xudongzheng  路  3Comments

nvanbenschoten picture nvanbenschoten  路  3Comments