Cockroach: sql: users surprised by the non-atomicity of TRUNCATE

Created on 26 Jul 2018  路  13Comments  路  Source: cockroachdb/cockroach

User observes this "strange" behavior:

43268153-e0f0418c-90bd-11e8-8560-e1bd1d271204-edit

There are two problems at hand here:

  • it takes an abnormally long time for the truncate to propagate (1 minute according to the user). We'd expect the other nodes to start using the new descriptor nearly as soon as truncate completes (subject to the table lease expiration delay, which is a few seconds).

  • the truncate is not atomic. We set expectations that all statements in CRDB are transactions, so this is an example we should clean up, or make it very clear in docs where the exceptions lie.

A-docs A-schema-changes C-investigation S-3-ux-surprise docs-done docs-known-limitation

All 13 comments

Discussed with @vivekmenezes:

We're unlikely to change the atomicity of TRUNCATE, but at least the documentation should be updated to clarify this.

Also the very long observed delay is abnormal and still needs to be investigated.

cc @rmloveland this will need to be documented as known limitation

@vivekmenezes also asks:

we need to ask the user how he timed it, because he is saying T0 + 10s , but the previous select took 17s

@knz t0 is when the first query finished the first execution, not when it started. It should have been: t0 + 17s + 10s.

@vivekmenezes the user has clarified the delay. Do we have enough information to analyze this from our side?

@vivekmenezes Can I get a quick blurb describing this known limitation w/r/t the impact to user experience? Ideally, we need it by Friday 10/26 for the 2.1 Known Limitations page. Posting it on this issue and/or pinging me would be great.

@sploiselle I see the limitations for schema changes discussed at https://www.cockroachlabs.com/docs/v2.1/online-schema-changes.html
in the limitations section.

I think what need to be do is make it explicit on the TRUNCATE page
https://www.cockroachlabs.com/docs/stable/truncate.html
that TRUNCATE is a schema change and thus suffers from all the limitations of schema changes.
I'd also suggest maintaining a separate link to the limitations section of online schema changes.

It seems reasonable for the TRUNCATE statement/txn to have a post-commit that waits for the old table descriptor to have no active leases. That would make it effectively "transactional".

Related: #42061

This seems to have been fixed some time ago. I'm not quite sure when. At least in 19.2 and the current release we wait for the leases on the old table to be released. I assumed we didn't when I saw this issue.

I tried to reproduce this in v2.1 and earlier and failed. We do seem to wait for the old table's leases to drain. Perhaps there's some deeper bug here but it's not clear what it is. If anybody can produce a repro of this behavior, I'll happily work to understand it.

@ajwerner, since this issue is closed, does that mean we can resolve/remove this known limitation from the 19.2 and/or 20.1 docs? https://www.cockroachlabs.com/docs/dev/known-limitations.html#truncate-does-not-behave-like-delete

That comment is vague enough that it might still apply. In particular, in a 3 node cluster, it is possible for 3 connections to observe the following anomaly which is not possible with deletes:

n1@t1: TRUNACE TABLE foo -- blocks until t4
n2@t2: SELECT * FROM foo; INSERT INTO bar VALUES ... -- observes truncate, writes rows
n3@t3: SELECT * FROM foo; SELECT * FROM bar -- does not observe truncate, does observe n2's writes.
n1@t4: Truncate concludes

This is weird and violates our consistency model in ways that relate to online schema changes and thus make that note reasonable. What this issue seemed to be about was a failure to observe the truncate after the truncate statement returned (after t4 in the above example).

OK. Thanks, @ajwerner. I'll leave the limitation as-is for now then.

Was this page helpful?
0 / 5 - 0 ratings