Cockroach: sql: primary keys cannot be changed

Created on 9 Oct 2017 · 24Comments · Source: cockroachdb/cockroach

It is not currently possible to change a table's primary key, e.g. via

ALTER TABLE foo DROP CONSTRAINT old_pkey;
ALTER TABLE foo ADD PRIMARY KEY (new, pkey, cols)

This becomes more important in 1.2 because the upcoming partitioning feature (#18683) will only allow partitioning by prefixes of the primary key. For example, suppose you want to partition this table by country:

CREATE TABLE users (
  id INT PRIMARY KEY,
  name STRING,
  country STRING,
  ...
)

The table needs to have a composite primary key that lists country first:

CREATE TABLE users (
  id INT,
  name STRING,
  country STRING,
  ...,
  PRIMARY KEY (int, country)
) PARTITION BY LIST (country)...

This essentially requires that the schema be designed with partitioning in mind from the get-go until we can alter primary keys.

A-partitioning A-schema-changes A-sql-pgcompat C-enhancement X-anchored-telemetry

Source

benesch

All 24 comments

/cc @awoods187 for triage

benesch on 9 Oct 2017

@benesch, Do you mean we should support SQL syntax like:

ALTER TABLE foo ADD PRIMARY KEY (new, pkey, cols)

Or just change the primary key of table users from id to (id, country) , when we try to add the partition for this table but the partition column is not in the primary?

We should change the data organization when we try to change the primary key of the specified table, this operation's overhead is quite expensive:

we should try to insert the same data within the replica, but changed the key of data from /TableID/IndexID/ColumnID/id to /TableID/IndexID/ColumnID/(id, country), the value also changed with the same way, and clear the original key-value.
change the TableDescriptor.
backfill.

a6802739 on 31 Oct 2017

@a6802739 I mean we should support arbitrary primary key changes like

ALTER TABLE foo ADD PRIMARY KEY (new, pkey, cols)

Since changing a table's primary key is such an expensive operation, we definitely don't want to do it implicitly if the partitioning scheme specifies columns that are not in the primary key. Much rather error in that case, and let the user alter the pkey only if they so choose.

Note that traditionally changing the pkey on a table with an existing pkey has required two steps:

ALTER TABLE foo DROP CONSTRAINT foo_pkey;
ALTER TABLE foo ADD PRIMARY KEY (new, pkey, cols);

That's going to be needlessly expensive in Cockroach. Dropping the primary key requires creating a hidden rowid column to use as the implicit pkey—this already happens when tables without a primary key are created from scratch—and rewriting all the data in terms of rowdid, just to rewrite it all in terms of (new, pkey, cols) when the new primary key is added. We should do one or more of the following:

Introduce new syntax to alter a primary key in one step:
ALTER TABLE foo ALTER PRIMARY KEY TO (new, pkey, cols)
Coalesce the operations when they occur in the same statement:
ALTER TABLE foo DROP CONSTRAINT foo_pkey, ADD PRIMARY KEY (new, pkey, cols);

Coalesce the operations when they appear in ALTER TABLEs within the same txn with no intervening writes:

BEGIN;
ALTER TABLE foo DROP CONSTRAINT foo_pkey;
ALTER TABLE foo ADD PRIMARY KEY (new, pkey, cols);
COMMIT;

My preference would be to eventually support all three, but start by supporting just option one. Options two and three are better for compatibility, but changing a table's primary key probably isn't something you want to lean on your ORM for.

Option three has some pathological edge cases FWIW:

CREATE TABLE foo (a INT PRIMARY KEY, b STRING);
INSERT INTO foo (a, b) VALUES (1, "a");
BEGIN;
ALTER TABLE foo DROP CONSTRAINT foo_pkey;
INSERT INTO foo (a, b) VALUES (1, "b");
ALTER TABLE foo ADD PRIMARY KEY (b);
COMMIT;

@jordanlewis @vivekmenezes @danhhz @knz @cockroachdb/sql-async does this sound reasonable?

benesch on 2 Nov 2017

Yes it's reasonable but @a6802739 please do not underestimate the complexity of the task! As much as possible, please split the work in small steps that can be easily reviewed. It might be beneficial that you study the problem and write down your implementation plan before you start (and perhaps share it with us).

knz on 2 Nov 2017

I meant to write "reasonable" in my previous comment.

knz on 2 Nov 2017

@knz, I will first try to understand what should I do for this task, and then write down how to implement it. Maybe I will have some confusion in this process, hope could get some help then.

Thank you very much.

a6802739 on 2 Nov 2017

@a6802739 Thank you, as always, for your help. I've discussed this project with the team, and because other high-priority projects with upcoming deadlines rely on this feature, we'll need to develop this feature internally.

cuongdo on 2 Nov 2017

assigned to @danhhz to make sure this finds a good home

cuongdo on 2 Nov 2017

🎉1

Has there been any thought on how this would be implemented? Changing the primary key is difficult because it is stored in indexes. One thought is that we make a copy of all of the secondary indexes and backfill both them and the new primary index and afterwards delete the old secondary indexes and primary index. Seems like that would conceptually work, though it would be slow.

@mjibson, @danhhz, @vivekmenezes Thoughts?

petermattis on 13 Mar 2018

It depends on whether there's a requirement for this to happen online.

If it needs to be online, which is what @benesch's first suggestion implies (ALTER TABLE ALTER PRIMARY KEY), then I agree with your suggestion. The phases in that case would be the same as adding an index, more or less:

The new indexes and table descriptor would be created
The old table descriptor would be updated so that If the table had n secondary indexes, the write path would then end up writing 2 * (n + 1) rows per update, guaranteeing that all writes from then on are reflected in the new tables.
A backfill process begins for the table and all of its secondary indexes, at some time t before the write path began writing to the new copies.
Once the backfill is finished, it's safe to switch reads to come from the new copies, and begin deleting the old copies.

If it doesn't have to be online, then perhaps that first step of "dropping the primary key" could effectively just put the table into read only mode until a new primary key is added, at which point we could perform the backfills above without having to do the table descriptor double write path dance.

jordanlewis on 13 Mar 2018

It depends on whether there's a requirement for this to happen online.

I'd be sad if changing a table's primary key took the table offline. That said, it would still a big improvement over our current "just dump it to CSV and reimport it!" recommendation.

benesch on 13 Mar 2018

I've been assuming that any schema change needs to be asynchronous. Let's not lower this bar yet.

petermattis on 13 Mar 2018

👍1

We talked about this yesterday briefly, but just one data point from a customer who wanted to change column types generally (not just primary keys) -> okay with this locking up a table while the change is going on. I don't fully understand the term backfilling, but are we also using this term to describe the faster process we've been talking about re doing something with sstables (vague because I only get 20% of it)? If not, I wonder if someone would want a fast, but offline primary key swap versus a slow but online primary key swap.

dianasaur323 on 14 Mar 2018

okay with this locking up a table while the change is going on

I'm not sure I believe this. Well, I believe the customer has stated that locking the table is ok, but we've also heard many times that async schema changes massively reduce operational burden.

petermattis on 14 Mar 2018

I think you're right - it's definitely 100x better to have async schema changes. That being said, if one takes way way longer than the other, perhaps it changes our decision? Either way, this is obviously an engineering decision, but just wanted to provide that single relevant data point here.

dianasaur323 on 14 Mar 2018

I think this is a low impact high effort issue. This change is valuable to a customer who is

running in production
running globally
running with partitioning
wants to change the primary key

I think number 4 in particular is rare.

I'll also bring up that when we do this we need to first refactor the codebase to accommodate this change (it should be clear that the code can be modifying two tables). It's very easy to make this change and land up with what we had with the sql executor but this time on the sql DML code paths.

I'm just saying that we better think about the weight of this change on the stability of the system and the priority of it for our customers for 2.1.

vivekmenezes on 15 Mar 2018

I agree that it's high effort, but I think you're understating the impact. We have at least two features where the primary key is crucial (partitioning and interleaving), and even without that people sometimes want to change their primary keys (existing databases all have this feature). It's unrealistic to expect people to be able to choose the correct PK when first creating the table.

okay with this locking up a table while the change is going on

I'm not sure I believe this. Well, I believe the customer has stated that locking the table is ok, but
we've also heard many times that async schema changes massively reduce operational burden.

If taking the table offline is acceptable, then an export/edit/import process would work. I think if we're going to do this we should do the work to make it async/online.

We talked about this yesterday briefly, but just one data point from a customer who wanted to change column types generally (not just primary keys)

Changing types of non-PK columns is a separate matter. It's much easier, and I think it's higher priority than changing PKs.

bdarnell on 15 Mar 2018

I suppose we can implement this within the same table ID space by allowing a primary index to have a different index id than the default primary index id. then the process might be

Create a covering index using all columns with the new id
recreate the existing indexes on the table
update the FKs from other tables to use the new indexes.
switch out the old primary and secondary indexes by placing them in the WRITE ONLY at this step all the SELECT statements will start using the new indexes.
then go through the DELETE_ONLY step and purge the old indexes.

vivekmenezes on 15 Mar 2018

I'd add that I'd originally thought that this change was going to make the sql codebase a lot more complex because I thought we had to add the notion of writing to two tables. With my comment above I feel confident that we can implement this using the same table ID which is a significant reduction in complexity of this change IMHO.

One unknown is how this might affect our interleaved tables implementation, because that's an index with two or more table ids in it and cannot be modeled as a covering index. Some more thought is needed there.

vivekmenezes on 15 Mar 2018

👍1

Needed for https://github.com/cockroachlabs/support/issues/267

awoods187 on 19 Feb 2019

This item only shows examples where the old primary key is removed first. As outlined in #40771 there's also a scenario possible where a table was defined without a primary key in the first place, and as such nothing was deleted.

@jordanlewis I'm perfectly fine with closing the issue I mentioned above. Should this issue also be given the label for postgres compatibility, or would that be counter-effective?

Freeaqingme on 26 Sep 2019

In CockroachDB a table always has a primary key behind the scenes, so this issue does cover your case as well. Thanks for noticing the missing compat label - I've added it.

jordanlewis on 26 Sep 2019

Can this issue be closed once https://www.cockroachlabs.com/docs/v20.1/alter-primary-key.html is stable?