Cockroach: opt: Support partitioned uniqueness checks

Created on 11 Oct 2019  Â·  18Comments  Â·  Source: cockroachdb/cockroach

This is an idea advanced by @danhhz and @sumeerbhola that is relevant to our efforts to make multi-region applications easier. I've added some thoughts on how we might support this in the optimizer.

Consider this schema:

CREATE TABLE users (
    region STRING NOT NULL,
    email STRING NOT NULL,
    PRIMARY KEY (region, email)
) PARTITION BY LIST (region) (
    PARTITION us VALUES IN ('us-east-2'),
    PARTITION europe VALUES IN ('eu-west-3'),
    PARTITION asia VALUES IN ('ap-northeast-2')
);
-- omitting all the zone configuration stuff

Assume there is a requirement that email addresses are domiciled in the corresponding country of origin (because they are PII). However, there is also a requirement that email addresses are globally unique.

One possible approach is to create a global secondary index that enforces uniqueness:

CREATE UNIQUE INDEX unique_email_index_us ON users (email);
ALTER INDEX users@unique_email_index_us CONFIGURE ZONE USING lease_preferences='[[+region=us-east-2]]';
-- omitting the declarations for Europe and Asia

However, this would put copies of the index in each region, which would violate the domiciling requirement. In addition, inserting a new user record (or updating the email) would require writing to every region, which adds I/O and multiplies storage requirements.

Note that any attempt to create a partitioned index on email that domiciles properly fails to ensure uniqueness:

CREATE UNIQUE INDEX unique_email_index ON users (region, email)
PARTITION BY LIST (region) ...

Here, email uniqueness is only enforced _per region_, not globally.

So what is the solution? Enforcing email uniqueness using either a global or partitioned index does not work. Instead, we need to do a "virtual" uniqueness check:

CREATE TABLE users (
    region STRING NOT NULL,
    email STRING NOT NULL UNIQUE VIRTUAL,
    PRIMARY KEY (region, email)
) PARTITION BY LIST (region) ...

Similar to virtual computed columns, a virtual unique constraint does not trigger the creation of an underlying storage artifact (i.e. column or index). Instead, before the email column is inserted or updated, a check query would be run to ensure that the new email value does not yet exist in _any region_. This would typically be planned as a fan-out query by the optimizer, resulting in a point lookup on each region using the (region, email) index. Note that if a good index does not exist, the optimizer would plan a table scan, so it's important that an appropriate partitioned index exists when using UNIQUE VIRTUAL constraints (we could consider automatically creating one).

Summing up the behavior of a UNIQUE VIRTUAL constraint:

  1. When constraint columns (e.g. the email column in this example) are inserted/updated, a fan-out query must be run against all regions. However, no writes need to be made to other regions, as with a global index.
  2. Updates that do not involve the constraint column do not trigger the uniqueness check query. Latency should be local, as no cross-region reads or writes are required.
  3. Reading local rows (i.e. those domiciled locally) is fast, requiring no cross-region reads.

The optimizer would likely implement UNIQUE VIRTUAL constraints in a way very similar to how it implements foreign key check constraints today. It attaches a list of check queries to the mutation operator, and those are run in parallel. If any fail, it results in an error that aborts the transaction. The uniqueness check would just be another check of that sort.

Adding support for UNIQUE VIRTUAL columns would be a building block to a future where multi-region applications are easy(ier):

CREATE TABLE users (
    email STRING PRIMARY KEY,
) PARTITION BY LOCALITY (region)

Syntax similar to this would:

  1. Automatically add a hidden region column that is filled by default with the gateway node's region on INSERT.
  2. Partition all indexes (primary + secondaries) by the region partitioning key column.
  3. Pin partitions to localities using zone configs.
  4. Create a UNIQUE VIRTUAL constraint check on any unique columns (or sets of columns).
  5. Create a partitioned index for each unique constraint for lookup performance reasons.
A-multiregion A-partitioning C-enhancement

Most helpful comment

I missed the initial discussion of this, but am in strong support of the idea. It seems like the natural mechanism to enforce uniqueness across a geopartitioned table without violating the constraint that all of a row's datums are "homed" within a single region. This is an important invariant that will help us provide much stronger latency guarantees than we would otherwise be able to. For instance, it allows us to guarantee that DELETEing a local row will always be a local operation (mod fk checks) and that an INSERT of a row will always _write_ to the local region, even if it needs to read from other regions. It also keeps the door open for data domiciling and datum rehoming in the future.

However, I'll echo @knz's thoughts from https://github.com/cockroachdb/cockroach/issues/41535#issuecomment-569262709. I'm not convinced that we need to introduce new syntax for this functionality, or even that it needs to be exposed to users at all. Instead, I think this is how we'll want to implement all UNIQUE constraints in REGIONAL tables (including the primary key constraint [1]). This fits with the model to prefix all of a REGIONAL table's indexes with the region column and keep them all partitioned.

@rytaft, @RaduBerinde, and I just met to discuss this and came to an interesting realization. It turns out that the idea to "look for a row in the local region and fall back to a fanout query to all other regions" (do we have a name for this? "locality-optimized search"?) depends directly on this functionality. Without a UNIQUE constraint on the logical primary key column(s), the optimizer can't conclude that a "hit" in the local region is sufficient to skip the remote fanout. Such behavior relies on uniqueness being enforced for the logical primary key column itself.

[1] though we will still explore a data type that can optimize away the uniqueness query.

All 18 comments

CC: @sumeerbhola, @danhhz, @rytaft, @justinj, @awoods187, @jordanlewis, @andreimatei

I really like this idea.

How does one address this if they don't get it right the first time? Or if they are already in production? I assume that its equivalent to needing to change the primary key and will need to be considered for that work @solongordon

How does one address this if they don't get it right the first time? Or if they are already in production?

That's an easy one:

ALTER TABLE t
   ALTER CONSTRAINT x_unique UNIQUE VIRTUAL;
DROP INDEX t@t_x_unique;

So I'm all for the feature, but I think the keyword "VIRTUAL" is misguided. See at the bottom for the alternative proposal.

The way I think about this is to remind myself that uniqueness is treated by both pg and the SQL standard as a constraint, like CHECK and foreign key relationships. Constraints are really stored in a table's metadata separately from the columns.

Constraints have two things of interest in pg's world, that crdb is part of:

  • they may or may not have a supporting index. For example, foreign key constraints do not have necessarily a supporting index, when the app knows that writes to the referenced table are infrequent (this is now supported by crdb).

  • they have a name. The name show up e.g. in pg_constraints. The name is conceptually independent from that of the supporting index (in particular the constraint always has a name even without a supporting index). The name also enables operating on the constraint using ALTER TABLE ALTER CONSTRAINT.

In fact, pg also considers UNIQUE in CREATE to implicitly declare a constraint. It's not a property of the column; it's certainly not just an implicit unique index declaration: it's an entry in the constraint list, which happens to also be supported by an index.

I'll remind you that this syntax is canonical:

CREATE TABLE t(
   x INT,
   CONSTRAINT x_unique UNIQUE (x)
);

and that x INT UNIQUE is merely a shorthand. (Ditto CONSTRAINT ... FOREIGN KEY.)

With this in mind, the feature requested at the top is not "virtual uniqueness". There is absolutely nothing virtual about this.

It's simply a constraint declaration without a supporting index ("indexless uniqueness check").

The "new" thing here is that indexless-ness in pg, when supported, is opt-out (the app must add the index for constraints without one, if it wants them), so there is no syntax to force it. However we do have already two perfect keywords for "I don't want an index": WITHOUT and INDEX!

Hence:

CREATE TABLE t(
   x INT,
   CONSTRAINT x_unique UNIQUE WITHOUT INDEX (x)
);

(possibly aliased to x INT UNIQUE WITHOUT INDEX if you love shorthands).

PG already has VIRTUAL computed columns, where the word VIRTUAL is used in opposition to STORED. The column really does exist in the schema, it just isn't backed by physical column storage. Isn't this an analogous case? The unique constraint really does exist in the schema, it just isn't backed by a physical index storage.

the word VIRTUAL is used in opposition to STORED

Here you are: although it seems analogous there's a key distinction: in the distinction virtual/stored, the _data_ either exists or does not exist on disk, in the physical world. This appeals to the fundamental meaning of the word "virtual" in English where there is no physical reality.

With uniqueness checking, in either case the uniqueness is not a "thing" that exists. It is not "data"—it is a property of other data.

Think about it: if you create a table with a unique constraint, then drop the constraint and the supporting index without updating (or adding) rows, did the column group become magically non-unique?

Arguably, given that uniqueness is an emergent property of a dataset, we could say that it is _always_ virtual regardless of whether there is a supporting index or even a constraint declaration.

To come back to the proposal, let's remember that constraint declarations are _primarily_ a specification about which checks to performs during SQL mutations (the exploitation of FK declarations for planning read-only queries was, historically, an unforeseen late addition). They are _specifications for mutation mechanisms_ just as much as—and, in my opinion, even more than—descriptions of the properties of relationships between rows/tables.

In that light, "WITHOUT INDEX" is more informative of what is going on.

So to make the syntax proposal even more concrete:

| UNIQUE WITHOUT INDEX '(' name_list ')'

Where STORING, INTERLEAVE, PARTITION BY, etc. are not allowed, because they only make sense with a real backing index. Also, it'd just be a simple name list, since ASC/DESC don't make sense.

Something like that, yes.
Implementation-wise, the common way to do this is to simplify the parsing code by parsing the more general form, then restricting what's allowable during planning.

There are two cases to consider: the col_qualification_elem (shorthand) and constraint_elem (full).

For the first case it's rather straightforward and I'd do this as follows:

1) to add a NoIndex bool field to tree.UniqueConstraint
2) add a opt_without_index non-terminal with type bool and empty / WITHOUT INDEX rules to set the boolean
3) change col_qualification_elem to parse UNIQUE opt_without_index and feed the bool from $2 into tree.UniqueConstraint

For the second case there's a trade-off. You can either add a NoIndex bool to tree.UniqueConstraintTableDef and later during planning check that the unwanted fields are unset if the bool is true. Then change the one parsing clause of constraint_elem to use opt_without_index.

Or you can split UniqueConstraintTableDef into two separate AST nodes UniqueConstraintWithIndexTableDef (with a full IndexTableDef within) and UniqueConstraintWithoutIndexTableDef (with just a column name list). Then parse using separate rules.

| | 1 AST node with bool | 2 AST nodes |
|------|----------------------|-------------|
| Pros | Less code to implement, easier to read/maintain on the parsing side | Easier to read/maintain on the planning side |
| Cons | Cannot exclude ASC and DESC specifiers from the column list, even during planning | More code overall, introduces cognitive overhead |

My opinion would be to go with the 1 AST node (and tell users in doc "ASC and DESC are meaningless and will be ignored") until we have enough experimental evidence that the other solution is preferable.

I missed the initial discussion of this, but am in strong support of the idea. It seems like the natural mechanism to enforce uniqueness across a geopartitioned table without violating the constraint that all of a row's datums are "homed" within a single region. This is an important invariant that will help us provide much stronger latency guarantees than we would otherwise be able to. For instance, it allows us to guarantee that DELETEing a local row will always be a local operation (mod fk checks) and that an INSERT of a row will always _write_ to the local region, even if it needs to read from other regions. It also keeps the door open for data domiciling and datum rehoming in the future.

However, I'll echo @knz's thoughts from https://github.com/cockroachdb/cockroach/issues/41535#issuecomment-569262709. I'm not convinced that we need to introduce new syntax for this functionality, or even that it needs to be exposed to users at all. Instead, I think this is how we'll want to implement all UNIQUE constraints in REGIONAL tables (including the primary key constraint [1]). This fits with the model to prefix all of a REGIONAL table's indexes with the region column and keep them all partitioned.

@rytaft, @RaduBerinde, and I just met to discuss this and came to an interesting realization. It turns out that the idea to "look for a row in the local region and fall back to a fanout query to all other regions" (do we have a name for this? "locality-optimized search"?) depends directly on this functionality. Without a UNIQUE constraint on the logical primary key column(s), the optimizer can't conclude that a "hit" in the local region is sufficient to skip the remote fanout. Such behavior relies on uniqueness being enforced for the logical primary key column itself.

[1] though we will still explore a data type that can optimize away the uniqueness query.

On the syntax, I just want to point out that the way to create a unique index is with CREATE UNIQUE INDEX so it will be confusing to users if that doesn't actually create an index. Perhaps it should automatically create the partitioned version of the index which would be suitable for use by the check queries.

I agree. We should absoultely create an index for each unique constraint like we currently do – this index should just be partitioned on the region column and enforced using a fan-out query across the closed set of possible regions. (It's a little tough for me to follow what's still being considered here, maybe we're all in agreement already with this)

Random thought - have we ever considered optimizing the solution to this problem through the use of bloom filters? The thinking is that in addition to the index we'd have a set of bloom filters on the uniqueness column. While the actual data has sovereignty to a given region, it _may_ be acceptable to replicate the bloom filters (which never give precise information as to the underlying data). Then, with the bloom filters in place, and replicated using non-voting replicas, we could perform uniqueness checking in the vast majority of cases without going cross-region (on the assumption that in most cases, uniqueness will not be violated and the bloom filter is large enough to reliably tell us that).

There are still questions around whether or not the bloom filter replication will be acceptable, and whether or not its writing and replication would be justified to save the cost of the cross-region uniqueness check, but if those work out, I could see this significantly improving transaction times.

This is interesting. I don't think I fully understand what we get from the bloom filter if it's being pushed asynchronously to non-voting replicas. Even if the bloom filter avoided all false positives, it seems like it would still have false negatives as an artifact of it being accessed inconsistently. If we're ok with uniqueness violations due to such inconsistencies then we could probably just perform the full partitioned uniqueness check on local non-voting replicas, since we expect to have a full copy of the data in every region. But I don't think we want to get in the business of allowing uniqueness violations. It's possible that I'm missing something here.

After talking this over with @nvanbenschoten, I'm not sure bloom filters would help here, as we'd get a potential savings on the uniqueness check read path, at the expense of additional latency on the write path (to replicate the bloom filters and ensure consistency). Probably not the best trade-off.

I pulled the locality optimized search portion of this into a separate issue: https://github.com/cockroachdb/cockroach/issues/55185.

So this issue can remain focused on the creation and use of partitioned unique indexes.

This would typically be planned as a fan-out query by the optimizer, resulting in a point lookup on each region using the (region, email) index. Note that if a good index does not exist, the optimizer would plan a table scan, so it's important that an appropriate partitioned index exists when using UNIQUE VIRTUAL constraints (we could consider automatically creating one).

Can we prevent creating a UNIQUE WITHOUT INDEX constraint if an appropriate index does not exist? It seems nefarious to give users the power to turn all their INSERTs into full table scans with the relatively innocent looking ALTER TABLE t ADD CONSTRAINT x_unique UNIQUE WITHOUT INDEX (x).

One option is to print a NOTICE warning users that any mutations affecting the unique column(s) will require a full table scan. As I mentioned in https://github.com/cockroachdb/cockroach/pull/55700#issuecomment-712435519, I think UNIQUE WITHOUT INDEX is going to be important for implementing and testing uniqueness checks, and there may be some power users that will want to use the feature.

If a user adds a UNIQUE WITHOUT INDEX constraint, how do we validate that the constraint holds for all existing data? Does that involve a SELECT DISTINCT query in general?

Edit: This question is addressed by #56201.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mjibson picture mjibson  Â·  3Comments

nvanbenschoten picture nvanbenschoten  Â·  3Comments

xudongzheng picture xudongzheng  Â·  3Comments

danhhz picture danhhz  Â·  3Comments

melskyzy picture melskyzy  Â·  3Comments