Cockroach: sql: support SERIAL PK for appending rows to a table

Created on 18 Feb 2016 · 9Comments · Source: cockroachdb/cockroach

For tables in which new rows are intended to be appended in chronological order support for SERIAL PK would be very desirable. While at first blush it would seem that having to synchronize atomic updating of a sequence number generator would add excessive overhead, it can be noted that since each appended row will be inserted in the last range for the table, the sequence number generator could be stored with a RocksDB key that is composed so that it always is positioned after the highest PK value of the table. IOW, it is located in the same range as where the appended row will be inserted in the key store. So, the transaction and consensus overhead to manage the sequence number should be minimal and optimal and convenient and readable (1..n) for the "chronological append" access pattern.

C-enhancement

Source

JackKrupansky

Most helpful comment

I think we should just support SERIAL as an alias for INTEGER DEFAULT unique_rowid() and document that our ID generation is different from other databases'. I think that even if we supported postgres-style SEQUENCES, the guarantees provided by SERIAL are so weak that we should not map this feature onto the more expensive distributed sequence implementation.

The one thing we'd be missing is that I believe postgres does provide strictly increasing values for SERIAL when those values are inserted outside of a transaction. We can't provide that even with an explicit SEQUENCE, unless we update the sequence as a part of the transaction.

bdarnell on 8 Apr 2016

👍2

All 9 comments

This is an interesting idea, though there are implementation complexities that may make it infeasible. Before getting into them I'd like to understand the use case better and question whether a SERIAL PK actually solves them. Consider the following:

CREATE TABLE t (k SERIAL PRIMARY KEY, v INT);

Now txn1 does:

BEGIN; INSERT INTO t (v) VALUES (1);

Notice the lack of commit. And txn2 does:

BEGIN; INSERT INTO (v) VALUES (2); COMMIT;

At this point, only txn2 has committed and the table contains:

# SELECT * FROM t
 k | v
---+---
 2 | 2

Now if we commit txn1 the table contains:

# SELECT * FROM t
 k | v
---+---
 1 | 1
 2 | 2

So the SERIAL PK didn't actually enforce that the rows were appended to the table at commit time. The SERIAL refers to the time of the INSERT statement which might differ from when the effects of the statement become visible. The above was tested using Postgres, though I'm pretty sure the same thing happens on MySQL with auto increment keys.

So what is the requirement behind the chronological ordering?

petermattis on 19 Feb 2016

I was under the impression that the first transaction would be forced to restart - to enforce serialization. I was assuming that the full statement would be restarted. Sounds like you are saying that the raw mutations would be restarted but the evaluation of default values would not. Good point to consider though!

JackKrupansky on 19 Feb 2016

I'm not sure how Postgres would handle these statements in serializable isolation, but Cockroach could certainly be made to restart the transaction on conflicts. This would guarantee the row is appended to the table at transaction commit time, but it would also kill performance when multiple transactions are inserting into the table concurrently.

petermattis on 19 Feb 2016

I went through that recent SQLAlchemy log - it has lots of SERIAL PKs. Typically relies on the values being 1..n small integers and assumes first is 1, etc.

I have some notes on different access patterns for append - low volume/velocity, high volume/velocity, etc. Certainly high volume/velocity would be sub-optimal for implicit counter update with frequent tx restart, but that is not the only append case. I do think that having the simple SERIAL case both aids migration and works well for some fraction of cases.

Worst case, support for the SERIAL append access pattern would assure that tx restart gets a healthy dose of performance optimization, which would help many cases unrelated to SERIAL or append.

In any case, I wouldn't list this as a high priority for initial Beta, but would urge for consideration in a later Beta release, again to ease PG migration.

JackKrupansky on 20 Feb 2016

The automatic primary key feature is already spelled differently in every database (SERIAL in postgres, INTEGER AUTO_INCREMENT in mysql, etc), so this is something that can be swapped out in sqlalchemy or any other ORM (https://github.com/bdarnell/cockroach-python/commit/e1fad2f57eb25d37d613ba5a1fa3bf2971cf2418)

Both mysql and postgres increment their counters outside of the transaction, so they can leave gaps in the sequence and allow records to be committed out of order. I think if you want to guarantee in-order commits you have to do an explicit SELECT MAX() to force the transactions to conflict. Given that, it's unclear whether there's an actual need for something that's closer to other databases' auto-increment behavior than our current experimental_unique_int(). (storing the counter at the end of the table is a clever idea, though, if we do want to provide something like this)

bdarnell on 21 Feb 2016

bdarnell on 8 Apr 2016

👍2

I think we should just support SERIAL as an alias for INTEGER DEFAULT unique_rowid() and document that our ID generation is different from other databases'.

I think that would be great. Any chance to schedule / assign this soon ?