Clickhouse: Auto-increment column are not support

Created on 19 Jun 2019 · 17Comments · Source: ClickHouse/ClickHouse

(you don't have to strictly follow this form)

Use case
When inserting data, you need to uniquely identify each row of records using Integer.
For example: AUTO_INCREMENT in mysql:

CREATE TABLE `test_auto_id` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

Describe the solution you'd like
FYI:
https://dev.mysql.com/doc/refman/5.5/en/example-auto-increment.html

feature st-discussion

Source

andyyzh

👍1

All 17 comments

This would be good for pagination too, limit&offset has very poor performance if you have millions of rows. Would be good if we could use auto incremented indexed fields to make the pagination.

MeteHanC on 24 Jun 2019

👍1

For distributed scenarios it's hard to guarantee 'fair' autoincrement between shards/replicas (but with some limitations and zookeeper coordination probably doable).

filimonov on 1 Jul 2019

@filimonov K-sortable identifiers, would need to touch ZK only after node reboots to ensure clock didn't go backwards irc.

nvartolomei on 1 Jul 2019

Auto-increments are useless. In multi-shard they could be implemented only with cache.
So each shard increment id+100m(b)il. in ZK and uses this cache. But cache makes id uncoordinated with time of insert through all shards.

den-crane on 1 Jul 2019

@filimonov K-sortable identifiers

Yep. Using time based identifiers is one of the possible workaround / 'limitations' (while it's not equal to 'autoincement').

would need to touch ZK only after node reboots to ensure clock didn't go backwards irc.

ntpd sounds much better (one-time sync is always bad due to clock drift etc).

In multi-shard they could be implemented only with cache.

Theoretically it's doable with zookeeper, while a bit tricky. I.e. each insert read current value of the counter (N), do atomic increment to the number of records (M), i.e. counter get a value N+M, and write column with values [N..N+M-1]. In case if something went wrong during insert a 'hole' in numbering will appear.

Currently inserts to Replicated also require zookeepers, so anyway it require zookeeper.

In general - I agree that autoincements sound like unnecessary complication.

filimonov on 1 Jul 2019

It’s possible to get some magic here if we generalize Summing and Replacing merge tree. Then some columns would be summing and others would be replacing. So, if we always insert 1 in some summing column it would be incremented by merges. Of cause you need selects with FINAL there.

4ertus2 on 2 Jul 2019

So, if we always insert 1 in some summing column it would be incremented by merges.

But that will require reducing number of rows, right? With autoincrement the idea is to preserve all rows giving each the unique number.

BTW (while talking about workarounds): did anybody try something like xxx UInt64 DEFAULT now()*1000000 + rowNumberInBlock() ? :D

filimonov on 2 Jul 2019

But that will require reducing number of rows, right?

Yup, it’s autoincrement + unique PK. For real autoincrement with not uniq data we need timestamp + more complex aggregating function. But it’s still possible if we have the same sorting order each time (do not allow inserts in the past)

4ertus2 on 2 Jul 2019

For real autoincrement with not uniq data we need timestamp + more complex aggregating function.

I don't think it's a good way. The simplest solution is @nvartolomei proposal: create some function like ksuid() (see) or similar, and use it as a DEFAULT.

filimonov on 2 Jul 2019

My point is it's possible to implement autoincrement as selected information, not as an index. And I agree that it's not user generally wants. The feature request is for autoincremented column in index. And it probably will not be implemented ever. But if someone needs autoincrement outside of index, there could be some solutions.

4ertus2 on 2 Jul 2019

And it probably will not be implemented ever.

Actually @andyyzh is contributor. I'm not sure if he don't plan to add some variant of autoincrement? :)

filimonov on 2 Jul 2019

Just thoughts, looks like it can be possible to implement auto increment with the same manner as MySQL does:

Support autoincrement_increment and autoincrement_offset in order to avoid intersection among shards
For each table - keep an atomic counter for new data. When server restarts - try find maximum value from existing data

alex-krash on 2 Jul 2019

👀1 👍1

Just thoughts, looks like it can be possible to implement auto increment with the same manner as MySQL does:

Support autoincrement_increment and autoincrement_offset in order to avoid intersection among shards

For each table - keep an atomic counter for new data. When server restarts - try find maximum value from existing data

In that way if insert are not balanced increments on one shard /replica can be far behind others.

filimonov on 2 Jul 2019

Just thoughts, looks like it can be possible to implement auto increment with the same manner as MySQL does

Yes, it's totally possible.

The only drawback is that even on single replica INSERTs may go out of order (first data is written to filesystem, then it is committed and multiple INSERTs can process in parallel) and INSERT may be rejected (the interval of numbers will be skipped but unused).

If you are ok with this proposal, it's easy to do.

alexey-milovidov on 11 May 2020

👍1

Distibuted Auto-increment using ZK.

create distributed sequence x_id cache 1billion

ZK x_id = 0
Host books a billion values x_id +=1b and uses this reservation until it exhaustion. Books next. Holes up to cache size is possible but it does not matter.

den-crane on 18 May 2020

🎉1

nice!

andyyzh on 19 May 2020

Hey team. I vote to @alex-krash suggestion!