When table is created, the table is sorted by primary key. so, primary key index can easily find data.
for example,
CREATE TABLE IF NOT EXISTS table1 on cluster1
(
CounterID UInt32,
Date Date,
Category String
) ENGINE = MergeTree()
ORDER BY (CounterID)
The data will be sorted by CounterID. so, when the CH access granules(.idx), It can easily find the data(.bin)
but, If I add index like ALTER TABLE table1 ADD INDEX idx1 (Category) TYPE set(0) GRANULARITY 1;, I do not know How to make index. Because the data already is sorted by CounterID.
Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?
related: #329
In ClickHouse, the data is only stored in order of the order by clause and primary key info is generated based on the primary key clause(default same to order by clause).
When executing query, the primary key index always works on the granules level (filtering or retaining a granule).
For the skipping index, it also works on the granules level(secondary filtering after the primary key has been filtered). It stores information that helps filter granule
without changing the order of the data store. For example, the minmax index always records the minimum and maximum value for each granule.
but, If I add index like ALTER TABLE table1 ADD INDEX idx1 (Category) TYPE set(0) GRANULARITY 1;, I do not know How to make index. Because the data already is sorted by CounterID.
Index information already exists for the newly written data.
For previously written data, you need to wait for the merge task(which is asynchronous) updates index information,
but you can also use the following statement to update(synchronously) the index information for all data.
sql
OPTIMIZE TABLE xxx FINAL;
Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?
As far as I know, modifying the order by clause is not currently supported. I remember there was an issue about it. but I couldn't find it. @alexey-milovidov
Thanks for information. but, if you find the answer of below question,
Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?
Please comment on this issue.
Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?
ClickHouse only sorts data by order key (primary key). And the granules of secondary indices cover existing ranges of primary key.
As far as I know, modifying the order by clause is not currently supported. I remember there was an issue about it.
There was support for extending ORDER BY key and adding a new column simultaneously. As far as I remember, it was implemented only for non-replicated MergeTree tables by feature request of one internal user and was removed.
Most helpful comment
related: #329
In ClickHouse, the data is only stored in order of the
order by clauseand primary key info is generated based on theprimary key clause(default same toorder by clause).When executing query, the
primary key indexalways works on the granules level (filtering or retaining a granule).For the
skipping index, it also works on the granules level(secondary filtering after the primary key has been filtered). It stores information that helps filter granulewithout changing the order of the data store. For example, the
minmax indexalways records the minimum and maximum value for each granule.Index information already exists for the newly written data.
For previously written data, you need to wait for the
merge task(which is asynchronous) updates index information,but you can also use the following statement to update(synchronously) the index information for all data.
sql OPTIMIZE TABLE xxx FINAL;As far as I know, modifying the
order by clauseis not currently supported. I remember there was an issue about it. but I couldn't find it. @alexey-milovidov