Clickhouse: If I add index, How does the Clickhouse make granules?

Created on 29 Oct 2019  路  4Comments  路  Source: ClickHouse/ClickHouse

When table is created, the table is sorted by primary key. so, primary key index can easily find data.

for example,

CREATE TABLE IF NOT EXISTS table1 on cluster1
(
    CounterID UInt32,
    Date Date,
    Category String
) ENGINE = MergeTree()
ORDER BY (CounterID)

The data will be sorted by CounterID. so, when the CH access granules(.idx), It can easily find the data(.bin)

but, If I add index like ALTER TABLE table1 ADD INDEX idx1 (Category) TYPE set(0) GRANULARITY 1;, I do not know How to make index. Because the data already is sorted by CounterID.

Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?

question question-answered

Most helpful comment

related: #329

In ClickHouse, the data is only stored in order of the order by clause and primary key info is generated based on the primary key clause(default same to order by clause).
When executing query, the primary key index always works on the granules level (filtering or retaining a granule).

For the skipping index, it also works on the granules level(secondary filtering after the primary key has been filtered). It stores information that helps filter granule
without changing the order of the data store. For example, the minmax index always records the minimum and maximum value for each granule.

but, If I add index like ALTER TABLE table1 ADD INDEX idx1 (Category) TYPE set(0) GRANULARITY 1;, I do not know How to make index. Because the data already is sorted by CounterID.

Index information already exists for the newly written data.
For previously written data, you need to wait for the merge task(which is asynchronous) updates index information,
but you can also use the following statement to update(synchronously) the index information for all data.

sql OPTIMIZE TABLE xxx FINAL;

Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?

As far as I know, modifying the order by clause is not currently supported. I remember there was an issue about it. but I couldn't find it. @alexey-milovidov

All 4 comments

related: #329

In ClickHouse, the data is only stored in order of the order by clause and primary key info is generated based on the primary key clause(default same to order by clause).
When executing query, the primary key index always works on the granules level (filtering or retaining a granule).

For the skipping index, it also works on the granules level(secondary filtering after the primary key has been filtered). It stores information that helps filter granule
without changing the order of the data store. For example, the minmax index always records the minimum and maximum value for each granule.

but, If I add index like ALTER TABLE table1 ADD INDEX idx1 (Category) TYPE set(0) GRANULARITY 1;, I do not know How to make index. Because the data already is sorted by CounterID.

Index information already exists for the newly written data.
For previously written data, you need to wait for the merge task(which is asynchronous) updates index information,
but you can also use the following statement to update(synchronously) the index information for all data.

sql OPTIMIZE TABLE xxx FINAL;

Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?

As far as I know, modifying the order by clause is not currently supported. I remember there was an issue about it. but I couldn't find it. @alexey-milovidov

Thanks for information. but, if you find the answer of below question,

Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?

Please comment on this issue.

Does the Clickhouse re-sort the data to make granule`s info? or uses the data already sorted by primary key?

ClickHouse only sorts data by order key (primary key). And the granules of secondary indices cover existing ranges of primary key.

As far as I know, modifying the order by clause is not currently supported. I remember there was an issue about it.

There was support for extending ORDER BY key and adding a new column simultaneously. As far as I remember, it was implemented only for non-replicated MergeTree tables by feature request of one internal user and was removed.

Was this page helpful?
0 / 5 - 0 ratings