Clickhouse: How the index file works

Created on 24 Oct 2019  路  1Comment  路  Source: ClickHouse/ClickHouse

I want to know How index work except for primary key.

When the tables is created, data is sorted by primary key. and primary.idx file is created which has information for granules.

so, when I send a select query with using primary key, the clickhouse find granules in the primary.idx and use (column).mrk2 to find data in the (column).bin. Do I understand right?

Then......

When I add an index using an alter table query, skp_idx_(index_name).idx and skp_idx_(index_name).mrk2 files are created. what roles of these files??

And I have to use 'optimize final' for using index right away.

However, These index files(.idx and .mrks) is created a little later. but I can use a new index by the way.

How these works??

question

Most helpful comment

Imagine you've created a table
create table T(A Int, B Int) order by A
insert into T values(1,1) --- first part no skip index
alter table add index min_max(B);
insert into T values(1,2) --- second part with skip index file with B_min_max=(2,2)

And you run a query select * from T where B>10000000
Ch unable to use primary index to filter rows because there is no conditions on column A.
Then CH will scan 2 parts. The first part does not have skip index file so CH will read column B and checks the condition B>10000000. In the contrary in the second part CH will use B_min_max=(2,2) index file and will skip a reading of column B because obviously this part does not have rows with B>10000000.
Later a merge process will merge these 2 parts to a new one with skip index B_min_max=(1,2).

>All comments

Imagine you've created a table
create table T(A Int, B Int) order by A
insert into T values(1,1) --- first part no skip index
alter table add index min_max(B);
insert into T values(1,2) --- second part with skip index file with B_min_max=(2,2)

And you run a query select * from T where B>10000000
Ch unable to use primary index to filter rows because there is no conditions on column A.
Then CH will scan 2 parts. The first part does not have skip index file so CH will read column B and checks the condition B>10000000. In the contrary in the second part CH will use B_min_max=(2,2) index file and will skip a reading of column B because obviously this part does not have rows with B>10000000.
Later a merge process will merge these 2 parts to a new one with skip index B_min_max=(1,2).

Was this page helpful?
0 / 5 - 0 ratings