Clickhouse: Clarify docs about key manipulation

Created on 14 Feb 2019  路  7Comments  路  Source: ClickHouse/ClickHouse

Can you please clarify documentation of sorting key? https://clickhouse.yandex/docs/en/operations/table_engines/mergetree/#choosing-the-primary-key-that-differs-from-the-sorting-key

ALTER of the sorting key is a lightweight operation because when a new column is simultaneously added to the table and to the sorting key data parts need not be changed (they remain sorted by the new sorting key expression).

I'm not sure what it meant because the english there is a bit confusing:

  1. when a new column is simultaneously added to the table
  2. and to the sorting key
  3. data parts need not be changed
  4. (they remain sorted by the new sorting key expression).

Ad 3.: Shouldn't it be "data parts don't need to be changed"?

Ad 4.: When I add a new column to the sorting key, I suppose the result is probably a new sorting key.
So how it is that data parts are not changed or how can they remain sorted by new sorting key? Shouldn't it be the old sorting key? I guess they cannot remain sorted by new sorting key as they were sorted by old sorting key before the operation.


A second question: Do I understand correctly that for SummingMergeTree the sorting key defines how values are summed? And that I can then use partitioning key and primary key for more efficient filtering. Am I right?

Thanks a lot!

question

All 7 comments

Check this note in docs

And have a closer look at this test:
https://github.com/yandex/ClickHouse/blob/cec49357da502c511e006ea8b5d8fbffeb521478/dbms/tests/queries/0_stateless/00754_alter_modify_order_by.sql#L11-L25

Especially pay attention to:
https://github.com/yandex/ClickHouse/blob/cec49357da502c511e006ea8b5d8fbffeb521478/dbms/tests/queries/0_stateless/00754_alter_modify_order_by.sql#L25

All of this says that sorting key must me appended at the same time as adding new column.

Re-sorting of existing rows is useless because existing rows can't contain any value from new column.

Good to know! This should be IMO added to the docs in some form.

But still, this sentence is wrong I think, it makes no sense:

data parts need not be changed (they remain sorted by the new sorting key expression).

Shouldn't it be like this?

- data parts need not be changed (they remain sorted by the new sorting key expression).
+ existing data parts do not need to be changed (they remain sorted by the old sorting key expression).

@simPod since the old sorting key is a prefix of the new sorting key, and there being no data in the just added column, the data at the moment of table modification is both sorted by the old and the new sorting key and hence wont need any modification.

I also needed a moment to understand the documention in this regard but it makes sense.

I think the word "remain" contributes to the difficulty of understanding this section. I would rephrase it to say something along the lines of "they already are sorted by the new sorting key expression".

@arctica yes, that makes sense! Something like

- data parts need not be changed (they remain sorted by the new sorting key expression).
+ existing data parts do not need to be changed. The new sorting key is actually extended old sorting key and therefore the data are already sorted by the new sorting expression.

@simPod, do you have any further questions?

@blinkov Hi, yes. WDYT of my last comment?

@simPod I see that the documentation was edited according to your suggestions.

Was this page helpful?
0 / 5 - 0 ratings