Clickhouse: Slow starting up speed on servers with almost the same configuration and less data

Created on 7 Apr 2020  Â·  7Comments  Â·  Source: ClickHouse/ClickHouse

I've upgrade my clickhouse to version 20.3.5.
There are two different cluster A and B.
All nodes' clickhouse share the same config.xml and users.xml.
Nodes in these two cluster have the same memory size.
Nodes in cluster A have 2 physical cores and 48 logical cores.
Nodes cluster B has 2 physical cores and 20 logical cores.

Data in cluster A's nodes is 4 times of cluster B's nodes. Data reachs TB level.
And now starting up clickhouse, Node in A just cost about 1 minutes and Node in B will cost about 15-20 minutes before we can connect to it using clickhouse-client.

In general, I think node in A maybe start faster than B, but don't be so much worse.

So, what could be the reason? Is there some config could improve starting up performance?
Thanks~

performance

Most helpful comment

Yes. Probably it's because of the number of parts 44000 + 20 logical cores
CH iterates through 44000 parts using 20 cores.

Try to set mergetree parameter max_part_loading_threads (auto=20 cores) to 48.

config.xml

    <merge_tree>
        <max_part_loading_threads>48</max_part_loading_threads>
    </merge_tree>

All 7 comments

The question is number of parts in both systems. Check count in system.parts

@filimonov I've checked the count of parts in these two clusters by select count() from system.parts.

Nodes in cluster A have about 16000 parts, and in B have about 44000 parts.
Most tables in A are partitioned by Month,and in B are partitioned by day. I think it's the reason leading to less parts in A.

But according to the count of parts, is it the core cause of the slowness? I mean A‘s parts are just about 3 times of B's.

Number of columns also impacts. Check this select count() from system.parts_columns for A and B.
Do you use AWS EBS ? It can be that IOPS are limited by 500 (EBS GP2).

Number of columns also impacts. Check this select count() from system.parts_columns for A and B

@den-crane I've checked, nodes in A have about 410,000 and in B have about 1,000,000.

So the influence is the result of a combination of various factors?

Do you use AWS EBS ? It can be that IOPS are limited by 500 (EBS GP2).

No, AWS EBS is not used.

Yes. Probably it's because of the number of parts 44000 + 20 logical cores
CH iterates through 44000 parts using 20 cores.

Try to set mergetree parameter max_part_loading_threads (auto=20 cores) to 48.

config.xml

    <merge_tree>
        <max_part_loading_threads>48</max_part_loading_threads>
    </merge_tree>

Try to set mergetree parameter max_part_loading_threads (auto=20 cores) to 48.

@den-crane Thanks~, I've tried this, and it does improve the starting speed to 7-8 minutes.
So, what's the proper value for this setting, does it depend on the both logical cores and system parts?

I also notice that when clickhouse starting, the cpu utilization is not very high. Just about 10% of all cores as top command shows. Does it mean that clickhouse doesn't use cpu cores well or just the general clickhouse load?

The default value max_part_loading_threads = auto (==logical cores) is set assuming that a system has a small number of large parts (<1000) and slow HDD disks. You can tune it for your particular system and set as you need ~ 8 / 64 / 96 / 333.

Was this page helpful?
0 / 5 - 0 ratings