Clickhouse: How to configure multiple layers for a single cluster?

Created on 18 Feb 2019  路  1Comment  路  Source: ClickHouse/ClickHouse

I've read the documentation about layer.

doc: https://clickhouse.yandex/docs/en/operations/table_engines/distributed/

A large number of servers is used (hundreds or more) with a large number of small queries (queries of individual clients - websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, as we've done in Yandex.Metrica, you can set up bi-level sharding: divide the entire cluster into "layers", where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. Distributed tables are created for each layer, and a single shared distributed table is created for global queries.

If I want to configure multiple layers, then Do I fix in config.xml?

How to configure multiple layers for a single cluster?

What are the arguments for the Distributed table engines for the specific layer?

question

Most helpful comment

So you have 400 servers in your cluster, divide them into 20 sub-clusters.
Servers 1..20 will be members of sub_cluster_l1 and members of super-cluster cluster (with all 400 servers) the same time.
Put data of clients with names A.. into cluster_l1 and clients with names Z.. into sub_cluster_l20
Create 21 distributed tables, 20 for each subcluster and 1 for super-cluster.
If you need to select data for client A.. point your query to subcluster_1 (distributed table)
If you need to select data over all clients point your query to super-cluster (distributed table)

What are the arguments for the Distributed table engines for the specific layer?

Just create several Distributed tables with different cluster specified.
A table (ch-server) can be a member of several clusters.

>All comments

So you have 400 servers in your cluster, divide them into 20 sub-clusters.
Servers 1..20 will be members of sub_cluster_l1 and members of super-cluster cluster (with all 400 servers) the same time.
Put data of clients with names A.. into cluster_l1 and clients with names Z.. into sub_cluster_l20
Create 21 distributed tables, 20 for each subcluster and 1 for super-cluster.
If you need to select data for client A.. point your query to subcluster_1 (distributed table)
If you need to select data over all clients point your query to super-cluster (distributed table)

What are the arguments for the Distributed table engines for the specific layer?

Just create several Distributed tables with different cluster specified.
A table (ch-server) can be a member of several clusters.

Was this page helpful?
0 / 5 - 0 ratings