Clickhouse: ON CLUSTER doesn't work with hosts in multiple shards

Created on 25 Sep 2017  Β·  11Comments  Β·  Source: ClickHouse/ClickHouse

The queries across the whole cluster don't work when a host has multiple shards:

DROP TABLE test ON CLUSTER 'cluster1'

click1  9000    371 An error occured before execution: Code: 371, e.displayText() = DB::Exception: There are two exactly the same ClickHouse instances click1:9000 in cluster cluster1, e.what() = DB::Exception    32  0

The host has this indeed three replicas of different shards:

SELECT *
  FROM system.clusters
 WHERE host_name = 'click1'


β”Œβ”€cluster──┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┐
β”‚ cluster1 β”‚        28 β”‚            1 β”‚           1 β”‚ click1    β”‚ 10.0.10.100  β”‚ 9000 β”‚        0 β”‚ default β”‚ r0               β”‚
β”‚ cluster1 β”‚        29 β”‚            1 β”‚           2 β”‚ click1    β”‚ 10.0.10.100  β”‚ 9000 β”‚        0 β”‚ default β”‚ r1               β”‚
β”‚ cluster1 β”‚        30 β”‚            1 β”‚           3 β”‚ click1    β”‚ 10.0.10.100  β”‚ 9000 β”‚        0 β”‚ default β”‚ r2               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

There's really no reason for this check to exist, if you want to make sure it's not a configuration error, you should perhaps use host_name:port/default_database for uniqueness check?

cc @ludv1x

comp-dddl enhancement

All 11 comments

Hi!
Indeed, for the simplicity of the first distributed DDL version, I intentionally skipped implementation of the case when a server has several shards in different databases. I thought it is not the widespread case.

It is not complicated to add support of this case.

Ah I see, supporting this use case would be wonderful πŸ‘Œ

So my understanding is that only ALTER queries for replicated tables are problematic, since some require you to execute the query only on leader replica, and some require you to execute the query on any single replica.

For CREATE | ATTACH | DROP | DETACH it should be executed on distinct server instances regardless of how many shards do they have. For example in my case, I have 3 shards in databases r{0,1,2}.
If write:

DROP TABLE r0.test ON CLUSTER 'cluster1'

I expect r0.test to be dropped on all hosts, and I don't expect other replicas or shards to be affected.
If the database isn't specified, I'd expect it to be current database, instead of expanding to default_database on each host.

If this is your planned behavior, then I'd be happy to submit a PR. Or did you plan to execute the queries without explictly stated database on each host_name:port/default_database instead (just like distributed view executes)?

We also have multiple shards and host a replica of each shard on every node, which basically makes ON CLUSTER requests unusable...

Some related talks from Telegram channel:

Alessandro Rizzo:
Ok. During ReplicatedMergeTree tables creation, I need to set ZK path and replica name.
But with 3 nodes sharded and 2 replicas for each one, I will have a structure like that:

node1: shard1replica1, shard3replica2
node2: shard2replica1, shard1replica2
node3: shard3replica1, shard2replica2

Does replica name on macros "break" with that?
Because I configured macros on node like
shard 01
replica 01

shard 02
replica 03

shard 03
replica 03

Does it makes sense?

Ivan Blinkov:
There was some workaround to do it like this, but unfortunately I can't reproduce it from top of my head (probably some1 else in the chat can).

It's easier to just use 6 nodes to avoid this colocation.

Alex Zatelepin:
That's called "cross replication". We have a test with sample configuration for this case: https://github.com/yandex/ClickHouse/blob/master/dbms/tests/integration/test_cross_replication/test.py

Alexander Zaitsev:
You may look at this example as well: https://www.altinity.com/blog/2018/5/10/circular-replication-cluster-topology-in-clickhouse

Mikhail Filimonov:
Currently it's a bit too tricky, may be it would be better to make it a bit more straight-ahead before making that recommendations 'official'? For example it would be much clearer if some macros values could be extracted from database name and hostname. I.e. database = test_myshard1 and macro like {shard_name} = {DATABASE:5:8} and {replica} = {HOSTNAME}, afterward macros would be used in table definition, and ON CLUSTER should work. Also it should be quite easy to implement, and afterwards it will be easy to configure clickhouse for usage like this.

Can we expect this feature in nearby future?

We are also facing this issue. Does anyone knows any plans on the fix

Same issue. We have to truncate some tables periodically and it would be much more convenient if our apps wouldn't know anything about cluster shards and their tables - only cluster name.

I have finished a patch to support distributed ddl(currently, only delete/updatet/drop partition operations have been tested) on cross replication clusters。 distributed-alter-query。For distributed ddl on crosss replication clusters, the syntax should be similar to the following "drop" statement:
"alter table db1.tab1_local on cluster cluster1 drop partition '2018-01-01';"

On other clusters, they remain the same as before.

Fixed in #11508 & #11703

Was this page helpful?
0 / 5 - 0 ratings