Hello.
I have a small question about how the config setting "in_order" and replicated table state "is_leader" is correlate.
Example of config of the one shard:
<replica>
<host>ch1</host>
<port>9000</port>
</replica>
<replica>
<host>ch2</host>
<port>9000</port>
</replica>
As I understand the "in_order" config allow us to use the first server (ch1) as a main for "SELECT" queries.
When I run a query on a table on such shard "SELECT * FROM system.replicas WHERE table = 'table1'" I see that "is_leader" set to ch2 server.
So, the question is how "in_order" and "is_leader" correlate ? Is it normal, or maybe "in_order" is counted from the end ?
in_order - defines a strategy of choosing replica to query. I.e. when you're sending select to Distibuted table it should choose one replica from each shard to ask for a data. in_order says that it should always prefer first listed in config.
So it defines how Distributed table work.
Leader in clickhouse is connected with Replicated*MergeTree tables work. It is one of the replicas which is choosen with help of zookeeper to define which parts should be merged. You can forbid certain replica to became a leader with replicated_can_become_leader setting. And the only thing leader is responsible - is choosing which parts should be merged.
So once again:
1) replication can work without Distibuted table
2) if you have multiple replicas of your table one replica should be a leader, but you don't need to care about that, as leader doesn't do something CPU intensive or doesn't have any advantages for enduser. Leader only choose which parts should be merged.
3) you can select / insert data to any replica. You don't care who is the leader.
4) if leader replica will go away another replica will become a leader automatically.
4) Distibuted table can work as loadbalancer between multiple replicas, and can decide which replica to ask. There are different strategies for that, one of them is 'in_order'.
Great.
Thanks for the detailed explanation.
Most helpful comment
in_order- defines a strategy of choosing replica to query. I.e. when you're sending select to Distibuted table it should choose one replica from each shard to ask for a data. in_order says that it should always prefer first listed in config.So it defines how Distributed table work.
Leader in clickhouse is connected with Replicated*MergeTree tables work. It is one of the replicas which is choosen with help of zookeeper to define which parts should be merged. You can forbid certain replica to became a leader with replicated_can_become_leader setting. And the only thing leader is responsible - is choosing which parts should be merged.
So once again:
1) replication can work without Distibuted table
2) if you have multiple replicas of your table one replica should be a leader, but you don't need to care about that, as leader doesn't do something CPU intensive or doesn't have any advantages for enduser. Leader only choose which parts should be merged.
3) you can select / insert data to any replica. You don't care who is the leader.
4) if leader replica will go away another replica will become a leader automatically.
4) Distibuted table can work as loadbalancer between multiple replicas, and can decide which replica to ask. There are different strategies for that, one of them is 'in_order'.