Hello,
I've set up ClickHouse with 3 replicas for each table. I wrote an API for test purposes which queries the CH. CH is supposed to distribute all the queries but it did not work as expected. It is only querying the CH node which the API is connected. I checked users.xml, changed load balancing to 'in_order' from 'random', it still did not work. By the way data is being replicated accurately, there is no problem with that.
I can see the replicas when I query 'select * from system.clusters', it looks okay as well.
Here is my create table queries :
CREATE TABLE t1 on cluster 'c1' (custUInt32,ttUInt32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/t1', '{replica}') ORDER BY cust
CREATE TABLE t1_dist on cluster 'c1' as t1 ENGINE = Distributed(c1, default, t1, rand());
Here is my select query :
select cust, count() as cnt
from t1_dist
where tt = 100
group by cust
Do I have to make any specific configuration for CH to distribute the queries?
did you try set prefer_localhost_replica = 0 ?
did you try
set prefer_localhost_replica = 0?
I set prefer_localhost_replica to 0 as you suggested but still no luck. Tried setting 'load_balancing' in users.xml to 'random' again, but it did not work as well.
Okay, so I've set prefer_localhost_replica to 0 on CLI, but it seems to not saving the setting. I closed the CLI and re-opened it to double check, I saw that 'prefer_localhost_replica' is re-set to 0.
I have solved the issue by adding <prefer_localhost_replica>0</prefer_localhost_replica> line to users.xml. I will open a new issue about set prefer_localhost_replica = 0 is only being executed for the current session. If it is not the expected behaviour
it's a bad idea to send all queries through one server. You should use some load balancer in front of CH and query all nodes (round-robin or random). Try ha-proxy or ch-proxy (https://github.com/Vertamedia/chproxy).
It's bad for high availability.
It's bad for performance because you entrance node re-send a query to another node and download a query result and re-send the result -- double network overhead.
It's bad for performance because you entrance node -- initiator does all final aggregations for shards results -- so your initiator overloaded by CPU/RAM in comparison with all other nodes.
(Though sometimes it has sense if one node has more RAM than others).
it's a bad idea to send all queries through one server. You should use some load balancer in front of CH and query all nodes (round-robin or random). Try ha-proxy or ch-proxy (https://github.com/Vertamedia/chproxy).
It's bad for high availability.
It's bad for performance because you entrance node re-send a query to another node and download a query result and re-send the result -- double network overhead.
It's bad for performance because you entrance node -- initiator does all final aggregations for shards results -- so your initiator overloaded by CPU/RAM in comparison with all other nodes.
(Though sometimes it has sense if one node has more RAM than others).
Yes, we are thinking about using ha-proxy but I don't want to get to that stage without making sure of the cluster&replica setup. Thanks for the warning though. By the way round-robin is 'in_order' attribute in the settings right?
By the way round-robin is 'in_order' attribute in the settings right?
No. https://clickhouse.yandex/docs/en/operations/settings/settings/#load_balancing-in_order
the same order as they are specified in configuration
<replica>
<host>first</host><port>9000</port>
</replica>
<replica>
<host>second</host><port>9000</port>
</replica>
By the way round-robin is 'in_order' attribute in the settings right?
No. https://clickhouse.yandex/docs/en/operations/settings/settings/#load_balancing-in_order
the same order as they are specified in configurationSo you mean I should not let CH to distribute queries and query the nodes raund-robin with using ha-proxy or ch-proxy. So I will be querying the local tables instead of distributed ones, did I get you right?
By the way at the moment I am having another issue, when I restart a CH replica under load it does not resume to get queries and act like a cold replica. It resumes to get queries if I restart the CH on the server which my API is connected. Weird.
By the way at the moment I am having another issue, when I restart a CH replica under load it does not resume to get queries and act like a cold replica. It resumes to get queries if I restart the CH on the server which my API is connected. Weird.
This is being fixed with #5317 now I think, it is not a bug but it just takes some time for the replica to get back online again because of the error count
So you mean I should not let CH to distribute queries and query the nodes raund-robin with using ha-proxy or ch-proxy. So I will be querying the local tables instead of distributed ones, did I get you right?
Right.
By the way at the moment I am having another issue, when I restart a CH replica under load it does not resume to get queries and act like a cold replica. It resumes to get queries if I restart the CH on the server which my API is connected. Weird.
Yes, this is expected behavior. You can execute [touch cluster_config.xml ] to reset error counters until #5317 is not implemented
Yes, this is expected behavior. You can execute [touch cluster_config.xml ] to reset error counters until #5317 is not implemented
I see, so when I restart the clickhouse cluster on the leader node, it resets error counters as well, because it gets fixed with that way too. Thanks a lot for your time man, I appreciate it!
So you mean I should not let CH to distribute queries and query the nodes raund-robin with using ha-proxy or ch-proxy. So I will be querying the local tables instead of distributed ones, did I get you right?
Right.
By the way about this one, I won't have replicas only, I will have shards too. So not letting CH to distribute the queries and querying the local tables manually, I will have to query all the shards and merge the data when I have more than 1 shards. So looks like it is not a good way to go with if you have shards along with the replicas right?
How to set load_balancing it depends of your cluster design.
I use load_balancing = nearest_hostname because I have a geo-cluster (replicas are far (in distance) from each other (high latency/low bandwidth).
Shards have similar host-names inside DC.
And nearest_hostname allows to query (preferably) local (in current DC) shards-nodes.
In case of local shard/replica fail distributed table requests replica from another DC.
Most helpful comment
it's a bad idea to send all queries through one server. You should use some load balancer in front of CH and query all nodes (round-robin or random). Try ha-proxy or ch-proxy (https://github.com/Vertamedia/chproxy).
It's bad for high availability.
It's bad for performance because you entrance node re-send a query to another node and download a query result and re-send the result -- double network overhead.
It's bad for performance because you entrance node -- initiator does all final aggregations for shards results -- so your initiator overloaded by CPU/RAM in comparison with all other nodes.
(Though sometimes it has sense if one node has more RAM than others).