Clickhouse: Queries Are Not Being Distributed Over The Replicas

Created on 7 Aug 2019 · 12Comments · Source: ClickHouse/ClickHouse

Hello,

I've set up ClickHouse with 3 replicas for each table. I wrote an API for test purposes which queries the CH. CH is supposed to distribute all the queries but it did not work as expected. It is only querying the CH node which the API is connected. I checked users.xml, changed load balancing to 'in_order' from 'random', it still did not work. By the way data is being replicated accurately, there is no problem with that.
I can see the replicas when I query 'select * from system.clusters', it looks okay as well.

Here is my create table queries :
CREATE TABLE t1 on cluster 'c1' (custUInt32,ttUInt32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/t1', '{replica}') ORDER BY cust

CREATE TABLE t1_dist on cluster 'c1' as t1 ENGINE = Distributed(c1, default, t1, rand());

Here is my select query :

select cust, count() as cnt from t1_dist where tt = 100 group by cust

Do I have to make any specific configuration for CH to distribute the queries?

question

Source

MeteHanC

👀2

Most helpful comment

it's a bad idea to send all queries through one server. You should use some load balancer in front of CH and query all nodes (round-robin or random). Try ha-proxy or ch-proxy (https://github.com/Vertamedia/chproxy).

It's bad for high availability.

It's bad for performance because you entrance node re-send a query to another node and download a query result and re-send the result -- double network overhead.

It's bad for performance because you entrance node -- initiator does all final aggregations for shards results -- so your initiator overloaded by CPU/RAM in comparison with all other nodes.
(Though sometimes it has sense if one node has more RAM than others).

den-crane on 7 Aug 2019

👍2

All 12 comments

did you try set prefer_localhost_replica = 0 ?

javisantana on 7 Aug 2019

👍2

did you try set prefer_localhost_replica = 0 ?

I set prefer_localhost_replica to 0 as you suggested but still no luck. Tried setting 'load_balancing' in users.xml to 'random' again, but it did not work as well.

MeteHanC on 7 Aug 2019

Okay, so I've set prefer_localhost_replica to 0 on CLI, but it seems to not saving the setting. I closed the CLI and re-opened it to double check, I saw that 'prefer_localhost_replica' is re-set to 0.

I have solved the issue by adding <prefer_localhost_replica>0</prefer_localhost_replica> line to users.xml. I will open a new issue about set prefer_localhost_replica = 0 is only being executed for the current session. If it is not the expected behaviour

MeteHanC on 7 Aug 2019

It's bad for high availability.

It's bad for performance because you entrance node re-send a query to another node and download a query result and re-send the result -- double network overhead.

den-crane on 7 Aug 2019

👍2

it's a bad idea to send all queries through one server. You should use some load balancer in front of CH and query all nodes (round-robin or random). Try ha-proxy or ch-proxy (https://github.com/Vertamedia/chproxy).

It's bad for high availability.

It's bad for performance because you entrance node re-send a query to another node and download a query result and re-send the result -- double network overhead.

It's bad for performance because you entrance node -- initiator does all final aggregations for shards results -- so your initiator overloaded by CPU/RAM in comparison with all other nodes.
(Though sometimes it has sense if one node has more RAM than others).

Yes, we are thinking about using ha-proxy but I don't want to get to that stage without making sure of the cluster&replica setup. Thanks for the warning though. By the way round-robin is 'in_order' attribute in the settings right?

MeteHanC on 7 Aug 2019

By the way round-robin is 'in_order' attribute in the settings right?

No. https://clickhouse.yandex/docs/en/operations/settings/settings/#load_balancing-in_order
the same order as they are specified in configuration

            <replica>
                <host>first</host><port>9000</port>
            </replica>
            <replica>
                <host>second</host><port>9000</port>
            </replica>

den-crane on 7 Aug 2019

By the way round-robin is 'in_order' attribute in the settings right?

No. https://clickhouse.yandex/docs/en/operations/settings/settings/#load_balancing-in_order
the same order as they are specified in configuration

So you mean I should not let CH to distribute queries and query the nodes raund-robin with using ha-proxy or ch-proxy. So I will be querying the local tables instead of distributed ones, did I get you right?

By the way at the moment I am having another issue, when I restart a CH replica under load it does not resume to get queries and act like a cold replica. It resumes to get queries if I restart the CH on the server which my API is connected. Weird.

MeteHanC on 7 Aug 2019

By the way at the moment I am having another issue, when I restart a CH replica under load it does not resume to get queries and act like a cold replica. It resumes to get queries if I restart the CH on the server which my API is connected. Weird.

This is being fixed with #5317 now I think, it is not a bug but it just takes some time for the replica to get back online again because of the error count

MeteHanC on 7 Aug 2019

👍1

So you mean I should not let CH to distribute queries and query the nodes raund-robin with using ha-proxy or ch-proxy. So I will be querying the local tables instead of distributed ones, did I get you right?

Right.

By the way at the moment I am having another issue, when I restart a CH replica under load it does not resume to get queries and act like a cold replica. It resumes to get queries if I restart the CH on the server which my API is connected. Weird.

Yes, this is expected behavior. You can execute [touch cluster_config.xml ] to reset error counters until #5317 is not implemented

den-crane on 7 Aug 2019

Yes, this is expected behavior. You can execute [touch cluster_config.xml ] to reset error counters until #5317 is not implemented

I see, so when I restart the clickhouse cluster on the leader node, it resets error counters as well, because it gets fixed with that way too. Thanks a lot for your time man, I appreciate it!

MeteHanC on 7 Aug 2019

So you mean I should not let CH to distribute queries and query the nodes raund-robin with using ha-proxy or ch-proxy. So I will be querying the local tables instead of distributed ones, did I get you right?

Right.

By the way about this one, I won't have replicas only, I will have shards too. So not letting CH to distribute the queries and querying the local tables manually, I will have to query all the shards and merge the data when I have more than 1 shards. So looks like it is not a good way to go with if you have shards along with the replicas right?

MeteHanC on 7 Aug 2019

How to set load_balancing it depends of your cluster design.

I use load_balancing = nearest_hostname because I have a geo-cluster (replicas are far (in distance) from each other (high latency/low bandwidth).
Shards have similar host-names inside DC.
And nearest_hostname allows to query (preferably) local (in current DC) shards-nodes.
In case of local shard/replica fail distributed table requests replica from another DC.

den-crane on 7 Aug 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings