Clickhouse: Disable query complexity checks for distributed queries on root (initial) node

Created on 30 Jul 2020 · 3Comments · Source: ClickHouse/ClickHouse

Hello! I use max_rows_to_read query limit to decide whether to read data from table or discard the query. The table is distributed, so it distribute queries to multiple shards. The behaviour I observe is that max_rows_to_read is checked on all the shard nodes, and then checked once again on root (the caller) node while merging response. And this second check could also result into exception of max_rows_to_read. I find this counterproductive since all the "hard" work of reading data from disk was already done by shard nodes only to be discarded then by root node.
My question is whether it is possible to enforce query complexity checks on leaf nodes only? Is this a kind of feature that project could be interested in?

feature question

Source

hagen1778

👍3

Most helpful comment

This feature will be nice to have.

The current behaviour (limit based on cumulative counter across all nodes) is correct, it's natural and it works as expected.
We should not change the current behaviour but we can introduce another setting.

Variants:

another group of settings similar to max_rows_to_read, max_bytes_to_read but applied only on leaf nodes (more flexible);
a setting that will alter the meaning of max_rows_to_read, max_bytes_to_read.

alexey-milovidov on 30 Jul 2020

👍3

All 3 comments

This feature will be nice to have.

Variants:

another group of settings similar to max_rows_to_read, max_bytes_to_read but applied only on leaf nodes (more flexible);
a setting that will alter the meaning of max_rows_to_read, max_bytes_to_read.

alexey-milovidov on 30 Jul 2020

👍3

What if we add a special prefix to all the settings leaf_ which will be renamed by the root executor before sending them to leafs and it will ignore itself? With this approach we will be able to send limits for both roots and leafs.