Clickhouse: Disable query complexity checks for distributed queries on root (initial) node

Created on 30 Jul 2020  路  3Comments  路  Source: ClickHouse/ClickHouse

Hello! I use max_rows_to_read query limit to decide whether to read data from table or discard the query. The table is distributed, so it distribute queries to multiple shards. The behaviour I observe is that max_rows_to_read is checked on all the shard nodes, and then checked once again on root (the caller) node while merging response. And this second check could also result into exception of max_rows_to_read. I find this counterproductive since all the "hard" work of reading data from disk was already done by shard nodes only to be discarded then by root node.
My question is whether it is possible to enforce query complexity checks on leaf nodes only? Is this a kind of feature that project could be interested in?

feature question

Most helpful comment

This feature will be nice to have.

The current behaviour (limit based on cumulative counter across all nodes) is correct, it's natural and it works as expected.
We should not change the current behaviour but we can introduce another setting.

Variants:

  • another group of settings similar to max_rows_to_read, max_bytes_to_read but applied only on leaf nodes (more flexible);
  • a setting that will alter the meaning of max_rows_to_read, max_bytes_to_read.

All 3 comments

This feature will be nice to have.

The current behaviour (limit based on cumulative counter across all nodes) is correct, it's natural and it works as expected.
We should not change the current behaviour but we can introduce another setting.

Variants:

  • another group of settings similar to max_rows_to_read, max_bytes_to_read but applied only on leaf nodes (more flexible);
  • a setting that will alter the meaning of max_rows_to_read, max_bytes_to_read.

What if we add a special prefix to all the settings leaf_ which will be renamed by the root executor before sending them to leafs and it will ignore itself? With this approach we will be able to send limits for both roots and leafs.

What if we add a special prefix to all the settings leaf_

Let's made it suffix: max_rows_to_read_leaf :)

which will be renamed by the root executor before sending them to leafs and it will ignore itself

Let's don't rename anything. Just send settings as usual, apply along with max_rows_to_read but don't apply for the progress from Remote source.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

igor-sh8 picture igor-sh8  路  3Comments

zhicwu picture zhicwu  路  3Comments

bseng picture bseng  路  3Comments

hatarist picture hatarist  路  3Comments

amonakhov picture amonakhov  路  3Comments