Hello! I use max_rows_to_read query limit to decide whether to read data from table or discard the query. The table is distributed, so it distribute queries to multiple shards. The behaviour I observe is that max_rows_to_read is checked on all the shard nodes, and then checked once again on root (the caller) node while merging response. And this second check could also result into exception of max_rows_to_read. I find this counterproductive since all the "hard" work of reading data from disk was already done by shard nodes only to be discarded then by root node.
My question is whether it is possible to enforce query complexity checks on leaf nodes only? Is this a kind of feature that project could be interested in?
This feature will be nice to have.
The current behaviour (limit based on cumulative counter across all nodes) is correct, it's natural and it works as expected.
We should not change the current behaviour but we can introduce another setting.
Variants:
What if we add a special prefix to all the settings leaf_ which will be renamed by the root executor before sending them to leafs and it will ignore itself? With this approach we will be able to send limits for both roots and leafs.
What if we add a special prefix to all the settings leaf_
Let's made it suffix: max_rows_to_read_leaf :)
which will be renamed by the root executor before sending them to leafs and it will ignore itself
Let's don't rename anything. Just send settings as usual, apply along with max_rows_to_read but don't apply for the progress from Remote source.
Most helpful comment
This feature will be nice to have.
The current behaviour (limit based on cumulative counter across all nodes) is correct, it's natural and it works as expected.
We should not change the current behaviour but we can introduce another setting.
Variants: