Clickhouse: Introduce a setting throw_on_neighbor_several_blocks

Created on 28 Oct 2020  Β·  7Comments  Β·  Source: ClickHouse/ClickHouse

neighbor / runningDifference / runningAccumulate should fail when setting is enabled and more that one block was returned.

Motivation: report the error early. Prevent function misuse.

P.S. Better name for that?

feature

Most helpful comment

We can simply add a setting that will disable these functions by default.

I agree. They are almost unusable especially with a new sorting behavior.

create table A (A Int64) engine=MergeTree order by A;

insert into A values (1);
insert into A values (2);
insert into A values (3);

select A, neighbor(A,1) from (select A from A order by A)
β”Œβ”€A─┬─neighbor(A, 1)─┐
β”‚ 1 β”‚              0 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€A─┬─neighbor(A, 1)─┐
β”‚ 2 β”‚              0 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€A─┬─neighbor(A, 1)─┐
β”‚ 3 β”‚              0 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

optimize table A final;

select A, neighbor(A,1) from (select A from A order by A)
β”Œβ”€A─┬─neighbor(A, 1)─┐
β”‚ 1 β”‚              2 β”‚
β”‚ 2 β”‚              3 β”‚
β”‚ 3 β”‚              0 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

All 7 comments

We can simply add a setting that will disable these functions by default.

We can simply add a setting that will disable these functions by default.

I agree. They are almost unusable especially with a new sorting behavior.

create table A (A Int64) engine=MergeTree order by A;

insert into A values (1);
insert into A values (2);
insert into A values (3);

select A, neighbor(A,1) from (select A from A order by A)
β”Œβ”€A─┬─neighbor(A, 1)─┐
β”‚ 1 β”‚              0 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€A─┬─neighbor(A, 1)─┐
β”‚ 2 β”‚              0 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€A─┬─neighbor(A, 1)─┐
β”‚ 3 β”‚              0 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

optimize table A final;

select A, neighbor(A,1) from (select A from A order by A)
β”Œβ”€A─┬─neighbor(A, 1)─┐
β”‚ 1 β”‚              2 β”‚
β”‚ 2 β”‚              3 β”‚
β”‚ 3 β”‚              0 β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

enable_block_dependent_functions = false

We can simply add a setting that will disable these functions by default.

Users will just enable it back and will continue complaining.

The bad thing about them is that users can do tests on small datasets when data fit's in a single block, and then (on bigger datasets) it will silently give him an incorrect result.

It's better to 'blow up' and 'make a lot of noise' instead of an incorrect results.

May be smth like block_dependent_functions = forbidden / throw / passthrough

Can be forbidden by default.

Users will just enable it back and will continue complaining.

That's ok, at least they will pay attention to the name of the setting.

These functions are also deficient for the following reason: #2479

They are almost unusable especially with a new sorting behavior.

1) that 'new sorting behavior' which @den-crane referenced was introduced in 19.14 (see #6054 #6629) - it happens when sorting key and ORDER BY in query matches. Can be disabled with optimize_read_in_order=0 if needed.
2) the way how clickhouse split the data into blocks can differ depending on the part sizes (so on background merges) and was never guaranteed
3) and those block dependant functions were provided with 'limited guarantees' for some very specific use cases and were always relying on blocks (no matter if there is internal order by or not):

SELECT version()

β”Œβ”€version()─┐
β”‚ 18.14.19 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

SELECT *
FROM 
(
    SELECT 
        number, 
        runningDifference(number) AS x
    FROM 
    (
        SELECT *
        FROM numbers(100000) 
        ORDER BY number ASC
    ) 
) 
WHERE x != 1

β”Œβ”€number─┬─x─┐
β”‚      0 β”‚ 0 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”˜
β”Œβ”€number─┬─x─┐
β”‚  65536 β”‚ 0 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”˜

2 rows in set. Elapsed: 0.019 sec. Processed 131.07 thousand rows, 1.05 MB (6.73 million rows/s., 53.81 MB/s.) 

So no regressions / no surprizes here.

Was this page helpful?
0 / 5 - 0 ratings