neighbor / runningDifference / runningAccumulate should fail when setting is enabled and more that one block was returned.
Motivation: report the error early. Prevent function misuse.
P.S. Better name for that?
We can simply add a setting that will disable these functions by default.
We can simply add a setting that will disable these functions by default.
I agree. They are almost unusable especially with a new sorting behavior.
create table A (A Int64) engine=MergeTree order by A;
insert into A values (1);
insert into A values (2);
insert into A values (3);
select A, neighbor(A,1) from (select A from A order by A)
ββAββ¬βneighbor(A, 1)ββ
β 1 β 0 β
βββββ΄βββββββββββββββββ
ββAββ¬βneighbor(A, 1)ββ
β 2 β 0 β
βββββ΄βββββββββββββββββ
ββAββ¬βneighbor(A, 1)ββ
β 3 β 0 β
βββββ΄βββββββββββββββββ
optimize table A final;
select A, neighbor(A,1) from (select A from A order by A)
ββAββ¬βneighbor(A, 1)ββ
β 1 β 2 β
β 2 β 3 β
β 3 β 0 β
βββββ΄βββββββββββββββββ
enable_block_dependent_functions = false
We can simply add a setting that will disable these functions by default.
Users will just enable it back and will continue complaining.
The bad thing about them is that users can do tests on small datasets when data fit's in a single block, and then (on bigger datasets) it will silently give him an incorrect result.
It's better to 'blow up' and 'make a lot of noise' instead of an incorrect results.
May be smth like block_dependent_functions = forbidden / throw / passthrough
Can be forbidden by default.
Users will just enable it back and will continue complaining.
That's ok, at least they will pay attention to the name of the setting.
These functions are also deficient for the following reason: #2479
They are almost unusable especially with a new sorting behavior.
1) that 'new sorting behavior' which @den-crane referenced was introduced in 19.14 (see #6054 #6629) - it happens when sorting key and ORDER BY in query matches. Can be disabled with optimize_read_in_order=0 if needed.
2) the way how clickhouse split the data into blocks can differ depending on the part sizes (so on background merges) and was never guaranteed
3) and those block dependant functions were provided with 'limited guarantees' for some very specific use cases and were always relying on blocks (no matter if there is internal order by or not):
SELECT version()
ββversion()ββ
β 18.14.19 β
βββββββββββββ
SELECT *
FROM
(
SELECT
number,
runningDifference(number) AS x
FROM
(
SELECT *
FROM numbers(100000)
ORDER BY number ASC
)
)
WHERE x != 1
ββnumberββ¬βxββ
β 0 β 0 β
ββββββββββ΄ββββ
ββnumberββ¬βxββ
β 65536 β 0 β
ββββββββββ΄ββββ
2 rows in set. Elapsed: 0.019 sec. Processed 131.07 thousand rows, 1.05 MB (6.73 million rows/s., 53.81 MB/s.)
So no regressions / no surprizes here.
Most helpful comment
I agree. They are almost unusable especially with a new sorting behavior.