How to reproduce
DROP TABLE IF EXISTS test_partition_filtering;
CREATE TABLE test_partition_filtering (
`timestamp` DateTime,
zoneId UInt64
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (zoneId, timestamp);
INSERT INTO test_partition_filtering
SELECT
toUInt64(now())-1000*number/1000 as timestamp,
number/1000 as zone
FROM numbers(1000000);
OPTIMIZE TABLE test_partition_filtering final;
SELECT count() FROM test_partition_filtering WHERE toDate(toStartOfDay(timestamp)) = today() and zoneId=42;
SELECT count() FROM test_partition_filtering WHERE toDate(timestamp) = today() AND zoneId = 42;
Expected behavior
Expect both SELECTS to prune partitions and to read just a single part.
Actual behavior
First query reads one part. Second query reads 12 parts.
Logs
executeQuery: (from 127.0.0.1:48812) SELECT count() FROM test_partition_filtering WHERE (toDate(toStartOfDay(timestamp)) = today()) AND (zoneId = 42)
InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "toDate(toStartOfDay(timestamp)) = today()" moved to PREWHERE
default.test_partition_filtering (SelectExecutor): Key condition: (column 0 in [42, 42]), (toDate(toStartOfDay(column 1)) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): MinMax index condition: unknown, (toDate(toStartOfDay(column 0)) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1 marks to read from 1 ranges
executeQuery: (from 127.0.0.1:48812) SELECT count() FROM test_partition_filtering WHERE (toDate(timestamp) = today()) AND (zoneId = 42)
InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "toDate(timestamp) = today()" moved to PREWHERE
default.test_partition_filtering (SelectExecutor): Key condition: (column 0 in [42, 42]), (toDate(column 1) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): MinMax index condition: unknown, (toDate(column 0) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): Selected 12 parts by date, 1 parts by key, 1 marks to read from 1 ranges
Is it reproducible on a stable version as well?
@victor-perov 19.17.2.4-testing is a stable version even though system.build_options reports it as testing.
each part has min_max_timestamp.idx file.
This file stores max & min values of timestamp column over this part.
pruning works with where timestamp > = <
pruning does not work with function(timestamp) > = < because function could be one-way_function
imho in these queries pruning does not work at all, only PK.
Though maybe I am wrong.
CREATE TABLE test_partition_filtering (
timestamp DateTime) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY tuple();
INSERT INTO test_partition_filtering SELECT '2019-01-01 10:00:00' FROM numbers(1000000);
INSERT INTO test_partition_filtering SELECT '2019-01-02 10:00:00' FROM numbers(1000000);
INSERT INTO test_partition_filtering SELECT toDateTime(today())+3600 FROM numbers(1000000);
select count() from test_partition_filtering
where toDate(toStartOfDay(timestamp)) = today();
Processed 1.00 million rows
select count() from test_partition_filtering
where toDate((timestamp)) = today();
Processed 1.00 million rows
It would be nice to get an update on the issue.
I'm hitting the same issue. Is there any updates?
related: https://github.com/ClickHouse/ClickHouse/issues/15255
20.10.1.4704.
SELECT count() FROM test_partition_filtering WHERE toDate(toStartOfDay(timestamp)) = today() and zoneId=42;
Selected 1 parts by date, 1 parts by key, 1 marks by primary key, 1 marks to read from 1 ranges
Processed 8.19 thousand rows
SELECT count() FROM test_partition_filtering WHERE toDate(timestamp) = today() AND zoneId = 42;
Selected 1 parts by date, 1 parts by key, 1 marks by primary key, 1 marks to read from 1 ranges
Processed 8.19 thousand rows
seems fixed starting with 20.8
It's fixed in https://github.com/ClickHouse/ClickHouse/pull/13497 . And toDate(timestamp_ms / 1000) will also work because of https://github.com/ClickHouse/ClickHouse/pull/14513 .
Most helpful comment
It's fixed in https://github.com/ClickHouse/ClickHouse/pull/13497 . And
toDate(timestamp_ms / 1000)will also work because of https://github.com/ClickHouse/ClickHouse/pull/14513 .