Clickhouse: Partition pruning not working as expected

Created on 27 Nov 2019  路  8Comments  路  Source: ClickHouse/ClickHouse

How to reproduce

  • Which ClickHouse server version to use: v19.17.2.4-testing
DROP TABLE IF EXISTS test_partition_filtering;

CREATE TABLE test_partition_filtering (
    `timestamp` DateTime,
    zoneId UInt64
) ENGINE = MergeTree() 
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (zoneId, timestamp);

INSERT INTO test_partition_filtering
SELECT
  toUInt64(now())-1000*number/1000 as timestamp,
  number/1000 as zone
FROM numbers(1000000);

OPTIMIZE TABLE test_partition_filtering final;

SELECT count() FROM test_partition_filtering WHERE toDate(toStartOfDay(timestamp)) = today() and zoneId=42;
SELECT count() FROM test_partition_filtering WHERE toDate(timestamp) = today() AND zoneId = 42;

Expected behavior
Expect both SELECTS to prune partitions and to read just a single part.

Actual behavior
First query reads one part. Second query reads 12 parts.

Logs

executeQuery: (from 127.0.0.1:48812) SELECT count() FROM test_partition_filtering WHERE (toDate(toStartOfDay(timestamp)) = today()) AND (zoneId = 42)
InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "toDate(toStartOfDay(timestamp)) = today()" moved to PREWHERE
default.test_partition_filtering (SelectExecutor): Key condition: (column 0 in [42, 42]), (toDate(toStartOfDay(column 1)) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): MinMax index condition: unknown, (toDate(toStartOfDay(column 0)) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1 marks to read from 1 ranges



executeQuery: (from 127.0.0.1:48812) SELECT count() FROM test_partition_filtering WHERE (toDate(timestamp) = today()) AND (zoneId = 42)
InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "toDate(timestamp) = today()" moved to PREWHERE
default.test_partition_filtering (SelectExecutor): Key condition: (column 0 in [42, 42]), (toDate(column 1) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): MinMax index condition: unknown, (toDate(column 0) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): Selected 12 parts by date, 1 parts by key, 1 marks to read from 1 ranges
bug performance

Most helpful comment

It's fixed in https://github.com/ClickHouse/ClickHouse/pull/13497 . And toDate(timestamp_ms / 1000) will also work because of https://github.com/ClickHouse/ClickHouse/pull/14513 .

All 8 comments

Is it reproducible on a stable version as well?

@victor-perov 19.17.2.4-testing is a stable version even though system.build_options reports it as testing.

each part has min_max_timestamp.idx file.
This file stores max & min values of timestamp column over this part.

pruning works with where timestamp > = <
pruning does not work with function(timestamp) > = < because function could be one-way_function

imho in these queries pruning does not work at all, only PK.

Though maybe I am wrong.

CREATE TABLE test_partition_filtering (
    timestamp DateTime) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY tuple();

INSERT INTO test_partition_filtering SELECT '2019-01-01 10:00:00' FROM numbers(1000000);
INSERT INTO test_partition_filtering SELECT '2019-01-02 10:00:00' FROM numbers(1000000);
INSERT INTO test_partition_filtering SELECT toDateTime(today())+3600 FROM numbers(1000000);

select count() from test_partition_filtering 
where toDate(toStartOfDay(timestamp)) = today();
Processed 1.00 million rows

select count() from test_partition_filtering 
where toDate((timestamp)) = today();
Processed 1.00 million rows

It would be nice to get an update on the issue.

I'm hitting the same issue. Is there any updates?

related: https://github.com/ClickHouse/ClickHouse/issues/15255

20.10.1.4704.
SELECT count() FROM test_partition_filtering WHERE toDate(toStartOfDay(timestamp)) = today() and zoneId=42;
Selected 1 parts by date, 1 parts by key, 1 marks by primary key, 1 marks to read from 1 ranges
Processed 8.19 thousand rows

SELECT count() FROM test_partition_filtering WHERE toDate(timestamp) = today() AND zoneId = 42;
Selected 1 parts by date, 1 parts by key, 1 marks by primary key, 1 marks to read from 1 ranges
Processed 8.19 thousand rows

seems fixed starting with 20.8

It's fixed in https://github.com/ClickHouse/ClickHouse/pull/13497 . And toDate(timestamp_ms / 1000) will also work because of https://github.com/ClickHouse/ClickHouse/pull/14513 .

Was this page helpful?
0 / 5 - 0 ratings