We observed a significant performance degradation when upgraded from 1.1.54310 to 1.1.54385.
Our testing shows that performance problems happen with queries containing an IN operator with a subquery inside that returns many rows (>10M) and it appeared somewhere between 1.1.54343 and 1.1.54378.
Example that is close to our query and reproduces the issue:
CREATE TABLE test_1
(
date Date,
number UInt64
)
ENGINE = MergeTree(date, number, 8192);
INSERT INTO test_1 SELECT
number / 10000,
rand()
FROM system.numbers
LIMIT 100000000;
CREATE TABLE test_2
(
date Date,
number_1 UInt64,
number_2 UInt64
)
ENGINE = MergeTree(date, (number_1, number_2), 8192);
INSERT INTO test_2 SELECT
number / 100000,
rand(),
rand()
FROM system.numbers
LIMIT 1000000000;
SELECT number
FROM test_1
WHERE number IN
(
SELECT number_1
FROM test_2
PREWHERE number_2 < 100000000
)
LIMIT 10;
The last select in the example on 1.1.54343 gives:
10 rows in set. Elapsed: 2.590 sec. Processed 1.00 billion rows, 8.22 GB (386.63 million rows/s., 3.18 GB/s.)
and on 1.1.54378:
10 rows in set. Elapsed: 23.436 sec. Processed 16.38 thousand rows, 131.07 KB (699.10 rows/s., 5.59 KB/s.)
Replacing the IN with a JOIN solved the problem in our case. But obviously IN after 1.1.54343 works in a different way and sometimes it is not optimal.
I will be happy to provide any additional details!
This is because of the features:
ClickHouse release 1.1.54343
Now the table index is used for conditions like expr IN (subquery).
(ClickHouse release 1.1.54356)
An index of MergeTree tables is used when IN is applied to a tuple of expressions from the
columns of the primary key. Example: WHERE (UserID, EventDate) IN ((123, '2000-01-01'), ...)
(Anastasiya Tsarkova).
It seems that you should to use another set/order of columns in PK if you need to filter by NOT number column.
Dirty workaround to trick "optimizer":
concat(toString(number),'') IN
(
SELECT toString(number_1)
It's not normal. Will fix.
You can mitigate it with SET use_index_for_in_with_subqueries = 0
Fixed in master with #2584 and #2677.
Most helpful comment
It's not normal. Will fix.