Clickhouse: uniqExact(distinct count) is slow

Created on 26 Feb 2019 · 3Comments · Source: ClickHouse/ClickHouse

I'm comparing ClickHouse with a commercial columnar database. It's amazing that in some cases ClickHouse beats the commercial product even with less resources(4 cores vs. 20 cores in my case). However, I noticed uniqExact(or distinct count) is unreasonably slow.

Here is my findings(using dockerized ClickHouse 19.3.4) and hopefully they can be of use:

assume we have a table with a few hundred millions rows(mine: ~600,000,000 rows) like below

create table test_distinct_count(a String, b String, b Int32, d String, ...)
Engine=MergeTree() order by (a, b, c, d) settings index_granularity = 8192;

we'll likely get the following test results on a VM with 4 cores(max_threads=4)

select a from test_distinct_count group by a -- 1,100+ rows returned in 5.252s
select distinct a from test_distinct_count -- 1,100+ rows returned in 5.912s
select uniq(a) from test_distinct_count -- 1 row returned in 7.483s
select uniqExact(a) from test_distinct_count -- 1 row returned in 12.308s
select count(1) from (select distinct a from test_distinct_count) x -- 1 row returned in 5.821s
select count(1) from (select a from test_distinct_count group by a) x -- 1 row returned in 4.937s

It looks like using group by and avoiding distinct bring us better performance, but it's really inconvenient to replace all distinct count with the trick...

Update: same issue on 19.3.5

performance

Source

zhicwu

👍3

Most helpful comment

uniqExact is always the worst.

Connected to ClickHouse server version 19.3.4

case1. create table u_perf(a Int64, v String) Engine=MergeTree order by a;
case2. create table u_perf(a Int64, v String) Engine=MergeTree order by v;
case3. create table u_perf(a Int64, v LowCardinality(String)) Engine=MergeTree order by a;
case4. create table u_perf(a Int64, v LowCardinality(String)) Engine=MergeTree order by v;

insert into u_perf select number, toString(number % 21277) from numbers(100000000);
sleep(1);
optimize table u_perf final;

den-crane on 26 Feb 2019

👍3

All 3 comments

Hi,
One accidental run on complex installation means nothing. Could you make reliable test, please?

Run each query 100-1000 times and place the time into texts file (with one column of time in milliseconds)
Use ministat tool to compare the distributions.

4ertus2 on 26 Feb 2019

uniqExact is always the worst.

Connected to ClickHouse server version 19.3.4

insert into u_perf select number, toString(number % 21277) from numbers(100000000);
sleep(1);
optimize table u_perf final;

den-crane on 26 Feb 2019

👍3

When you calculate a single value of aggregate function, the merging phase is not parallelized. (It is parallelized for different aggregate function states.)

E.g. if you calculate multiple small-medium uniqExacts in a single query (with GROUP BY), it will be fast, but when you calculate just a single uniqExact, it is slow.

This issue is yet to be addressed.