Clickhouse: uniqExact(distinct count) is slow

Created on 26 Feb 2019  Â·  3Comments  Â·  Source: ClickHouse/ClickHouse

I'm comparing ClickHouse with a commercial columnar database. It's amazing that in some cases ClickHouse beats the commercial product even with less resources(4 cores vs. 20 cores in my case). However, I noticed uniqExact(or distinct count) is unreasonably slow.

Here is my findings(using dockerized ClickHouse 19.3.4) and hopefully they can be of use:

  • assume we have a table with a few hundred millions rows(mine: ~600,000,000 rows) like below
create table test_distinct_count(a String, b String, b Int32, d String, ...)
Engine=MergeTree() order by (a, b, c, d) settings index_granularity = 8192;
  • we'll likely get the following test results on a VM with 4 cores(max_threads=4)
select a from test_distinct_count group by a -- 1,100+ rows returned in 5.252s
select distinct a from test_distinct_count -- 1,100+ rows returned in 5.912s
select uniq(a) from test_distinct_count -- 1 row returned in 7.483s
select uniqExact(a) from test_distinct_count -- 1 row returned in 12.308s
select count(1) from (select distinct a from test_distinct_count) x -- 1 row returned in 5.821s
select count(1) from (select a from test_distinct_count group by a) x -- 1 row returned in 4.937s

It looks like using group by and avoiding distinct bring us better performance, but it's really inconvenient to replace all distinct count with the trick...

Update: same issue on 19.3.5

performance

Most helpful comment

uniqExact is always the worst.

Connected to ClickHouse server version 19.3.4

case1. create table u_perf(a Int64, v String) Engine=MergeTree order by a;
case2. create table u_perf(a Int64, v String) Engine=MergeTree order by v;
case3. create table u_perf(a Int64, v LowCardinality(String)) Engine=MergeTree order by a;
case4. create table u_perf(a Int64, v LowCardinality(String)) Engine=MergeTree order by v;

insert into u_perf select number, toString(number % 21277) from numbers(100000000);
sleep(1);
optimize table u_perf final;

sql | case1 | case2 (sorted) | case3 (LC) | case4 LC(sorted)
-- | -- | -- | -- | --
select count() from (select v from u_perf group by v); | 1.092 sec. 1.096 sec. | 0.362 sec. 0.354 sec. | 1.694 sec. 1.640 sec. | 0.090 sec. 0.105 sec.
select count() from (select distinct v from u_perf); | 0.651 sec. 0.761 sec. | 0.400 sec. 0.356 sec. | 2.250 sec. 2.164 sec. | 0.879 sec. 0.746 sec.
select uniq(v) from u_perf; | 0.541 sec. 0.595 sec. | 0.473 sec. 0.552 sec. | 1.851 sec. 1.747 sec. | 0.451 sec. 0.392 sec.
select uniqExact(v) from u_perf; | 1.325 sec. 1.304 sec. | 0.759 sec. 0.867 sec. | 2.341 sec. 2.661 sec. | 0.678 sec. 0.817 sec.

All 3 comments

Hi,
One accidental run on complex installation means nothing. Could you make reliable test, please?

  1. Run each query 100-1000 times and place the time into texts file (with one column of time in milliseconds)
  2. Use ministat tool to compare the distributions.

uniqExact is always the worst.

Connected to ClickHouse server version 19.3.4

case1. create table u_perf(a Int64, v String) Engine=MergeTree order by a;
case2. create table u_perf(a Int64, v String) Engine=MergeTree order by v;
case3. create table u_perf(a Int64, v LowCardinality(String)) Engine=MergeTree order by a;
case4. create table u_perf(a Int64, v LowCardinality(String)) Engine=MergeTree order by v;

insert into u_perf select number, toString(number % 21277) from numbers(100000000);
sleep(1);
optimize table u_perf final;

sql | case1 | case2 (sorted) | case3 (LC) | case4 LC(sorted)
-- | -- | -- | -- | --
select count() from (select v from u_perf group by v); | 1.092 sec. 1.096 sec. | 0.362 sec. 0.354 sec. | 1.694 sec. 1.640 sec. | 0.090 sec. 0.105 sec.
select count() from (select distinct v from u_perf); | 0.651 sec. 0.761 sec. | 0.400 sec. 0.356 sec. | 2.250 sec. 2.164 sec. | 0.879 sec. 0.746 sec.
select uniq(v) from u_perf; | 0.541 sec. 0.595 sec. | 0.473 sec. 0.552 sec. | 1.851 sec. 1.747 sec. | 0.451 sec. 0.392 sec.
select uniqExact(v) from u_perf; | 1.325 sec. 1.304 sec. | 0.759 sec. 0.867 sec. | 2.341 sec. 2.661 sec. | 0.678 sec. 0.817 sec.

When you calculate a single value of aggregate function, the merging phase is not parallelized. (It is parallelized for different aggregate function states.)

E.g. if you calculate multiple small-medium uniqExacts in a single query (with GROUP BY), it will be fast, but when you calculate just a single uniqExact, it is slow.

This issue is yet to be addressed.

Was this page helpful?
0 / 5 - 0 ratings