Clickhouse: Version 20.3 has problem with simple select * queries

Created on 5 May 2020 · 9Comments · Source: ClickHouse/ClickHouse

What exactly works slower than expected?
We have ReplicatedMergeTree tables with 100 million - 1 billion records and select * from [table] limit 1 takes 10-20 seconds to respond.

I have tested the same table in 19 , 20.1 and 20.3 and it seems 20.3 has problem with select * queries

Which ClickHouse server version to use
20.3.8.53

comp-processors performance st-need-info v20.3-affected

Source

hoseiney

Most helpful comment

approach to STR:

412.32 MiB VS 64.02 MiB

20.3.8.53

CREATE TABLE xxxt (
 id UInt32,
 col_name1 Int32,
 col_name2 Int32,
 col_name3 Float64,
 col_name4 Float64,
 col_name5 Int32,
 col_name6 String,
 col_name7 String,
 col_name8 Float64,
 col_name9 Float64,
 col_name10 Int32,
 col_name11 String,
 col_name12 String,
 col_name13 Int8,
 col_name14 Int8,
 col_name15 Int8,
 col_name16 Int8,
 col_name17 Int32,
 col_name18 Int32,
 col_name19 Int32,
 col_name20 Int32,
 col_name21 Int32,
 col_name22 Int8,
 col_name23 Nullable(Int8),
 col_name24 String,
 col_name25 Int8,
 col_name26 Int32,
 col_name27 Int32,
 col_name28 Float64,
 col_name29 Float64,
 col_name30 Int8,
 col_name31 Int8,
 col_name32 Int8,
 col_name33 Int8,
 col_name34 String,
 col_name35 Float64,
 col_name36 Float64,
 col_name37 Int32,
 col_name38 String,
 col_name39 Int32,
 col_name40 Int32,
 col_name41 Int32,
 col_name42 Int8,
 created_at DateTime,
 created_date Date,
 col_name43 Date,
 updated_at DateTime,
 col_name44 DateTime,
  INDEX min_max created_date TYPE minmax GRANULARITY 8192
) ENGINE = ReplacingMergeTree( updated_at) 
PARTITION BY toDate(created_at) 
PRIMARY KEY id ORDER BY id 
SETTINGS index_granularity = 8192;

set max_insert_parts=100000;

insert into xxxt(id, created_at, created_date, updated_at, col_name23, col_name34, col_name35, col_name36)
select number x,  x/1000*3600*24, rand(), intDiv(x, 100),null, toString(cityHash64(x)), 
cityHash64(x)/43, cityHash64(x)/33 from numbers(1000000);

select * from xxxt limit 1

MemoryTracker: Peak memory usage (for query): 412.32 MiB.
1 rows in set. Elapsed: 0.873 sec. Processed 1.00 thousand rows, 239.41 KB (1.15 thousand rows/s., 274.31 KB/s.)

set experimental_use_processors=0

select * from xxxt limit 1

MemoryTracker: Peak memory usage (for query): 64.02 MiB.
1 rows in set. Elapsed: 0.122 sec.

20.4.1.3163

set experimental_use_processors=0
select * from xxxt limit 1
Peak memory usage (for query): 60.02 MiB.
1 rows in set. Elapsed: 0.044 sec.


set experimental_use_processors=1
select * from xxxt limit 1
Peak memory usage (for query): 60.03 MiB.
1 rows in set. Elapsed: 0.066 sec.

Seems it is solved in master

den-crane on 6 May 2020

👍3

All 9 comments

1) Please try set experimental_use_processors=0; and check if performance is proper again.
2) please collect the logs of query execution (with experimental_use_processors 0 and 1), by running your query in clickhouse-client with set send_logs_level='trace' before it.
3) can you share the schema of the table you are selecting from (you can obfuscate columns names).

filimonov on 5 May 2020

it is solved with set experimental_use_processors=0
currently I have downgraded it into v20.1
Is it ok to upgrade to v20.3 with experimental_use_processors = 0 config ?

logs :
query : select * from table limit 1;

1- with experimental_use_processors=0 :

Key condition: unknown
MinMax index condition: unknown
Selected 1541 parts by date, 1541 parts by key, 79739 marks to read from 1541 ranges
Reading 1 ranges from part 20161006_0_0_0, approx. 8192 rows starting from 0
...
Reading 1 ranges from part 20200506_12_17_1, approx. 16384 rows starting from 0

======>>>>(result)<<<==============

<Trace> UnionBlockInputStream: Waiting for threads to finish
<Trace> UnionBlockInputStream: Waited for threads to finish
<Information> executeQuery: Read 3 rows, 1.12 KiB in 0.832 sec., 3 rows/sec., 1.35 KiB/sec.
<Debug> MemoryTracker: Peak memory usage (for query): 104.07 MiB.

1 rows in set. Elapsed: 0.850 sec.

2- with experimental_use_processors=1:

Key condition: unknown
MinMax index condition: unknown
Selected 1539 parts by date, 1539 parts by key, 79737 marks to read from 1539 ranges
Reading 1 ranges from part 20161006_0_0_0, approx. 8192 rows starting from 0
...
Reading 1 ranges from part 20200506_12_17_1, approx. 16384 rows starting from 0

======>>>>(result)<<<==============

Progress: 1.54 thousand rows, 698.87 KB (7.05 rows/s., 3.20 KB/s.)                                                                       0%[srv-dpn-1-asia.snapp.dc] 2020.05.06 05:05:36.818088 [ 3895 ] {07795a82-c89c-4eae-9e56-73cd57b4c924} <Debug> MemoryTracker: Current memory usage: 1.00 GiB.

<Debug> MemoryTracker: Current memory usage: 2.00 GiB.
<Debug> MemoryTracker: Current memory usage: 3.00 GiB.
<Debug> MemoryTracker: Current memory usage: 4.00 GiB.
<Debug> MemoryTracker: Current memory usage: 5.00 GiB.
<Debug> MemoryTracker: Current memory usage: 6.00 GiB.
<Debug> MemoryTracker: Current memory usage: 7.00 GiB.
<Debug> MemoryTracker: Current memory usage: 8.00 GiB.
<Debug> MemoryTracker: Current memory usage: 9.00 GiB.
<Debug> MemoryTracker: Current memory usage: 10.00 GiB.
<Debug> MemoryTracker: Current memory usage: 11.00 GiB.
<Debug> MemoryTracker: Current memory usage: 12.00 GiB.
<Debug> MemoryTracker: Current memory usage: 13.00 GiB.
<Debug> MemoryTracker: Current memory usage: 14.00 GiB.
<Debug> MemoryTracker: Current memory usage: 15.00 GiB.
<Debug> MemoryTracker: Current memory usage: 16.00 GiB.
<Debug> MemoryTracker: Current memory usage: 17.00 GiB.
<Debug> MemoryTracker: Current memory usage: 18.00 GiB.
<Debug> MemoryTracker: Current memory usage: 19.00 GiB.
<Debug> MemoryTracker: Current memory usage: 20.00 GiB.
<Debug> MemoryTracker: Current memory usage: 21.00 GiB.
<Debug> MemoryTracker: Current memory usage: 22.00 GiB.
<Debug> MemoryTracker: Current memory usage: 23.00 GiB.
<Debug> MemoryTracker: Current memory usage: 24.00 GiB.
<Information> executeQuery: Read 1539 rows, 682.49 KiB in 218.214 sec., 7 rows/sec., 3.13 KiB/sec.
[<Debug> MemoryTracker: Peak memory usage (for query): 24.79 GiB.

1 rows in set. Elapsed: 218.216 sec. Processed 1.54 thousand rows, 698.87 KB (7.05 rows/s., 3.20 KB/s.)

schema:

CREATE TABLE table (
 col_name UInt32,
 col_name Int32,
 col_name Int32,
 col_name Float64,
 col_name Float64,
 col_name Int32,
 col_name String,
 col_name String,
 col_name Float64,
 col_name Float64,
 col_name Int32,
 col_name String,
 col_name String,
 col_name Int8,
 col_name Int8,
 col_name Int8,
 col_name Int8,
 col_name Int32,
 col_name Int32,
 col_name Int32,
 col_name Int32,
 col_name Int32,
 col_name Int8,
 col_name Nullable(Int8),
 col_name String,
 col_name Int8,
 col_name Int32,
 col_name Int32,
 col_name Float64,
 col_name Float64,
 col_name Int8,
 col_name Int8,
 col_name Int8,
 col_name Int8,
 col_name String,
 col_name Float64,
 col_name Float64,
 col_name Int32,
 col_name String,
 col_name Int32,
 col_name Int32,
 col_name Int32,
 col_name Int8,
 col_name DateTime,
 col_name Date,
 col_name Date,
 col_name DateTime,
 col_name DateTime,
  INDEX min_max created_date TYPE minmax GRANULARITY 8192
) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/database/table', '{replica}', updated_at) 
PARTITION BY toDate(created_at) 
PRIMARY KEY id ORDER BY id 
SETTINGS index_granularity = 8192;

hoseiney on 6 May 2020

What are the types of created_date / updated_at/ created_at / id ?

Is it ok to upgrade to v20.3 with experimental_use_processors = 0 config ?

yes

INDEX min_max created_date TYPE minmax GRANULARITY 8192

SKIP INDEX GRANULARITY 8192 -- has no sense. Try GRANULARITY 1..5.

den-crane on 6 May 2020

id UInt32
created_at , updated_at DateTime
created_date Date
as you know 8192 is default GRANULARITY, you are saying the default value is not appropriate ?

hoseiney on 6 May 2020

id UInt32
created_at , updated_at DateTime
created_date Date

Thank you.

as you know 8192 is default GRANULARITY, you are saying the default value is not appropriate ?

8192 is default GRANULARITY for a table SPARSE INDEX and it means rows.
Unfortunately SKIP indexes GRANULARITY has the same name but it means table's granules.
So your index able to skip only >= 8192*8192 = 67 108 864 rows only.
So trust me recommended skip index GRANULARITY is 1 to 5.

den-crane on 6 May 2020

👍1

@hoseiney in users' profile

cat /etc/clickhouse-server/conf.d/experimental_use_processors.xml
<?xml version="1.0" ?>
<yandex>
    <profiles>
        <default>
            <experimental_use_processors>0</experimental_use_processors>
        </default>
    </profiles>
</yandex>

den-crane on 6 May 2020

❤1 👍1

approach to STR:

412.32 MiB VS 64.02 MiB

20.3.8.53

CREATE TABLE xxxt (
 id UInt32,
 col_name1 Int32,
 col_name2 Int32,
 col_name3 Float64,
 col_name4 Float64,
 col_name5 Int32,
 col_name6 String,
 col_name7 String,
 col_name8 Float64,
 col_name9 Float64,
 col_name10 Int32,
 col_name11 String,
 col_name12 String,
 col_name13 Int8,
 col_name14 Int8,
 col_name15 Int8,
 col_name16 Int8,
 col_name17 Int32,
 col_name18 Int32,
 col_name19 Int32,
 col_name20 Int32,
 col_name21 Int32,
 col_name22 Int8,
 col_name23 Nullable(Int8),
 col_name24 String,
 col_name25 Int8,
 col_name26 Int32,
 col_name27 Int32,
 col_name28 Float64,
 col_name29 Float64,
 col_name30 Int8,
 col_name31 Int8,
 col_name32 Int8,
 col_name33 Int8,
 col_name34 String,
 col_name35 Float64,
 col_name36 Float64,
 col_name37 Int32,
 col_name38 String,
 col_name39 Int32,
 col_name40 Int32,
 col_name41 Int32,
 col_name42 Int8,
 created_at DateTime,
 created_date Date,
 col_name43 Date,
 updated_at DateTime,
 col_name44 DateTime,
  INDEX min_max created_date TYPE minmax GRANULARITY 8192
) ENGINE = ReplacingMergeTree( updated_at) 
PARTITION BY toDate(created_at) 
PRIMARY KEY id ORDER BY id 
SETTINGS index_granularity = 8192;

set max_insert_parts=100000;

insert into xxxt(id, created_at, created_date, updated_at, col_name23, col_name34, col_name35, col_name36)
select number x,  x/1000*3600*24, rand(), intDiv(x, 100),null, toString(cityHash64(x)), 
cityHash64(x)/43, cityHash64(x)/33 from numbers(1000000);

select * from xxxt limit 1

MemoryTracker: Peak memory usage (for query): 412.32 MiB.
1 rows in set. Elapsed: 0.873 sec. Processed 1.00 thousand rows, 239.41 KB (1.15 thousand rows/s., 274.31 KB/s.)

set experimental_use_processors=0

select * from xxxt limit 1

MemoryTracker: Peak memory usage (for query): 64.02 MiB.
1 rows in set. Elapsed: 0.122 sec.

20.4.1.3163

set experimental_use_processors=0
select * from xxxt limit 1
Peak memory usage (for query): 60.02 MiB.
1 rows in set. Elapsed: 0.044 sec.


set experimental_use_processors=1
select * from xxxt limit 1
Peak memory usage (for query): 60.03 MiB.
1 rows in set. Elapsed: 0.066 sec.

Seems it is solved in master

den-crane on 6 May 2020

👍3

Was fixed in 4837257bca175f52ad9c2b5530229b095a44c05c