Clickhouse: GROUP BY with LIMIT on a HDFS Parquet table results in Memory limit Exception

Created on 6 Nov 2019 · 6Comments · Source: ClickHouse/ClickHouse

I'm testing ClickHouse to see if it's a fit for our data.

I found that performing a GROUP BY with LIMIT query, results in Memory limit (for query) exceeded, even though max_bytes_before_external_group_by and max_bytes_before_external_sort are set.

ClickHouse server version : 19.16.2
Non-default settings :

set send_logs_level='debug'
set max_memory_usage=20000000000
set max_bytes_before_external_group_by=10000000000
set max_bytes_before_external_sort=10000000000

Table created using HDFS Engine :

CREATE TABLE hdfs_table (... some columns.. , high_cardinality_key_1 Nullable(Int64), high_cardinality_key_2 Nullable(Int64)) ENGINE=HDFS('hdfs://host:9000/some_dir/*', 'Parquet')

The data directory contains about 3 billion rows, 100 GB.
Queries to run that lead to unexpected result :

1. select high_cardinality_key_1, count(high_cardinality_key_2) from hdfs_table group by high_cardinality_key_1 limit 1
2. select count(distinct hour) from hdfs_table

I expected the query to spill to disk and return results even though memory is limited.
In the second query, hour is a very low cardinality column (only 24 distinct values) Is the whole data loaded in memory even though only some columns are needed?

The debug logs for the first query:

<Debug> MemoryTracker: Current memory usage: 6.03 GiB.
<Debug> MemoryTracker: Current memory usage: 10.00 GiB.
<Debug> Aggregator: Writing part of aggregation data into temporary file /var/lib/clickhouse/tmp/tmp68165zfaaaa.
...
<Debug> MemoryTracker: Current memory usage: 16.01 GiB.
...
<Debug> Aggregator: Writing part of aggregation data into temporary file /var/lib/clickhouse/tmp/tmp68165yhaaaa.
...
Progress: 733.14 million rows, 64.52 GB (19.19 million rows/s., 1.69 GB/s.) <Error> executeQuery: Code: 241, e.displayText() = DB::Exception: Memory limit (for query) exceeded: would use 18.63 GiB (attempt to allocate chunk of 33554448 bytes), maximum: 18.63 GiB (version 19.16.2.2 (official build))

Any idea ?
Thanks

bug st-accepted

Source

yamrzou

All 6 comments

Just tried to query some parquet files from local FS and also small file (about 150MB, 29M rows) from local HDFS installation.

SELECT countDistinct(BannerID)
FROM file('huge_event_log*.parq', 'Parquet', 'EventTime DateTime, BannerID UInt64, Cost UInt64, CounterType UInt8')
┌─uniqExact(BannerID)─┐
│             2788144 │
└─────────────────────┘

1 rows in set. Elapsed: 269.551 sec. Processed 5.06 billion rows, 106.26 GB (18.77 million rows/s., 394.20 MB/s.)

and in server logs:

2019.11.07 19:23:45.078137 [ 133 ] {01ab166d-77c9-465a-968c-1d6882e28217} <Debug> MemoryTracker: Peak memory usage (total): 1.43 GiB.

Memory consumption seems to be OK, but I cannot reproduce your case precisely. Maybe you have some peculiar properties of your files? How many parquet files you have? Which compression format do you use?

In our code, we read data from HDFS with small batches https://github.com/clickhouse/ClickHouse/blob/master/dbms/src/IO/ReadBufferFromHDFS.cpp#L61.

I think main problem is amout of files in hdfs://host:9000/some_dir/*.

alesapin on 7 Nov 2019

Yes, just created 200 files and get OOM:

$ ls event_log* | wc -l
200
$ clickhouse-client
 :) select countDistinct(BannerID) from file('event_log*.parq', 'Parquet', 'EventTime DateTime, BannerID UInt64, Cost UInt64, CounterType UInt8');

[7 19:52:13 2019]  oom_kill_process+0x254/0x280
[7 19:52:13 2019] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[7 19:52:13 2019] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice,task=clickhouse,pid=1767,uid=40154
[7 19:52:14 2019] oom_reaper: reaped process 1767 (clickhouse), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

alesapin on 7 Nov 2019

@stavrolia Look at narrowBlockInputStreams.

alexey-milovidov on 7 Nov 2019

👍1

Thank you for looking into it.

Indeed we have about 200 parquet files in the directory. Also the compression used is snappy.

yamrzou on 7 Nov 2019

👍1

Yes, just created 200 files and get OOM:

$ ls event_log* | wc -l
200
$ clickhouse-client
 :) select countDistinct(BannerID) from file('event_log*.parq', 'Parquet', 'EventTime DateTime, BannerID UInt64, Cost UInt64, CounterType UInt8');

[7 19:52:13 2019]  oom_kill_process+0x254/0x280
[7 19:52:13 2019] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[7 19:52:13 2019] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice,task=clickhouse,pid=1767,uid=40154
[7 19:52:14 2019] oom_reaper: reaped process 1767 (clickhouse), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

hi ,I created a similar table, read the HDFS file as orc, the same problem will occur, what do I need to set to avoid this situation, my configuration is the same as above, machine configuration: 8 core 32gb memory ,please give me a hand,thanks