Clickhouse: Ran out of memory - left join

Created on 31 Jul 2019 · 6Comments · Source: ClickHouse/ClickHouse

I am running out of memory when trying to execute a text-search query.

clickhouse_driver.errors.ServerException: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 524288 bytes), maximum: 9.31 GiB.

The column I'm querying contains HTML snippets, so it's quite expensive to search through it. In the future, I plan to split it into several other columns, but I would like to know if it's possible to split the load between RAM and disk (e.g. switch to disk when out of memory or use another engine - currently using MergeTree).

The query looks like the following:

SELECT COUNT(*)
FROM ObserverNodeOccurrence as occ 
LEFT JOIN
         ObserverNodeOccurrence_NodeElements as occ_ne
         ON occ._id = occ_ne.occurrenceId
WHERE
         occ_ne.snippet LIKE '<img>

Further details on this stackoverflow question.

question

Source

Guilherme-Routar

All 6 comments

ClickHouse always use batch/vector processing. I.e. even if you need one row, ClickHouse can't process less than index_granularity rows (by default 8192), if rows are big - you can hit the memory limit when trying to put 8K row in memory.

So index_granularity is minimum size of the block (it just can access smaller chunks of data), but in practice clickhouse usually process data in blocks of max_block_size size (65536 rows by default), doing it in multithread mode. That means that you need at least column_size * max_block_size * max_threads memory.

So, please consider:
1) descreasing max_block_size for your select
1) descreasing index_granularity CREATE TABLE ... SETTINGS index_granularity=1024
2) recreate the table on ClickHouse >= v19.11.4 (in that case adaptive index granularity will be used)

filimonov on 1 Aug 2019

👍1

Thank you very much for the input. I updated to v19.11.4 but the error pops up anyway (I did recreate all tables); is there any setting I should change to enable adaptive index granularity?

ClickHouse client version 19.11.4.24 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.11.4 revision 54423.

Guilherme-Routar on 1 Aug 2019

Why did you think that memory used on search?
Your query has a JOIN. To execute JOIN CH puts the right table into memory into hash table.

How many rows returns:

select count() 
from ObserverNodeOccurrence_NodeElements as occ_ne

Did you try

SELECT COUNT(*)
FROM ObserverNodeOccurrence
where _id in (select occurrenceId 
                      from ObserverNodeOccurrence_NodeElements snippet LIKE '<img>)

And CH does not have an optimization to filter on the right table.

den-crane on 2 Aug 2019

👍1

@den-crane JOIN's right table has around 53 million records. And each snippet can be quite big. I tried using a subquery as you mentioned and it works perfectly.

Although I would like to understand whether adaptive index granularity is being used in my tables. Is there a way I can look into this? @filimonov

Guilherme-Routar on 2 Aug 2019

Your issue is unrelated to the @filimonov ideas probably he missed that you have the JOIN.
53 millions WIDE rows easily able to eat 500GB of memory because of straightforward JOIN you used.

den-crane on 2 Aug 2019

Alright, thanks for the input.

Guilherme-Routar on 2 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings