I am running out of memory when trying to execute a text-search query.
clickhouse_driver.errors.ServerException: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 524288 bytes), maximum: 9.31 GiB.
The column I'm querying contains HTML snippets, so it's quite expensive to search through it. In the future, I plan to split it into several other columns, but I would like to know if it's possible to split the load between RAM and disk (e.g. switch to disk when out of memory or use another engine - currently using MergeTree).
The query looks like the following:
SELECT COUNT(*)
FROM ObserverNodeOccurrence as occ
LEFT JOIN
ObserverNodeOccurrence_NodeElements as occ_ne
ON occ._id = occ_ne.occurrenceId
WHERE
occ_ne.snippet LIKE '<img>
Further details on this stackoverflow question.
ClickHouse always use batch/vector processing. I.e. even if you need one row, ClickHouse can't process less than index_granularity rows (by default 8192), if rows are big - you can hit the memory limit when trying to put 8K row in memory.
So index_granularity is minimum size of the block (it just can access smaller chunks of data), but in practice clickhouse usually process data in blocks of max_block_size size (65536 rows by default), doing it in multithread mode. That means that you need at least column_size * max_block_size * max_threads memory.
So, please consider:
1) descreasing max_block_size for your select
1) descreasing index_granularity CREATE TABLE ... SETTINGS index_granularity=1024
2) recreate the table on ClickHouse >= v19.11.4 (in that case adaptive index granularity will be used)
Thank you very much for the input. I updated to v19.11.4 but the error pops up anyway (I did recreate all tables); is there any setting I should change to enable adaptive index granularity?
ClickHouse client version 19.11.4.24 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.11.4 revision 54423.
Why did you think that memory used on search?
Your query has a JOIN. To execute JOIN CH puts the right table into memory into hash table.
How many rows returns:
select count()
from ObserverNodeOccurrence_NodeElements as occ_ne
Did you try
SELECT COUNT(*)
FROM ObserverNodeOccurrence
where _id in (select occurrenceId
from ObserverNodeOccurrence_NodeElements snippet LIKE '<img>)
And CH does not have an optimization to filter on the right table.
@den-crane JOIN's right table has around 53 million records. And each snippet can be quite big. I tried using a subquery as you mentioned and it works perfectly.
Although I would like to understand whether adaptive index granularity is being used in my tables. Is there a way I can look into this? @filimonov
Your issue is unrelated to the @filimonov ideas probably he missed that you have the JOIN.
53 millions WIDE rows easily able to eat 500GB of memory because of straightforward JOIN you used.
Alright, thanks for the input.