Clickhouse: Memory limit (total) exceeded issues with wait_end_of_query=1

Created on 23 May 2020  路  20Comments  路  Source: ClickHouse/ClickHouse

Using the version a2ed8267add1cfc28fa2ca17f7227a816921e7d7 in case I run a QUERY like "SELECT 1" in loop using wait_end_of_query=1 after 7000 iterations all queries will fail with:

DynamicQueryHandler: Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded: would use 14.40 GiB (attempt to allocate chunk of 4198133 bytes), maximum: 14.40 GiB, Stack trace (when copying this message, always include the lines below):

If wait_end_of_query is not specified the problem doesn't occur.

bug comp-http memory v20.4-affected

Most helpful comment

I fund out that also If I do 10K small inserts one after the other without any wait_end_of_query receive same issue.

Yes, this is kind of expecting behaviour

There is any way I can disable this memory limit check ?

You can add the following into config.xml:

<max_server_memory_usage_to_ram_ratio>100</max_server_memory_usage_to_ram_ratio>

To verify take a look into the logs:

Setting max_server_memory_usage ...

Hm, and this not happens before?

And yes it pops up only now, since max_server_memory_usage has been added only in 20.4 (#10421, #10362)

P.S. this memory tracking accounting will be reset each 90 seconds (AFAIR), but due to this queries are too fast (since they are no-op) this does not work

All 20 comments

Cannot reproduce (even on that particular commit), 10k requests, ~1k in parallel everything is fine (RSS/VIRT does not grows):

$ time yes '127.1:8123/?wait_end_of_query=1' | head -n10000 | xargs -P100
00 curl -s -d 'select 1' | grep -x -c 1                                                                                                  
10000

real    0m3.427s

Hello @azat,

I can replicate the problem with your query. Please look into the logs (those are spammed with DynamicQueryHandler: Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded:) . If you look at the memory this will not increasing. There is no memory leak, the problem is in the MemoryTracker.

Silviu

Reproduced (forgot to set memory limit for server)

Using the version a2ed826

Hm, and this not happens before?

@azat I have no idea :) Now I wanted to use wait_end_of_query . But everything seems broken once the problem occurs. Even config auto reload is not longer working and throw same problem.

@azat I fund out that also If I do 10K small inserts one after the other without any wait_end_of_query receive same issue. There is any way I can disable this memory limit check ?

I fund out that also If I do 10K small inserts one after the other without any wait_end_of_query receive same issue.

Yes, this is kind of expecting behaviour

There is any way I can disable this memory limit check ?

You can add the following into config.xml:

<max_server_memory_usage_to_ram_ratio>100</max_server_memory_usage_to_ram_ratio>

To verify take a look into the logs:

Setting max_server_memory_usage ...

Hm, and this not happens before?

And yes it pops up only now, since max_server_memory_usage has been added only in 20.4 (#10421, #10362)

P.S. this memory tracking accounting will be reset each 90 seconds (AFAIR), but due to this queries are too fast (since they are no-op) this does not work

@azat I keeped the server opened for few hours without sending any query and seems that counter didn't reseted. All queries are still failing..

I keeped the server opened for few hours without sending any query and seems that counter didn't reseted. All queries are still failing..

Have you installed the server setting max_server_memory_usage_to_ram_ratio from the comment?
And what RSS for the clickhouse-server process?

I added max_server_memory_usage_to_ram_ratio and this fixed all problems. But without this once the problems appears the app becomes unusable even if the clickhouse-server process doesn't have more than 300 MB of memory on a 16 GB server... Even if you wait 1 hour will not become usable till you restart.

clickhouse-server process doesn't have more than 300 MB of memory

of RSS?

Can you post output of the following?

select metric, formatReadableSize(value) from system.metrics where metric = 'MemoryTracking'

this query should work since it uses less then max_untracked_memory, by default 4MB

Also, maybe you have some background queries that you forgot to mention?

same problem with 20.4.3.16 and 20.4.2.9

Hi,
I got the same problem and setting max_server_memory_usage_to_ram_ratio does not fix the problem. Anyone have a solution yet?
this is my error

ServerException: Code: 241.
DB::Exception: Memory limit (for query) exceeded: would use 18.63 GiB (attempt to allocate chunk of 4210950 bytes), maximum: 18.63 GiB: While executing MergeTreeThread. Stack trace:

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x103fa390 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x900b8ad in /usr/bin/clickhouse
2. ? @ 0x900af73 in /usr/bin/clickhouse
3. MemoryTracker::alloc(long) @ 0x90095e7 in /usr/bin/clickhouse
4. void DB::filterArraysImpl<char8_t>(DB::PODArray<char8_t, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, DB::PODArray<char8_t, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<char8_t, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, long) @ 0xd60c57c in /usr/bin/clickhouse
5. DB::ColumnString::filter(DB::PODArray<char8_t, 4096ul, Allocator<false, false>, 15ul, 16ul> const&, long) const @ 0xd5a9385 in /usr/bin/clickhouse
6. ? @ 0xd9b2735 in /usr/bin/clickhouse
7. DB::MergeTreeRangeReader::read(unsigned long, std::__1::deque<DB::MarkRange, std::__1::allocator<DB::MarkRange> >&) @ 0xd9b64fa in /usr/bin/clickhouse
8. DB::MergeTreeBaseSelectProcessor::readFromPartImpl() @ 0xd9af83d in /usr/bin/clickhouse
9. DB::MergeTreeBaseSelectProcessor::generate() @ 0xd9b0417 in /usr/bin/clickhouse
10. DB::ISource::work() @ 0xdad2fab in /usr/bin/clickhouse
11. DB::SourceWithProgress::work() @ 0xdd18887 in /usr/bin/clickhouse
12. ? @ 0xdafe041 in /usr/bin/clickhouse
13. DB::PipelineExecutor::executeSingleThread(unsigned long, unsigned long) @ 0xdb01ffd in /usr/bin/clickhouse
14. ? @ 0xdb029c4 in /usr/bin/clickhouse
15. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x9031267 in /usr/bin/clickhouse
16. ? @ 0x902f753 in /usr/bin/clickhouse
17. start_thread @ 0x7e65 in /usr/lib64/libpthread-2.17.so
18. clone @ 0xfe88d in /usr/lib64/libc-2.17.so

I got the same problem and setting max_server_memory_usage_to_ram_ratio does not fix the problem. Anyone have a solution yet?

@HooliBox which version do you have? and please provide startup logs (at least Setting max_server_memory_usage ... messages)

I am on version version 20.4.5.36 (official build).
I did the grep max_server_memory_usage /var/log/clickhouse-server/clickhouse-server.log

2020.06.21 11:20:40.539962 [ 506 ] {} <Information> Application: Setting max_server_memory_usage was set to 2.86 TiB
2020.06.21 11:21:59.810640 [ 679 ] {} <Information> Application: Setting max_server_memory_usage was set to 2.57 TiB
2020.06.21 11:26:18.303317 [ 846 ] {} <Information> Application: Setting max_server_memory_usage was set to 2.86 TiB
2020.06.21 12:22:44.987683 [ 314 ] {} <Information> Application: Setting max_server_memory_usage was set to 2.86 TiB

@HooliBox but in your case the query limit exceeded, not total (like in issue subject):

DB::Exception: Memory limit (for query) exceeded: would use 18.63 GiB (attempt to allocate chunk of 4210950 bytes), maximum: 18.63 GiB: While executing MergeTreeThread. Stack trace:

So take a look at max_memory_usage* settings

Here is what I did, I ran the same query 10 times. The first couple time, the query works fine. Then, it stopped working and complaining about the memory.

from sqlalchemy import create_engine, Column, MetaData, literal
from clickhouse_sqlalchemy import Table, make_session, get_declarative_base, types, engines
import pandas as pd
import numpy as np

uri = 'clickhouse://[email protected]/'
engine = create_engine(uri)
metadata = MetaData()

sql = """
SELECT toYYYYMMDD(trade_datetime) AS trade_dt, s_info_windcode, SUM(volume) AS volume FROM minute_bar
WHERE toInt64(toYYYYMMDDhhmmss(trade_datetime)) - toInt64(toYYYYMMDD(trade_datetime)) * 1000000 > 94500
AND toInt64(toYYYYMMDDhhmmss(trade_datetime)) - toInt64(toYYYYMMDD(trade_datetime)) * 1000000 < 141500
GROUP BY toYYYYMMDD(trade_datetime), s_info_windcode"""

for i in range(10):
    res = engine.execute(sql)
    data = res.fetchall()
    print(i, data[0])

and here is the output:

0 ('20191025', '603915.SH', '2157700')
1 ('20130208', '002059.SZ', '1159964')
2 ('20191025', '603915.SH', '2157700')
3 ('20130208', '002059.SZ', '1159964')
4 ('20130208', '002059.SZ', '1159964')
5 ('20191025', '603915.SH', '2157700')
---------------------------------------------------------------------------
DatabaseException                         Traceback (most recent call last)
<ipython-input-2-d1cd821b006b> in <module>
      7 
      8 for i in range(10):
----> 9     res = engine.execute(sql)
     10     data = res.fetchall()
     11     print(i, data[0])

/opt/conda/lib/python3.7/site-packages/sqlalchemy/engine/base.py in execute(self, statement, *multiparams, **params)
   2242 
   2243         connection = self._contextual_connect(close_with_result=True)
-> 2244         return connection.execute(statement, *multiparams, **params)
   2245 
   2246     def scalar(self, statement, *multiparams, **params):

/opt/conda/lib/python3.7/site-packages/sqlalchemy/engine/base.py in execute(self, object_, *multiparams, **params)
   1010         """
   1011         if isinstance(object_, util.string_types[0]):
-> 1012             return self._execute_text(object_, multiparams, params)
   1013         try:
   1014             meth = object_._execute_on_connection

/opt/conda/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_text(self, statement, multiparams, params)
   1185             parameters,
   1186             statement,
-> 1187             parameters,
   1188         )
   1189         if self._has_events or self.engine._has_events:

/opt/conda/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1322         except BaseException as e:
   1323             self._handle_dbapi_exception(
-> 1324                 e, statement, parameters, cursor, context
   1325             )
   1326 

/opt/conda/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1519                 )
   1520             else:
-> 1521                 util.raise_(exc_info[1], with_traceback=exc_info[2])
   1522 
   1523         finally:

/opt/conda/lib/python3.7/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
    176 
    177         try:
--> 178             raise exception
    179         finally:
    180             # credit to

/opt/conda/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1282                 if not evt_handled:
   1283                     self.dialect.do_execute(
-> 1284                         cursor, statement, parameters, context
   1285                     )
   1286 

/opt/conda/lib/python3.7/site-packages/clickhouse_sqlalchemy/drivers/base.py in do_execute(self, cursor, statement, parameters, context)
    874 
    875     def do_execute(self, cursor, statement, parameters, context=None):
--> 876         cursor.execute(statement, parameters, context=context)
    877 
    878     def _check_unicode_returns(self, connection, additional_tests=None):

/opt/conda/lib/python3.7/site-packages/clickhouse_sqlalchemy/drivers/http/connector.py in execute(self, operation, parameters, context)
    115         response_gen = transport.execute(raw_sql, params=params)
    116 
--> 117         self._process_response(response_gen)
    118         self._end_query()
    119 

/opt/conda/lib/python3.7/site-packages/clickhouse_sqlalchemy/drivers/http/connector.py in _process_response(self, response)
    214         response = iter(response)
    215 
--> 216         self._columns = next(response, None)
    217         self._types = next(response, None)
    218         self._response = response

/opt/conda/lib/python3.7/site-packages/clickhouse_sqlalchemy/drivers/http/transport.py in execute(self, query, params)
     89         there is nothing to return.
     90         """
---> 91         r = self._send(query, params=params, stream=True)
     92         lines = r.iter_lines()
     93         try:

/opt/conda/lib/python3.7/site-packages/clickhouse_sqlalchemy/drivers/http/transport.py in _send(self, data, params, stream)
    135             orig = HTTPException(r.text)
    136             orig.code = r.status_code
--> 137             raise DatabaseException(orig)
    138         return r

DatabaseException: Orig exception: Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded: would use 52.74 GiB (attempt to allocate chunk of 4718592 bytes), maximum: 52.73 GiB: While executing AggregatingTransform (version 20.4.5.36 (official build))

This seems like the same issue with the above where it didn't clear out the memory

Ok, here are some questions to you:

  • can you post into gist.github.com log for one query when it does not fails
  • can you try using http interface instead of tcp? (there was issues that was in tcp interface, that has been fixed in #11022, http does not have it). AFAICS clickhouse_sqlalchemy allows HTTP interface too?

And also some notes regardless 20.4:

  • #11022 wasn't backported to 20.4 (@alexey-milovidov should it?)
  • #11084 also wasn't backported to 20.4

P.S. and looks like you have different issue

I pasted log file here.
I tried both tcp and http, they all failed with same error. The only difference is tcp takes couple more iterations to fail, I guess it is using less memory? Not sure.

From your logs:

2020.06.21 14:00:53.762580 [ 859 ] {} MemoryTracker: Peak memory usage (for query): 6.15 GiB.

And the error when the query fails:

DB::Exception: Memory limit (for query) exceeded: would use 18.63 GiB (attempt to allocate chunk of 4210950 bytes), maximum: 18.63 GiB: While executing MergeTreeThread. Stack trace:

Possible issues:

  • these are for different queries
  • more data arrived
  • memory accounting got lost (i.e. tracker of the memory changed after new, but before free and the memory incorrectly accounted)

Anyway looks like a separate issue to me (and actually this issues already contains 3 separate issues), could you report it separately? (including all the details you provided, also you can add full server log from the beginning until the query starts failing, since looks like it is not a problem for you)

I used the same query and no new data was inserted.
I could create a new issue.

Was this page helpful?
0 / 5 - 0 ratings