Sanic: Increasing memory usage in `update_current_time`

Created on 28 Feb 2019  ·  18Comments  ·  Source: sanic-org/sanic

Describe the bug
I see constantly increasing memory usage of my running Sanic instance.
After using tracemalloc to identify cause of this I found top 5 memory allocations

/usr/local/lib/python3.5/dist-packages/sanic/server.py:547: size=16.5 MiB, count=113860, average=152 B
<frozen importlib._bootstrap_external>:484: size=1075 KiB, count=17138, average=64 B
/usr/lib/python3.5/copy.py:240: size=882 KiB, count=14116, average=64 B
/usr/local/lib/python3.5/dist-packages/sanic/server.py:328: size=540 KiB, count=4940, average=112 B
/usr/local/lib/python3.5/dist-packages/sanic/server.py:157: size=434 KiB, count=3967, average=112 B

If I look at the /usr/local/lib/python3.5/dist-packages/sanic/server.py:547 which has size=16.5 MiB it leads to
https://github.com/huge-success/sanic/blob/52deebaf65ab48d5fbfa92607f22d96ee1bdb7a7/sanic/server.py#L607

Memory consumption graph (gaps are OOM kills due to memory limits in Kubernetes)
image

Could there be an issue with this part of coude in terms of leaking memory?

Environment (please complete the following information):

  • Debian stretch running in Kubernetes (Docker)
  • Version sanic-18.12.0-py3

Additional context
Sorry I'm not a Python expert, maybe I'm using the tracemalloc the wrong way or the issue s somewhere else. In that case I'm sorry and would be glad for any Ideas where to look or how to profile it.

Thanks in advance

Most helpful comment

I can confirm that removing the update_current_time logic keeps the pod's memory in check. I think the leak is happening only when there are requests? I'm not totally sure, but I agree with @vltr that the benefit of caching time() is probably really small, so I'm not gonna dig deeper for now.

Removing current_time variable also removes the only global variable in the codebase :)

I can make a PR if this sounds like the proper way to go forward?

All 18 comments

I was about to open an issue about it. I've also just identified where the possible leak comes from;
I checked the memory usage using mem_top;

After few requests, the output is:

refs:
1210    <class 'dict'> {140519667199136: <weakref at 0x7fcd3b70d638; to 'type' at 0x7fcd48d8eca0 (type)>, 140519667196960: 
1122    <class 'list'> ['import logging\n', 'import logging.config\n', 'import os\n', 'import re\n', 'import warnings\n', '
910     <class 'dict'> {'application/octet-stream': ['.a', '.bin', '.dll', '.exe', '.o', '.obj', '.so', '.lha', '.lzh', '.c
444     <class 'dict'> {'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': 
366     <class 'dict'> {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions
339     <class 'dict'> {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThi
337     <class 'dict'> {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m
337     <class 'dict'> {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m
336     <class 'list'> ['fromfd', 'getfqdn', 'create_connection', 'AddressFamily', 'SocketKind', 'AF_ALG', 'AF_APPLETALK', 
310     <class 'list'> ['altsep', 'curdir', 'pardir', 'sep', 'pathsep', 'linesep', 'defpath', 'name', 'path', 'devnull', 'S

bytes:
256544   <uvloop.Loop running=True closed=False debug=False>
36968    {'application/octet-stream': ['.a', '.bin', '.dll', '.exe', '.o', '.obj', '.so', '.lha', '.lzh', '.c
18528    {140519667199136: <weakref at 0x7fcd3b70d638; to 'type' at 0x7fcd48d8eca0 (type)>, 140519667196960: 
18528    {'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': 
18528    {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThi
18528    {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions
9656     ['import logging\n', 'import logging.config\n', 'import os\n', 'import re\n', 'import warnings\n', '
9320     {'__name__': 'posix', '__doc__': 'This module provides access to operating system functionality that
9320     {'__name__': 'posix', '__doc__': 'This module provides access to operating system functionality that
9320     {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m

types:
8104     <class 'function'>
4781     <class 'tuple'>
4560     <class 'dict'>
2488     <class 'weakref'>
2018     <class 'list'>
1416     <class 'method_descriptor'>
1361     <class 'wrapper_descriptor'>
1241     <class 'getset_descriptor'>
1090     <class 'type'>
1045     <class 'set'>

After many requests for few minutes it turns to:

refs:
1210    <class 'dict'> {139867876654240: <weakref at 0x7f3579b26638; to 'type' at 0x7f35871a7ca0 (type)>, 139867876652064: 
1122    <class 'list'> ['import logging\n', 'import logging.config\n', 'import os\n', 'import re\n', 'import warnings\n', '
1079    <class 'set'> {<TimerHandle functools.partial(<function update_current_time at 0x7f3578b510d0>, <uvloop.Loop runni
910     <class 'dict'> {'application/octet-stream': ['.a', '.bin', '.dll', '.exe', '.o', '.obj', '.so', '.lha', '.lzh', '.c
444     <class 'dict'> {'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': 
366     <class 'dict'> {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions
339     <class 'dict'> {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThi
337     <class 'dict'> {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m
337     <class 'dict'> {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m
336     <class 'list'> ['fromfd', 'getfqdn', 'create_connection', 'AddressFamily', 'SocketKind', 'AF_ALG', 'AF_APPLETALK', 

bytes:
256544   <uvloop.Loop running=True closed=False debug=False>
36968    {'application/octet-stream': ['.a', '.bin', '.dll', '.exe', '.o', '.obj', '.so', '.lha', '.lzh', '.c
32992    {<TimerHandle functools.partial(<function update_current_time at 0x7f3578b510d0>, <uvloop.Loop runni
18528    {139867876654240: <weakref at 0x7f3579b26638; to 'type' at 0x7f35871a7ca0 (type)>, 139867876652064: 
18528    {'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': 
18528    {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThi
18528    {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions
9656     ['import logging\n', 'import logging.config\n', 'import os\n', 'import re\n', 'import warnings\n', '
9320     {'__name__': 'posix', '__doc__': 'This module provides access to operating system functionality that
9320     {'__name__': 'posix', '__doc__': 'This module provides access to operating system functionality that

types:
8104     <class 'function'>
5849     <class 'tuple'>
4556     <class 'dict'>
2486     <class 'weakref'>
2018     <class 'list'>
1416     <class 'method_descriptor'>
1361     <class 'wrapper_descriptor'>
1241     <class 'getset_descriptor'>
1090     <class 'type'>
1084     <class 'functools.partial'>

The following image show the memory usage evolution of the same application running in the Kubernetes cluster:

image

During the time where the memory usage grew, the application basically received health check requests, which is a simple get that returns a simple json saying that the application is alive;

Environment (please complete the following information):

  • Alpine 3.8 running in Kubernetes (Docker)
  • Version sanic-18.12.0
  • Python version 3.6.6

I would take a wild guess that Python's GC is not collecting the hanging futures inside each function call to itself - but I might be wrong.

IMO, we don't need to update the clock every second; we just need to call time() where current_time is required.

Might be. According to the comment it was supposed to mitigate the possible overhead but I'm not sure how much it helps.

Cache the current time, since it is needed at the end of every
keep-alive request to update the request timeout time

Yep, but I think this is unnecessary since calling time() has an almost neglectable overhead:

import timeit

setup_stmt = """
from time import time


def call_time():
    assert time() is not None

"""
timed_stmt = {
    "call_time": 'call_time()',
}
times = 1_000_000

for name, stmt in timed_stmt.items():
    time_to_run = timeit.timeit(stmt, setup=setup_stmt, number=times)
    time_each = time_to_run / times
    print("%s: %d results in %f s (%f us per result, %f per sec)" %
          (name, times, time_to_run, time_each * times, 1. / time_each))
call_time: 1000000 results in 0.110726 s (0.110726 us per result, 9031334.549457 per sec)

I can confirm that removing the update_current_time logic keeps the pod's memory in check. I think the leak is happening only when there are requests? I'm not totally sure, but I agree with @vltr that the benefit of caching time() is probably really small, so I'm not gonna dig deeper for now.

Removing current_time variable also removes the only global variable in the codebase :)

I can make a PR if this sounds like the proper way to go forward?

Great, thanks for resolving it so quickly!
IIUC the fix will be in 18.13 and possibly 18.12.1, is there any estimation on the release date?

@FUSAKLA Look for two releases for 18.12.1 (a few fixes for the LTS) and 19.3 (the next releases in the schedule) sometime in the next couple weeks.

FYI memory usage after 19.3.1 upgrade, thanks works great!

image

Thanks for sharing

I had the same issue with memory leak for 18.12.1 , and then I have updated to 19.3 ,but the memory is still increasing in a very slow speed.

memory usage in two days :
D7A92982-4421-42D7-BC36-49EFC44D1ED0

and I use multiple workers, the machine is 4 cores and 6GB memory:

workers = multiprocessing.cpu_count()
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000, workers=workers)

the result when I exec ps -aux | grep python

F9CE1A65-8FFD-426F-AE37-0DD6C5E99BE3

the another question is why so many processes , I think it should be 4 processes ? I use pm2 fork mode , its relation with pm2 ?

@rifewang I would look to pm2 first. I am not 100% familiar with it and how it handles python applications. Does it act as a proxy server? How and when does it decide to start a process? To continue this conversation, please start a new issue and link to this here.

I also experience the memory leak issue when using more than one worker on Sanic 19.6.0 with python 3.7.3

@ladler0320 19.6.0 is not out yet, do you mean 19.3?

@sjsadowski I'm not quite sure, but it's probably the 19.6.0.
I installed it using
pip install git+https://github.com/huge-success/sanic.git@a57c14c70bca3514d502db3ce53fd3262d1af149
because of #1549 fix and it was shown that the Sanic version is 19.6.0

Got it, you installed from git master. Can you share your code and process to determine memory leaks?

@sjsadowski I do apologize, but I'm not sure I'm allowed to share the code. However, I can confirm that memory leak occurs when I'm using more than one worker and only when the service receives requests.
Sure, I'll try to investigate that issue and I'll let you know if there will be any update.

@ladler0320 if you can do pseudocode, that would help too. It's hard to know what to fix if we can't identify what and where it's broken.

@sjsadowski Sure.

```python
def init_app():
app_local = Sanic(name="App", log_config=LOGGING_CONFIG)
app_local.config.update(APP_CONFIG) # "RESPONSE_TIMEOUT": 0.5

app_local.error_handler.add(exceptions.ServiceUnavailable, timeout_error_handler)
app_local.add_route(handler, "/models/<model>", methods=["POST"])

return app_local

async def handler(request, model):
loop = asyncio.get_event_loop()
run = loop.run_in_executor

try:
    with async_timeout.timeout(0.5, loop=loop):
        data = request.json
        resp = await run(pool, function, model, data)
        status = 200
        resp = js.dumps(resp, ensure_ascii=False)

except (concurrent.futures.CancelledError,
        concurrent.futures.TimeoutError,
        asyncio.TimeoutError):
    resp = ""
    status = 504

except Exception as e:
    logger.info(e)
    resp = ""
    status = 500

return response.text(resp, status=status)

def timeout_error_handler(request, exception):
return response.text("", 504)

if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--host", default=os.getenv("HOST", "0.0.0.0"))
parser.add_argument("--port", type=int, default=os.getenv("PORT", 1234))

args = parser.parse_args()

host = args.host
port: int = args.port
workers = int(os.getenv("WORKERS")) 
threads = int(os.getenv("THREADS"))

app = init_app()
pool = concurrent.futures.ThreadPoolExecutor(max_workers=threads)
app.run(host=host, port=port, workers=workers, debug=False)

```

Was this page helpful?
0 / 5 - 0 ratings