馃悶 Describe the bug
When downloading content from a website, memory usage increases roughly proportionally to the size of the content downloaded, with some overhead for the library and other objects in memory.
When the scope where that content is downloaded is closed and inaccessible, I expect the garbage collector to de-allocate the objects which were downloaded, but memory usage appears to remain near the same level as peak memory usage right after downloads finish.
馃挕 To Reproduce
馃挕 Expected behavior
Once the downloaded objects are no longer in scope, they should be de-allocated by the garbage collector and memory usage should fall back to a similar level as when the script initialized.
馃搵 Logs/tracebacks
I have no specific logs/tracebacks (no exception occurred), but I have profiled the relevant functions in two scripts with memory_profiler to demonstrate memory usage.
The first script is created with aiohttp, demonstrating the bug:
Filename: ./aiohttp_example.py
Line # Mem usage Increment Occurences Line Contents
============================================================
22 26.8 MiB 26.8 MiB 1 @profile
23 async def quicktest():
24 26.8 MiB 0.0 MiB 1 url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
25 26.8 MiB 0.0 MiB 1 session = aiohttp.ClientSession()
26 26.8 MiB 0.0 MiB 8 jobs = [grab_data(url, session) for _ in range(5)]
27 260.7 MiB 233.9 MiB 2 jobs_result = await asyncio.gather(*jobs)
28 260.7 MiB 0.0 MiB 1 if not all(jobs_result): # sanity check
29 _log.info("Something didn't give back data")
30 260.7 MiB 0.0 MiB 1 await session.close()
Filename: ./aiohttp_example.py
Line # Mem usage Increment Occurences Line Contents
============================================================
33 26.7 MiB 26.7 MiB 1 @profile
34 def main():
35 26.7 MiB 0.0 MiB 1 logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
36 26.8 MiB 0.1 MiB 1 loop = asyncio.get_event_loop()
37 26.8 MiB 0.0 MiB 1 try:
38 201.4 MiB 174.6 MiB 1 loop.run_until_complete(quicktest())
39 # memory usage should be back to normal
40 201.4 MiB 0.0 MiB 1 _log.info("Completed example")
41 except KeyboardInterrupt:
42 _log.info("shutting down...")
43 loop.stop()
The bug is observed here: On line 38, I expect memory usage to reach its peak, and the next line should see memory usage drop to init levels. However, on line 40, memory usage is still at the same level, even though the scope where the memory was allocated is gone and inaccessible.
The second is created using requests in a nearly identical way, but demonstrates the behavior I expect in aiohttp:
Filename: ./requests_example.py
Line # Mem usage Increment Occurences Line Contents
============================================================
19 23.9 MiB 23.9 MiB 1 @profile
20 def quicktest():
21 23.9 MiB 0.0 MiB 1 url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
22 23.9 MiB 0.0 MiB 1 session = requests.Session()
23 126.2 MiB 102.2 MiB 8 jobs_result = [grab_data(url, session) for _ in range(5)]
24 126.2 MiB 0.0 MiB 1 if not all(jobs_result): # sanity check
25 _log.info("Something didn't give back data")
26 126.2 MiB 0.0 MiB 1 session.close()
Filename: ./requests_example.py
Line # Mem usage Increment Occurences Line Contents
============================================================
29 23.9 MiB 23.9 MiB 1 @profile
30 def main():
31 23.9 MiB 0.0 MiB 1 logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
32 23.9 MiB 0.0 MiB 1 try:
33 27.4 MiB 3.5 MiB 1 quicktest()
34 # memory usage should be back to normal
35 27.4 MiB 0.0 MiB 1 _log.info("Completed example")
36 except KeyboardInterrupt:
37 _log.info("shutting down...")
Here the memory usage shown by memory_profiler is considerably different, because the memory usage drops as the function quicktest() on line 33 ends, so the scope of main() never sees the same level of memory usage.
馃搵 Your version of the Python
python --version
Python 3.8.3
馃搵 Your version of the aiohttp/yarl/multidict distributions
python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: [email protected]
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires: attrs, chardet, multidict, async-timeout, yarl
Required-by:
```console
$ python -m pip show multidict
(memory_leak_aiohttp) gregdan3@Theros ~/g/c/memory_leak_aiohttp (master)>
python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.[email protected]
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
```console
python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires: multidict, idna
Required-by: aiohttp
馃搵 Additional context
The relevant scripts and my Pipfile* are here:
scripts.zip
#!/usr/bin/env python
import asyncio
import logging
from concurrent.futures import ThreadPoolExecutor
import aiohttp
from memory_profiler import profile
_log = logging.getLogger(__name__)
LOG_FORMAT = (
"[%(asctime)s] [%(filename)22s:%(lineno)-4s] [%(levelname)8s] %(message)s"
)
async def grab_data(url, session):
async with session.get(url) as response:
data = await response.read()
return data
@profile
async def quicktest():
url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
session = aiohttp.ClientSession()
jobs = [grab_data(url, session) for _ in range(5)]
jobs_result = await asyncio.gather(*jobs)
if not all(jobs_result): # sanity check
_log.info("Something didn't give back data")
await session.close()
@profile
def main():
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(quicktest())
# memory usage should be back to normal
_log.info("Completed example")
except KeyboardInterrupt:
_log.info("shutting down...")
loop.stop()
if __name__ == "__main__":
main()
#!/usr/bin/env python
import logging
import requests
from memory_profiler import profile
_log = logging.getLogger(__name__)
LOG_FORMAT = (
"[%(asctime)s] [%(filename)22s:%(lineno)-4s] [%(levelname)8s] %(message)s"
)
def grab_data(url, session):
with session.get(url) as response:
data = response.content
return data
@profile
def quicktest():
url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
session = requests.Session()
jobs_result = [grab_data(url, session) for _ in range(5)]
if not all(jobs_result): # sanity check
_log.info("Something didn't give back data")
session.close()
@profile
def main():
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
try:
quicktest()
# memory usage should be back to normal
_log.info("Completed example")
except KeyboardInterrupt:
_log.info("shutting down...")
if __name__ == "__main__":
main()
And my Pipfile:
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
[packages]
memory_profiler = {git = "ssh://[email protected]/pythonprofilers/memory_profiler.git"}
aiohttp = "*"
requests = "*"
I have not embedded my Pipfile.lock due to its length, but it is in the scripts.zip above and demonstrates the same versions for my dependencies as I showed before.