Pytest: Memory usage in 3.3.0

Created on 28 Nov 2017  ·  22Comments  ·  Source: pytest-dev/pytest

Bug Description

TL;DR: memory usage seems to have increased several fold in 3.3.0. With the default log capturing enabled, memory usage has the potential to increase to ludicrous levels.

I was troubleshooting issues with tests hanging on our CI server and massive swap utilization, and I found it to be due to an upgrade to pytest 3.3.0. Running our test suite with 3.3.0 quickly consumed all of the memory and began swapping.

Looking at the changelog, the capturing of log output seemed to be the most likely cause. Our application uses SQLAlchemy, which has verbose logging, and has rather verbose logging itself, so I attempted to disable log capturing via both of the options specified here, and while the first seemed to decrease the massive consumption of virtual memory seen with the default options, neither was able to reduce memory utilization enough for our test suite to run.

I have collected the memory utilization of various configurations below. For all of the 3.3.0 runs, the CI server began swapping and the test suite hung, although some configurations resulted in more virtual memory consumption than others.

In all cases, regardless of attempting to disable log capturing, the virtual memory output jumped significantly when hitting tests that produced a large amount of log output. However, even prior to hitting those tests, virtual memory utilization was around twice that of its peak in version 3.2.5.

The tests were running in a python:3.5.2 docker container, via an Ubuntu 16.04 OS. The system has 2 GB of RAM and 2 GB of swap space. In all cases, pytest was invoked via tox, with the command coverage run --source <project directory> -m py.test {posargs}

pytest version | additional options | peak virtual memory consumed | peak resident memory consumed | test suite completed
-----------|----------|----------|----------|------
3.2.5 | none | 2685 MB | 243 MB | yes
3.3.0 | none | 11.9 GB !!! | 418 MB | no
3.3.0 | addopts=-p no:logging | 9105 MB | 366 MB | no
3.3.0 | --no-print-logs | 12.2 GB !!! | 431 MB | no
3.3.0 | addopts=-p no:logging and --no-print-logs | 12.2 GB !!! | 415 MB | no

pytest is not included in the pip output below, since its version varied by test. All other packages remained consistent between tests. Some private distributions have been redacted.

We have had to pin our version of pytest to <3.3.0 for the moment. Is there anything else I can try to reduce the base memory footprint of version 3.3.0?

I'm happy to gather more data as needed.

Thanks!

Pip List

aiohttp (2.2.5)
alembic (0.9.5)
amqp (2.2.2)
aniso8601 (1.3.0)
apipkg (1.4)
apispec (0.25.4)
appnope (0.1.0)
asn1crypto (0.23.0)
astroid (1.5.3)
async-timeout (2.0.0)
attrs (17.3.0)
autopep8 (1.3.3)
Babel (2.5.1)
billiard (3.5.0.3)
bleach (2.1.1)
celery (4.1.0)
certifi (2017.7.27.1)
cffi (1.11.2)
chardet (3.0.4)
click (6.7)
coverage (4.4.2)
cryptography (2.1.1)
debtcollector (1.18.0)
decorator (4.1.2)
dill (0.2.7.1)
docutils (0.14)
entrypoints (0.2.3)
execnet (1.5.0)
flake8 (3.5.0)
flasgger (0.8.0)
Flask (0.12.2)
Flask-RESTful (0.3.6)
Flask-Script (2.0.6)
frozendict (1.2)
html5lib (1.0b10)
httplib2 (0.10.3)
idna (2.6)
ipdb (0.10.3)
ipykernel (4.6.1)
ipython (6.2.1)
ipython-genutils (0.2.0)
ipywidgets (7.0.5)
iso8601 (0.1.12)
isort (4.2.15)
itsdangerous (0.24)
jedi (0.11.0)
Jinja2 (2.9.6)
jsonschema (2.6.0)
jupyter (1.0.0)
jupyter-client (5.1.0)
jupyter-console (5.2.0)
jupyter-core (4.4.0)
keystoneauth1 (3.2.0)
kombu (4.1.0)
lazy-object-proxy (1.3.1)
loggly-python-handler (1.0.0)
Mako (1.0.7)
MarkupSafe (1.0)
marshmallow (2.13.6)
marshmallow-sqlalchemy (0.13.1)
mccabe (0.6.1)
mistune (0.7.4)
monotonic (1.3)
msgpack-python (0.4.8)
multidict (3.2.0)
mypy (0.550)
mysqlclient (1.3.12)
nbconvert (5.3.1)
nbformat (4.4.0)
netaddr (0.7.19)
netifaces (0.10.6)
notebook (5.2.1)
oslo.config (4.13.1)
oslo.i18n (3.18.0)
oslo.policy (1.28.1)
oslo.serialization (2.21.1)
oslo.utils (3.30.0)
pandocfilters (1.4.2)
parso (0.1.0)
passlib (1.7.1)
pbr (3.1.1)
pexpect (4.2.1)
pickleshare (0.7.4)
pika (0.11.0)
pip (9.0.1)
pluggy (0.6.0)
positional (1.2.1)
prompt-toolkit (1.0.15)
psutil (5.4.1)
psycopg2 (2.7.3.1)
ptyprocess (0.5.2)
py (1.5.2)
pycodestyle (2.3.1)
pycparser (2.18)
pydecor (1.1.3)
pydocstyle (2.1.1)
pyflakes (1.6.0)
Pygments (2.2.0)
PyJWT (1.5.3)
pylint (1.7.4)
pymongo (3.5.1)
pyparsing (2.2.0)
pyrabbit (1.1.0)
pytest-aiohttp (0.1.3)
pytest-cov (2.5.1)
pytest-forked (0.2)
pytest-xdist (1.20.1)
python-dateutil (2.6.1)
python-editor (1.0.3)
python-keystoneclient (3.13.0)
python-memcached (1.58)
pytz (2017.2)
PyYAML (3.12)
pyzmq (16.0.3)
qtconsole (4.3.1)
redis (2.10.6)
requests (2.18.4)
requests-futures (0.9.7)
restructuredtext-lint (1.1.2)
rfc3986 (1.1.0)
setuptools (20.10.1)
simplegeneric (0.8.1)
simplejson (3.11.1)
six (1.11.0)
snowballstemmer (1.2.1)
SQLAlchemy (1.1.14)
SQLAlchemy-Utils (0.32.18)
stevedore (1.27.1)
stripe (1.67.0)
terminado (0.8.1)
testpath (0.3.1)
tornado (4.5.2)
tox (2.9.1)
traitlets (4.3.2)
typed-ast (1.1.0)
typing (3.6.2)
urllib3 (1.22)
vine (1.1.4)
virtualenv (15.1.0)
wcwidth (0.1.7)
webargs (1.8.1)
webencodings (0.5.1)
Werkzeug (0.12.2)
widgetsnbextension (3.0.8)
wrapt (1.10.11)
yarl (0.13.0)

Submission Checklist

Thanks for submitting an issue!

Here's a quick checklist in what to include:

  • [x] Include a detailed description of the bug or suggestion
  • [x] pip list of the virtual environment you are using
  • [x] pytest and operating system versions
  • [x] Minimal example if possible
logging critical regression

All 22 comments

I was just trying to make a repeatable example of this issue, and I have found it to be relatively easy. The file here uses a ton of RAM during testing, without all the docker/tox/coverage complications of our real environment:

OS: MacOS 10.13.2
Python: 3.5

pip version | options | virtual memory | resident memory | test time
-----|-----|----|----|----
3.2.5 | none | 132 MB | 32 MB | 78.76 seconds
3.3.0 | none | 890 MB | 731 MB | 140.96 seconds
3.3.0 | -p no:logging | 125 MB | 32 MB | 74.27 seconds
3.3.0 | --no-print-logs | 819 MB | 711 MB | 140.96 seconds

This indicates that the additional memory overhead seen in our full test environment when -p no:logging is set is due to something other than the log capturing. Has the fixture setup/teardown lifecycle changed significantly?

Edit: it's also possible that this test doesn't really replicate our actual test environment well enough. We've got ~2500 tests, hundreds of fixtures, etc.

@mplanchard thanks a ton for the reproducible example, we really appreciate it.

I can reproduce your memory usage on my Windows computer. Indeed I suspected the logging module when I first read your post a few minutes ago because it will append the captured log to the test items, which are kept alive for the entire test session. Setting --no-print-logs should be enough as well as this prevents the logging module from appending the captured output to the items:

https://github.com/pytest-dev/pytest/blob/4d2f05e4b967a8df6becb939fe236f5f0b3f11cb/_pytest/logging.py#L307-L310

But this does not explain why even with -p no:logging you still see an increase in memory in your full test environment of course. There has not been any significant change in fixture setup/teardown that I remember.

Are you passing -p no:logging on the command line or in the pytest.ini file? As #2964 shows, there might be a bug related to that.

In the full environment, I was passing it via the pytest.ini file, although in my examples from my local machine I was just appending it to the command line (which seemed to work).

I can enable debugging, config tracing, etc. and run any particular combination that you think might be helpful. The tests take quite a while, so just let me know and I can run them in the background during the workday.

It's odd that --no-print-logs is showing virtually the same memory consumption as with no options in both my local and the full test environment. Maybe it's the record list being kept by the LogCaptureHandler? It seems like that might get appended to even if self.print_logs is False

https://github.com/pytest-dev/pytest/blob/4d2f05e4b967a8df6becb939fe236f5f0b3f11cb/_pytest/logging.py#L117-L128

Maybe it's the record list being kept by the LogCaptureHandler?

Good point and I thought about that, but the handler is deleted from the item at the end of each phase:

https://github.com/pytest-dev/pytest/blob/4d2f05e4b967a8df6becb939fe236f5f0b3f11cb/_pytest/logging.py#L296-L310

I'm really puzzled that even with logging disabled with -p no:logging you are still seeing the abnormal memory increase...

The memory usage seems to be higher from the start, which is what made me ask about differences in fixture setup. I'm doing a run now with -p no:logging, --debug, and --trace-config. I'll need to review the output for sensitive information before sending it. Would it be helpful to also have a comparison of a run with the same options on 3.2.5?

Thanks for the patience in debugging this!

Would it be helpful to also have a comparison of a run with the same options on 3.2.5?

Hopefully yes, so it would be great if you could that as well. 😁

No problem! We love pytest, and use it for all our projects. It's the least we can do.

Also, just collecting statistics, out test suite outputs about 54000 log messages, which is about 10 MB, so even if every log message were held in active memory simultaneously, it would not explain the sudden jump in virtual memory by several gigabytes.

I'm having some issues with collecting tests in --debug mode, because several of our fixtures rely on setting/transmitting a Flask application context, and having --debug on seems to interfere with that. There's a relatively complex fixture hierarchy relying on those fixtures, and of course a lot of tests. Any ideas as to how to get around that?

___________________________________ ERROR collecting invision_api/tests/integration/v3/test_log.py ____________________________________
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:617: in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:222: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:179: in __call__
    self.before(hook.name, hook_impls, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:441: in before
    hooktrace(hook_name, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:160: in __call__
    self.root.processmessage(self.tags, args)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:136: in processmessage
    lines = self.format_message(tags, args)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:131: in format_message
    lines.append("%s    %s: %s\n" % (indent, name, value))
.tox/py35/lib/python3.5/site-packages/werkzeug/local.py:366: in <lambda>
    __str__ = lambda x: str(x._get_current_object())
.tox/py35/lib/python3.5/site-packages/werkzeug/local.py:306: in _get_current_object
    return self.__local()
.tox/py35/lib/python3.5/site-packages/flask/globals.py:51: in _find_app
    raise RuntimeError(_app_ctx_err_msg)
E   RuntimeError: Working outside of application context.
E
E   This typically means that you attempted to use functionality that needed
E   to interface with the current application object in a way.  To solve
E   this set up an application context with app.app_context().  See the
E   documentation for more information.

I think I may have found something interesting. Looking at htop, I felt like there were more threads being used in 3.3.0, so I counted the threads every two seconds. With version 3.3.0, the thread count is increased by a factor of ~10 for the majority of the test, and then bumps up to a thread count increase of ~25 when it hits the latter tests (which have many more log messages, although I'm not sure that's causal).

Version 3.3.0 with -p no:logging
2 -> 4 -> 8 -> 28 -> 70 -> 136 -> 180 -> 226 -> 372 (majority of tests) -> 816 (beginning of latter tests till freeze)

Version 3.2.5
2 -> 4 -> 8 -> 30 -> 114 -> 148 -> 32 -> 34 (majority of tests) -> 166 (beginning of latter tests) -> 92 -> 150 -> 222 -> 90 -> 216 -> 82 -> 186 -> 220 -> 222 -> 230 -> 57 -> 38 -> 32 (until tests finished)

So, it looks like pytest may be holding onto threads that it was letting go previously, or spinning up more threads than it was before.

Edit: thread count collected with while true; do ps -eLf | grep -c "py.test"; sleep 2; done

And the thread count gets even higher with the logging plugin enabled.

Version 3.3.0 with default options
2 -> 4 -> 8 -> 4 -> 28 -> 30 -> 76 -> 130 -> 162 -> 202 -> 290 -> 348 -> 372 (majority of tests) -> 554 (beginning of latter tests) -> 712 -> 922 -> 1178 (freezes)

Hi @mplanchard, thanks a lot for the detailed statistics. The part about the high number of threads is interesting: pytest itself (nor py and pluggy) spawn any threads AFAIK. This to me also suggests it might be a side effect of something not being teardown at the appropriate time (fixtures might be one case as you suggest). I would love to help debug this further but it is hard to speculate further without looking at the source code.

I would try to reduce the tests and code involved step by step until I got to a minimal reproducer. For example running half of the test suite, looking for fixtures which might spawn threads and see if the same number of threads keep spawning if tests which don't use that fixture are executed, and so on. Really sorry I can't help more than that, but I will be sure to come back here if more suggestions pop in my head.

No worries. I didn't find any reference to the threading module in the pytest code, so I'm inclined to think you're right. I wonder if it might be due to the changes referenced in #2959? Several of our fixtures return mutable objects, and some of them, like DB connection instances, function via thread pools. I'm not sure how attr implements the whole frozen thing, so I'm not sure whether that's a reasonable conjecture.

I'll try to get a minimal example by this evening that demonstrates a difference in thread consumption at the lowest possible level.

Several of our fixtures return mutable objects, and some of them, like DB connection instances, function via thread pools. I'm not sure how attr implements the whole frozen thing, so I'm not sure whether that's a reasonable conjecture.

Actually the change in #2959 refers to the fact that the params argument of pytest.fixture could be mutated after the declaration and it would affect the parametrization with the mutated list; returning mutable objects from fixtures is common practice and should work.

Another thing to look for: do all your fixtures implement an explicit cleanup (if makes sense)? For example, explicitly closing a db connection or a destroying a thread pool? I can see some bug sneaking in that would keep the fixture frame alive for some reason, and a fixture which implicitly relied on the fact that the locals of that frame were destroyed for cleanup might start acting up (#2798 came to mind, but that was reported in 3.2).

You were absolutely right about explicit cleanup. We had a couple of test fixtures that were relying on an implicit closing of a MongoClient connection upon garbage collection (sneakily, because they were instantiating objects that instantiated clients). Updating these to close() the clients explicitly fixed the memory leak so that the -p no:logging utilization more closely mirrors that of 3.2.5.

I should have time this evening both to check the memory utilization with logging enabled and to hopefully generate a test based on our fixtures that is able to demonstrate the difference between 3.2.5 and 3.3.0.

I haven't really had time to follow this discussion, but FWIW I've had good experiences tracking down memory leaks using tracemalloc and/or pympler.muppy.

@mplanchard that's both great news (that you found a solution to your problem) and a bit of sad news (it seems we introduced a regression about leaking frames/fixtures in 3.3). 😅

We would really appreciate if you can come up with a reproducible example that demonstrates the differences between 3.2 and 3.3 regarding fixture cleanup. 👍

Rá managed to find a reproducer in #2981! 👍

That is great news! I can verify that our case is also a session fixture
being passed to a regular fixture.

Thanks so much for all your help and your quick responses. We’ll be pushing
fixes for our test tear down tomorrow, which should allow us to get back to
the most recent pytest. We’d be happy to help verify any fixes as they come!

On Wed, Nov 29, 2017 at 5:53 PM Bruno Oliveira notifications@github.com
wrote:

Rá managed to find a reproducer in #2981
https://github.com/pytest-dev/pytest/issues/2981! 👍


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytest-dev/pytest/issues/2968#issuecomment-348036393,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGcI4Wg66Y5iti9nHY16VwSGRwvwgxLzks5s7e6OgaJpZM4Qt8Hv
.

Thanks for the patience as well!

out test suite outputs about 54000 log messages, which is about 10 MB

With the fixed tear down code, could you please check the memory consumption of your test suite with the logging plugin active?

@nicoddemus It seems like after integrating our teardown fixes, thread utilization and memory consumption is pretty much back to normal, even with the logging plugin enabled. In a run with pytest 3.3.0 and the logging plugin enabled, both metrics look pretty much identical to running with 3.2.5.

The changes required were remarkably minimal, but of course the fixtures are used in hundreds and hundreds of tests. I've included all of the required fixture changes for our test suite below:

[email protected]_fixture()
-def legacy_app():
[email protected](scope='session')
+def legacy_app_session():
+    """Yield a legacy application for the session."""
+    instance = api.create_custom_app()
+    yield instance
+    instance.mongo.db.client.close()
+
+
[email protected]()
+def legacy_app(legacy_app_session):
     """Yield an isolated flask test app instance"""
-    app = api.create_custom_app()
-    context = app.app_context()
+    context = legacy_app_session.app_context()
     context.push()
-    yield app
+    yield legacy_app_session
     context.pop()

 @pytest.fixture(scope='session')
 def session_app(sql_session):
     """Return an app for the session, with no context"""
     instance = api.create_app()
     instance.config['TESTING'] = True
     scoped_session = instance.sql.session
     instance.sql.session = instance.sql.session()
     yield instance
+    instance.mongo.db.client.close()
     instance.sql.session.expunge_all()
     instance.sql.session.rollback()
     instance.sql.session.close()

 @pytest.fixture()
-def db(mongo_util):
+def db(mongo_util, db_client):
     """Return a pymongo Database instance"""
-    db = mongo_util.get_db(db_name=mongo_util.db_info.db)
+    db = db_client.get_database(mongo_util.db_info.db)
     return db


 @pytest.fixture(scope='session')
-def db_session(mongo_util):
+def db_session(mongo_util, db_client_session):
     """Return a pymongo Database instance"""
-    db = mongo_util.get_db(db_name=mongo_util.db_info.db)
+    db = db_client_session.get_database(mongo_util.db_info.db)
     return db

The mongo_util.get_db() method being used previously created a client when it was called, which was then not getting closed. Also, the db_client fixtures, which were not being used previously in the fixtures where they've been added, already had explicit teardowns.

That's great news, thanks @mplanchard for the followup! I'm really glad we got to the bottom of the issue and the solution was simple.

I'm closing this for now because the root of the issue is detailed in #2981 already. Thanks again for all the patience and understanding while we tracked down the problem! 👍

Was this page helpful?
0 / 5 - 0 ratings