Pytest: Tests collection is slow on high number of tests

Created on 28 Jun 2019 · 33Comments · Source: pytest-dev/pytest

Hi,

We have over 15k tests in the repository, and recently switched from nosetests to pytest. Apparently, tests collection in pytest is way slower than in nosetests - 130s vs 30s. Also, I need to mention that norecursedirs and testpaths parameters are both set, just to exclude the case of walking in .git or other unrelated directories.
It's also interesting, that the number of collected tests is asymptotic to time:

I tried to profile it using pyflame and cProfile, it seems like the slowest part is calling node.py:FixtureManager._matchfactories method call:

Here is a callgraph:

When I tried to log, what's is happening, it seems like, the more tests, the more method is called with polynomial dependency.

Going deeper into _matchfactories method, I was able to reduce collection time from 130s to 85s just by caching the results of _splitnode method:

diff --git a/src/_pytest/nodes.py b/src/_pytest/nodes.py
index 491cf9d2c..62eb66086 100644
--- a/src/_pytest/nodes.py
+++ b/src/_pytest/nodes.py
@@ -13,6 +13,9 @@ SEP = "/"
 tracebackcutdir = py.path.local(_pytest.__file__).dirpath()


+_node_cache = {}
+
+
 def _splitnode(nodeid):
     """Split a nodeid into constituent 'parts'.

@@ -31,9 +34,13 @@ def _splitnode(nodeid):
     if nodeid == "":
         # If there is no root node at all, return an empty list so the caller's logic can remain sane
         return []
-    parts = nodeid.split(SEP)
-    # Replace single last element 'test_foo.py::Bar' with multiple elements 'test_foo.py', 'Bar'
-    parts[-1:] = parts[-1].split("::")
+
+    parts = _node_cache.get(nodeid)
+    if not parts:
+        parts = nodeid.split(SEP)
+        # Replace single last element 'test_foo.py::Bar' with multiple elements 'test_foo.py', 'Bar'
+        parts[-1:] = parts[-1].split("::")
+        _node_cache[nodeid] = parts
     return parts

It doesn't solve the asymptotic problem, but just making _splitnode and thus _matchfactories method faster:

Please let me know if you think there could be a better way to fix the problem.

Thanks!
Alex

collection performance

Source

Nepherhotep

Most helpful comment

@boxed pytest==5.2.2 collects 15k tests for 70 seconds. That's the result of tests collection, when splitnode optimization PR is merged.
However, collection time still seem to asymptotically depend on the number of tests, as the first 5k are collected just for 15 seconds.

Nepherhotep on 5 Nov 2019

👍2

All 33 comments

Thanks @Nepherhotep for the report.

I think a cache there wouldn't hurt (might even use @lru_cache(maxsize=None) instead), but I think it would be worth to investigate this a little further to see if we can have more gains.

Would it be possible to share the output of pytest --collect-only so we can reproduce your test tree with dummy tests? I imagine this would facilitate debugging and improving performance there.

nicoddemus on 29 Jun 2019

Thanks @nicoddemus
It will take me a while to obfuscate data, but I'll prepare the output soon.

Thanks!

Nepherhotep on 1 Jul 2019

👍2

collected_tests.txt

Hi @nicoddemus,

Please see the file attached.
As for cache, if I want it to be in 4.6-maintenance branch as well, do I need to submit two different PRs - lru_cache for "master" and dict-based cache to "4.6-maintenance" (as lru_cache is not supported by 2.7 yet)?

Nepherhotep on 2 Jul 2019

Thanks, I will take a look

As for cache, if I want it to be in 4.6-maintenance branch as well, do I need to submit two different PRs - lru_cache for "master" and dict-based cache to "4.6-maintenance" (as lru_cache is not supported by 2.7 yet)?

Actually I don't think this qualifies to go to 4.6-maintenance, as this is an improvement and 4.6-maintenance is about critical bug-fixes only for the 4.6 series.

nicoddemus on 2 Jul 2019

Gotcha, that makes sense.

Nepherhotep on 7 Jul 2019

I tried to reproduce the issue using collected_tests.txt but there is no difference -- just generating the test directory using the following script doesn't call _splitnodes. I think fixtures need to be used somewhere. Can you amend the script such that it reproduces the before/after results you got?

Also, since _splitnodes is only called from one function, ischildnode, it might be more worthwhile to optimize that.

import os

package, module = None, None
for line in open('collected_tests.txt'):
    if not line.strip():
        continue
    scope, name = line.strip()[1:-1].split(' ')
    if scope == 'Package':
        package = './' + name
        os.makedirs(package)
    elif scope == 'Module':
        module = open(package + '/' + name, 'w')
    elif scope == 'UnitTestCase':
        print(f'class {name}:\n    pass', file=module)
    elif scope == 'TestCaseFunction':
        print(f'    def {name}(self):\n        pass', file=module)

for root, dirs, files in os.walk('dir_0'):
    open(root + '/__init__.py', 'w')

bluetech on 1 Aug 2019

@bluetech, I'd say, it's required to refactor getfixturedefs, which calls _matchfactories function polynomial amount of times, which causes a slowdown in performing quite simple calculations like splitting nodes.

Nepherhotep on 1 Aug 2019

Sounds like it. If you can arrange a reproduction for this, I think it will be helpful.

Looking narrowly at how _matchfactories can be made faster, I am unsure about something. Is the following expected behavior? Are / and :: compatible in this way?

>>> _pytest.nodes.ischildnode(baseid='a/b/c/d', nodeid='a/b::c::d::e')
True

bluetech on 1 Aug 2019

FTR: Some work has been done in #5681 already thanks to @Nepherhotep.

nicoddemus on 2 Aug 2019

@bluetech I'll try to create a fake repo, which is slow to collect.

@nicoddemus great thanks for guidance in that PR!

Nepherhotep on 2 Aug 2019

👍1

I'm looking at some performance issues with pytest at the moment and found this ticket. @Nepherhotep how did the changes that went into pytest already affect your use case?

boxed on 4 Nov 2019

Hi @boxed,

Thanks for getting back. Let me check the performance with the most recent pytest version.

Nepherhotep on 4 Nov 2019

👍1

Nepherhotep on 5 Nov 2019

👍2

That curve is classic accidentally quadratic performance, which might help to diagnose it 😄

Zac-HD on 26 Nov 2019

@Nepherhotep
Have you considered what @bluetech suggested? (instead of / additional to https://github.com/pytest-dev/pytest/pull/5681 - but likely makes not much sense given different inputs there; i.e. #5681 makes sense to me as a first step measure, but it being called so often appears to be the real/bigger problem)

Also, since _splitnodes is only called from one function, ischildnode, it might be more worthwhile to optimize _that_.

@bluetech

Looking narrowly at how _matchfactories can be made faster, I am unsure about something. Is the following expected behavior? Are / and :: compatible in this way?
>>> _pytest.nodes.ischildnode(baseid='a/b/c/d', nodeid='a/b::c::d::e')
True

That looks wrong to me, but cannot really say for sure. AFAIK "/" should be used inside of parts (split by "::") for paths. They should not be mixed.

blueyed on 26 Nov 2019

Has this been fixed?

For me, test collection went down from more than a minute to about 6 seconds.

rndusr on 31 Mar 2020

@rndusr
There have been certainly fixes in this regard. It would be interesting if you could git-bisect it.

blueyed on 31 Mar 2020

Last I looked it hadn't been fixed at least. Maybe 6 seconds is a lot better than it was but I'd bet it's about 5 seconds too much :)

boxed on 31 Mar 2020

I'd like to bisect, but I'm having trouble installing pytest.

"cd path/to/pytest ; pip install --editable ." fails:

/path/to/venv/bin/python3: can't find '__main__' module in '/path/to/venv/share/python-wheels/pep517-0.7.0-py2.py3-none-any.whl/pep517/_in_process.py'

No idea what's going on here.

"cd path/to/pytest ; python3 ./setup.py develop -e -b ./build" works, but then I
can't reproduce the performance increase. I can checkout master, 5.4., 5.3.
etc, I always get the same bad performance.

When I uninstall pytest and install from pypi, fast test collection is back.

The "Contributing" section in the docs doesn't mention anything about installing
pytest for development.

rndusr on 31 Mar 2020

@rndusr

try:

python -m venv .venv
.venv/bin/pip install -e ".[testing]"
.venv/bin/python -m pytest …

Basically step 7 from https://docs.pytest.org/en/latest/contributing.html#long-version

blueyed on 31 Mar 2020

@rndusr I tested the most recent version in pypi, and it's still around 2 minutes for 25k tests. What specific version do I need to try?

Nepherhotep on 31 Mar 2020

python -m venv .venv
.venv/bin/pip install ".[testing]"

That gives me the same "can't find '__main__' module" error.
Doesn't matter if I use Python 3.7 or 3.8.

I've noticed the faster test collection with pytest 5.4.1. It takes between 3
and 7 seconds to collect 13k tests. (But they are mostly parametrized. I don't
know if that matters.)

I've been trying to get the slow test collection back, but I can't!

I've tried old pytest versions with Python 3.7 and 3.8. I know some
combination of these had me waiting considerably longer, but now tests are
consistently collected in less than ten seconds.

rndusr on 1 Apr 2020

@rndusr you might want to check the discussion in https://github.com/pytest-dev/pytest/issues/2206 I believe that pytest is a bit more than 10x slower than it strictly needs to be.

boxed on 1 Apr 2020

@rndusr JFI: https://github.com/pytest-dev/pytest/commit/ce1cc3dddb384d7f73d8409280931342e9c539e2 seems to have quite an impact, but is in 3.10 already (so likely not relevant) - what version have you've been using before this got fixed for you?

blueyed on 4 Apr 2020

From looking at the release dates, the fix (for me) was probably introduced in 5.4.0.

rndusr on 4 Apr 2020

7130 fixes at least one instance of quadratic behavior in relation to the number of tests, so might help here.

bluetech on 29 Apr 2020

Thanks a lot! I'll compare the performance with the fix and without.

On Wed, Apr 29, 2020 at 10:30 AM Ran Benita notifications@github.com
wrote:

7130 https://github.com/pytest-dev/pytest/pull/7130 fixes at least one

instance of quadratic behavior in relation to the number of tests, so might
help here.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytest-dev/pytest/issues/5516#issuecomment-621248903,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAQGG6XRPUNAL73G3POSXODRPA2ZBANCNFSM4H4FTLGQ
.

Nepherhotep on 29 Apr 2020

Tested our case, without the fix - it's 1m 33 seconds, with the fix - 1m 29
seconds (20k tests).

On Wed, Apr 29, 2020 at 11:11 AM Alexey Zankevich zancudero@gmail.com
wrote:

Thanks a lot! I'll compare the performance with the fix and without.

On Wed, Apr 29, 2020 at 10:30 AM Ran Benita notifications@github.com
wrote:

7130 https://github.com/pytest-dev/pytest/pull/7130 fixes at least

one instance of quadratic behavior in relation to the number of tests, so
might help here.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytest-dev/pytest/issues/5516#issuecomment-621248903,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAQGG6XRPUNAL73G3POSXODRPA2ZBANCNFSM4H4FTLGQ
.

Nepherhotep on 29 Apr 2020

👀1

@Nepherhotep To get a better understanding, it will help to get a profile of the collection phase on your test suite. You can do it like this:

python -m cProfile --sort=cumtime -m pytest --collect-only your-tests-dirs-here

and paste the output here. Notes:

The command works on Python 3.8, if you use an earlier version, it will need some wrapper script or such.
The output might contain some references to your source code, you might want to redact it if it's sensitive.

bluetech on 5 May 2020

Sure, let me generate and obfuscate the report.

On Tue, May 5, 2020 at 4:45 PM Ran Benita notifications@github.com wrote:

@Nepherhotep https://github.com/Nepherhotep To get a better
understanding, it will help to get a profile of the collection phase on
your test suite. You can do it like this:

python -m cProfile --sort=cumtime -m pytest --collect-only your-tests-dirs-here

and paste the output here. Notes:

The command works on Python 3.8, if you use an earlier version, it
will need some wrapper script or such.

The output might contain some references to your source code, you
might want to redact it if it's sensitive.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytest-dev/pytest/issues/5516#issuecomment-624296347,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAQGG6RT3BKOEOLP6ANP5HLRQB3G7ANCNFSM4H4FTLGQ
.

Nepherhotep on 6 May 2020

👍1

I'll close this issue for now since it's not actionable for us.

@Nepherhotep if you get around to producing a profile (on the latest available pytest version), please feel free to reopen.

bluetech on 4 Sep 2020

Hey @bluetech, what's the command/wrapper that I'd need to run for python-3.6? Thanks for your effort btw!

@Nepherhotep To get a better understanding, it will help to get a profile of the collection phase on your test suite. You can do it like this:
python -m cProfile --sort=cumtime -m pytest --collect-only your-tests-dirs-here
and paste the output here. Notes:
* The command works on Python 3.8, if you use an earlier version, it will need some wrapper script or such.

* The output might contain some references to your source code, you might want to redact it if it's sensitive.

atzorvas on 1 Oct 2020

@atzorvas Instead of using -m pytest, you can provide the full path to the pytest script. For example, on my machine (Linux), in a project which is using a virtualenv called venv, pytest is found venv/bin/pytest, so the command is:

python -m cProfile --sort=cumtime venv/bin/pytest --collect-only your-tests-dirs-here

If you are using a unix-like OS, you can find where your pytest by running which pytest.

bluetech on 1 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

`sys.stdout.reconfigure` raises AttributeError: '_io.FileIO' object has no attribute 'reconfigure'

jtmoon79 · 3Comments

Send parameters in parametrize as dict to `ids=` callable

wstrm · 3Comments

Session fixture keeps frames of other fixtures alive

nicoddemus · 3Comments

Spurious unicode diffs with -v

jacquerie · 3Comments

getfixturevalue does not support parametrized fixtures

dmtucker · 3Comments