pytest 🚀 - rough performance analysis of the current internals

can I work on it ?

rajibmitra on 16 Jun 2017

Feel free to, please send a mail to the ml and join the IRC channels if IRC works out for you

RonnyPfannschmidt on 17 Jun 2017

We have a test suite at TriOptima where if you run pytest 3.4.0 with "-k this_will_never_match_anything", it takes 3 seconds. But if I upgrade to 3.9.2 that's suddenly 13(!) seconds. Some profiling reveals:

In [7]: Stats('pytest3.4').sort_stats('tottime').print_stats(40)                                                                  
Wed Oct 24 13:40:31 2018    pytest3.4

         7104316 function calls (6876020 primitive calls) in 7.618 seconds

   Ordered by: internal time
   List reduced from 15512 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    74504    0.593    0.000    0.593    0.000 {built-in method posix.stat}
     2305    0.327    0.000    0.327    0.000 {built-in method marshal.loads}
8109/7564    0.170    0.000    1.824    0.000 {built-in method builtins.__build_class__}
329269/329064    0.169    0.000    0.172    0.000 {built-in method builtins.getattr}
 24228/51    0.126    0.000    7.495    0.147 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/pluggy/callers.py:151(_multicall)
444973/444109    0.124    0.000    0.177    0.000 {built-in method builtins.isinstance}
    37865    0.123    0.000    0.466    0.000 /Users/andersh/triresolve/env3/lib/python3.6/copy.py:66(copy)
    43680    0.113    0.000    0.118    0.000 {built-in method __new__ of type object at 0x10a7dd840}
193459/193417    0.106    0.000    0.122    0.000 {built-in method builtins.hasattr}
     1913    0.090    0.000    0.090    0.000 {built-in method posix.listdir}
     2308    0.089    0.000    0.120    0.000 <frozen importlib._bootstrap_external>:830(get_data)
      251    0.089    0.000    0.089    0.000 {built-in method marshal.load}
    40333    0.088    0.000    0.147    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/posixpath.py:75(join)
    79/63    0.083    0.001    0.144    0.002 {built-in method _imp.create_dynamic}
      745    0.080    0.000    0.086    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/django/forms/fields.py:63(__init__)
  23975/2    0.079    0.000    7.493    3.747 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/pluggy/__init__.py:603(__call__)
   236513    0.075    0.000    0.088    0.000 {built-in method posix.fspath}
    13799    0.072    0.000    0.113    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/posixpath.py:331(normpath)
24349/20443    0.067    0.000    0.786    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/_pytest/config.py:325(_getconftestmodules)
   128641    0.066    0.000    0.066    0.000 {method 'split' of 'str' objects}
    39388    0.062    0.000    0.543    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/py/_path/local.py:371(check)
      486    0.062    0.000    0.439    0.001 /Users/andersh/triresolve/env3/lib/python3.6/imp.py:255(find_module)
    36513    0.060    0.000    0.221    0.000 /Users/andersh/triresolve/env3/lib/python3.6/copy.py:268(_reconstruct)
35999/4576    0.059    0.000    0.689    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/tri/declarative/__init__.py:70(__init__)
    31606    0.056    0.000    0.075    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/tri/token/__init__.py:28(__init__)
    44967    0.056    0.000    0.370    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/genericpath.py:27(isfile)
   201158    0.055    0.000    0.055    0.000 {method 'startswith' of 'str' objects}
     4013    0.052    0.000    0.116    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/tri/token/__init__.py:46(__init__)
   2865/1    0.049    0.000    7.622    7.622 {built-in method builtins.exec}
      705    0.049    0.000    0.060    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/django/db/models/fields/related.py:67(lazy_related_operation)
    25397    0.048    0.000    0.078    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/py/_path/local.py:282(_getbyspec)
     4087    0.045    0.000    0.245    0.000 <frozen importlib._bootstrap_external>:1233(find_spec)
   222375    0.042    0.000    0.048    0.000 {method 'get' of 'dict' objects}
   335322    0.040    0.000    0.040    0.000 {method 'append' of 'list' objects}
    16747    0.039    0.000    0.072    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/posixpath.py:154(dirname)
     1113    0.037    0.000    0.038    0.000 {built-in method io.open}
     2942    0.037    0.000    0.060    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/django/db/models/fields/__init__.py:145(__init__)
    28539    0.037    0.000    0.037    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
     2920    0.037    0.000    1.175    0.000 <frozen importlib._bootstrap>:870(_find_spec)
     6081    0.036    0.000    0.063    0.000 {built-in method builtins.dir}


Out[7]: <pstats.Stats at 0x103da2898>

In [8]: Stats('pytest3.9').sort_stats('tottime').print_stats(40)                                                                  
Wed Oct 24 13:41:08 2018    pytest3.9

         44300096 function calls (44049151 primitive calls) in 24.070 seconds

   Ordered by: internal time
   List reduced from 15698 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1801661    2.927    0.000    5.233    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/posixpath.py:154(dirname)
  9066854    2.767    0.000    3.623    0.000 {built-in method posix.fspath}
   208445    1.860    0.000   11.618    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/py/_path/common.py:315(parts)
  1789894    1.548    0.000    7.161    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/py/_path/local.py:312(dirpath)
  3543756    1.458    0.000    4.706    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/py/_path/local.py:163(__eq__)
   110852    0.948    0.000    0.948    0.000 {built-in method posix.stat}
  7115283    0.856    0.000    0.856    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/py/_path/common.py:394(__fspath__)
  1859892    0.571    0.000    0.923    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/posixpath.py:41(_get_sep)
   150272    0.568    0.000   10.339    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/_pytest/main.py:535(<genexpr>)
  1810946    0.529    0.000    0.529    0.000 {method 'rfind' of 'str' objects}
  1836546    0.479    0.000    0.484    0.000 {built-in method __new__ of type object at 0x103507840}
2297823/2296959    0.467    0.000    0.523    0.000 {built-in method builtins.isinstance}
  1431322    0.388    0.000    0.388    0.000 {method 'rstrip' of 'str' objects}
  2036776    0.261    0.000    0.261    0.000 {method 'append' of 'list' objects}
     2323    0.235    0.000    0.235    0.000 {built-in method marshal.loads}
2132768/2131097    0.234    0.000    0.235    0.000 {built-in method builtins.len}
    62679    0.231    0.000    4.120    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/_pytest/main.py:530(<genexpr>)
8155/7610    0.173    0.000    1.733    0.000 {built-in method builtins.__build_class__}
347520/347310    0.171    0.000    0.173    0.000 {built-in method builtins.getattr}
 26125/51    0.157    0.000   23.911    0.469 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/pluggy/callers.py:157(_multicall)
    89562    0.152    0.000    0.257    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/py/_path/local.py:282(_getbyspec)
   315483    0.148    0.000    0.148    0.000 {method 'split' of 'str' objects}
202362/202320    0.118    0.000    0.133    0.000 {built-in method builtins.hasattr}
    52153    0.115    0.000    0.192    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/posixpath.py:75(join)
  25869/2    0.110    0.000   23.909   11.954 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/pluggy/hooks.py:270(__call__)
    64362    0.108    0.000    0.906    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/py/_path/local.py:371(check)
     4162    0.104    0.000    0.111    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/_pytest/nodes.py:77(__init__)
     2040    0.100    0.000    0.100    0.000 {built-in method posix.listdir}
    74565    0.098    0.000    0.714    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/genericpath.py:27(isfile)
    37867    0.096    0.000    0.423    0.000 /Users/andersh/triresolve/env3/lib/python3.6/copy.py:66(copy)
      251    0.087    0.000    0.087    0.000 {built-in method marshal.load}
    22396    0.087    0.000    1.131    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/_pytest/config/__init__.py:379(_getconftestmodules)
     2326    0.086    0.000    0.117    0.000 <frozen importlib._bootstrap_external>:830(get_data)
   312214    0.085    0.000    0.085    0.000 {method 'startswith' of 'str' objects}
      519    0.084    0.000    0.630    0.001 /Users/andersh/triresolve/env3/lib/python3.6/imp.py:255(find_module)
      745    0.082    0.000    0.088    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/django/forms/fields.py:63(__init__)
    15976    0.080    0.000    0.126    0.000 /Users/andersh/triresolve/env3/bin/../lib/python3.6/posixpath.py:331(normpath)
     3629    0.078    0.000    0.085    0.000 /usr/local/Cellar/[email protected]/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/inspect.py:485(unwrap)
    79/63    0.077    0.001    0.124    0.002 {built-in method _imp.create_dynamic}
    55882    0.071    0.000    0.303    0.000 /Users/andersh/triresolve/env3/lib/python3.6/site-packages/_pytest/fixtures.py:1367(getfixturedefs)

To summarize: 3 seconds just in dirname(), ~3 in fspath(), stuff in py/_path maybe 7ish?.

boxed on 24 Oct 2018

@boxed
Can you try https://github.com/pytest-dev/pytest/pull/4224, please?
I think this might be partly caused by 52ff1eaf.

blueyed on 24 Oct 2018

14.14 -> 13.27. So some improvement, but not really significant compared to the big problem outlined in the profiler output above. It would be a big improvement if the problem with path handling was fixed though :P

boxed on 24 Oct 2018

Ok.
Then it appears to be something else that is using py.path a lot more than before.
It would be useful if you could look at the actual callers for this.
Also of couse bisecting would help, but that might be tedious (from 3.4 to 3.9).

It would be a big improvement if the problem with path handling was fixed though :P

What do you mean?

blueyed on 24 Oct 2018

What do you mean?

I mean that you've made a significant performance improvement, it's just overshadowed by one much larger. So please merge your change even if it doesn't fix my problem!

boxed on 24 Oct 2018

👍1

I see. It was easy since I've thought about it before already, and have introduced it myself after all.. ;)
As for the other issue it would be good to create a new issue with more details, i.e. with information about the actual callers.

blueyed on 24 Oct 2018

I'll look into creating an example project with the performance issues tomorrow at work. It's probably just to create lots of files and directories that are irrelevant I think.

What's the deal with the "marked as off topic" stuff btw? This issue was about performance issues generally and I gave a pretty clear example of a huge performance issue. I don't see how I could be more on topic than that ^_-

boxed on 24 Oct 2018

Cool, looking forward to this.

Re offtopic: this issue appears to be investigating performance altogether / in general, so your issue is not really offtopic, but should be a separate issue - my intention was to keep this issue here easier to read / skim in the future.

blueyed on 24 Oct 2018

I've done some more investigation on the collect phase speed issues:

3.4.0  6s
3.5.0  6s
3.5.1  6s
3.6.0  3s  (!!!!)
3.6.1  6s
3.6.2  6s
3.6.4  6s
3.7.0  13s (!)
3.7.4  13s
3.8.2  13s
3.9.3  13s

So as you can see I've only tried the released versions, but it's narrowed down a lot. I was very impressed by the speed of 3.6.0, so much I had to check and recheck that it did indeed find all my tests.

I have written a little test script that demonstrates some (but not all!) of the behavior we see:

from os import makedirs
from os.path import expanduser, join
from shutil import rmtree

base_dir = expanduser('~/Downloads/test_pytest_collect_speed/tests/')

rmtree(base_dir)

for i in range(100):
    d = join(base_dir, f'foo_{i}')
    makedirs(d)
    with open(join(d, f'__init__.py'), 'w') as f:
        pass

    for j in range(10):
        with open(join(d, f'test_bar_{j}.py'), 'w') as f:
            f.write('def test_foo(): pass')

Running this script and then pytest -k asdasdasdasdasdas produces these times on my machine:

3.4.0   3.59s
3.6.0   3.80s 
3.6.1   3.78s
3.6.2   3.76s
3.6.4   4.61s
3.7.0   11.59s
3.7.4   12.14s
3.8.0   12.07s
3.9.1   12.97s
3.9.2   12.89s

As you can see 3.6 isn't much faster with this test, unlike our real code base, but the slowdown in 3.7 is clearly visible.

boxed on 25 Oct 2018

Thanks @boxed for the analysis.

3.7.0 saw the introduction of package-scoped fixtures, that's very likely to be the cause of the slowdown since it changed the collection code significantly.

nicoddemus on 25 Oct 2018

That sounded plausible, but looking at the cProfile output I don't think that's the case:

3.6.4:

================================================ 1000 deselected in 6.31 seconds =================================================
         6535056 function calls (6309736 primitive calls) in 7.488 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   188165    1.639    0.000    1.639    0.000 {built-in method posix.stat}
   191421    0.398    0.000    0.684    0.000 posixpath.py:75(join)
     2256    0.324    0.000    2.659    0.001 imp.py:255(find_module)
 45037/51    0.289    0.000    7.341    0.144 callers.py:151(_multicall)
   169685    0.202    0.000    1.655    0.000 genericpath.py:27(isfile)
  44823/2    0.162    0.000    7.338    3.669 hooks.py:244(__call__)
   383193    0.137    0.000    0.156    0.000 {built-in method posix.fspath}
394795/394794    0.119    0.000    0.124    0.000 {built-in method builtins.isinstance}
   300547    0.107    0.000    0.107    0.000 {method 'startswith' of 'str' objects}
153883/153877    0.089    0.000    0.090    0.000 {built-in method builtins.getattr}
46213/44012    0.079    0.000    0.299    0.000 __init__.py:368(_getconftestmodules)
   215437    0.071    0.000    0.140    0.000 posixpath.py:41(_get_sep)
76933/76913    0.071    0.000    0.075    0.000 {built-in method builtins.hasattr}
    34000    0.068    0.000    1.688    0.000 python.py:405(_makeitem)
    12775    0.066    0.000    0.105    0.000 posixpath.py:331(normpath)
   113286    0.060    0.000    0.060    0.000 {method 'split' of 'str' objects}
 45037/51    0.059    0.000    7.341    0.144 manager.py:58(<lambda>)
    68000    0.058    0.000    0.803    0.000 python.py:216(pytest_pycollect_makeitem)
      572    0.057    0.000    0.057    0.000 {built-in method marshal.loads}
   132290    0.056    0.000    0.101    0.000 terminalwriter.py:74(<genexpr>)
    23322    0.053    0.000    0.078    0.000 imp.py:105(get_suffixes)
    18514    0.052    0.000    0.081    0.000 local.py:282(_getbyspec)
    41607    0.051    0.000    0.354    0.000 main.py:409(gethookproxy)
     2004    0.051    0.000    0.051    0.000 {method 'truncate' of '_io.FileIO' objects}
     1200    0.049    0.000    0.049    0.000 {built-in method io.open}
      933    0.045    0.000    0.045    0.000 {built-in method posix.listdir}
    34000    0.044    0.000    0.082    0.000 unittest.py:14(pytest_pycollect_makeitem)
     1000    0.044    0.000    1.817    0.002 python.py:376(collect)
    34000    0.043    0.000    0.067    0.000 python.py:359(_matches_prefix_or_glob_option)
   236510    0.043    0.000    0.045    0.000 {method 'get' of 'dict' objects}
     2000    0.042    0.000    0.258    0.000 fixtures.py:1071(getfixtureclosure)
    32000    0.041    0.000    0.165    0.000 fixtures.py:1209(getfixturedefs)
     1047    0.040    0.000    0.040    0.000 {built-in method marshal.load}
    16533    0.040    0.000    0.071    0.000 posixpath.py:154(dirname)
1649/1628    0.039    0.000    0.108    0.000 {built-in method builtins.__build_class__}
35828/35214    0.038    0.000    0.042    0.000 {built-in method builtins.issubclass}
   187622    0.038    0.000    0.038    0.000 {method 'endswith' of 'str' objects}
    20886    0.035    0.000    0.306    0.000 local.py:371(check)
    53345    0.035    0.000    0.109    0.000 local.py:163(__eq__)

3.7.0:

================================================ 1000 deselected in 17.79 seconds ================================================
         34834354 function calls (34601846 primitive calls) in 18.905 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  7339091    2.219    0.000    2.915    0.000 {built-in method posix.fspath}
  1407287    2.094    0.000    3.809    0.000 posixpath.py:154(dirname)
   192750    1.667    0.000    1.667    0.000 {built-in method posix.stat}
   167757    1.439    0.000    8.886    0.000 common.py:315(parts)
  1406098    1.215    0.000    5.318    0.000 local.py:312(dirpath)
  2826149    1.199    0.000    3.794    0.000 local.py:163(__eq__)
  5680042    0.696    0.000    0.696    0.000 common.py:394(__fspath__)
  1606319    0.464    0.000    0.788    0.000 posixpath.py:41(_get_sep)
   113301    0.428    0.000    7.562    0.000 main.py:523(<genexpr>)
   191150    0.383    0.000    0.655    0.000 posixpath.py:75(join)
  1420042    0.382    0.000    0.382    0.000 {method 'rfind' of 'str' objects}
1796928/1796927    0.373    0.000    0.379    0.000 {built-in method builtins.isinstance}
     2252    0.325    0.000    2.597    0.001 imp.py:255(find_module)
  1427462    0.308    0.000    0.308    0.000 {built-in method __new__ of type object at 0x10a57b840}
 45837/51    0.291    0.000   18.739    0.367 callers.py:157(_multicall)
  1088730    0.283    0.000    0.283    0.000 {method 'rstrip' of 'str' objects}
    56550    0.205    0.000    3.575    0.000 main.py:518(<genexpr>)
   169453    0.196    0.000    1.637    0.000 genericpath.py:27(isfile)
  1513672    0.190    0.000    0.190    0.000 {method 'append' of 'list' objects}
  45623/2    0.173    0.000   18.737    9.368 hooks.py:270(__call__)
1561811/1561136    0.170    0.000    0.171    0.000 {built-in method builtins.len}
   307054    0.104    0.000    0.104    0.000 {method 'startswith' of 'str' objects}
    51612    0.096    0.000    0.158    0.000 local.py:282(_getbyspec)
160583/160577    0.090    0.000    0.091    0.000 {built-in method builtins.getattr}
   182348    0.085    0.000    0.085    0.000 {method 'split' of 'str' objects}
    15674    0.078    0.000    0.124    0.000 posixpath.py:331(normpath)
47213/45012    0.076    0.000    0.296    0.000 __init__.py:368(_getconftestmodules)

I don't see how package-scoped fixtures can be an explanation of how the number of calls to posix.fspath went from 383193 to 7339091. That's 19x more.

boxed on 25 Oct 2018

We're spending the vast amount of time in this piece of code in _pytest/main.py:

                if not any(x in path.parts() for x in paths):

I'm having trouble wrapping my head around what it's doing. It was introduced with package-scoped fixtures as you suspected.

boxed on 25 Oct 2018

Here some data from running running @boxed test case under pyflame.

Worst offender in terms of performance regressions are:

This line (the one boxed mentioned above) https://github.com/pytest-dev/pytest/blob/3.7.0/src/_pytest/main.py#L523 adds up to ~46% of the runtime.
And this line https://github.com/pytest-dev/pytest/blob/3.7.0/src/_pytest/main.py#L518 adds up to ~22% of the runtime.

Both were indeed introduced with the package-scoped fixtures feature.

https://github.com/martenbr/flamecharts/blob/master/3.6.4.svg (8.63s)
https://github.com/martenbr/flamecharts/blob/master/3.7.0.svg (32.83s)
(Download and open in a browser to get interactive charts)

martenbr on 25 Oct 2018

Well.. we found it. I'll send a PR.

boxed on 25 Oct 2018

we really ought to take a look at using strings and a set there ^^

RonnyPfannschmidt on 25 Oct 2018

It seems to me like a purge of py's path handling is the way to go. It looks like a high level path juggling lib but it goes to the filesystem crazy much.

I also noticed that between 3.7.0 (which I tested my performance fix on) and master, more performance regressions have landed, resulting in my test going from 7 seconds (in 3.7 with my patch) to 9.3 seconds.

boxed on 25 Oct 2018

@boxed perso0nally i'D be delighted if we could remove all of py.path.* unfortunateyl we'll have to keep backward compatibility shims

RonnyPfannschmidt on 25 Oct 2018

Here's a PR https://github.com/pytest-dev/pytest/pull/4237

There's a lot of performance left on the table still, so I'll keep looking. This is still ~50% slower than 3.9.1 on my machine.

boxed on 25 Oct 2018

lets commit this to the examples folder - those around >1000 files share content, so its inexpensive to keep in git - and we can have a direct folder in the checkout to test with

RonnyPfannschmidt on 25 Oct 2018

i made 10.000 files - pre-collect time is insane

RonnyPfannschmidt on 25 Oct 2018

😄1

There should be a benchmarks suite that runs in CI really. Similar to speed.pypy.org

boxed on 25 Oct 2018

👍2

lets commit this to the examples folder

Nah.. we can have the script to generate it instead, but I'd rather not see many files unnecessarily.

As for benchmarks suite: we could start with measuring this example, and store results in Travis cache maybe - but then again often changes are not really about performance and it might not make sense to run this always.
Having something in examples in this regard would be useful though indeed.

blueyed on 25 Oct 2018

travis is not suitable as origin for that kindo f stuff

as for the file generator - i wonder if there is a sensible way to get it and use it directly - the plus side of adding a few thousand identical files/folders is that its just 3-4 objects in git - and in a nice separate folder

RonnyPfannschmidt on 25 Oct 2018

I agree on Travis. It would be nice with a separate benchmark server. Maybe we should talk with the people who did the one for pypy (and now cpython?).

boxed on 25 Oct 2018

After all the investigations we made before it seems that the big performance issue in the collect phase is due to the assertion rewriter trying to check for the existence of files that we should know up front do not in fact exist.

Maybe this can be partially solved by using the new import hook API in python 3? Leaving python 2 slow, but that's a pretty ok compromise considering python 2 will at some point go away.

boxed on 29 Oct 2018

👍1

i`m entirely fine with that

RonnyPfannschmidt on 29 Oct 2018

Since this discussion I have anecdotally observed that pytest is getting even slower over time. I'm not sure this can be solved with the current code base, the priorities and composition of the team and such.

So to try to actually be helpful I have instead written a new test runner (https://github.com/boxed/hammett) that is partly compatible with pytest (but not class based tests and not unittest style tests). I think this tool can be used as a comparison to see where pytest performance actually could be.

You can run the entire test suite of iommi (https://github.com/TriOptima/iommi) with this tool with just a very small config saying that it uses the pytest-django plugin. The performance difference on the entire test suite on my machine is ~7.8s for pytest and ~6.1s for hammett. About 1s of that is the general pytest overhead. 0.2s or so seems to be assertion rewriting. The rest I don't know.

The overhead is actually my biggest problem with pytest right now, as it is the overhead that really punishes you when running mutation testing.

boxed on 17 Mar 2020

Hi @boxed,

Since this discussion I have anecdotally observed that pytest is getting even slower over time.

That's probably true; as we add more features, that's natural, but indeed we rarely try to measure the performance impact as we add new things.

So to try to actually be helpful I have instead written a new test runner (boxed/hammett) that is partly compatible with pytest (but not class based tests and not unittest style tests). I think this tool can be used as a comparison to see where pytest performance actually could be.

Very interesting!

The performance difference on the entire test suite on my machine is ~7.8s for pytest and ~6.1s for hammett. About 1s of that is the general pytest overhead. 0.2s or so seems to be assertion rewriting. The rest I don't know.

Hmm... that's actually a bummer to me, because if it was a greater difference I would expect that we had low hanging fruits that we could tackle. In longer test suites, do you have a constant overhead or the overhead increases with the size of the test suite, always being around 20%?

nicoddemus on 17 Mar 2020

In longer test suites, do you have a constant overhead or the overhead increases with the size of the test suite, always being around 20%?

The 1s is a constant afaict. iommi is the largest test suite I've tried to run hammett on. It's a super big hack right now so probably only works for very small and simple projects. The overhead goes down for longer test suites since that 1s gets amortized.

For my use case, doing mutation testing, I pretty much only care about that 1s constant up front overhead. The rest is nice to get rid of, for sure, but in iommi I can aggressively cut the size of the test suite so actually the process overhead of python and the import of the project itself starts to dominate! I am working on running hammett in-process in mutmut to deal with this. Seems to work ok. I tried this with pytest back in the day and that did NOT work. At all.

I think the next step for me performance wise is to be able to know more tightly what tests to run. Right now I select 1 module -> n test files. I would like to be able to get to 1 function -> some tests (maybe a test file).

I guess I've been ranting a bit at this point... I'll stop now :P

boxed on 17 Mar 2020

The cost of path vs pathlib vs strings may be a factor, i also suspect a number of cases in the fixture internals

RonnyPfannschmidt on 17 Mar 2020

As another little test I redid the test with my script that generates lots of directories and test files:

pytest 4.6.2: collect only: 4.5s (second run, first run was 15s for some reason)
pytest 4.6.2: full run: ~9s
pytest -q 4.6.2: full run: ~8s
pytest -q --assert=plain 4.6.2: full run: ~6s
hammet full run: 0.5s (!)

There is huge performance difference between the first run and subsequent runs for both pytes and hammett, I'm guessing because of disk caches or something.

Still, the difference between 6s and 0.5s is pretty stark.

boxed on 18 Mar 2020

this makes me wonder how much of that is pytest hooks/pluggy and whether the idea of cythonizing hooks would make a good difference

RonnyPfannschmidt on 18 Mar 2020

Here is some analysis, not of collection but mostly for execution.

Methodology

I use the idea from #401, without the sleep:

import pytest

@pytest.mark.parametrize("x", range(5000))
def test_foo(x):
    pass

I analyze how much overhead each non-essential plugin adds to the default with all plugins enabled. Run with -p no:<plugin> 5 times and take the minimum to compute the overhead.

Specs: Linux, Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, HDD, Python 3.8, current master (7d5f5a878598aaa7f3e100ef6a97bcd13c1b408d), performance governor, turbo off

Hacky script

#!/usr/bin/env python
from typing import List
import subprocess
import time

plugins = (
    "assertion",
    "cacheprovider",
    "capture",
    "debugging",
    "doctest",
    "faulthandler",
    "freeze_support",
    "junitxml",
    "logging",
    "monkeypatch",
    "nose",
    "pastebin",
    "recwarn",
    "reports",
    "resultlog",
    "setuponly",
    "setupplan",
    "skipping",
    "stepwise",
    "unittest",
    "warnings",
)


def check(plugin: str, iterations: int = 5) -> List[float]:
    timings = []
    for i in range(iterations):
        start = time.perf_counter()
        subprocess.run(
            ["python", "-m", "pytest", "xx.py", "-p", f"no:{plugin}"],
            stdout=subprocess.DEVNULL,
            check=True,
            env={"PYTHONPATH": "src/"},
        )
        timings.append(time.perf_counter() - start)
    return timings


print("PLUGIN", "T1", "T2", "T3", "T4", "T5", "TMIN", "DIFF", "DIFF%")

timings = check("$DEFAULT")
baseline = min(timings)
print(
    "{:15} {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:+.4f}s {:+.3f}%".format(
        "BASELINE",
        timings[0],
        timings[1],
        timings[2],
        timings[3],
        timings[4],
        min(timings),
        baseline - min(timings),
        ((baseline - min(timings)) / baseline) * 100,
    )
)


for plugin in plugins:
    timings = check(plugin)
    print(
        "{:15} {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:+.4f}s {:+.3f}%".format(
            plugin,
            timings[0],
            timings[1],
            timings[2],
            timings[3],
            timings[4],
            min(timings),
            baseline - min(timings),
            ((baseline - min(timings)) / baseline) * 100,
        )
    )

Results

Detailed results

PLUGIN | T1 |T2 |T3 |T4 |T5 |TMIN |DIFF |DIFF%
--------------|--------|-------|-------|-------|-------|-------|--------|-------
BASELINE | 8.6669s|8.6314s|8.5093s|8.7840s|8.7474s|8.5093s|+0.0000s|+0.000%
assertion | 8.6088s|8.5090s|8.4971s|8.5183s|8.5142s|8.4971s|+0.0121s|+0.142%
cacheprovider | 8.1316s|8.1153s|8.0618s|8.1250s|8.1907s|8.0618s|+0.4475s|+5.259%
capture | 7.5484s|7.6386s|7.6759s|7.5143s|7.6171s|7.5143s|+0.9949s|+11.692%
debugging | 8.5375s|8.6287s|8.5948s|8.7474s|8.5854s|8.5375s|-0.0282s|-0.332%
doctest | 8.7486s|8.6381s|8.5814s|8.5956s|8.6282s|8.5814s|-0.0721s|-0.848%
faulthandler | 8.5920s|8.6088s|8.5411s|8.6846s|8.5611s|8.5411s|-0.0319s|-0.374%
freeze_support| 8.6831s|8.6670s|8.6971s|8.6463s|8.5876s|8.5876s|-0.0784s|-0.921%
junitxml | 8.5813s|8.5482s|8.5990s|8.6958s|8.6254s|8.5482s|-0.0389s|-0.457%
logging | 6.9895s|6.8886s|6.8846s|6.9546s|6.9742s|6.8846s|+1.6246s|+19.092%
monkeypatch | 8.6100s|8.5525s|8.6031s|8.5367s|8.6799s|8.5367s|-0.0274s|-0.322%
nose | 8.5155s|8.5339s|8.4914s|8.5055s|8.5123s|8.4914s|+0.0179s|+0.210%
pastebin | 8.7020s|8.6695s|8.6396s|8.6313s|8.5839s|8.5839s|-0.0746s|-0.877%
recwarn | 8.6273s|8.6493s|8.5004s|8.4378s|8.6495s|8.4378s|+0.0714s|+0.839%
reports | 8.5065s|8.7384s|8.6581s|8.7497s|8.6464s|8.5065s|+0.0028s|+0.033%
resultlog | 8.6606s|8.5702s|8.5818s|8.5576s|8.5925s|8.5576s|-0.0483s|-0.568%
setuponly | 8.7867s|8.5259s|8.5698s|8.5807s|8.6715s|8.5259s|-0.0166s|-0.195%
setupplan | 8.6306s|8.6868s|8.6589s|8.6690s|8.6238s|8.6238s|-0.1145s|-1.346%
skipping | 8.0935s|8.0553s|7.9831s|7.9610s|7.9105s|7.9105s|+0.5987s|+7.036%
stepwise | 8.5489s|8.6844s|8.5964s|8.6576s|8.7008s|8.5489s|-0.0397s|-0.466%
unittest | 8.6372s|8.5504s|8.4745s|8.5486s|8.5737s|8.4745s|+0.0347s|+0.408%
warnings | 7.7409s|7.7247s|7.6440s|7.6950s|7.7402s|7.6440s|+0.8652s|+10.168%
ALL | 3.6315s|3.6125s|3.6365s|3.6323s|3.6572s|3.6125s|+4.8968s|+57.546%

So overall, for this simple case, the plugins take 57.5% of the time, almost all of it from these 5:

logging (19%)
capture (11%)
warnings (10%)
skipping (7%)
cacheprovider (5%)

bluetech on 24 Apr 2020

Hey @bluetech,

I've tried to reached you at "ran at unusedvar", is that your primary e-mail?

Sorry as this is off-topic, but I couldn't think of a better way to reach you.

nicoddemus on 24 Apr 2020

@nicoddemus Yes. Sorry, I am still going through mail. I received it and will reply soon!

bluetech on 24 Apr 2020

@bluetech

I ran this test with pytest (no config): 11.6s.
python -m pytest -p no:logging -p no:warnings -p no:skipping -p no:cacheprovider --assert=plain: 9.2s
With hammett: 0.9s.

Not sure if those parameters are correct, because passing any random string there doesn't give an error message so I'm not sure pytest isn't just swallowing an invalid configuration silently...

Another interesting test is to remove the plain asserts param and changing the test to assert False. Pytest: 19.9s, hammett: 2.2s

(I would argue that hammett actually right now provides a slightly better user experience for failures right now.)

boxed on 24 Apr 2020

Hi @boxed, thanks for checking! It's nice to have a lower bound with hammett.

Interesting that it's only 11.6 -> 9.2 for you. Can you describe your system (operating system and hardware)?

Not sure if those parameters are correct, because passing any random string there doesn't give an error message so I'm not sure pytest isn't just swallowing an invalid configuration silently...

You can verify with --trace-config.

Another interesting test is to remove the plain asserts param and changing the test to assert False. Pytest: 19.9s, hammett: 2.2s

Hmm. I haven't looked at assertion rewriting. IIRC assertion rewriting is cached/compiled to pyc though, is it still 19.9s on second run?

@RonnyPfannschmidt

this makes me wonder how much of that is pytest hooks/pluggy and whether the idea of cythonizing hooks would make a good difference

A large chunk of the remaining time (after turning off the plugins) is taken by _multicall from pluggy. I looked at it and there is not much to optimize there, it doesn't do any fluffy stuff. I'm not sure even Cython will be able to speed it up much. It is mostly just the overhead of the plugin virtual dispatch. We might look into ways to reach it less however.

bluetech on 24 Apr 2020

👍1

I realized I had run this test in a rather dirty venv that had the django plugin. Made a new fresh venv and got Python 3.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1.

6 seconds with tests passing. Big improvement from 11s! Still bad compared to hammetts 2, but at least same order of magnitude.
39.2 seconds when all tests fail! That's a massive performance degradation compared to 4.6.2.

Assertion rewriting on or off has a small effect. It's very obvious that the degradation in performance is due to the _printing_ at the end.

Hammett also prints local variables so that's not it.

boxed on 24 Apr 2020

Assertion rewriting on or off has a small effect. It's very obvious that the degradation in performance is due to the printing at the end.

All tests failing is probably not a very common case, but 39.2 does sound excessive. If you have time to do a cProfile of this scenario, I'd be interested. Also be interested to know how long it takes with >& /dev/null (which should remove terminal overhead, and also internal pytest overhead when outputting to a tty).

bluetech on 25 Apr 2020

I tried > /dev/null and it made no difference.

As for a cProfile. It's simple enough to reproduce, it's not specific to my machine.

boxed on 26 Apr 2020

I wrote up a little benchmark suite: https://github.com/boxed/test-benchmarks

boxed on 27 Apr 2020

Both nose2 and hammett don't collect a list of tests before running. They just collect a list of files containing tests (for nose2 based on filename heuristics).
If you want a comparison of test runtime I suggest you pass a list of tests and view test collection as a separate problem.

From some experience writing our own collector to work around pytest collector slowness, I doubt they are capable of finding all types of tests that pytest supports.

joekohlsdorf on 24 Jun 2020

What do you mean "don't collect tests"? Hammett finds all pytest tests, fixtures and parametrized tests and runs them all. What do you mean "collect" is more than that?

And also note that the benchmarks contain empty directory. What is pytest "collecting" that justifies it taking so much more time doing nothing?

I can make a benchmark for explicitly sending the full list of tests. Shall we say 100? Or just one? Hammett will run circles around pytest on that too, I will gladly put down a $100 bet on this before having written the benchmark.

Let's be serious though: the benchmarks are not super strange and show a real problem in pytest. There is no point in trying to hide simple facts.

boxed on 24 Jun 2020

just as an observation point, while working on a pytest plugin (pytest-terraform) and using xdist on a many core machine, i've noticed that pytest does some fairly invasive/expensive introspection on any python module, basically a getattr on everything it can find in a module trying to find plugins and fixtures, looking at some of the perf traces this appears to cause significant runtime execution count and time accumulation, for xdist this gets multiplied where pytest collection appears to be dominate (60-70%) the overall test execution time. as a first step it might be nice just to have some instrumentation within pytest its unclear if that can be done out of tree via plugin to accurately capture the cost versus needing to be in core to get stats on the cost of plugin scanning, another would be reworking the pluggy hook and fixture mechanism to populate a module global so that lookups are explicit instead of implicit. a speed infrastructure setup would also be useful to prevent significant regressions over time.

kapilt on 24 Aug 2020

@kapilt thanks for the comment. The timing of your comment is uncanny, because @bluetech has opened a few days ago a PR in the direction of reducing the amount of introspection done at collection time: https://github.com/pytest-dev/pytest/pull/7671.

nicoddemus on 24 Aug 2020

i started playing around PYTEST_DEBUG (didn't realize it existed on previous comment) and py-spy on test collection time py-spy record -r 1000 --format speedscope -o test -- pytest tests/*py -k nothing_here, while using PYTEST_DEBUG the other significant portion of time seems to be spent looking at directory traversal and file examination of non python files. as a context i'm running this on the tests for https://github.com/cloud-custodian/cloud-custodian we have a significant amount of json in subdirectories of our test structure that represent recorded outputs, simply manipulating the parameters results in a 2/3 decrease in collection times.

ie. the delta for running pytest tests and pytests tests/*py

cloud-custodian on  upstream [!?] is 📦 v0.9.5 via 🐍 v3.8.5 (c7n-3.8) on ☁️  us-east-1                                                                                                       
❯ time pytest tests/test*py -k nothing_here                                                                                                                                                   
Test session starts (platform: linux, Python 3.8.5, pytest 5.3.5, pytest-sugar 0.9.4)                                                                                                         
rootdir: /home/kapilt/projects/cloud-custodian, inifile: tox.ini                                                                                                                              
plugins: xdist-1.34.0, sugar-0.9.4, forked-1.3.0, cov-2.10.0                                                                                                                                  
collecting ...                                                                                                                                                                                

Results (0.99s):                                                                                                                                                                              
    1625 deselected                                                                                                                                                                           

real    0m1.786s                                                                                                                                                                              
user    0m1.705s                                                                                                                                                                              
sys     0m0.084s                                                                                                                                                                              

cloud-custodian on  upstream [!?] is 📦 v0.9.5 via 🐍 v3.8.5 (c7n-3.8) on ☁️  us-east-1                                                                                                       
❯ time pytest tests -k nothing_here                                                                                                                                                           
Test session starts (platform: linux, Python 3.8.5, pytest 5.3.5, pytest-sugar 0.9.4)                                                                                                         
rootdir: /home/kapilt/projects/cloud-custodian, inifile: tox.ini                                                                                                                              
plugins: xdist-1.34.0, sugar-0.9.4, forked-1.3.0, cov-2.10.0                                                                                                                                  
collecting ... 

Results (4.63s):
    1625 deselected

real    0m5.431s
user    0m5.097s
sys     0m0.332s

actually running tests instead of just collection with xdist -n auto on amd ryzen 4000 laptop (8core/16 hw thread), results in a real world test time reduction from 38s to 25s.

the PYTEST_DEBUG output showing the plugin processing of other data files in the same directory

            pytest_collect_file [hook]                                                                                                                                                        
                path: /home/kapilt/projects/cloud-custodian/tests/data/iam/s3-policies.json
                parent: <Session cloud-custodian exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=0>
            finish pytest_collect_file --> [] [hook]

for plugins that want to load files, it might be better to have some explicit file extension registration on the sessoin rather than dispatch on every file found.

kapilt on 24 Aug 2020

@kapilt pytest "has" to dispatch the hook on these files, because there might be a plugin which turns them into a test, check out https://docs.pytest.org/en/stable/example/nonpython.html for example.

In cloud-custodian it seems all of the data files are nicely encapsulated in one directory, so you might want to tell pytest to skip it. There are amusingly many ways to do it: --ignore, --ignore-glob, the norecursedirs ini, collect_ignore global in a conftest, and the pytest_ignore_collect hook in a conftest -- take your pick :)

bluetech on 24 Aug 2020

That non-python files feature seems a perfect example of something that should be opt in so not everyone gets the performance hit.

boxed on 24 Aug 2020

@bluetech thanks for the pointers re config options, however the plethora of overlapping options around this is indicative that many people have had this issue. while I love the configurability/pluggability of pytest, ideally features should not impose significant costs to those not using them, ie zero cost abstractions. in this context, I would hope having plugins in an early hook register their interest in file extensions, would allow a fast non plugin based filtering at the session level when collecting files with a default to just py files, ie. zen of python explicit is better than implicit.

kapilt on 24 Aug 2020

👍1

I kinda agree, but that would be a breaking change at this point, so my preference is to just make it fast instead :)

From the snippet above I see you are using pytest 5.3.5. There have been several performance improvements since then. It would be great if you can test it with the latest release, or even pytest master, or even even with #7671:

pip install git+https://github.com/pytest-dev/pytest.git@refs/pull/7671/head

bluetech on 24 Aug 2020

thanks, i appreciate the pitfalls of compatibility here, we had that version pinned, due to issues with pytest-sugar (output writer compat), along with pytest-xdist and pytest-terraform (both of which needed tmp factory root location).

Trying #7671 vs trunk to compare, via time pytest tests -k foobar with norecursedirs set, shows a minor improvement (maybe 2%, its within test run delta margins), albeit it significantly cleanups the call tree when visualizing as well as the debug output, so its a net win regardless of perf imho.
pytest-7671

kapilt on 24 Aug 2020

I think pytest is overly concerned with backwards compatibility. The performance problems are much worse than breaking backwards compatibility, especially if the change is that users just needs to add some config in setup.cfg (or project.toml or whatever the flavor of the month is.. the need to check multiple files seems like a problem too, but smaller :P )

boxed on 25 Aug 2020

@kapilt

Thanks for the performance benchmarks!

the delta for running pytest tests and pytests tests/*py

Indeed it is a bit surprising, but at a cursory glance I see that your test files are all in the root tests directory, and you have many many other support files inside tests/data, including .py files.

I don't think in this case it would be a matter of pytest calling hooks for just some file extensions would really help much, hook calls are supposed to be cheap, and we still need to traverse the directories and list the files anyway.

For this specific case I think the best bet is to use norecursedirs to stop pytest from even recursing into tests/data at all:

[pytest]
norecursedirs=data

This simple configuration change should speed up your specific case significantly.

@boxed

I think pytest is overly concerned with backwards compatibility.

Not sure, I think it is the responsible thing to do given that pytest is part of the infrastructure of so many projects and users.

Having said that, I agree that if we can speed things up by introducing a backward incompatible change that can be reverted in a simple manner (say a config option or plugins to implement an extra hook), we just need to make sure the change brings real gains to justify the breakage, and doing in a way that is the least disruptive as possible.

nicoddemus on 26 Aug 2020

@nicoddemus

Not sure, I think it is the responsible thing to do given that pytest is part of the infrastructure of so many projects and users.

Depends on your perspective. Yes it might introduce some breakage which will take some time to fix, but breakage+fixing is O(1), while overhead for invocations is O(n), sometimes even O(n*m). It's not at all obvious that is bigger than the other.

we just need to make sure the change brings real gains to justify the breakage, and doing in a way that is the least disruptive as possible.

Agreed.

boxed on 26 Aug 2020

Depends on your perspective. Yes it might introduce some breakage which will take some time to fix, but breakage+fixing is O(1), while overhead for invocations is O(n), sometimes even O(n*m). It's not at all obvious that is bigger than the other.

Indeed there's a balance. 👍

nicoddemus on 26 Aug 2020

Pytest: rough performance analysis of the current internals

Most helpful comment

All 57 comments

Methodology

Results

Related issues