Mypy: Switch to pytest from our homegrown test runner

Created on 7 Jun 2016  Â·  23Comments  Â·  Source: python/mypy

Ok, so when I saw the PR a while back that "parallelized" the test runner, I was super hyped. But the hype wore off when I realized that the unit tests were still run sequentially.

In particular, it continuously bugs me how, when running the full unit test suite, there's absolutely no sort of progress indicator of any kind. To make things worse, runtests.py is largely undocumented, leaving me to wonder why stuff like ./runtests.py unit-test -a '*something*' works but not ./runtests.py -a '*something*'.

In addition, to an extent, -a is slightly...useless. I mean, because the naming convention for tests is so inconsistent, it's rarely ever useful.

Also, test fixtures are a cruel mess. It took me 30 minutes of debugging to figure out why the tests were giving out obscure KeyErrors that I couldn't reproduce in my own code. Turns out, the fixture I was using didn't define list. But defining list caused other tests to brutally fail, so then I just created a new fixture. IMO this whole thing should either be reworked or (better yet) completely gutted and thrown into a fire pit. A very _hot_ fire pit. With sulfur.

So I see two options:

  • Move to a different test runner. Some have suggested pytest, which I personally _really_ like! Pytest has the xdist plugin, which supports running multiple tests in parallel.
  • Revamp the current test runner. Ideally, this would include fixing fixtures (oh, the irony...), improving runtests.py, and making the actual unit tests run in parallel, preferably also with a progress bar of some sort.

Thoughts? I could try and help out here a bit if you guys come to a consensus of some sort.

feature priority-0-high topic-tests

Most helpful comment

This is now finally fixed with https://github.com/python/mypy/pull/5274 🎉

All 23 comments

Yep -- I think we all agree that the current test runner is pretty frustrating. Last week at the PyCon sprints where we had a bunch of new contributors getting involved, I found myself apologizing every time I explained how to filter tests (with the positional argument and with -a).

We were actually just discussing this also on #1668, and so far everyone agrees that the right answer will be to move to something like pytest. If you want to try putting together a PR, that would be very useful!

Pytest is (mostly for good reasons) kind of a complicated beast with a lot of choices in how to set it up, so I recommend discussing early and often, either on this thread or on a work-in-progress PR.

I also hate hate hate the builtins test fixtures. +1 to killing them with fire. They only exist for perf reasons, and I'm not at all sure they're worth it.

A big problem here seems to be that (for reasons having to do with mypy's
early history as a separate language, probably) the non-data-driven tests
use a set of classes that is custom to mypy's "myunit" test runner (which
is wrapped by runtests.py). I don't know enough about pytest to understand
whether we'd have to redo all that infrastructure. I also don't know how
easy it is to get pytest to understand our data-driven tests. It would be
pretty frustrating if pytest treated each file of test data as a single
test.

Yeah, one thing we'll definitely want is for pytest to see the individual
test cases inside the data files. I think it should be possible to do this
pretty nicely with appropriate use of pytest.mark.parametrize:
https://pytest.org/latest/parametrize.html but I am not a pytest expert,
and this is one of the things I had in mind when I said that pytest is
something of a complicated beast with a lot of choices in how to set it up.
So that will be for Ryan or whoever takes this on to figure out. :-)

On Tue, Jun 7, 2016 at 3:29 PM, Guido van Rossum [email protected]
wrote:

A big problem here seems to be that (for reasons having to do with mypy's
early history as a separate language, probably) the non-data-driven tests
use a set of classes that is custom to mypy's "myunit" test runner (which
is wrapped by runtests.py). I don't know enough about pytest to understand
whether we'd have to redo all that infrastructure. I also don't know how
easy it is to get pytest to understand our data-driven tests. It would be
pretty frustrating if pytest treated each file of test data as a single
test.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/python/mypy/issues/1673#issuecomment-224434089, or mute
the thread
https://github.com/notifications/unsubscribe/AABuDRvZfrjPF-sEQfTUhsdfDcXEFYvgks5qJfDDgaJpZM4IwVl5
.

I'll try to see if I can tackle this over the weekend!

Cool, switching to a standard test runner would have many potential benefits. (Note that this has been discussed before, see #380. That issue also explains why things are as they are.)

The last time the test runner was rewritten we had a bunch of issues and regressions and let's try to avoid them this time. Here are some things to consider:

  • Running a single test case shouldn't be (significantly) slower than it's now, as this is critical to productivity.
  • Generally try to support current workflows and try not to make them less convenient or slower. Otherwise productivity could even be impaired instead of improved due to the switch, at least in the short term. I don't have a full list, but here are some:

    • Running a single unit test or a bunch of related unit tests (based on keywords). The -k py.test option seems to do the trick.

    • Running all "unit tests" (the idea is to make it easy to run only fast test cases locally).

    • Running all tests in parallel locally.

    • Running all tests in parallel on Travis CI.

    • Reporting test failures and file locations clearly. This includes locations of test cases in .test data files and generally whatever (useful) output the current runner generates.

  • Try to only change one thing at a time. For example, don't try to deal with test fixtures at the same time as switching to py.test. Also, try to avoid touching the .test data files when switching the test runner.

Test fixtures are mostly a separate issue. I'll file a new issue.

FWIW the fast way to run a single test case is currently to use the myunit incantation given in README.md, ignoring runtests.py. It would be much nicer if pytest could handle all of these.

Yeah, "change one thing at a time" is definitely good advice for this kind of migration.

I'd recommend taking that farther, even, and try to find a way to migrate one section of things at a time. For example, one good strategy could be to take just one of the larger "tasks" in runtests.py, like the testcheck driver, and rewrite that driver using pytest, so that

  • the tests can be run directly with a nice py.test command, and also
  • runtests.py continues to run it, now through that py.test command.

You'd want to preserve the ability to run just one or a few related unit tests at a time (perhaps by having runtests.py pass any -a argument it gets straight through to py.test -k, if it's running the task you migrate). And I think if you reuse DataDrivenTestCase and parse_test_cases from mypy.test.data, perhaps with some adaptations, it should be possible to keep parsing the *.data files unmodified and pass the results to a mechanism like pytest.mark.parametrize so that pytest knows about all the separate test cases.

A PR that does that for one driver without making a ton of changes in total would be a really solid step forward. There'll probably be discussion on the details (made possible by having the code up in a PR), and then once it's merged it should be pretty smooth sailing to do the same for most of the other drivers. Once they're all converted, runtests.py can be dropped.

With #1944 merged, the migration is begun!

Some next steps, not necessarily in a particular order:

  • Migrate testcmdline, which is the other DataDrivenTestCase-using driver. This should be a straightforward re-use of what's in #1944.
  • Migrate the other parse_test_cases-using drivers that don't use DataDrivenTestCase -- testparse, testpythoneval, testsemanal, teststubgen, testtransform, testtypegen. This should generally also be a pretty straightforward re-use of what's in #1944, with a little more massaging of the test code.

    • testpythoneval is especially appealing because it's the long pole and in combination with xunit (below) will make a full test run faster.

  • Migrate the non-parse_test_cases-using drivers: testargs, testdocstring, testgraph, testinfer, testlex, testmoduleinfo, testsolve, testsubtypes, testtypes. These probably become plain old bog-standard pytest tests. Probably will require loosening up python_functions and/or python_classes in pytest.ini, which in turn may require some tweaking elsewhere to avoid spuriously picking up functions back in parse_test_cases-land that are meant as helpers and not as test cases themselves.
  • Add the xdist plugin, do some testing that it's working correctly, and set up pytest.ini so that when someone just types py.test they get the benefit of it by default. Then adjust runtests.py to fit with that. Initially this could just be runtests.py passing -n 1 to knock out xdist and carrying on with its own concurrency model... but the real prize is when it doesn't do that. Say, run pytest alone first, so it can have all the cores to itself, then everything else with the runtests/waiter concurrency mechanism.
  • Cut out myunit from the code and documentation, simplifying things.
  • Migrate everything else in runtests.py: flake8, type-checking, and importing.
  • Cut out runtests.py and mypy.waiter from code and documentation, simplifying things again.

The big prizes here are the simplification from cutting things out at the end; and the speedups from getting effective parallelism, especially on testpythoneval and other individual slow drivers.

Oh, also:

  • Incorporate the test data file (like check-classes.test) into what py.test -k can match against. It's already in the reporting, which is something I've wanted for a while and I know at least some other people have too.

Uhhh...I think I kind of screwed this one up... So sorry! Glad @gnprice was able to actually, y'know, _finish_ it!

@kirbyfan64 Nothing to apologize for! Thanks for your work in #1723 -- I drew on that for #1944 and it's now landed. Want to take on some of the further steps of the migration? ;-)

In particular I think you mentioned in a comment on #1723 that you'd already done some testing of xdist, and gotten it to work after a bugfix upstream. Setting things up so a plain py.test command runs things in parallel by default would be awfully nice...

The biggest issue with test speed now is not the runner anymore but excessive subprocessing and disk thrashing. If we switch to py.test but still have to subprocess for every myunit invocation and for every mypy test or flake8 run, it's not going to win us much, especially with so many [cases] generating temporary throw-away *.py files.

As far as I can tell, the myunit *.test format is very appealing because it's minimal boilerplate. There's no plans to move away from that syntax, right?

Are there plans to make myunit use the API to not have to fire a fresh mypy subprocess for every check? mypy --help takes 0.5s on my laptop. Since myunit is already subprocessed from runtests.py and it doesn't itself do any parallelization, there's little point in invoking mypy via the command line. We could start simplifying myunit so that it just becomes a set of helpers for pytest to handle *.test as parameters, like Greg is suggesting above. Then, making those tests use the API instead of command line mypy, would give us the biggest efficiency win.

As far as I can tell, the myunit *.test format is very appealing because it's minimal boilerplate. There's no plans to move away from that syntax, right?

Correct.

Are there plans to make myunit use the API to not have to fire a fresh mypy subprocess for every check?

That's a great idea! The API is very new so we hadn't thought of this yet.

Honestly I'm not happy that myunit reimplements a bunch of classes that look similar to unittest -- I believe that's a relic from the days when mypy was a language dialect that had to be transpiled to be used. But I don't believe that "fixing" that would help much (it would just make the code slightly more conventional) so it's really off-topic.

The way I see it, moving the rest of the tests (i.e. the non data-driven tests) to pytest requires using @pytest.fixture: see POC implementation of the fixture file with functions like

@pytest.fixture
def fx_contra() -> TypeFixture:
    return TypeFixture(CONTRAVARIANT)

And a test, with test cases such as

def test_is_proper_subtype_contravariance(fx_contra) -> None:
    assert_true(is_proper_subtype(fx_contra.gsab, fx_contra.gb))

Type checking the consistency of pytest's fixtures will require nontrivial mypy plugin, at least. Is it OK to leave it unchecked, and open a separate issue for that?

Pytest can also run tests defined using the stdlib unittest module. unittest is closer to myunit and using it would make the migration easier fir non-data-driven tests. In particular, the fixtures should be easy to migrate. Type checking test cases would also be simpler.

3880 completed the migration of the data-driven tests to pytest.

Great! Thanks for working through the various test modules.

What should be the criteria for closing this issue? The retirement or runtests.py? Why would we need that script still if we can run all tests through pytest?

runtests.py also type checks various things, verifies that modules can be imported, and it runs lint. These would need to be migrated to pytest as well, in addition to the remaining non-data-driven myunit tests.

A reasonable next milestone would be the removal of myunit.

After https://github.com/python/mypy/pull/4369, I think we should replace some of the simpler assert* functions, as pytest will work better without them and using bare assert functions should lead to nicer output on errors.

This would entail e.g. replacing assert_false calls to be just assert not ..., but leaving assert_join and other non-straightforward assert functions.

Myunit is finally gone. Thanks @elazarg!

We are not quite there still, the current status is:

$ ./runtests.py -l
0:lint
1:pytest
2:check file runtests.py
3:check file waiter.py
4:check package mypy
5:check stubs
6:check stdlibsamples (3.2)
7:check file test-data/samples/bottles.py
8:check file test-data/samples/class.py
9:check file test-data/samples/cmdline.py
10:check file test-data/samples/crawl.py
11:check file test-data/samples/crawl2.py
12:check file test-data/samples/dict.py
13:check file test-data/samples/fib.py
14:check file test-data/samples/files.py
15:check file test-data/samples/for.py
16:check file test-data/samples/generators.py
17:check file test-data/samples/greet.py
18:check file test-data/samples/guess.py
19:check file test-data/samples/hello.py
20:check file test-data/samples/input.py
21:check file test-data/samples/itertool.py
22:check file test-data/samples/regexp.py

@elazarg As I understand you are working on this, right?

This is now finally fixed with https://github.com/python/mypy/pull/5274 🎉

Was this page helpful?
0 / 5 - 0 ratings