Pytest: Dynamically create or add a test to each test suite?

Created on 11 Mar 2020 · 20Comments · Source: pytest-dev/pytest

If I have test, which I would like perform automatically after each test suite/file. In Python unittest I can do something like this:

# base.py file
class Base(unittest.TestCase):

    def test_999(self):
        assert check_crash()

# my_test.py file
class TestClass(Base):

    def test_01(self):
        assert do_something()

    def test_02(self):
        assert other_thing()

And when writing new test suite, only thing to remember, is that one needs to inherit Base and test_999 is automatically executed.

But is there way to same thing pytest? So that this:

def test_01():
    assert do_something()

def test_02():
    assert other_thing()

Would look like this during execution time and in some automatic way?

def test_01():
    assert do_something()

def test_02():
    assert other_thing()

def test_999():
    assert check_crash()

I know that I can workaround the problem by writing plugin and doing this checkup after the all test have been executed. But then it would not be a test, it would be some post action that would fail and it would not be clear what and why it did fail (example xunit output would show that all test did pass.)

collection fixtures question

Source

aaltat

All 20 comments

in pytest that would be typically done in the teardown of a module scope fixture for example

RonnyPfannschmidt on 11 Mar 2020

Darn miss clicked, sorry. Those buttons are too close.

Doing it in fixture or module teardown is also possible. But then it is not a test and that is a (somewhat) problematic. Because it's a test, it tests that there is not crashes in the application and I would not like to do something outside of the test scope which is a test. I would like it to be reported as failed or passed test in various places (xunit output, plugins and in other internal tools) and if it's not anymore test, it's hard to report status correctly in various places.

aaltat on 11 Mar 2020

I did not find any other way than writing it directly in the test:

def test_999():
    assert check_crash()

If you do not see other way, then we can close the issue.

aaltat on 16 Mar 2020

@aaltat in my experience, this is usually an issue that doesn't need to be tackled directly, as it comes down to changing the core of how the tests are written.

If you have a catchall test that always applies and checks for something like the system crashing, then the other tests (or at least some of them) likely weren't looking for evidence that suggests the resulting state of the SUT aligns with what you expect.

As a quick example, let's say you want to test creating a user. If you call the function that should create the user, and it returns "success", that return value is just one behavior triggered by the act of calling that function. You can (and should) assert that this was done, but it doesn't actually tell you that the user was created. There's more that needs to be asserted. You could make a follow-up query to the SUT in some way to assert that the user is actually in the system. If the system crashed, that test would fail, and you wouldn't need to check for the system failing directly (although you could parse the result of that query to provide more intelligent error messages in failure reports).

Things get tricky when you want to run multiple asserts, because if you put them all in one test function, and the first one fails, you don't know if the other ones would fail. But, in this example, I'd want to know if the only problem is that the function didn't return "success" but the user was still created, or if it did, but the user wasn't created. Luckily, with pytest, you can use larger scopes like classes and target them with your fixtures instead of individual test functions, and then run as many test methods against a common state as you want. This also gets much easier to do if the arrange and act code is broken out into separate fixtures, leaving the test methods to just make non-state-changing queries to assert against (non-state-changing so that they don't step on each other's toes).

I'm making a lot of assumptions here, so just let me know if I'm off base, and we can go into more detail for your situation to find a better solution.

SalmonMode on 3 Apr 2020

👍1

I understand you point, test such fail if there is an error in the system, not in some later test. I fully agree on this and by looking at the pytest API, which is really nice, it is not big thing move dump detection in to a fixture or a plugin. But in reality this is not always possible. Detecting dumps is actually pretty heavy operation, in the VM environment which I am using. In my environment it would increase test time, lets say about 10 seconds for each test (which is just an estimation, did not try it out, but looking current execution times, it is pretty solid estimation.) But increasing test time is not a good way to go for us, because my teams runs, in a normal day, 250 000 - 300 000 tests. If one test life time increase 10 seconds, our TA starts lasting 690 - 830 hours longer in one day. And that is pretty big increase, it would delay input from single change, it would increase the cost of the TA and it would have other implications too.

Also dumps are pretty self contained, it tells in what component did the crash happen, where it happen in the code and it usually it is pretty easy to find a solution for the dump (although not always.) And because there is trace from test all, the way to commit which did trigger the build, finding the reason the crash is relatively easy.

So this is middle ground that I have currently settle. It is not ideal but it is the best one available at the moment.

aaltat on 3 Apr 2020

Sorry, I'll clarify. I'm not saying to check the dumps early. I'm saying if there's an impact from something going wrong (like a dump), your tests should already be seeing it without checking for any dumps. I'm saying you can, in a way do _less_, which would reduce test execution time.

You can also have pytest check for a failure in all your tests right after each one ends, and then, if it failed, have it check for the dump only then to include it in the failure report (or just export to a file or something. Wherever is most convenient). That way, the expensive check only happens after a test fails, and you can use the dump information, not as the _cause_ of a test failing, but as extra info to help debug the cause.

I'm assuming that by "dump", you mean a memory dump. Could you elaborate on your process for checking for dumps?

SalmonMode on 3 Apr 2020

1) Impact in the test
Yes, if the application component which is being tested crashes, it is usually detected by the test which it is running. But there is about hundred different components in the application (it is a Windows antivirus client) and different components have different kind of relations to each other.

Example application sends information to upstream. All information sending done by a single sender component and other components talk to the sender component to send the information. If I have test which is testing some other component and that other component wants to send information to upstream. Now that information sending causes the sender component to crash, but it is is not detected/seen by test, because the sender component crash does not affect the functionality of the component under test. One can imagine it as micro service architecture, where single service crash may not be visible in other services.

Application it pretty resilient and single component crash does not crash the whole application. In a more realistic scenario: If a virus is found by a component, information is send to upstream. But virus might affect some resources in the system and it hinders information sending, we do not want virus removal be affected by the lack of resources in other components.

Also there are numerous different kind of relations between components and all of them can not be easily verified in a single test. Also it would increase the test time, because not all components work in real time time, but might have batch like processing. Therefore there is last test, which is run in the end of a test suite, which detects was there a crash dumps, somewhere in the know locations.

2) Dump processing
It is windows memory dump of a single crashed process. Last test checks dump files from know locations and if there is a dump, it collect logs files from the application and the dump file in a zip file. That last test failure is visible in the dash board and if crash is a new one, analysis build is run. That analysis build prints out stack trace of the error and with the log files figuring out the problem is easier.

aaltat on 3 Apr 2020

I'll rephrase part of what you said just to make sure I understand. I'll also give names to things just to keep them straight.

In the example you gave, you describe 3 components. One component is the Sender Component that other components use to send information upstream. The two other components, Component A and Component B, are almost completely unrelated to each other, but both use the sender component to send information upstream.

You then describe 2 tests that are running at the same time as each other: test_a, which is testing Component A, and test_b, which is testing Component B. They might not always be running at the same time as each other, but they are for this example.

The issue is that if test_b happens to cause the Sender Component to crash, neither test_b nor test_a will be impacted, because they are checking that their respective components each sent something to the Sender Component, and _not_ checking that the Sender Component sent their information upstream. So as far as those tests are concerned, as long as they sent their information to the Sender Component, and that information was correct, they should pass.

However, you still want to know if either of those tests caused the Sender Component to crash, so after those two tests are done, you run a third test which checks the Sender Component to see if it crashed, and if it did, you have it print out the stack trace and log files for the Sender Component to the build output so you can easily debug it.

Is that all correct?

SalmonMode on 3 Apr 2020

Yes that sums it's pretty clear together. Although I made mistake in my explanation and there is only a single test running in single VM/antivirus client. So if we simplify example to test_b and component b, it hits the scenario correctly.

There are tests which do testing which involves multiple components, like testing component b and making sure that sender component does actually sends the information and it's successful received by the upstream. But because those tests are take longer time to run (example ten times longer) those tests doesn't cover all the things that can be covered the test run within a single component.

aaltat on 3 Apr 2020

So if you test component B, and it takes, for example, 1 second from the start to the point that it sends something to the Sender Component, you're saying it could take 10 seconds for that information to actually make it to the upstream component?

SalmonMode on 3 Apr 2020

Unfortunately these tests are not that fast, usual single test duration is somewhere from 20 - 90 seconds and then those longer running test can take 5 to 7 minutes (single test). These tests are not unit test (there are unit test, but unit test are run when component is build), this stage has full application is installed in a real operating system. But there are different kinds of API`s build in which allows to interact with single component. So that it's easier, faster and more reliable to test.

aaltat on 3 Apr 2020

We do run test suites in parallel, but instead of running multiple suites in a same VM, we claim multiple VM from a cloud service. Also we run different operating system and same test suite might be launched multiple times, but by using different operating system.

aaltat on 3 Apr 2020

If you wanted to test an individual component, why not just fire up only that component, and mock any external services? That would be much faster and more reliable.

SalmonMode on 3 Apr 2020

Isolation and mocking is done in the unit level. But component may interact with other components or perhaps with the operating system. Example we might configure firewall or perhaps scan file from the disk. Or perhaps display a notification for the user that is active.

aaltat on 3 Apr 2020

Ok, I'm gonna back up a bit and go back to the basics, because I'm seeing some conflicting stuff, and I can only assume it's a breakdown in communication (likely from me haha). I also think I see a solution for you

If a component is crashing, there must some way to observe this without checking to see if there's a dump. If there isn't, then it doesn't matter, because it can't possibly have any impact on the user. I don't think you're disagreeing with that statement. I _think_ what you're saying is that you can't assert _one_ thing and know that every behavior that stemmed from a single action in a single component is working as expected. I think you're also saying, that no matter what, you want something to check for a dump after _all_ the tests have ran, so it can raise some sort of red flag.

Is this correct?

SalmonMode on 3 Apr 2020

I agree having discussion over issue tracker is difficult and many things ar lost in translation. And definitely also in this end too.

Yes, there are ways other ways to check and many of the longer running tests do it. Example when messages are send to upstream, those messages should arrive, let's say to back end system. It's not difficult to poll that message has arrived to the backend, but that takes longer time than reading files from a disk. And the example is simplified, component does something and send information to somewhere. In real life there is more components involved and checking all of components is more expensive in time vise than spinning disks for ten seconds in the end of test suite.

The test automation will check only things with it knows to check. If component A is changed that it sends something new to component X (which it has not never done before), checking component X needs to added somewhere in the test automation. But the best information is received when something unexpected happens. Therefore the last test is so good, because checking dump files catches also thins which where unexpected.

The problem is that in the current system I don't have resources to spin disk for ten seconds after every test. With the scale that we have, it's just consuming too much time and resources. I can do the dump file check as a last test in a suite. Even it's just two lines of code, it can be easily forgotten by people (also by people reviewing the PR) and it would be nice to have it added automatically.

I like having this discussion and it makes me think this from a different angle.

aaltat on 4 Apr 2020

Ok, so in that case, I have a solution, although it isn't the one you were hoping for. I also have a technique/structure that might be of use to you in general.

Unfortunately, pytest doesn't have any notion of test order. It isn't meant to have a deterministic order of tests, and trying to get it to do so is just gonna bring you headaches and more problems. The order you have now is _technically_ deterministic, in that if you rerun your test suite, they _should_ happen in the same order. But it's not deterministic in the sense that you aren't in control of the order they run in. That's entirely the result of whatever implementation pytest happens to have at that moment (and could even potentially change if you so much as update your version of python, albeit unlikely). So this is definitely not something you should rely on.

So one solution would be to bake the check for the dumps into the build process itself as a step that takes place after the test suite runs. If that step detects a dump was created, it can fail the build.

Regarding the structure I mentioned, you can assume that your test suite will always take at least as long as the longest test. So if you have a series of tests that all build up the same state, you can just build up the state first in class scoped fixtures, and then run several queries against that state, in a sort of postmortem analysis, each query being one test. As a quick example:

class TestThing:
    @pytest.fixture(scope="class")
    def thing(self):
        return make_thing()

    @pytest.fixture(scope="class")
    def sender(self):
        return get_sender()

    @pytest.fixture(scope="class")
    def sender_dumps(self, sender, fixture_a, fixture_b):
        # needs to request fixture_a and fixture_b to ensure it happens last, if it happens at all
        return sender.get_dumps()

    @pytest.fixture(scope="class")
    def thing_that_a_needs_but_does_not_impact_b(self, thing):
        thing.do_thing_for_a()

    @pytest.fixture(scope="class")
    def thing_that_b_needs_but_does_not_impact_a(self, thing):
        thing.do_thing_for_b()

    @pytest.fixture(scope="class", autouse=True)
    def fixture_a(self, thing, thing_that_a_needs_but_does_not_impact_b):
        thing.do_another_thing_for_a()

    @pytest.fixture(scope="class", autouse=True)
    def fixture_b(self, thing, thing_that_b_needs_but_does_not_impact_a):
        thing.do_another_thing_for_b()

    def test_a_talked_to_sender(self, thing, sender):
        assert thing.a_message in sender.outbound_queue

    def test_b_talked_to_sender(self, thing, sender):
        assert thing.b_message in sender.outbound_queue

    @pytest.mark.dump_check
    def test_sender_did_not_crash(self, sender_dumps):
        assert sender_dumps is None

    # more tests

The general goal is to just head straight to the final state, and get everything out of the way. Then, once the dust has settled, go back and look at what happened without changing it any further.

In this structure, sender_dumps doesn't run until test_sender_did_not_crash, but all the other fixtures are forced to happen because of how the autouse fixtures are requesting the others. So while you can't guarantee it will happen after all the other tests, you can guarantee that it will happen after the final state has been established (because it requests fixture_a and fixture_b), so it's effectively the same. Also, since sender_dumps is class-scoped, once it's executed, it will never execute for that class again as the value is cached, so it won't run more than necessary in case you have other tests poking around with it.

It's also marked with dump_check, so you can easily filter it out from test runs using -m 'not dump_check' IIRC, and if you filter it out, the fixture that gets the dumps will never run. You can also exclusively run that test (or all tests marked that way) with -m 'dump_check' (IIRC).

SalmonMode on 4 Apr 2020

I always tough pytest order is deterministic, although I am first to admit that I have never looked closely in the matter.

I think I understand your example by reading, but I need to play with some real (but dummy) code to be sure. At least it revealed to me an new way to use fixtures, I didn't know that fixtures can be used to ensure order.

aaltat on 4 Apr 2020

Exactly. That's what makes pytest's fixture system so incredibly powerful.

Let me know if you hit any snags or have any questions about how they work, or if you hit any problems with how to structure them, and I'll be more than happy to help out.

SalmonMode on 4 Apr 2020

👍1

Closing this as an inactive question - hopefully fixtures are working for you :slightly_smiling_face: