For about a month now (maybe less) our team is experiencing issues with the --last-failed mode. Since we regularly update our development dependencies there is a chance that it might be due to an issue in pytest. Unfortunately I don't have an example to reproduce the issue.
I will try downgrading to pytest 4.4 and see if this fixes the issue. If I don't forget it, I will update this issue if that's the case.
When using "last-failed" mode we expect the following behaviour:
What we observe instead:
Workaround:
pytest once without --last-failed solves the issue (but only once).Package Version Location
----------------------------- -------- ------------------------------
alabaster 0.7.12
apipkg 1.5
asn1crypto 0.24.0
astroid 2.2.5
atomicwrites 1.3.0
attr 0.3.1
attrs 19.1.0
autopep8 1.4.4
Babel 2.6.0
bboss 1.10.1
bcrypt 3.1.6
certifi 2019.3.9
cffi 1.12.3
chardet 3.0.4
colorama 0.4.1
commonmark 0.9.0
config-resolver 4.3.4
coverage 4.5.3
cryptography 2.6.1
distribute 0.7.3
docutils 0.14
entrypoints 0.3
execnet 1.6.0
flake8 3.7.7
flake8-polyfill 1.0.2
future 0.17.1
gouge 1.3.0
idna 2.8
imagesize 1.1.0
isort 4.3.20
jargon 4.14.0
Jinja2 2.10.1
lazy-object-proxy 1.4.1
mando 0.6.4
MarkupSafe 1.1.1
mccabe 0.6.1
more-itertools 7.0.0
msgpack-python 0.5.6
mypy 0.701
mypy-extensions 0.4.1
packaging 19.0
paramiko 2.4.2
pathlib2 2.3.3
pexpect 3.3
pip 19.1.1
pkg-resources 0.0.0
pluggy 0.11.0
puresnmp 1.5.1
py 1.8.0
pyasn1 0.4.5
pycodestyle 2.5.0
pycparser 2.19
pyflakes 2.1.1
Pygments 2.4.0
pylint 2.3.1
PyNaCl 1.3.0
pynet 3.11.0 /home/users/malbert/work/pynet
pyparsing 2.4.0
pytest 4.5.0
pytest-cov 2.7.1
pytest-cover 3.0.0
pytest-coverage 0.0
pytest-forked 1.0.2
pytest-xdist 1.28.0
python-dateutil 2.8.0
pytz 2019.1
PyYAML 3.13
radon 3.0.3
recommonmark 0.5.0
requests 2.22.0
schema 5.2.3
setuptools 41.0.1
six 1.12.0
snowballstemmer 1.2.1
Sphinx 2.0.1
sphinx-rtd-theme 0.4.3
sphinxcontrib-applehelp 1.0.1
sphinxcontrib-devhelp 1.0.1
sphinxcontrib-htmlhelp 1.0.2
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.2
sphinxcontrib-serializinghtml 1.1.3
typed-ast 1.3.5
typing 3.6.6
urllib3 1.25.2
verlib 0.1
wcwidth 0.1.7
wheel 0.33.4
wrapt 1.11.1
Linux BBS-nexus.ipsw.dt.ept.lu 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
Python 3.5.2
Maybe due to 1f5a61e / #5034.
- _only some tests are re-run_. Only a fraction of tests are executed.
More details on those might be interesting if you see it again.
btw: I've found to put the pytest cache dir under Git control good for looking how the cache changes over time / after a run. This came up with having state of last-failed stashed for later / git-bisect.
I have downgraded to 4.4 and it seems better now, so maybe it is indeed a regression in 4.5.
I'm still keeping an eye on this, although, my test-runs and round-trip is not the fastest in the project I am currently working on which has around 1600 tests (and I will be off from this afternoon until Monday).
... forgot to ask: You asked for more details on the fraction of tests. Is there anything specific I should look out for?
I can now confirm that it definitely only happens in 4.5.0
Are you seeing some output then, e.g. "run-last-failure: 1 known failures not in selected tests" then?
Just noticed this myself, and bisected it to https://github.com/pytest-dev/pytest/commit/08734bdd18ec4b11aeea0cf7e46fcbf4e68ee9ad.
My issue is fixed with https://github.com/pytest-dev/pytest/pull/5333 - would be good if you could try it, and/or provide more information to reproduce your issue.
Thanks @blueyed for the fix. π
I'm currently out of office. I will check back on Monday and let you know if the fix worked.
Sorry for the late reply. I only now managed to upgrade and am now on pytest 4.6.2 and it is still not re-running all tests. I just ran a successful test by just running ./env/bin/pytest without any arguments (that is, without last-fail mode).
After running that command, I ran pytest with last-fail activated and I get this:
================================================= test session starts ==================================================
platform linux -- Python 3.5.2, pytest-4.6.2, py-1.8.0, pluggy-0.12.0
rootdir: /home/users/malbert/work/ipbase/core, inifile: pytest.ini, testpaths: tests
plugins: cov-2.7.1
collected 448 items
run-last-failure: 99 known failures not in selected tests (skipped 87 files)
Note that it says: "skipped 87 files". But because the previous test-run was successful it should have run everything. It also states that there are 99 known failures even though the test-run was successful.
Until now, we blacklisted only 4.5 but unfortunately I think we need to stick with pytest < 4.5 for now.
It is not super critical for us as the failures get picket up by our CI pipelines but it is still quite annoying to see everything go green only to have the pipeline fail 10 minutes later π
Unfortunately it is sometimes hard to spot that not every test was run. Especially if only one or two got skipped.
If you need help debugging this, I could make myself available for a skype/hangout/... session as I am aware that this is hard to reproduce.
It also states that there are 99 known failures even though the test-run was successful.
It says "99 known failures not in selected tests", i.e. there are known failures outside of the collected tests. This might be due to only running selecting specific tests, but also because files have been removed or renamed.
You do not specify how you run pytest, e.g. what args are being used.
For this to debug you can inspect .pytest_cache/v/cache/lastfailed.
I suggest copying .pytest_cache for a state where this is failing for you, and then git-bisect pytest to see where that behavior changed.
I am having trouble reproducing this in a controlled manner. It still happens, but keeping a backup of .pytest_cache did not help. Here's what I did:
.pytest_cache using: cp -rv .pytest_cache failing-cache4.6.3cp -rv failing-cache .pytest_cachebisect bad baseline). But lo and behold, _all_ tests are run π€ I am still trying to find a way to reproduce the issue. But seemingly, copying the .pytest_cache folder is not enough.
And yes indeed, I forgot to mention how I run pytest. The command-line is:
./env/bin/pytest --lf
and pytest.ini contains:
[pytest]
norecursedirs = .git env log
testpaths = tests
log_cli_level = CRITICAL
I am currently unable to reproduce the error. It's unfortunate that the .pytest_cache backup did not help, even though I ran into the issue again this morning.
I will... for now... pin pytest to the last known failing version 4.6.2 in all my projects and keep a close eye on it. If I find more I will send an update.
I need to drop this for now as I need to advance on other projects π
@exhuma it is really a bummer that we can't reproduce this. Please do come back when you give this another go. π
I'm aware that this is another comment which won't help much identifying it, but - for what it's worth - I just started adding unit-tests to a new project, so everything is brand-new: New pytest install (on 4.6.3) and brand-new unit-tests. Even on a completely different machine (private laptop as compared to shared office server).
I ran into the same issue about 20min into development. At first everything seemed fine, I was not even thinking abut this issue and suddenly it happened again. I still don't know what triggers it. But from one moment to the next it decided to ignore most tests again.
I'm merely posting this update as this is on a _completely different_ setup as the one I was observing the behaviour earlier. So it's not isolated to our office box. I was still not sure if it might be something local to that environment. I can safely discard that now. It is definitely something affecting pytest itself.
I will go ahead and create a dummy project now and keep track of my steps... Maybe I will find something
... for the life of me I can't reproduce it in a controlled manner. Whenever I try to, it works as it should. With the dummy project I created right now it works again. But on the same machine I have another window open with a project where it doesn't :frowning_face:
Okay.... I may have something which demonstrates the error. Unfortunately it does not show the steps of how to _get_ there. I took the project I am working on and zipped it. Unzipping it, cleaning it out and running the tests with --lf demonstrates the error.
As I can't make uploads directly on GitHub do you have a preference or an idea how I can share it?
Great.
You could start by investigating if e.g. there are files in the cache that do not exist anymore etc. - basically all the ideas I gave already.
Then also the code for this is rather compact, and you should be able to print-debug it.
Also git-bisecting it would still be useful.
Otherwise you could push the project as a temporary git project somewhere, but I'd suggest to start debugging it yourself already.
This is getting me somewhere. Bisecting gave me the commit 08734bdd1 as the culprit
The commit contains a check to see if a file should be skipped.
The check does not make sense to me. I read it as "If a file has not failed in the last run, skip it". This may explain why I was unable to reproduce it. I was always using "dummy" projects with just one test-file. This seems to happen if you have multiple files.
@exhuma
Cool. Note that https://github.com/pytest-dev/pytest/pull/5333 was already a fix in this area, so yours might look similar.
The idea here is to skip whole files when they are not in the list of known failures, i.e. would be skipped then anyway.
Note also that the code looks a bit different already: https://github.com/pytest-dev/pytest/blob/b713460cc75a7a54e15f47c63ecb3f93caa9d1d8/src/_pytest/cacheprovider.py#L169-L180
The idea here is to skip whole files when they are not in the list of known failures, i.e. would be skipped then anyway.
Exactly. This makes for a very snappy collection. I wonder if there's a bug in last_failed_paths, specially in the nodeid breaking logic:
Also x.exists() should really be x.is_file() for consistency.
I still did not have the time to dig into this and it currently looks unlikely that I will get time soon. I will try though...
For the time being I have pinned pytest to < 4.5 in my projects as I really like using the --last-failed mode, and this bug continues to make me miss errors in my applications. Today I worked on an application which I have not touched in a while and getting a fresh clone, setting up the new virtual env pulled in the latest release of pytest without me noticing it. I spun up a entr process with pytest --lf and started coding and all seemed well at first except that some files got ignored again and I noticed it only after finalising my feature branch and pushing it to the CI server.
I've had to pin that project to pytest <4.5 as well now π
It's strange that our team seems to be the only one running into this bug... So I wonder if it has to do with our environment/workflow. We all run on Linux (Ubuntu 16.04.1) and run the tests using entr like this:
find . -name "*.py" | entr -c sh -c "./env/bin/pytest --lf"
Although it does seem to be unrelated to entr. And I have the same issue on my personal laptop which runs Arch Linux
Are you creating / renaming files often, or something else?
I am using --lf only on demand / not very often.
Are you creating / renaming files often, or something else?
I am not renaming files often (if ever). Nothing else is out of the ordinary.
A colleague of mine has the impression that deleting the cache helped him. I've tried the same and it worked for one execution an then failed again.
I am using --lf only on demand / not very often.
The "--lf" flag was a game-changer for me. In combination with entr this makes for a very efficient workflow. entr will automatically launch pytest as soon as a file changes. And with --lf this gives very quick feedback on code changes. I could use --sw instead, but I find --lf much more informative.
I've tried the same and it worked for one execution an then failed again.
Please cp -a it before and after then (or use the git-based approach I've mentioned already).
This might give you a clue then.
The "--lf" flag was a game-changer for me
I agree that it can be useful..
btw: you might want to try testmon also: it selects only affected tests via coverage.. :) (but also has subtle issues still likely (but I heard you like those ;)), I am using a local mix of PRs that are still open)
So together with entr it will run only those tests that are either changed directly, or that touch code that was changed (works better with good coverage, of course).
@exhuma we don't actually need access to your full suite, but you can provide the following to help us diagnose this issue:
pytest --collect-only: save the output somewherepytest a few times, saving the .pytest_cache directory each time.When you get the strange behavior, please send us all the above so we can reproduce and hopefully understand and fix this issue.
I'm back actively developing on a bigger project and this still keeps happening. I've tried the approach noted above, but every time I try to reproduce it, I am unable to do so. I still have not reliably found the trigger. The only difference from yesterday to today is that I was switching branches between test-runs, which could mean that some files appeared/disappeared. Apart from that nothing is really special about my workflow. I simply run pytest --lf after each source-code change.
I am trying my best to find it. I spent all morning on it yesterday without being able to reproduce it. Today I have the same issue again and caused me to accidentally push something broken into revision control again. Noticed my error on the failing CI pipeline.
I wish I could give more productive feedback, but I have to find a compromise between hunting for this pytest bug and actually investing time in the project where I should be investing time in π
I'm also doing regular pytest upgrades just in case it gets accidentally fixed π
Frankly though, I would not mind slower test-collection if that means that the test run reliably.
I will keep an eye on this though. I really want to help you guys find it.
I'm going nuts... I've just run into the same issue in the 4th project. All projects are fairly difficult.
I've taken some time to isolate the issue with a small sample project. But for the life of me I can't reliably reproduce it. This issue tries to hide from me... every time I look closely it disappears π
Note that this is whenever I try the proposed solutions with bisecting and using the "collect-only" command. I'm still trying... but this is getting seriously frustrating...
One thing that comes to mind is that it always happens in repositories with existing environments. And every time I want to reproduce it with a minimal example I create a new environment. But I don't think it's related.
I will create aΒ new env in the project I just experienced the issue with and will see how it goes...
Thanks for pursuing this @exhuma, we appreciate it!
It happened again with the new environment. So that wasn't it either. I had a feeling it went away.
A wild guess: I've made a couple of git stash and git checkout operations. It is possible that I've made one such operations while the tests were running. Maybe that has something to do with it? This is a very wild and desperate guess though...
I'll continue to keep an eye on this.
Nope. That wasn't it either. Just happened again without using any git operation. Just a normal "edit β save β run-tests" loop.
@exhuma
Keep in mind that it uses the previous state here.. i.e. it would only be normal if you do "run-tests β edit β save β run-tests".
And it also depends on what "run-tests" does of course.
I recommend keeping an eye on the status message that gets reported ("run-last-failure: β¦") also.
Ok. Thanks for the pointer. For clarification, I am using an inotify based file watcher (entr) to automatically re-run pytest. It does nothing more than run pytest with --lf. My exact command-line is:
git ls-files | entr -c sh -c "./env/bin/pytest --lf"
The usage of entr does not seem to have an impact as I also run into the issue when running the tests manually.
So, when I start working and run this command, pytest is executed at least once with --lf. Then every time a file changes on disk which is also tracked by git, pytest is re-run with --lf.
Is it possible that the initial run with --lf causes the problem if there are failures in the tests?... let me check...
That wasn't it either :(
You are not switching py2/py3 by chance? (https://github.com/pytest-dev/pytest/issues/5702)
@blueyed No. All is running on Python 3
And I just ran into it again. I took some screenshots and annotated it, in which shows the issue and how it manifests itself. In _this_ instance, I wanted to implement a bugfix in an internal library which I have not touched in a few days. So the first pytest execution is a "fresh" run if you will. The last time pytest was run in that folder is probably about a week ago. No changes have been made in the venv. Just a cd into the folder and ran pytest. The numbers in the screenshot are the different steps I took:
pytest_cache before this run)..pytest_cache, but for once I held back on the muscle-memory and made a backup. Rerunning the tests after cleanup made all tests run again.Having a look at both "pytest" folders shows that the one with the unexpected behaviour has a file called v/cache/lastfailed while the other does not have one. The content of that file is:
{
"tests/qprim/test_mii.py::test_floordic[left0-right0-False]": true,
"tests/qprim/test_mii.py::test_floordic[left1-right1-True]": true,
"tests/qprim/test_mii.py::test_floordic[left12-right12--1]": true,
"tests/qprim/test_mii.py::test_floordic[left2-right2-False]": true,
"tests/qprim/test_mii.py::test_floordic[left3-right3-False]": true,
"tests/qprim/test_mii.py::test_floordic[left4-right4-True]": true,
"tests/qprim/test_mii.py::test_floordic[left5-right5-False]": true,
"tests/qprim/test_mii.py::test_floordic[left6-right6-True]": true
}
When removing that file, all tests are run as expected. If I recreated the file with only one row in it, it will again only run that one file (repeatedly).
A wild guess is that the file is not removed after a successful test-run. I don't feel confident enough to say that this failure to remove the file happens every time. I have the feeling that it only happens in some situations, which makes it hard to reproduce the error.
@exhuma
v/cache/lastfailed
That's where the last failures are stored.
Do the tests therein exist still?
How does pytest --collect-only tests/qprim/test_mii.py look like (only "test_floordic" is relevant)?
The message there says then also "8 known failures not in selected tests (skipped 176 files)" (not happy to type that from a screenshot btw)
I think "selected tests" here is triggered since you have "testpaths = tests" in the config. But it could be smart about that the failed ones are therein.
I think https://github.com/pytest-dev/pytest/pull/5625 (https://github.com/pytest-dev/pytest/pull/5625/files) might help here already. - not really probably, but the code is related.
I think "selected tests" here is triggered since you have "testpaths = tests" in the config.
I thought I'd give it a try to nuke the pytest.ini to make sure that it's not interfering with "last-failed" mode. And I was - until now - under the impression that this helped it and was about to post a message that this would be the cause of the issue. Alas, the issue just manifested itself again.
I've actually become used to just removing the .pytest_cache folder prior to each test-run now. I wish I could spend more time on debugging this. I promise that I will keep an eye on it and try to find out as much as I can. But I'm having trouble with this. As it keeps running successfully whenever I try to debug this. I'm feeling helpless in debugging this. I'm really sorry that I still wasn't able to produce anything helpful :(
I wish I were able to reliably reproduce this βΉοΈ
I have not updated this as I have already become accustomed to removing the pytest_cache folder regularly. I'm aware that other people in my team have the same issue. I've prompted them to participate in this thread in order to share experiences and hopefully find a way to reproduce it. But unfortunately this is really on the bottom of our priority list as the workaround is fairly simple. It is merely annoying of pushing something to CI only to realise that pytest has again skipped some tests on the dev-box and running into a failure on CI.
Is there no possibility to just add a flag to pytest to disable this optimisation? I'd prefer having a slower startup time than missing tests altogether.
@exhuma
It does not really help us if you repeat that it's annoying - I think we got that.. ;)
I still recommend to use e.g. version control to see what going on (you can use https://github.com/blueyed/dotfiles/blob/master/usr/bin/p-cache-git). This was also what showed me e.g. https://github.com/pytest-dev/pytest/issues/5206.
As for disabling the optimisation: not that I know of - but you could (monkey-)patch it out I guess - after all you would have to use a config setting for it, so you can have something in a pytest plugin also probably. But then you also need to know what's causing it really in the first place I guess.
Have you checked if the git-bisect result (https://github.com/pytest-dev/pytest/issues/5301#issuecomment-504299026) still applies?
Maybe https://github.com/pytest-dev/pytest/pull/5625 helps, which takes given args better into account.
I still recommend to use e.g. version control to see what going on
You could have e.g. a p script that wraps git to commit before and after each run. If it then happens again you have information to debug this. I recommend only using "last-failed" from the cache then (but not e.g. "nodeids" at least, which is buggy (#6004)).
i'm hitting this bug as well. i'm able to reproduce it by deleting or renaming&fixing the failing test. if i have two files, test_a.py and test_b.py, with a passing and a failing test in test_a and a single failing test in test_b; if i delete the failing test from test_a.py then ptest --lf will never run the tests in test_b.py. i also see this behavior if i fix and rename the test before running pytest --lf again.
i'm not sure if this is exactly the same situation i'm hitting in production but i hope this info helps a bit.
ps - for anyone else who googles this issue my hacky work around is to delete .pytest_cache when pytest exits with success (since i'm calling pytest from a wrapper script this is easy for me to do):
pytest --lf ... && rm -f .pytest_cache/v/cache/lastfailed
(using pytest 5.3.2 and python 3.7.5 in a fresh venv)
Thanks @segv for the reproducible problem, we appreciate it!
What happens is:
First run:
$ pytest
collected 3 items
test_a.py::test_1 PASS
test_a.py::test_2 FAIL
test_b.py::test_3 PASS
So --lf records test_a.py::test_2 as failed.
Now we comment out the function test_2:
$ pytest --lf
collected 1 item
run-last-failure: 1 known failures not in selected tests (skipped 1 file)
test_a.py::test_1 PASS
The second run should now run all tests, because it could not detect any failures this time around.
But after some debugging, now things are clearer: the optimization that prevents us from collecting files in case they didn't contain any failing tests from the previous run effectively breaks the case 're-running all tests due to no previous failures', because the files with all tests passing are not even collected.
That's what's happening here: because no test from test_b.py failed, it is not collected, but because on the 2nd run there's no failing tests anymore, it should run all tests, but it only runs tests from test_a.py because test_b.py was never collected.
Commenting out pytest_collect_ignore (responsible for the optimization) now produces the correct behavior:
collected 2 items
run-last-failure: 1 known failures not in selected tests
test_a.py::test_1 PASS
test_b.py::test_3 PASS
But even then it seems some adjustments in the run-last-failure message.
If my analysis is correct, I don't think there's any way to implement the existing optimization consistently, so I propose we remove it. Better to have a slower collection than wrong/unexpected behavior.
What do you think @blueyed?
What do you think @blueyed?
This could be improved/fixed by storing the mtime of files, i.e. if a file is newer then what we know about its failures it could be ignored for skipping others. (Note that this is something good to have in general - in https://github.com/blueyed/pytest/pull/102 I've added this via a generic "cache/fspaths" cache key, where it is then used for a faster -k that can skip collection)
Another approach/idea might be to only ignore files once at least a failure has been found. For this it might be best to sort modules/files with failures to the beginning (via pytest_make_collect_report). I've tried to use pytest_make_collect_report instead of pytest_ignore_collect, but the problem there is that it cannot be used both as a hookwrapper, and to return an empty collect report for skipped files (it can still override the result, but collection is not skipped then, i.e. it has no performance improvement then anymore). For this two separate plugins could be used (a hookwrapper to sort failures first, and a firstresult one to skip files).
Somehow related also: https://github.com/pytest-dev/pytest/pull/5625
You might want to try https://github.com/pytest-dev/pytest/pull/6448.
(Sorry for the delay)
This could be improved/fixed by storing the mtime of files, i.e. if a file is newer then what we know about its failures it could be ignored for skipping others.
Not sure I follow can you elaborate? How it would help with the case in https://github.com/pytest-dev/pytest/issues/5301#issuecomment-572574330, given that the file modified is the one with the failure?
@nicoddemus I cannot really recall after 2 weeks. What I've meant likely is keeping track of mtime of files and execution times for tests, therefore heuristics could be applied to detect "this test is not in there anymore" etc, but it always is a question about "have we collected everything from the file?" (i.e. no -k etc).
Therefore #6448 is better in this regard, because it hooks into the collection process already.