Pytest: Pytest interferes with git in an unexpected way

Created on 5 Mar 2019 · 16Comments · Source: pytest-dev/pytest

Since https://github.com/pytest-dev/pytest/pull/3982, pytest automatically creates a .gitignore file in the .pytest_cache directory that it creates. The purpose of this is to relieve git users of having to manually ignore the .pytest_cache directory.

While the above seems to be a nice idea, it can interfere with established workflows in surprising ways.
Here is one example: when a Debian package is prepared using git, the package may be built by issuing the command gbp buildpackage. This command will fail if the repository contains files that are not present in the upstream tarball (outside of the debian subdirectory). The usual way to clean a repository using git clean -id will not work with pytest because of the ignored .pytest_cache directory.

Personally, I wasted half an hour of time this morning before I understood the problem. It was feeling surreal, because gbp kept telling me that there were polluting files, while git status was telling me that there were none, and I was sure that I had no .gitignore file.

There must exist more ways in which pytest's silent creation of a .gitignore file can create subtle problems that are difficult to resolve.

I think that pytest should be agnostic about version control systems, just like other comparable software is. Observe that the Python interpreter does not create .gitignore files in tje __pycache__ directories it creates, nor does setuptools for the .eggs directory, nor distutils or Sphinx for their build directories. GCC does not add .o files to gitignore either!

cache enhancement

Source

grothesque

All 16 comments

Why not git clean -fxfd instead?

asottile on 5 Mar 2019

That would indeed solve my immediate problem, but not the general problem that pytest should not interfere with git because, put bluntly, it is none of its business.

Or would you argue that all programming tools that happen to create directories automatically should deposit .gitignore files in them?

grothesque on 5 Mar 2019

No I wouldn't, but I would argue that pytest can put whatever it pleases inside its own directories. If it were modifying a user's .gitignore then fine that's a bad behaviour, but it's not.

I also think this is fine, especially given how convenient it is for the vast majority of users who are using pytest inside a git repository

asottile on 5 Mar 2019

👍1

I think it is a good argument that other tools also do not create .gitignore files, e.g. Python.
OTHO pytest only has a single dir here, while Python would have many.
Related issue: #3286

I am also more a fan of having a global gitignore (~/.config/git/ignore).

blueyed on 5 Mar 2019

from my pov its an absolutely acceptable tradeoff to prevent a lot of developer pain by inflicting a extra step on package maintainers

RonnyPfannschmidt on 5 Mar 2019

@RonnyPfannschmidt
But where's your argument about not supporting $OTHERVCS in the same way?

While I agree that it is helpful from pytest in some way, #3286 appears to have been created only because .cache was renamed to .pytest-cache, and users had ignored the previous but now have seen the new one.

pytest is messing with VCS-controlled dirs in the first place already by adding .pytest-cache - the real fix would be https://github.com/pytest-dev/pytest/issues/1089 - use a cache outside of the working tree.

blueyed on 5 Mar 2019

@blueyed looks like Ronny is open to other VCS additions as well, but nobody has provided patches for them: https://github.com/pytest-dev/pytest/issues/3286#issuecomment-421383395

asottile on 5 Mar 2019

@blueyed looks like Ronny is open to other VCS additions as well, but nobody has provided patches for them: #3286 (comment)

Thanks for the info - missed that.

Still I think we should rather use XDG and have caches outside of user's worktrees.
But even untill then, it is better/cleaner to just have user's configure ignores - after all this also affects e.g. rg, where you might get similar surprising effects like with a maintainer's build system.

blueyed on 5 Mar 2019

@blueyed the ignore got added because users kept getting this wrong

an xdg based cache would be really nice to have - but managing the lifetimes of those cache folders would need some work on use-cases and tooling

RonnyPfannschmidt on 6 Mar 2019

surprising effects like with a maintainer's build system.

I also use gbp-buildpackage and pytest and don't have problems, I think OP just confused their git clean commands, notably missing -x and this is an overreaction to a misunderstanding.

-x

Don’t use the standard ignore rules read from .gitignore (per directory) and $GIT_DIR/info/exclude, but do still use the ignore rules given with -e options. This allows removing all untracked files, including build products. This can be used (possibly in conjunction with git reset) to create a pristine working directory to test a clean build.

(emphasis mine)

Here's a trivial example where git clean -id misses things without involving pytest at all:

rm -rf t
git init t
cd t
mkdir a
touch a/b.py
echo __pycache__ > .gitignore
git add a .gitignore
git status
python3 -c 'import a.b'
git status
ls a/
git clean -id
git clean -idx

$ rm -rf t
$ git init t
Initialized empty Git repository in /tmp/t/.git/
$ cd t
$ mkdir a
$ touch a/b.py
$ echo __pycache__ > .gitignore
$ git add a .gitignore
$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   .gitignore
    new file:   a/b.py

$ python3 -c 'import a.b'
$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   .gitignore
    new file:   a/b.py

$ ls a/
b.py  __pycache__
$ git clean -id
$ git clean -idx
Would remove the following item:
  a/__pycache__/
*** Commands ***
    1: clean                2: filter by pattern    3: select by numbers
    4: ask each             5: quit                 6: help
What now> 1
Removing a/__pycache__/

asottile on 6 Mar 2019

@asottile, your example demonstrates the following:
(1) The user explicitly tells git to ignore __pycache__ directories.
(2) CPython creates a __pycache__ directory.
(3) git clean -id does not clean it, because it only cleans things that are not ignored.

This is exactly how things are supposed to work, but it entirely misses the problem with pytest.

The gitignore mechanism allows the users of a git repository to assign untracked files to two classes: "ignored" and "cruft". This distinction is useful: one repository owner might decide to ignore pyc and treat everything else as cruft. Git then provides ways to inspect and delete the cruft (and optionally also the ignored files). It is very common that the user will want to delete only what he or she considers as cruft and leave the ignored files in place.

The way that git works is that all untracked files are classified as cruft by default. This is a good choice and it is how everybody has learned to expect git to work. But this assumption is broken if some overbearing software decides by its own to put the temporary files it creates in the privileged "ignored" class. Now git status no longer shows any new untracked files that have appeared in the repository (and gives the user the chance to decide to ignore some of them).

Should the example of pytest catch on, it would render git status useless for the detection of new untracked files. One would have to always run git status --ignored and manually filter out the files that one actually wants to be ignored.

Pytest is a Python testing framework that should be fully orthogonal to version control systems. Not interfering with them is a question of good design! If you are indeed confident that pytest's behavior should be emulated, then please consider opening issues in Jupyter (to gitignore .ipynb_checkpoints directories), in CPython (to gitignore __pycache__ directories), and in the many other software packages that create temporary or cache directories. I bet that you will not be successful, but then why should pytest's behavior deviate from the consensus?

grothesque on 7 Mar 2019

👍1

we would be more than happy to have a better way (like xdg)

but lets be realistic here - the added .gitignore protects beginner uses from a very common mistake, that's why its there

its a practical solution to a practical problem and has a interference component
i'm more than happy to replace the underlying system, but i'm strictly opposed to going back to throwing sticks between the legs of beginners

we didn't add that .gitignore because ¨we" where facing issues, we where adding that .gitignore because users just getting started where facing that issue and it was way easier to do at the time than designing a reasonably complete out of band cache location and its lifecycle management

RonnyPfannschmidt on 7 Mar 2019

👍1

I'm not sure whether a central cache would be a good idea, for the same reason that git (and similar software) does not use a central database. The values stored in a pytest cache are tied to a specific project that lives in a specific directory. If that directory is moved, then the cache should move along with it. That would be difficult to realize with a central cache. In addition, one would face the problem of deleting caches that are no longer in use.

That's why I believe that keeping the cache in a subdirectory is just the right solution. It is common for software to produce cache files and dump them into the current subidrectory. (To name an example from a different domain, just think of the various files that running LaTeX produces.)

In my opinion, the best behavior is the old one, which is also the one that is used by everybody else. It does indeed have the problem that people may mistakenly add the subdirectory to git (random examples: https://github.com/pytest-dev/pytest-xdist/issues/161, https://github.com/jupyter/jupyter_core/issues/120, https://github.com/scienceetonnante/GameOfLife/tree/master/__pycache__). I think that the right way to mitigate this problem is to clearly document the existence of the cache directory and recommend VC users to ignore it.

grothesque on 7 Mar 2019

👍1

Just a quick related thought: I would actually like the cache to be in the temporary directory ("/tmp/pytest-of-user/cache/$PATH_TO_REAL_PYTEST_ROOT"). This was it would be on a tmpfs for me (RAM), and would automatically get cleaned. All data in there is supposed to be volatile AFAIK (e.g. last failures). (I am using a shell wrapper to accomplish this for .tox already, by creating the tox dir in /tmp, and symlink it via .tox)

blueyed on 8 Mar 2019

Sorry for coming late to the thread, I agree with the reasons why we added .gitignore to .pytest_cache, although I understand the merits brought by @grothesque.

I also think we should move to a central cache location: the cache contents are supposed to be volatile anyway, so a user moving a repository probably doesn't care too much to move also the .pytest_cache directory. It is often "ignored" anyway, as is virtual envs and such.

The discussion about the central cache was brought up here: https://github.com/pytest-dev/pytest/issues/1089, and one idea is for it to be configurable anyway in pytest.ini, including using a local directory as it is now.

nicoddemus on 8 Mar 2019

I agree that if the pytest cache data is volatile my argumentation in
favor of a local subdirectory is invalid, and a central (and possibly
auto-erased) location is preferable.

grothesque on 8 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Session fixture keeps frames of other fixtures alive

nicoddemus · 3Comments

[Question] : Is there a way to find all markers associated with a particular test function?

thakkardharmik · 3Comments

xdist race condition on node restart

RonnyPfannschmidt · 3Comments

`sys.stdout.reconfigure` raises AttributeError: '_io.FileIO' object has no attribute 'reconfigure'

jtmoon79 · 3Comments

stdout capture (capsys) does not seem to work with real-time cli logging

Sup3rGeo · 3Comments