Pip: Adding tests to ensure pip install is reproducible

Created on 1 Mar 2020 · 14Comments · Source: pypa/pip

Environment

Nixpkgs at 9a4d723e436d8c1b94782dbff6d2d8dd4bf5dce5

pip version: 20.0.2
Python version: CPython 3.7
OS: Any Linux OS using Nix builds

Description
Using pip at 19.3.1 building of Python packages was reproducible, however, since we upgraded to 20.0.2 that is no longer the case (https://github.com/NixOS/nixpkgs/issues/81441). The issue is that bytecode now refers to a unpacked-wheel folder (see e.g. https://r13y.com/diff/b03e875784ba16e60ec59d1d5e720b13fc7ae1311edf409b819b0b264acd5393-aba2cac73c5234af8fbf694100bc631800ca397ad7afeed335d38895322a7c71.html).

The function for this unpacked-wheel was added in https://github.com/pypa/pip/pull/7483, I have not bisected this any further though cc @chrahunt @raboof

Expected behavior
Reproducible build.

How to Reproduce

tests awaiting PR

Source

FRidh

👍5

Most helpful comment

It's not us, but the standard library.

Ah, OK. In which case, I'd say that pip's test should simply be that we pass None to the stdlib function (and hence that we follow stdlib behaviour). We shouldn't test that the stdlib does what it says it will do.

If, however, we want to document for pip what guarantees we give around reproducibility, then yes we should have a test that ensures we deliver that behaviour. But I'd say document first, then test - we don't need any more tests that check undocumented behaviour 🙁

I don't have a stake in this

Nor do I, beyond not wanting to be expected to support a behaviour that I don't understand and personally have no need for. So I'll say no more for now. As you say, someone who has an interest in this can define what they want and push the discussion on that basis.

pfmoore on 5 Jul 2020

👍3

All 14 comments

This makes sense if during the build you were specifying --build-dir. This should be resolved by #6030, since we will byte-compile the .py files in-place.

chrahunt on 2 Mar 2020

We build wheels using

python setup.py bdist_wheel

and install them using

python -m pip install ./*.whl --no-index --prefix="$out" --no-cache $pipInstallFlags --build tmpbuild

FRidh on 2 Mar 2020

Thanks. --build would have forced pip to use the same directory for the initial wheel extraction, assuming that directory was always the same during installation. Then since we do the byte-compilation on the extraction directory (instead of the install directory), it would have had the same embedded directory path each time.

A straightforward way to fix this without requiring too much change is to recurse over the unpacked wheel directory explicitly and use py_compile.compile with a dfile parameter that matches the expected install directory.

chrahunt on 3 Mar 2020

This should be fixed by #8541, however we will need tests before calling this done. Tests would look like:

create a wheel in-memory that contains some .py files (with tests.lib.wheel.make_wheel)
pip install that wheel with SOURCE_DATE_EPOCH set
hash all the files
uninstall that package, verify all files are gone
set tmpdir that pip would use to a different directory
pip install that wheel again
hash all the files and compare against previous digests

and would probably go in tests/functional/test_install_wheel.py (which can be referenced for some of the other steps).

chrahunt on 5 Jul 2020

pip install that wheel with SOURCE_DATE_EPOCH set

Am I missing something here? There's no mention of SOURCE_DATE_EPOCH in the pip codebase, tests, or documentation. So I don't think we should be making any commitments (much less mandating via tests) of how pip will behave if that's set.

pfmoore on 5 Jul 2020

It's not us, but the standard library. If we're using either of compileall.compile_file or py_compile.compile it should be handled for us automatically.

I don't have a stake in this, so I won't drive the discussion more than that. Anyone can feel free to pick this up and put time into it. :)

chrahunt on 5 Jul 2020

👍1

It's not us, but the standard library.

I don't have a stake in this

pfmoore on 5 Jul 2020

👍3

@pfmoore if PR was made that would make argument of said compile functions None to use standard library convention instead, would it be accepted? This is where it's documented https://docs.python.org/3/whatsnew/3.7.html#py-compile

nanonyme on 3 Nov 2020

The end result is essentially checked hash so bytecode is still validated against source for changes but not based on mtime but instead through hash.

nanonyme on 3 Nov 2020

Looks like https://github.com/pypa/pip/blob/master/src/pip/_vendor/distlib/util.py#L595 is fine as is. It will result in whatever standard library uses.

nanonyme on 3 Nov 2020

Similarly https://github.com/pypa/pip/blob/master/src/pip/_internal/operations/install/wheel.py#L715 looks fine as is. The default is checked hash assuming Python works as intended.

nanonyme on 3 Nov 2020

Any chance the title could be reflected that this issue is not in fact about fixing pip but adding tests? :)

nanonyme on 3 Nov 2020

I can confirm the issue is indeed resolved with 20.2. Thank you. It would be great to have tests to avoid this issue from occurring again.

FRidh on 5 Nov 2020

PRs adding tests are welcome!

pradyunsg on 5 Nov 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

New resolver cannot installs distributions that only have pre releases

sbidoul · 3Comments

Filename encoding error in some environments with PAX sdist

ncoghlan · 3Comments

Direct URL PEP 508 support and installing sub-dependencies from Git

dmfigol · 3Comments

Pip issues while installing requirements, OSError: [Errno 1] Operation not permitted:

GregBorrelly · 3Comments

Pip does not update too-old dependencies when installing multiple packages

Zac-HD · 3Comments