Environment
Description
Using pip at 19.3.1 building of Python packages was reproducible, however, since we upgraded to 20.0.2 that is no longer the case (https://github.com/NixOS/nixpkgs/issues/81441). The issue is that bytecode now refers to a unpacked-wheel folder (see e.g. https://r13y.com/diff/b03e875784ba16e60ec59d1d5e720b13fc7ae1311edf409b819b0b264acd5393-aba2cac73c5234af8fbf694100bc631800ca397ad7afeed335d38895322a7c71.html).
The function for this unpacked-wheel was added in https://github.com/pypa/pip/pull/7483, I have not bisected this any further though cc @chrahunt @raboof
Expected behavior
Reproducible build.
How to Reproduce
This makes sense if during the build you were specifying --build-dir
. This should be resolved by #6030, since we will byte-compile the .py files in-place.
We build wheels using
python setup.py bdist_wheel
and install them using
python -m pip install ./*.whl --no-index --prefix="$out" --no-cache $pipInstallFlags --build tmpbuild
Thanks. --build
would have forced pip to use the same directory for the initial wheel extraction, assuming that directory was always the same during installation. Then since we do the byte-compilation on the extraction directory (instead of the install directory), it would have had the same embedded directory path each time.
A straightforward way to fix this without requiring too much change is to recurse over the unpacked wheel directory explicitly and use py_compile.compile
with a dfile
parameter that matches the expected install directory.
This should be fixed by #8541, however we will need tests before calling this done. Tests would look like:
tests.lib.wheel.make_wheel
)and would probably go in tests/functional/test_install_wheel.py
(which can be referenced for some of the other steps).
pip install that wheel with SOURCE_DATE_EPOCH set
Am I missing something here? There's no mention of SOURCE_DATE_EPOCH
in the pip codebase, tests, or documentation. So I don't think we should be making any commitments (much less mandating via tests) of how pip will behave if that's set.
It's not us, but the standard library. If we're using either of compileall.compile_file
or py_compile.compile
it should be handled for us automatically.
I don't have a stake in this, so I won't drive the discussion more than that. Anyone can feel free to pick this up and put time into it. :)
It's not us, but the standard library.
Ah, OK. In which case, I'd say that pip's test should simply be that we pass None to the stdlib function (and hence that we follow stdlib behaviour). We shouldn't test that the stdlib does what it says it will do.
If, however, we want to document for pip what guarantees we give around reproducibility, then yes we should have a test that ensures we deliver that behaviour. But I'd say document first, then test - we don't need any more tests that check undocumented behaviour 馃檨
I don't have a stake in this
Nor do I, beyond not wanting to be expected to support a behaviour that I don't understand and personally have no need for. So I'll say no more for now. As you say, someone who has an interest in this can define what they want and push the discussion on that basis.
@pfmoore if PR was made that would make argument of said compile functions None to use standard library convention instead, would it be accepted? This is where it's documented https://docs.python.org/3/whatsnew/3.7.html#py-compile
The end result is essentially checked hash so bytecode is still validated against source for changes but not based on mtime but instead through hash.
Looks like https://github.com/pypa/pip/blob/master/src/pip/_vendor/distlib/util.py#L595 is fine as is. It will result in whatever standard library uses.
Similarly https://github.com/pypa/pip/blob/master/src/pip/_internal/operations/install/wheel.py#L715 looks fine as is. The default is checked hash assuming Python works as intended.
Any chance the title could be reflected that this issue is not in fact about fixing pip but adding tests? :)
I can confirm the issue is indeed resolved with 20.2. Thank you. It would be great to have tests to avoid this issue from occurring again.
PRs adding tests are welcome!
Most helpful comment
Ah, OK. In which case, I'd say that pip's test should simply be that we pass None to the stdlib function (and hence that we follow stdlib behaviour). We shouldn't test that the stdlib does what it says it will do.
If, however, we want to document for pip what guarantees we give around reproducibility, then yes we should have a test that ensures we deliver that behaviour. But I'd say document first, then test - we don't need any more tests that check undocumented behaviour 馃檨
Nor do I, beyond not wanting to be expected to support a behaviour that I don't understand and personally have no need for. So I'll say no more for now. As you say, someone who has an interest in this can define what they want and push the discussion on that basis.