$ python3 -m venv env
$ env/bin/pip --version
pip 9.0.1 from .../env/lib/python3.5/site-packages (python 3.5)
$ mkdir mydist
$ cat > mydist/setup.py
import os
import sys
from setuptools import setup
assert "egg_info" in sys.argv or "my-build-dir" in os.getcwd()
setup(name="example")
$ tar czf mydist.tar.gz mydist/*
$ env/bin/pip install -v mydist.tar.gz
Processing ./mydist.tar.gz
Running setup.py (path:/tmp/pip-ynhyamwl-build/setup.py) egg_info for package from file:///home/benjamin/dev/repos/repo/mydist.tar.gz
Running command python setup.py egg_info
running egg_info
creating pip-egg-info/example.egg-info
writing top-level names to pip-egg-info/example.egg-info/top_level.txt
writing dependency_links to pip-egg-info/example.egg-info/dependency_links.txt
writing pip-egg-info/example.egg-info/PKG-INFO
writing manifest file 'pip-egg-info/example.egg-info/SOURCES.txt'
warning: manifest_maker: standard file '-c' not found
reading manifest file 'pip-egg-info/example.egg-info/SOURCES.txt'
writing manifest file 'pip-egg-info/example.egg-info/SOURCES.txt'
Source in /tmp/pip-ynhyamwl-build has version 0.0.0, which satisfies requirement example==0.0.0 from file:///home/benjamin/dev/repos/repo/mydist.tar.gz
Installing collected packages: example
Running setup.py install for example ... Running command /home/benjamin/dev/repos/repo/env/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-ynhyamwl-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-mpcj19ev-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/benjamin/dev/repos/repo/env/include/site/python3.5/example
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-ynhyamwl-build/setup.py", line 4, in <module>
assert "egg_info" in sys.argv or "my-build-dir" in os.getcwd()
AssertionError
error
Cleaning up...
Removing source in /tmp/pip-ynhyamwl-build
Command "/home/benjamin/dev/repos/repo/env/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-ynhyamwl-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-mpcj19ev-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/benjamin/dev/repos/repo/env/include/site/python3.5/example" failed with error code 1 in /tmp/pip-ynhyamwl-build/
Exception information:
Traceback (most recent call last):
File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/req/req_install.py", line 878, in install
spinner=spinner,
File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/utils/__init__.py", line 707, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command "/home/benjamin/dev/repos/repo/env/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-ynhyamwl-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install
This bug only happens when a "req" is not given on the command line. (Indeed, it's possible work around the bug by passing a file://
URL with a #egg=mydist
fragment.) When a "req" is not given, pip determines the distribution name by running setup.py egg_info
and parsing the distribution name from the resulting metadata. egg_info
is always run in a temporary directory selected by pip (see InstallRequirement.build_location
). The bug is that this directory is then used for the actual build. The method InstallRequirement._correct_build_location
is apparently supposed to address this very case by setting the build directory back to the user-specified one. However, it is always noop because it's early exit condition self.source_dir is not None
is always true. (One can almost prove this because the singular caller of _correct_build_location
, run_egg_info
asserts that self.source_dir
is not None
.)
Would it be possible to use the user-specified --build-dir
to run egg_info
in and clean out the directory before running the real build? That would fix this bug, make the behavior more consistent, and likely simplify InstallRequirement
's code.
What is your usecase for using --build
? In https://github.com/pypa/pip/issues/4371 I questioned just deprecating it because all of the mentions for uses cases I could find were better handled by our move to randomized build directories or setting TMPDIR
.
Unfortunately, I use --build-dir
to explicitly avoid the default randomized build directories for the purposes of reproducibility. It turns out many packages leak their build directories into the output wheel.
Huh, interesting. Are any of them public? I'm curious what they're doing that leaks that into the output.
A classic example is this numpy code generator that writes generator's __file__
(thus, including the build directory) into its output artifacts. C extensions also generally leak their build directory into the debugging info.
I decided it was easier to systematically bandaid the problem by using a constant --build-dir
rather than whack-a-mole fixing issues in individual packages.
Am I correct in thinking that the leaked build directory name is not affecting the functionality of the resulting packages, it's just making the builds show as non-identical when rerun?
I have never seen the build directory actually effect package behavior, no. However, I aim for byte-for-byte reproducibility of pip's output artifacts. (This is critical for e.g., avoiding unnecessary invalidation in content-addressable caching.) If --build-dir
went away, I would probably either have to fork pip or resort to more devious schemes like replacing /dev/urandom
with something deterministic.
I'd like to note that another use case for --build
in addition to reproducible builds, is the use of compiler build caches, ccache in particular. In practice (see https://github.com/scipy/scipy/pull/7647, https://github.com/spacetelescope/asv/issues/201, https://github.com/spacetelescope/asv/pull/203) the fact that pip builds projects in a directory with a random name often prevents ccache from working properly.
One particular use case here is asv, which needs to rebuild and reinstall Python projects very many times, and significantly benefits from ccache. (For instance, with proper caching Scipy build goes from 10min to <1min.) However, we have to jump through some extra hoops in the build automation because --build
doesn't work as advertised.
I wasn't aware that --build
actually works with the (pretty obscure) file://...#egg=foo
syntax, but that looks like a somewhat unreliable workaround...
Another use case for ccache is in continuous integration (e.g. travis-CI). Here, one can sometimes avoid use of pip, but it's not so easy to find out the reason why ccache doesn't work properly when using pip.
Confirmed the original behavior still occurs in 19.2.1. The fact that we don't respect --build
when given a file path definitely seems like a bug to me.
Would it be possible to use the user-specified
--build-dir
to runegg_info
in and clean out the directory before running the real build? That would fix this bug, make the behavior more consistent, and likely simplifyInstallRequirement
's code.
That may be better answered during the PR itself, but offhand it looks like that approach would run into issues:
_correct_build_location
, we're trying to namespace the build of the package using a directory under the build dir with the same name as the package itselfegg_info
.
Most helpful comment
I'd like to note that another use case for
--build
in addition to reproducible builds, is the use of compiler build caches, ccache in particular. In practice (see https://github.com/scipy/scipy/pull/7647, https://github.com/spacetelescope/asv/issues/201, https://github.com/spacetelescope/asv/pull/203) the fact that pip builds projects in a directory with a random name often prevents ccache from working properly.One particular use case here is asv, which needs to rebuild and reinstall Python projects very many times, and significantly benefits from ccache. (For instance, with proper caching Scipy build goes from 10min to <1min.) However, we have to jump through some extra hoops in the build automation because
--build
doesn't work as advertised.I wasn't aware that
--build
actually works with the (pretty obscure)file://...#egg=foo
syntax, but that looks like a somewhat unreliable workaround...Another use case for ccache is in continuous integration (e.g. travis-CI). Here, one can sometimes avoid use of pip, but it's not so easy to find out the reason why ccache doesn't work properly when using pip.