Pip: --build-dir is not respected when installing from an sdist without a req

Created on 20 Jan 2017  路  8Comments  路  Source: pypa/pip

Reproduction steps

$ python3 -m venv env
$ env/bin/pip --version
pip 9.0.1 from .../env/lib/python3.5/site-packages (python 3.5)
$ mkdir mydist
$ cat > mydist/setup.py
import os
import sys
from setuptools import setup
assert "egg_info" in sys.argv or "my-build-dir" in os.getcwd()
setup(name="example")
$ tar czf mydist.tar.gz mydist/*
$ env/bin/pip install -v mydist.tar.gz 
Processing ./mydist.tar.gz
  Running setup.py (path:/tmp/pip-ynhyamwl-build/setup.py) egg_info for package from file:///home/benjamin/dev/repos/repo/mydist.tar.gz
    Running command python setup.py egg_info
    running egg_info
    creating pip-egg-info/example.egg-info
    writing top-level names to pip-egg-info/example.egg-info/top_level.txt
    writing dependency_links to pip-egg-info/example.egg-info/dependency_links.txt
    writing pip-egg-info/example.egg-info/PKG-INFO
    writing manifest file 'pip-egg-info/example.egg-info/SOURCES.txt'
    warning: manifest_maker: standard file '-c' not found

    reading manifest file 'pip-egg-info/example.egg-info/SOURCES.txt'
    writing manifest file 'pip-egg-info/example.egg-info/SOURCES.txt'
  Source in /tmp/pip-ynhyamwl-build has version 0.0.0, which satisfies requirement example==0.0.0 from file:///home/benjamin/dev/repos/repo/mydist.tar.gz
Installing collected packages: example
  Running setup.py install for example ...     Running command /home/benjamin/dev/repos/repo/env/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-ynhyamwl-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-mpcj19ev-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/benjamin/dev/repos/repo/env/include/site/python3.5/example
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-ynhyamwl-build/setup.py", line 4, in <module>
        assert "egg_info" in sys.argv or "my-build-dir" in os.getcwd()
    AssertionError
error
Cleaning up...
  Removing source in /tmp/pip-ynhyamwl-build
Command "/home/benjamin/dev/repos/repo/env/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-ynhyamwl-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-mpcj19ev-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/benjamin/dev/repos/repo/env/include/site/python3.5/example" failed with error code 1 in /tmp/pip-ynhyamwl-build/
Exception information:
Traceback (most recent call last):
  File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/req/req_set.py", line 784, in install
    **kwargs
  File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/req/req_install.py", line 878, in install
    spinner=spinner,
  File "/home/benjamin/dev/repos/repo/env/lib/python3.5/site-packages/pip/utils/__init__.py", line 707, in call_subprocess
    % (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command "/home/benjamin/dev/repos/repo/env/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-ynhyamwl-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install

Analysis

This bug only happens when a "req" is not given on the command line. (Indeed, it's possible work around the bug by passing a file:// URL with a #egg=mydist fragment.) When a "req" is not given, pip determines the distribution name by running setup.py egg_info and parsing the distribution name from the resulting metadata. egg_info is always run in a temporary directory selected by pip (see InstallRequirement.build_location). The bug is that this directory is then used for the actual build. The method InstallRequirement._correct_build_location is apparently supposed to address this very case by setting the build directory back to the user-specified one. However, it is always noop because it's early exit condition self.source_dir is not None is always true. (One can almost prove this because the singular caller of _correct_build_location, run_egg_info asserts that self.source_dir is not None.)

Would it be possible to use the user-specified --build-dir to run egg_info in and clean out the directory before running the real build? That would fix this bug, make the behavior more consistent, and likely simplify InstallRequirement's code.

deferred till PR help wanted bug

Most helpful comment

I'd like to note that another use case for --build in addition to reproducible builds, is the use of compiler build caches, ccache in particular. In practice (see https://github.com/scipy/scipy/pull/7647, https://github.com/spacetelescope/asv/issues/201, https://github.com/spacetelescope/asv/pull/203) the fact that pip builds projects in a directory with a random name often prevents ccache from working properly.

One particular use case here is asv, which needs to rebuild and reinstall Python projects very many times, and significantly benefits from ccache. (For instance, with proper caching Scipy build goes from 10min to <1min.) However, we have to jump through some extra hoops in the build automation because --build doesn't work as advertised.

I wasn't aware that --build actually works with the (pretty obscure) file://...#egg=foo syntax, but that looks like a somewhat unreliable workaround...

Another use case for ccache is in continuous integration (e.g. travis-CI). Here, one can sometimes avoid use of pip, but it's not so easy to find out the reason why ccache doesn't work properly when using pip.

All 8 comments

What is your usecase for using --build? In https://github.com/pypa/pip/issues/4371 I questioned just deprecating it because all of the mentions for uses cases I could find were better handled by our move to randomized build directories or setting TMPDIR.

Unfortunately, I use --build-dir to explicitly avoid the default randomized build directories for the purposes of reproducibility. It turns out many packages leak their build directories into the output wheel.

Huh, interesting. Are any of them public? I'm curious what they're doing that leaks that into the output.

A classic example is this numpy code generator that writes generator's __file__ (thus, including the build directory) into its output artifacts. C extensions also generally leak their build directory into the debugging info.

I decided it was easier to systematically bandaid the problem by using a constant --build-dir rather than whack-a-mole fixing issues in individual packages.

Am I correct in thinking that the leaked build directory name is not affecting the functionality of the resulting packages, it's just making the builds show as non-identical when rerun?

I have never seen the build directory actually effect package behavior, no. However, I aim for byte-for-byte reproducibility of pip's output artifacts. (This is critical for e.g., avoiding unnecessary invalidation in content-addressable caching.) If --build-dir went away, I would probably either have to fork pip or resort to more devious schemes like replacing /dev/urandom with something deterministic.

I'd like to note that another use case for --build in addition to reproducible builds, is the use of compiler build caches, ccache in particular. In practice (see https://github.com/scipy/scipy/pull/7647, https://github.com/spacetelescope/asv/issues/201, https://github.com/spacetelescope/asv/pull/203) the fact that pip builds projects in a directory with a random name often prevents ccache from working properly.

One particular use case here is asv, which needs to rebuild and reinstall Python projects very many times, and significantly benefits from ccache. (For instance, with proper caching Scipy build goes from 10min to <1min.) However, we have to jump through some extra hoops in the build automation because --build doesn't work as advertised.

I wasn't aware that --build actually works with the (pretty obscure) file://...#egg=foo syntax, but that looks like a somewhat unreliable workaround...

Another use case for ccache is in continuous integration (e.g. travis-CI). Here, one can sometimes avoid use of pip, but it's not so easy to find out the reason why ccache doesn't work properly when using pip.

Confirmed the original behavior still occurs in 19.2.1. The fact that we don't respect --build when given a file path definitely seems like a bug to me.

Would it be possible to use the user-specified --build-dir to run egg_info in and clean out the directory before running the real build? That would fix this bug, make the behavior more consistent, and likely simplify InstallRequirement's code.

That may be better answered during the PR itself, but offhand it looks like that approach would run into issues:

  1. As noted in _correct_build_location, we're trying to namespace the build of the package using a directory under the build dir with the same name as the package itself
  2. Also mentioned, we may not know the name of the package until after running egg_info.
Was this page helpful?
0 / 5 - 0 ratings