Pip: Solving issues related to out-of-tree builds

Created on 4 Jan 2020  ·  68Comments  ·  Source: pypa/pip

I open this issue as an attempt to consolidate discussion about out-of-tree builds, the related issues and possible solutions.

What's the problem this feature will solve?

When building projects from local directories, pip first copies them to a temporary location.

This approach has raised a number of issues over time:

  • When setup.py / pyproject.toml is not at the root of the project and when the build depends on resources outside of the setup.py / pyproject.toml subtree (#3500, #7549, #6276), e.g.:

    • build needs resources that are symlinks to files/directories outside of the subtree

    • build needs the git repository (eg when using setuptools_scm), and .git/ is in a parent directory not copied to the temp dir by pip

    • build relies on the subdirectory name (somewhat exotic maybe, yet I have such as case where I want to create a custom build backend and part of the metadata depends on the subdirectory name)

  • Performance issues when the the project directory is large (#2195).

Why does pip copy to a temporary directory before building? Caveat: this is unclear to me - here is what I collected so far:

  • To avoid relying to something out of source (https://github.com/pypa/pip/issues/2195#issuecomment-524606986) - although the definition of "out of source" is the cause of some issues above
  • To avoid polluting the source directory with build artifacts or residues (?)
  • Something else?

Possible Solutions

  1. Build an sdist in place, unpack the sdist in a temporary location, then build from that.
  2. Add a pip option to build in place.
  3. Update PEP 517 with some sort of mechanism to let back-ends communicate to front-ends if they are "safe" for in-place builds.
  4. Change pip to always build in place.
  5. Change pip to build in place by default with an option to build out-of-tree.

Additional context

More discussion about building via sdist on discuss.python.org.

needs discussion

Most helpful comment

Coming from https://github.com/pypa/pip/issues/2195#issuecomment-664728481 I can say that I'm more than happy to re-do #7882 behind --use-feature=in-tree-build.

All 68 comments

Looking at this from the back-end's perspective, the idea of an "out of tree build" is actually meaningless. The back end is given a "source tree" in the form of the current directory of the process, and is asked to execute a build. It's got no way of knowing whether that directory was extracted from a sdist, checked out of a VCS, or copied from somewhere else. All it can do is build, and if it doesn't have what it needs to build, report failure.

So that pretty much kills possible solution (3) IMO - the backend doesn't have a concept of what an in-place build is, so it can't say whether such a build is safe1.

As regards (2), I'm generally opposed to additional options like this. If we know what the best thing to do is, we should do it, and if we don't, then passing the problem onto the user is not a particularly friendly option. The issue here is subtle enough that I would expect few users to know what the correct choice is, so we'd likely see people just trying the 2 options and blindly using "whatever works". Also, the support implications are significant - offering an option clearly implies that we expect the build results to be different in at least some cases, so how do we test that the differences are as we'd expect? Are we responsible for educating users on when the in-place build flag would be needed or not (either via our documentation, or as a result of issues being raised by users who don't know which to use)?

Having said that, I'm not against pip simply building in place. I believe the reason we don't is because we had cases where artifacts left over from a previous build were used in a subsequent build, but were built with different options (for example, object files built in debug mode being linked into a release build done from the same tree). It is, however, reasonable to say that backends should ensure that issues like this don't happen, and if they do, it's a backend bug and not up to pip to try to defend against it. However, I'm not sure if taking such a stance is particularly user friendly - this came up during the PEP 517 discussions under the topic of incremental builds, and was controversial enough that it was deferred at the time.

My preference has been to build a sdist, and then build the wheel from that, for a long time now (your option 1). I'm OK with pip building in place (if we can agree that doing so is safe, given that as I noted above, we can't expect the back-end to let us know), but I think that would need a community discussion to thrash out the wider implications (on backends, pip/frontends and end users).

1 It would of course be possible to update PEP 517 to define what constitutes an "in-place" source tree as opposed to an "out of tree" build, but I suspect that would be a very difficult concept to pin down.

I added solutions 4. (Change pip to always build in place) and 5. (Change pip to build in place by default with an option to build out-of-tree).

I've mixed feelings regarding building via sdist (solution 1.) for the following reasons:

  1. some projects may have a broken sdist-to-wheel path; while I see the value of validating that building from sdists works, doing it by default now will certainly break a lot of end-user builds
  2. it still has performance implications for projects with large sdists, and due to additional subprocess calls
  3. I personally think we should not put too much emphasis on sdists as it is quite common for project to publish built artifacts and refer to their favorite code hosting platform for their source releases (for instance, backends that rely on the presence of a VCS checkout for building need to jump through hoops to produce a working sdist). And PEP 517 specifically allows for backends to raise UnsupportedOperation for build_sdist.

This post on discuss also summarizes similar arguments.

I agree the path for solution 3. is far from obvious.

I also agree we should avoid additional options if we can.

I also note there were some voices in favor of in-place builds in the linked discuss thread, but indeed a focused community discussion on that specific subject is necessary if we want to explore that approach.

  • Build an sdist in place, unpack the sdist in a temporary location, then build from that.

I think this is a good approach.

IMO chances are, this would be run for a single package in most pip install runs involving local directories -- most commonly I imagine pip install .. This would be done likely as part of the package development workflow.

Here's what I think this behavior should be:

  • If the backend is unable to create an sdist, do local-dir -> wheel (in-place)

    • I think the onus is very much on the backend to make sure that local-dir -> wheel is an idempotent operation in this case.

  • If the backend is capable of creating an sdist, do local-dir -> sdist -> unpacked-sdist -> wheel.

Doing local-dir -> sdist -> wheel, we do have an additional set of calls. However, I do think it's reasonable to to validate that generated sdists are sane, especially during development. tox already does this as part of its workflow, check-manifest to cover for setuptools' not-so-friendly interfaces here.

Overall, I do think the costs of building an sdist when given a local directory are worth it to prevent such errors in projects, especially since folks installing from local directories are likely the developers of the project themselves.

Regarding the rollout, I think we'd want to wait and watch how #6536 comes around. We'd definitely learn a few things that can be transferred over.

I prefer building/installing in-place without sdist (so setup.py install or setup.py bdist_wheel or calling build_wheel on the PEP 517 backend as applicable) vs building an sdist, unpacking, and installing from that. My specific reasons are:

  1. Supports the largest number of users out of the box.

    1. Assume pip does local-dir -> installed (via wheel build in-place or setup.py install). Users that want to go from local-dir -> installed can execute pip install local-dir. Users wanting to go from local-dir -> sdist -> installed are free to create an sdist and then execute pip install ./path/to/sdist. There are plenty of alternate tools that can build an sdist, and this is something that users will be more likely to have already unless they are hand-crafting the distributions they upload to PyPI.

    2. Now assume pip does local-dir -> sdist -> installed. Users that want to go from local-dir -> installed have no options that involve pip. Users wanting to go from local-dir -> sdist -> installed can execute pip install local-dir. The users with no options will ask pip for options to control the behavior or will have to find another tool that they wouldn't've otherwise needed.

  2. If we implement local-dir -> sdist -> installed, presumably we would also do it for VCS-based requirements? If so that's more work. If not, that's additional paths through the code and deviances in the way installation is handled that have to be remembered by users or when providing support.
  3. Least amount of work to implement. There are three places to change to implement local-dir -> installed (here, here, and here). For local-dir -> sdist -> installed I wouldn't even want to touch implementing this until #6607 is done, otherwise I think it would end up in a lot of places in the code base similar to artifact downloading/hash checking.
  4. Least amount of work to test since, IMO, existing tests are sufficient to cover the local-dir -> installed code path. For the local-dir -> sdist -> installed path we would want to verify that we're actually building an sdist, and that the fallbacks work for building a wheel directly.
  5. Least amount of work (computationally). As mentioned elsewhere, local-dir -> sdist -> installed is an additional sub-process call (and that sub-process is doing work). It also means pip has to unpack the sdist (hello virus scanners and otherwise slow disks) before doing the wheel build.

Regardless of approach, the only issue I see with doing things in-place is that for setuptools builds (I think this applies for legacy and PEP 517) we would end up with .egg-info in the project directory which will get mistaken as an "installed package" when pip is invoked with python -m pip in that directory. This would be fixed by #4575 which would presumably NOT include the current directory in the query for installed packages for any scheme.

Noting that I've grown to agree that the idea of skipping the sdist build and directly doing an in-tree build is a better approach for pip to take by default, and not trying to do local-dir -> sdist -> wheel.

In Fedora, when we build Python RPM packages, we are dinosaurs and the standard way to do thins is to use python setup.py build. With PEP 517 we added a "provisional" way of using pip wheel instead. However with extension modules we have a problem with the "move sources to tmp, build from there" approach that pip uses to build them.

Our build machinery is injecting some compiler flags so the build artifacts (.so extension modules in this case) contain metadata about their sources. Later there is a shell script that traverses the build artifacts, extracts this information and copies the sources to /usr/src/debug to be installed via a special *-debugsource RPM. The mahcinery expects everything to be built within the working tree and it doesn't really work nice when built outside. Here are the things that can be done (together) to mitigate the problem on our side:

  1. set the $TMPDIR environment variable to have it within the place where the RPM script expects it (i.e. export TMPDIR=%{_builddir}/.tmp (and create it))
  2. use pip wheel with the --no-clean option to keep the copied sources in $TMPDIR
  3. run some shell kung fu to rewrite the "what is my source" information to the correct location: find %{buildroot} -iname '*.so' -print0 | xargs --no-run-if-empty -0 -n1 /usr/lib/rpm/debugedit -b "%{_builddir}/.tmp/pip-req-build-"* -d "$PWD"
  4. (Optional: clean $TMPDIR manually.)

We don't particularly like the third step because it relies on too many implementation details:

  • /usr/lib/rpm/debugedit API and location (and existence)
  • the pip-req-build name
  • the mechanism with which pip builds the sources

If pip would always build in place or if there would be command line switch for this, the issue would go away.

Downstream report: https://bugzilla.redhat.com/show_bug.cgi?id=1806625

Noting that I've grown to agree that the idea of skipping the sdist build and directly doing an in-tree build is a better approach for pip to take by default, and not trying to do local-dir -> sdist -> wheel.

I'm also becoming more inclined to accept the idea of doing an in-place wheel build. My remaining reservations are:

  1. We'd be relying on the backend behaving "correctly" - e.g., not giving different results based on left-over data from previous builds, or whatever. I'm fine with making that assumption, but I'm concerned about the support cost if we start getting people saying "pip built my wheel wrongly" and we have to debug only to find that it's a backend issue.
  2. I think that whatever approach we take, we should very clearly and explicitly document it, and make a strong effort to move completely over to it. We really don't want to add "yet another approach" to pip while leaving all of the old code around for compatibility. If we can't commit to the new approach, we're just introducing yet more maintenance headaches.
  3. As a corollary to the above, we should make sure there are no projects that we know of which rely on our current "copy and build" approach, as if there are we'll be breaking them with this change.

... and of course someone will need to write a PR implementing this change (with tests, docs, etc - the usual stuff) otherwise all we're doing is talking 🙂

We'd be relying on the backend behaving "correctly" - e.g., not giving different results based on left-over data from previous builds, or whatever. I'm fine with making that assumption, but I'm concerned about the support cost if we start getting people saying "pip built my wheel wrongly" and we have to debug only to find that it's a backend issue.

Would it make sense to extend the PEP 517 interface to include a “clean” hook? We’d probably want it as some point anyway to enable other endeavours (e.g. implement editable install, building a package development tool that builds any PEP 517 project). pip can call it here to make sure there are no leftover junk before doing in-tree build.

Would it make sense to extend the PEP 517 interface to include a “clean” hook?

Maybe? But if pip automatically calls clean, there's bound to be someone who wants to not do so, to do incremental builds or something. And then we end up with another option.

My inclination is to stick with my position "we need to be able to assume that it's the responsibility of the backend to ensure that in-place builds work correctly". Even if that turns out to be untenable, getting concrete examples of why it doesn't work will help us better understand what to do about the issue, rather than just guessing.

  1. We'd be relying on the backend behaving "correctly" - e.g., not giving different results based on left-over data from previous builds, or whatever.

I'd be tempted to extend PEP-517 to make this an explicit requirement.

I'd be tempted to extend PEP-517 to make this an explicit requirement.

It already says this:

The backend may store intermediate artifacts in cache locations or temporary directories. The presence or absence of any caches should not make a material difference to the final result of the build.

It's not so much that backends are likely to violate this requirement deliberately, as that users will naturally report the issue as a pip issue, and get redirected to the backend project, which is a bit of extra overhead.

When trying t figure a workaround for Fedora, we've been hit by https://github.com/pypa/pip/issues/7872

Since we have resolved the ".egg-info in cwd" issue with #7731 and friends, that is one less thing to worry about when building in place.

So option 4 (always build in place) was implemented in #7882 .

We have now (per #7951) published a beta release of pip, pip 20.1b1. This release includes #7882, which implemented a solution for this issue.

I hope participants in this issue will help us by testing the beta and checking for new bugs. We'd like to identify and iron out any potential issues before the main 20.1 release on Tuesday.

I also welcome positive feedback along the lines of "yay, it works better now!" as well, since the issue tracker is usually full of "issues". :)

We totally plan to test it out in Fedora (we already planned that before your comment), however the Tuesday deadline is probably not realistic.

@hroncok Any idea by when Fedora might be able to test out these changes?

I'll try our best to do it somehow on Monday, however I cannot make any promises.

Indeed, 20.1b1 makes our problems go away.

More general 20.1b1 feedback in https://mail.python.org/archives/list/[email protected]/message/5EAUIYYIRKXEHTAG5GQ7EJHSXGZIW2F7/

Hurray! Many thanks for trying out the beta @hroncok! Much appreciated! ^>^

One result of build in place: I had been building wheels for multiple versions of python in parallel (inside a manylinux docker container). With in-place builds, parallel builds don't work as the different version conflict. With out-of-tree builds, each version made a separate tree and didn't have a problem.

@manthey that discussion is under #8168

So it's been more than 10 days now. There were a few issues raised about the change (all expected I'd say - #8165, #8168, #8196). There were also people explicitly mentioning the change is helping them.

  • Besides performance issues, the previous behavior (copy to temp dir) had correctness issues (linked above) that are impossible to fix by pip without context knowledge that only the caller has (and, as a side note, that copytree code was already full of band-aids to cope with weird situations - tmpdir in tree, sockets, etc).
  • An option to activate the previous behavior would still have the correctness and performance issues.
  • A correct solution will involve build back-end support to control the build directory which does not fully exist today (for instance, setuptools bdist_wheel has --bdist-dir, but still writes .egg-info in place, see also https://github.com/pypa/setuptools/issues/1816, https://github.com/pypa/setuptools/issues/1825). So now that pip behaves correctly maybe the discussion can shift to see if, e.g., setuptools can evolve an option to do a build without touching the source directory and then look if a PEP 517 change is needed or not to control that option.
  • In the meantime the issues reported are probably relatively easy to work around by callers (e.g. by copying themselves to a temp directory, or making a temporary tarball, which they can do correctly with full knowledge of the context).
  • Lastly, it's hard to tell for sure with the data we have, but my intuition is that this change helps more people than it hurts.

So I hate breaking changes, but this one was not done lightly, and for these reasons, I'm personally inclined to keep it.

I don't agree that the new behavior is pip behaving "correctly", differently for sure, and apparently in some cases differently broken and I think framing it as such is incorrect. It represents a tradeoff for one set of broken users for a different set, in both cases here were workarounds that could be done.

I wouldn't have merged this change and had missed it happening or I would have argued against it (and I think that now having merged it, it makes some using pip as a forcing function for helping to prevent certain types of broken packages significantly harder). That being said, I don't really know if reverting is the right thing here. It can get even more confusing for users if behavior flip flops around a lot. If we're going to revert we should do it quickly, if not then the current behavior should probably stand for better or worse.

I used the term "correctly", because before pip wheel <localdir> and pip install <localdir> were generating a different wheel than cd <localdir> ; setup.py bdist_wheel in some cases: with different of missing files in presence of symlinks (#3500), or a different version with setuptools_scm (#7549), or https://github.com/pypa/pip/issues/7555#issuecomment-595180864, or #6276, or plain errors. I don't think pip 20.1 generates such bad wheels/installs, so in that sense I believe it is indeed more correct.

Of course we knew the change would break some workflows and the tradeoff has to be re-evaluated now that we have feedback, and be reverted or confirmed for good.

And it still can produce different wheels than setup.py sdist && pip install dist/*.tar.gz.

My suggestion would be to revert the PR and implement the fix by first shuffling through a sdist and then building a wheel from the resulting sdist.

This should solve all of the correctness issues except in cases that the project is incapable of correctly building a sdist and that is, IMO, not an important use case to solve for.

The trade off there is that it will be slower. However, once we have that implemented we can then further refine the PEP 517 interface to add optional APIs that enable speed ups. That will likely still never be quite as fast as this change does, but we can certainly get closer to it.

This change as is makes it effectively impossible to further push the needle on correctness without introducing performance regressions that users are unlikely to be happy about. However if we make it correct and then enhance with performance we can come to a happy middle ground that satisfies both parties.

As the old adage says, first make it correct, then make it fast. I fear with this PR we have made it fast and locked out our ability to make it correct.

I still agree validating sdists is desirable but not at pip install time. Maybe that's a feature of for a future sdist builder tool or pip build command.

Also, setup.py sdist creates .egg-info in local directory, so the reported issues with read-only source dir or concurrent builds would stay.

If it doesn’t happen at pip install time it functionally doesn’t happen until someone else’s pip install time. Skipping it just means we have multiple paths that a project can go on to go from VCS to installed package and every path is another chance for differences. This isn’t a new thing, basically every option we have that changes the install path ends up with a different set of bytes on disk, even for the most rigorous projects. Subtle differences always exist and you just hope those differences are not meaningful— or you can do what you can to remove those differences by structurally making it impossible to have them from the start.

Some performance issues could indeed reappear if/when building via sdist but they would probably be an order of magnitude lower than what we had in pip < 20.1. Indeed most of it often came out of copying .git, or a venv, or other unrelated voluminous stuff that would not be in the sdist.

Regardless of what pip'll eventually end up with, could we make the other an option, since it's unlikely either is able to satisfy everyone? I imagine that if the current approach is to be kept (I don't really have an opinion on which should be the default), we should be able to provide a last-result fallback where an user can choose to create an sdist and install the package from there.

Also, setup.py sdist creates .egg-info in local directory, so the reported issues with read-only source dir or concurrent builds would stay.

I think (at least a quick test agrees with me) that only setuptools (not distutils) does it, and this behavior is configurable to create the dir elsewhere. Similar with other backends, we should be able to recommend them to do a clean sdist generation.

FWIW, I don't think we'd need to dump the sdist-generation --egg-info directory into the working directory, if we go down the generate-sdist-unpack-it-build-wheel approach, since we can dump that into a temporary directory, like we do for generate_metadata.

@pradyunsg does that not require a change in setuptools? Last time I checked the sdist command did not have an option to specify the .egg-info base location, contrarily to egg_info which has an --egg-base option that we leveraged in #7978.

Indeed! I was looking at the wrong file in setuptools. 🙈 I stand corrected.

Why is everything so complex in this space? :(

$  ls -la
total 8
drwxr-xr-x  3 dstufft  staff   96 May  6 14:26 .
drwxr-xr-x  9 dstufft  staff  288 Apr 28 15:46 ..
-rw-r--r--  1 dstufft  staff   85 Apr 23 16:23 setup.py

$  py setup.py egg_info --egg-base /tmp/foo sdist
/Users/dstufft/.pyenv/versions/3.8.2/lib/python3.8/site-packages/setuptools/dist.py:471: UserWarning: Normalizing '2020.04.23.3' to '2020.4.23.3'
  warnings.warn(
running egg_info
creating /tmp/foo/dstufft.testpkg.egg-info
writing /tmp/foo/dstufft.testpkg.egg-info/PKG-INFO
writing dependency_links to /tmp/foo/dstufft.testpkg.egg-info/dependency_links.txt
writing top-level names to /tmp/foo/dstufft.testpkg.egg-info/top_level.txt
writing manifest file '/tmp/foo/dstufft.testpkg.egg-info/SOURCES.txt'
reading manifest file '/tmp/foo/dstufft.testpkg.egg-info/SOURCES.txt'
writing manifest file '/tmp/foo/dstufft.testpkg.egg-info/SOURCES.txt'
running sdist
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

running check
warning: Check: missing required meta-data: url

warning: Check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied

creating dstufft.testpkg-2020.4.23.3
copying files to dstufft.testpkg-2020.4.23.3...
copying setup.py -> dstufft.testpkg-2020.4.23.3
Writing dstufft.testpkg-2020.4.23.3/setup.cfg
creating dist
Creating tar archive
removing 'dstufft.testpkg-2020.4.23.3' (and everything under it)

$ ls -la                                        
total 8
drwxr-xr-x  4 dstufft  staff  128 May  6 14:28 .
drwxr-xr-x  9 dstufft  staff  288 Apr 28 15:46 ..
drwxr-xr-x  3 dstufft  staff   96 May  6 14:28 dist
-rw-r--r--  1 dstufft  staff   85 Apr 23 16:23 setup.py

https://github.com/pypa/pip/issues/8165#issuecomment-624669107 this feels like a pretty show stopping bug, arguably not our bug but it is one types of bugs that I argued would happen during the PEP 517 discussion when doing in place builds by default came up.

bdist_wheel has been asked to clean its build directory automatically in the past. That feature should go in. Do the other distutils builds clean?

If it was SCons it would remember the files it cared about and would omit extra files in the build/ directory from the wheel even if they were present in the filesystem.

I believe the above issue doesn't just affect manylinux. It should happen anytime the build directory is not specific enough to capture the ABI (in setuptools case, it appears platform and python version is all that is captured in the ABI tag in the build directory). I think this extends beyond ABI with the current interpreter too, if something links against NumPy for instance, I think it has an ABI that will work on newer, but not older NumPy, and unless they encode that in the build directory naming then this will effect uses like that too.

Automatically cleaning the build directory doesn't solve the problem, it just makes it less likely (for instance running two pip wheel invocations in parallel could still trigger the problem), on top of that one of the supposed reasons for implementing in this way (at least during the PEP 517 discussion) was that this would provide more performance by allowing caching between invocations for incremental builds. IOW the current behavior is what some subset wanted, reusing build artifacts between runs, it just so happens that by far the most common build backend gets it very wrong (and arguably in some cases doesn't have enough information to get it right without per package customization).

Of course with enough flags to the underlying setuptools command you can remedy this (something like py setup.py egg_info --egg-base /tmp/foo build --build-base /tmp/foo/build-base bdist_wheel --bdist-dir /tmp/foo/bdist would do it).

I'd reiterate though that the problem isn't extra files, it's that the expected ABI that the wheel was compatible with changed, and the .so didn't get rebuilt. If SCons is smart enough to know that Python built with pymalloc needs one build directory and Python built with another (including things like NumPy versions the .so might link to) then it is not affected. If it would reuse a previously built artifact with a different ABI then it is affected.

I attempted to test enscons but I couldn't get rsalette to build at all without error.

I attempted to test scikit-build to see how it handled incremental builds, and no matter what I did it jut puked on itself on the 2nd build and I had to manually delete the _skbuild directory each time to get it to run without error.

Cool. Sorry unfortunately enscons has updated and rsalette hasn't.

On Wed, May 6, 2020, at 4:18 PM, Donald Stufft wrote:

I attempted to test enscons but I couldn't get rsalette to build at all without error.

I attempted to test scikit-build to see how it handled incremental builds, and no matter what I did it jut puked on itself on the 2nd build and I had to manually delete the _skbuild directory each time to get it to run without error.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/7555#issuecomment-624867490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABSZERIEDAPUIXCPAKBBUDRQHAXRANCNFSM4KCV5MHQ.

Sorry unfortunately enscons has updated and rsalette hasn't.

Is there a good C ext that uses enscons that is updated? I just picked rsalette because it was first in the list and didn't feel like debugging it, happy to try with something else.

The only problem with rsalette is that it shouldn't pass ROOT_IS_PURELIB to Environment in SConstruct. It doesn't have a C extension. cryptacular looks okay.

#8165 (comment)

With this, I think I'm in agreement that we should revert this change.

I think the new problems are much bigger than anticipated. Perhaps a quick 20.1.1 to revert and then we can have a longer discussion on how to solve both problems of in-tree and out-of-tree builds?

I also vote for revert and pursuing https://discuss.python.org/t/proposal-adding-a-persistent-cache-directory-to-pep-517-hooks/2303/15 as a solution for this (that would allow backends to build not in-place, so not expose such issues). Chim in on that thread too if you agree with the suggestion there.

That seems sensible to me as well. I think that an in-tree (or build-from-sdist) approach has some extremely significant benefits (I'm pretty sure we'll get howls of complaint from part of the user base when we revert 🙂) but the downsides are also significant.

I'm not clear what the UI should be here (default to which approach? what sort of options should we have?) but I do think we should take a bit more time to decide that, rather than making decisions while firefighting the current problems.

Alrighty! I think the general consensus is to revert and reassess. I'll file a PR for this. :)

Chim in on that thread too if you agree with the suggestion there.

Please do - I've been commenting, but I've got to the point where I don't know enough to offer meaningful suggestions, so input from people with more experience would be valuable.

Just dropped a couple of big-blobs-of-text in https://github.com/pypa/pip/issues/8165#issuecomment-625401463. I'm gonna step away for today now... I ended up feeling a bit frustrated while writing the personal notes at the end. Ending up on #5599 and reading negative user comments certainly didn't help.

Hi folks, I gave some more thought to this, here is my current point of view on this matter.

  1. Build an sdist in place, unpack the sdist in a temporary location, then build from that.

I still believe pip install / pip wheel is not the right place to try and catch bad sdists. Should that not be a back-end responsibility to not create bad sdists in the first place? Moreover I'd think unconditional build via sdist is probably as disruptive as build in place.

  1. Add a pip option to build in place.

The one I like best short term, since solution 4 did not make it. Is it premature to add that in pip 20.1.1?

  1. Update PEP 517 with some sort of mechanism to let back-ends communicate to front-ends if they are "safe" for in-place builds.

With this pip would still need to fall back to its broken and unfixable copytree so I'm not in favor of this one.

  1. Change pip to always build in place.

So this one is deemed too disruptive and we'll revert in 20.1.1.

  1. Change pip to build in place by default with an option to build out-of-tree.

That could be the long term target, the option to build out-of-tree blending with the cache directory concept?

I really don't like CLI options, particularly ones like this. What if I list two packages that are on my local FS and I need to build one in place and one not? If we provide an option to do one or the other there are going to end up existing packages that can only be built with one or the other.

It also smells to me like the kind of option that exists entirely because a project was unable to come to a decision, and decided to just push that decision off to the end user.

Building via sdist isn't exactly about catching bad sdists. In large part what it is about is reducing the possible variations of "path" that a project can go through before ending up installed. Adding a flag in some ways makes that problem worse not better.

For what I mean we have a few "paths" that installs can go through:

  1. VCS -> Sdist -> Wheel -> Installed
  2. VCS -> Wheel -> Installed
  3. VCS -> Installed (Legacy)

There are some additional paths that are sort of being phased out, but in general those are our 3 (and ideally 3 gets phased out too). There's also editable installs but they're not going away anytime soon.

We can consider either an sdist or a wheel being uploaded to PyPI and installed from that to till be part of the same "path", you're just pausing it and finishing it on another computer.

The problem with having multiple "paths" like this, is it introduces inconsistencies in the final resulting install. They don't always happen, but it is an easily observable case that it happens frequently. Often times these inconsistencies are no big deal, but sometimes they are.

If we're doing in place builds like this, then we are effectively saying we're never going to be able to collapse down to a single path, and we're just always going to have to deal with this weird edge case where sometimes people will get different results based on how the install was done.

As an added benefit, this can also act as a forcing function to help ensure that the happy path stays happy.

Mostly I agree with @dstufft, and in particular I agree that the build-from-sdist approach should be viewed not as "trying to validate sdists" but as "everything follows the "source tree -> sdist -> wheel -> install route (just some things skip some initial steps)".

However, I do want to pick up on one point:

What if I list two packages that are on my local FS and I need to build one in place and one not?

Just run the two packages in two separate runs of pip with different options?!? I know it's possible one is a dependency of the other and your point in general holds, But there does seem to be a general inclination for people to assume that every install scenario has to be collapsed into a single run of pip, and I don't think it's reasonable (we've had perfectly good workarounds for problems get rejected by the user because "it means I'd have to split my list of requirements in two")

Note that when (if) we revert, we'll need to reopen issues like #6276 which was closed as a result of implementing in-tree builds.

Part of the problem is pip doesn't consider what is already installed when resolving dependencies (I'm not sure if the new resolver work changes that?) so you need to have everything contained within a single pip invocation if you want it to resolve dependencies "correctly" (to the extent our current resolver does anything correctly).

If the new resolver takes what is already installed into account then pip install foo bar and pip install foo && pip install bar would be roughly equal and not matter at all, but if it doesn't (and the same is roughly true now) if both projects depended on "spam" but foo required < 2 and bar required > 1 then we'd get an invalid install.

That's a tangent though :)

(I'm not sure if the new resolver work changes that?)

Inputs welcome on #7744. :)

  1. Change pip to always build in place.

So this one is deemed too disruptive and we'll revert in 20.1.1.

To be clear, it's also that we "rolled it out too fast" and the rollout approach we took is definitely part of why this ended up being too disruptive.

  1. Add a pip option to build in place.

@dstufft @pfmoore I see that kind of option as an opt-in mechanism so we can progressively nudge users towards in-place builds with a goal of making it the default at some point. In the spirit of this comment: https://github.com/pypa/pip/issues/8165#issuecomment-625501216

I'll file a PR for this. :)

8221 it is.

20.1.1 has been released, containing the reverted changes.

In Fedora, when we build Python RPM packages, we are dinosaurs and the standard way to do thins is to use python setup.py build. With PEP 517 we added a "provisional" way of using pip wheel instead. However with extension modules we have a problem with the "move sources to tmp, build from there" approach that pip uses to build them.

Our build machinery is injecting some compiler flags so the build artifacts (.so extension modules in this case) contain metadata about their sources. Later there is a shell script that traverses the build artifacts, extracts this information and copies the sources to /usr/src/debug to be installed via a special *-debugsource RPM. The mahcinery expects everything to be built within the working tree and it doesn't really work nice when built outside. Here are the things that can be done (together) to mitigate the problem on our side:

1. set the `$TMPDIR` environment variable to have it within the place where the RPM script expects it (i.e. `export TMPDIR=%{_builddir}/.tmp` (and create it))

2. use `pip wheel` with the `--no-clean` option to keep the copied sources in `$TMPDIR`

3. run some shell kung fu to rewrite the "what is my source" information to the correct location: `find %{buildroot} -iname '*.so' -print0 | xargs --no-run-if-empty -0 -n1 /usr/lib/rpm/debugedit -b "%{_builddir}/.tmp/pip-req-build-"* -d "$PWD"`

4. (Optional: clean `$TMPDIR` manually.)

This is effectively the path I started down when looking pip integration, #6505, etc.

Iterative builds with pip are effectively broken today, which is a major loss for groups that have a large amount of python code in C extension form, hence I resorted to invoking the build with setup.py.

pip needs a build command and the end result of the build command should be passable along to other subcommands, like wheel, install, etc.

Right now pip effectively treats install as build and install, which is _not_ what some folks want who are building and caching binary artifacts, installing binaries over read-only mounts, etc.

I really wish there was a way to use setup.py to build the binaries, then pip install them without resorting to creating a bdist, but that doesn't seem to be possible today, since pip and distutils/setuptools don't agree on where to find the binary artifacts.

I really wish there was a way to use setup.py to build the binaries, then pip install them without resorting to creating a bdist, but that doesn't seem to be possible today, since pip and distutils/setuptools don't agree on where to find the binary artifacts.

I'm not sure I follow - you're saying you want a way to use binaries but don't want to use the binary distribution formats that exist already. Why is that?

I really wish there was a way to use setup.py to build the binaries, then pip install them without resorting to creating a bdist, but that doesn't seem to be possible today, since pip and distutils/setuptools don't agree on where to find the binary artifacts.

I'm not sure I follow - you're saying you want a way to use binaries but don't want to use the binary distribution formats that exist already. Why is that?

The bdist formats are extremely limiting. My group has to resort to a dumb format, like tar, then unpack it verbatim (none of the BSDs are supported, Debian isn't supported, etc).

What I discovered last night is that using a dumb bdist isn't installable via pip. The dumb binaries lack the metadata required to be installed via pip, AFAICT, which is where I guess pip wheels come into play.

I tried egg and zip, as well, but they lack the metadata needed to install using just a file:// URI.

I've been screwing around with trying to shoehorn building via distutils, setuptools into a larger build system using make, so I can't state if I've done "all the right things" to make things work the way that a standard bdist call would have.

Coming from https://github.com/pypa/pip/issues/2195#issuecomment-664728481 I can say that I'm more than happy to re-do #7882 behind --use-feature=in-tree-build.

Hurrah! Sounds like a plan!

Let's also update --build's docstring this time. ;)

Coming from #2195 (comment) I can say that I'm more than happy to re-do #7882 behind --use-feature=in-tree-build.

Curious if it as well as by command-line it would be reasonable to have an in-tree-build option set in pyproject.toml ? This would be pretty nice for resolving #6276 without needing to make a bash script or makefile to wrap pip. (Not that that's a particularly big issue.)

have an in-tree-build option set in pyproject.toml

@davidhewitt this is more or less option 3 in the original description of this issue. My understanding is that the current consensus is that it is better to avoid an additional option if we can. Therefore the idea to enable in-tree builds with --use-feature during a transition period, with a longer term goal of making it the default and only mechanism.

BTW, I will not be able to implement this in time for 20.3, but I still intent to do it, hopefully in 20.4.

@sbidoul I wrote a patch to help get this feature in - see #9091.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

qwcode picture qwcode  ·  89Comments

Miserlou picture Miserlou  ·  71Comments

gaborbernat picture gaborbernat  ·  66Comments

jaraco picture jaraco  ·  99Comments

dstufft picture dstufft  ·  102Comments