Pipenv: Investigate pip-faster & venv-update

Created on 3 Jan 2018  路  8Comments  路  Source: pypa/pipenv

Most helpful comment

@ncoghlan yes and no, some of the parts I believe are upstreamable, others we've tried and gotten a firm no -- here's the bits that make it work:

  1. avoid network calls when == pinned and cache is available on disk - this is the one upstream pip has said no to, and in my mind there's a fair point here. pip-faster avoids a round trip to pypi to "verify" that this version is in fact available (because it knows it is because it is on disk). The pushback from upstream is this violates potential protocols where a version would be removed from pypi (security, or other reasons) and therefore it shouldn't be installable. pip-faster decided to trade that off for huge benefits in low-bandwidth (and functionality _at all_ in zero bandwidth) situations. Understandably this piece will probably not be upstreamable without a change of heart or a hardening from a security perspective (maybe some sort of clever hashing scheme?). This was, I believe, the last attempt at upstreaming this optimization: https://github.com/pypa/pip/pull/2114
  2. --prune: pip-faster removing unneeded packages and syncing the installed state to a specific list of requirements. This I think is the best deliverable from the pip-faster and venv-update suite. It does some clever tracing of the install requirements and uninstalls what's not needed. If anything I think this is the component that would have the most value to the community if upstreamed as a pip install --prune -r ... (or --pipfile ... or whatever) option. The current implementation has one _major_ flaw outlined here. I wouldn't say this is unfixable (iirc I've outlined in a comment on that issue the exact place in pip where the smart stuff would need to happen -- just needs someone with slightly more free time than I have and a motive to do it!).
  3. dependency resolution (well kinda). pip-faster does a traversal of dependencies and determines if conflicts exist and raises them as errors at install time. This is another piece that could be upstreamed, but again it's difficult. The current approach leans heavily on pip and pkg-resources internals to accomplish this traversal, though it exposes it in a more meaningful way than I've seen in other tools (piptools, pkg_resources alone) and at least exposes this information at all. This is in the direction of https://github.com/pypa/pip/issues/988, though venv-update's is very primitive (it mostly verifies the resolution over _resolving_ the resolution).
  4. virtualenv invalidation. Actually, this is all that venv-update really does now that we've pushed most of the responsibilities into pip-faster. This detects when the interpreter used to build the virtualenv has changed and recreates it. For instance: virtualenv created with python 3.6.3, I install an updated from deadsnakes which releases 3.6.4. The virtualenv _may_ work, but may not due to breaking changes in the imports from builtin modules in the stdlib (traces like this). venv-update saves some metadata and invalidates this when a full virtualenv rebuild is necessary.
  5. bootstrap a virtualenv from ~nothing. This is another part that I think is useful but I don't know where it would have a home. I've made a mini split out version of this and proposed something similar to pypa but unfortunately heard nothing back. create_bootstrap_script in virtualenv itself hasn't worked for a while and no longer seems maintained.

pip-faster itself is a thin wrapper (well, monkeypatch ok -- and yes the internal module reshuffling makes this more difficult to maintain in the long term) around vanilla pip -- in theory whatever pip supports, pip-faster will also inherit those features (whether that's pipfile, pyproject.toml, etc.).

I'd love for pip-faster / venv-update to just *poof* eventually (by being made obsolete by upstream contributions). if pip / pipenv is the answer to where these features should end up I'd be happy to help make that happen.

Happy to go into more detail, sorry if this was rambly (late night brain dump) :) -- also we'd love to get some pointers on what stuff is desirable and how we can go about upstreaming it. pip hasn't seen a release in quite some time and there's a lot of things we'd love to contribute.

All 8 comments

we could potentially depend on them, or steal from them

Cool to see this considered. I (and/or the people I've tagged below) may be able to assist with integration and/or working out the edges here.

CC @bukzor @chriskuehl

@asottile Have you had much opportunity to look at what could be integrated back into pip itself? Or into the libraries that pip depends on? (packaging, distlib, etc)

(I know that pushing performance improvements upstream can be a pain, since you need to account for use cases that an opt-in alternative tool can ignore, but the recent move to hide all of pip's internal APIs should grant a lot more freedom to refactor things, and the approval of the pyproject.toml PEPs means alternative implementations are either going to need to inherit pip's support for those, or else implement their own version of it)

@ncoghlan yes and no, some of the parts I believe are upstreamable, others we've tried and gotten a firm no -- here's the bits that make it work:

  1. avoid network calls when == pinned and cache is available on disk - this is the one upstream pip has said no to, and in my mind there's a fair point here. pip-faster avoids a round trip to pypi to "verify" that this version is in fact available (because it knows it is because it is on disk). The pushback from upstream is this violates potential protocols where a version would be removed from pypi (security, or other reasons) and therefore it shouldn't be installable. pip-faster decided to trade that off for huge benefits in low-bandwidth (and functionality _at all_ in zero bandwidth) situations. Understandably this piece will probably not be upstreamable without a change of heart or a hardening from a security perspective (maybe some sort of clever hashing scheme?). This was, I believe, the last attempt at upstreaming this optimization: https://github.com/pypa/pip/pull/2114
  2. --prune: pip-faster removing unneeded packages and syncing the installed state to a specific list of requirements. This I think is the best deliverable from the pip-faster and venv-update suite. It does some clever tracing of the install requirements and uninstalls what's not needed. If anything I think this is the component that would have the most value to the community if upstreamed as a pip install --prune -r ... (or --pipfile ... or whatever) option. The current implementation has one _major_ flaw outlined here. I wouldn't say this is unfixable (iirc I've outlined in a comment on that issue the exact place in pip where the smart stuff would need to happen -- just needs someone with slightly more free time than I have and a motive to do it!).
  3. dependency resolution (well kinda). pip-faster does a traversal of dependencies and determines if conflicts exist and raises them as errors at install time. This is another piece that could be upstreamed, but again it's difficult. The current approach leans heavily on pip and pkg-resources internals to accomplish this traversal, though it exposes it in a more meaningful way than I've seen in other tools (piptools, pkg_resources alone) and at least exposes this information at all. This is in the direction of https://github.com/pypa/pip/issues/988, though venv-update's is very primitive (it mostly verifies the resolution over _resolving_ the resolution).
  4. virtualenv invalidation. Actually, this is all that venv-update really does now that we've pushed most of the responsibilities into pip-faster. This detects when the interpreter used to build the virtualenv has changed and recreates it. For instance: virtualenv created with python 3.6.3, I install an updated from deadsnakes which releases 3.6.4. The virtualenv _may_ work, but may not due to breaking changes in the imports from builtin modules in the stdlib (traces like this). venv-update saves some metadata and invalidates this when a full virtualenv rebuild is necessary.
  5. bootstrap a virtualenv from ~nothing. This is another part that I think is useful but I don't know where it would have a home. I've made a mini split out version of this and proposed something similar to pypa but unfortunately heard nothing back. create_bootstrap_script in virtualenv itself hasn't worked for a while and no longer seems maintained.

pip-faster itself is a thin wrapper (well, monkeypatch ok -- and yes the internal module reshuffling makes this more difficult to maintain in the long term) around vanilla pip -- in theory whatever pip supports, pip-faster will also inherit those features (whether that's pipfile, pyproject.toml, etc.).

I'd love for pip-faster / venv-update to just *poof* eventually (by being made obsolete by upstream contributions). if pip / pipenv is the answer to where these features should end up I'd be happy to help make that happen.

Happy to go into more detail, sorry if this was rambly (late night brain dump) :) -- also we'd love to get some pointers on what stuff is desirable and how we can go about upstreaming it. pip hasn't seen a release in quite some time and there's a lot of things we'd love to contribute.

Thanks @asottile. Re-using your numbers:

  1. This sounds like something pipenv could adopt for the --keep-outdated option when updating the lock file and for the new pip sync subcommand (see https://github.com/pypa/pipenv/issues/1255 for discussion of both of those proposals). For pip itself, it might be worth asking about a --no-removal-check option (similar to the --disable-pip-version-check option).
  2. This sounds a lot like what I'm proposing for the pipenv sync subcommand as part of #1255 (I hadn't really thought about an implementation strategy at all yet, I was just handwaving based on the assumption that if pip-sync can do it, so can pipenv).
  3. For pipenv, I'm hoping we can move our conflict detection to pipenv lock, and then have the installation commands always start by updating the lock file first (pondering that problem is how I came to write the initial design proposal in #1255 in the first place)
  4. This could potentially have an upstream home in pipenv sync (we'd just be syncing the virtualenv with the host Python in addition to syncing it with the lock file)
  5. PEP 405 (python -m venv being available by default) + PEP 538 (python -m pip being available by default) took quite a bit of pressure off the bootstrapping experience in general (at least for new Python-3-only users), but there are definitely still non-trivial problems with it. It's especially noticeable in the pipenv tutorial, where we ended up relying on pip install --user pipenv as our cross-platform bootstrap mechanism since we decided the alternatives were even worse from a UX perspective: https://packaging.python.org/tutorials/managing-dependencies/#installing-pipenv

    That said, there is a curl based option for systems that ship with curl: https://docs.pipenv.org/install/#crude-installation-of-pipenv

Do note, at this point in time, our vendored version of Pip (e.g. v9) will be in use for quite some time after v10 is being used and installed by end users. Pip-tools heavily relies on functionality provided by the pip9 api that has been removed/refactored out of pip10.

I tried experimenting with pip-faster, but I keep getting this error:

Traceback (most recent call last):
  File "t.py", line 4, in <module>
    pip_faster.main()
  File "/Volumes/KR/Library/Mobile Documents/com~apple~CloudDocs/repos/pypa/pipenv/pipenv/vendor/pip_faster.py", line 511, in main
    with pipfaster_install_prune_option():
  File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/Volumes/KR/Library/Mobile Documents/com~apple~CloudDocs/repos/pypa/pipenv/pipenv/vendor/pip_faster.py", line 415, in patched
    orig = patch(attrs, updates.items())
  File "/Volumes/KR/Library/Mobile Documents/com~apple~CloudDocs/repos/pypa/pipenv/pipenv/vendor/pip_faster.py", line 407, in patch
    orig[attr] = attrs[attr]
TypeError: 'module' object is not subscriptable

and installing it directly into the virtualenv:

```
Traceback (most recent call last):
File "/Volumes/KR/.local/share/virtualenvs/pipenv-Uf7eyyXP/bin/pip-faster", line 7, in
from pip_faster import main
File "/Volumes/KR/.local/share/virtualenvs/pipenv-Uf7eyyXP/lib/python3.6/site-packages/pip_faster.py", line 27, in
import pip as pipmodule
File "/Volumes/KR/.local/share/virtualenvs/pipenv-Uf7eyyXP/lib/python3.6/site-packages/pip/__init__.py", line 9, in
from pip.log import logger
File "/Volumes/KR/.local/share/virtualenvs/pipenv-Uf7eyyXP/lib/python3.6/site-packages/pip/log.py", line 9, in
from pip._vendor import colorama, pkg_resources
File "/Volumes/KR/.local/share/virtualenvs/pipenv-Uf7eyyXP/lib/python3.6/site-packages/pip/_vendor/pkg_resources.py", line 1479, in
register_loader_type(importlib_bootstrap.SourceFileLoader, DefaultProvider)
AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'
````

Was this page helpful?
0 / 5 - 0 ratings