Pipenv: shouldn't pipenv rather depend on the packages in vendor than providing them directly?

Created on 30 Mar 2018 · 14Comments · Source: pypa/pipenv

Hello,

I'm currently packaging pipenv for Debian and I wonder what's the purpose of the vendor directory and its contents? I understand the purpose of patched but shouldn't pipenv rather depend on those packages in vendor (e.g. via install_requires in setup py) than providing them?

From a packaging point of view this is also very hard as I'll probably have to un-bundle the vendor packages one by one and make pipenv depend on the corresponding Debian packages.

Assuming the packages in vendor are unpatched, would you accept pull requests that try to un-bundle those packages in vendor?

Discussion

Source

venthur

Most helpful comment

As @gsemet notes, the key problem for tools like pip/pipenv/setuptools/etc is that they're part of bootstrapping the dependency management system, and we can't necessarily assume that that is already working in an end users' environment, just because they have a Python install.

Debian & Ubuntu, for example, ship without a working ensurepip and venv by default - users need admin privileges to get things working (see https://github.com/pypa/python-packaging-user-guide/issues/399 ).

Fedora ships a working ensurepip and venv, but they do so by relying on pip's dependency bundling at the virtual environment level and then keeping the pip package itself up to date.

The Mac OS X system Python is infamously broken by default (in more ways than just not supporting pip), as Apple decided to head down the Swift path instead.

pip and setuptools have the extra problem of actually ending up in the run and/or build environments of the projects being developed, giving them an extra incentive to hide their dependencies, whereas pipenv is mainly avoiding polluting the user level site-packages directory. However, having a random upgrade in the user site-packages directory break pipenv would also be a major problem when it came to keeping pipenv managed environments up to date, so there's a significant resilience concern there as well.

So the answer to @venthur's original question that started the issue is "No, it shouldn't, as doing so would make bootstrapping more difficult, and introduce a greater chance of an end user accidentally breaking their ability to update their pipenv managed environments in a way they can't readily recover from".

I'm going to close this particular issue on that basis - if folks wanted to open to open separate issues for switching to pip-style vendoring (with clear dependency declarations, and automated updates for vendored dependencies), and then configuring pyup.io to monitor those dependencies for new releases, that would be a good thing.

More generally, I'd ask that folks avoid making the assumption that we aren't already well aware of the risks and trade-offs between fully integrated systems (fewer security updates due to the use of shared components, but either requiring more coordinated integration testing before an update is released, or else increasing the risks of failure on end user systems due to untested combinations), and isolated application silos (multiple app updates needed for the same security update, but each app can be developed and tested independently without a complex distro-style coordination layer).

The key design assumption that pipenv makes that a lot of other developer tools don't is that we expect a large proportion of our user base to be students, educators, and research scientists that are looking to write and run their first Python applications. We default them to running the latest version of everything, and require them to explicitly opt out and pin old versions if that's what they want to do instead, but we mostly assume that they're writing software for themselves.

Even for more experienced application developers, we mostly assume that they're either working on deployment to hosted PaaS environments (so dependency updates are just a git push away), app store application development (so they have to bundle anything beyond the base platform APIs regardless), or else they're using pipenv to manage the dependencies for their test suite and development environment, rather than for the deployed component.

This is going to be irritating for folks that prefer to treat their systems as carefully crafted interlocked webs of dependencies, and don't want to delegate responsibility for any part of that system to anyone else. However, even the traditional Linux operating system developers have conceded that the "giant ball of mud" approach to operating system development breaks down once you reach the level of thousands or tens of thousands of intertwined components, and are switching to technologies like Kubernetes, Flatpak, and Snappy to help break that scaling barrier, while mitigating the harm that can be caused by outdated components in application bundles that haven't updated appropriately.

ncoghlan on 1 Apr 2018

👍2

All 14 comments

Pipenv vendors because it is intended to be installed globally, and depending on too many packages makes the Python installation suspect of dependency hell. I don’t know why all dependencies are not vendored, but from past discussions it seems Kenneth is quite happy of how things currently are.

As a side note, how does the pip package on Debian (python-pip) handle this? pip also vendors quite a lot of packages.

uranusjr on 30 Mar 2018

@venthur We primarily vendor things to avoid system-wide installation / dependency resolution of the specific versions (e.g. pip 9) & to provide as much of the requirements to use pipenv by default with pipenv itself

Happy to discuss further if you have specific needs as you move forward, and if you need to incorporate something like what pip does that might be an option as well

techalchemy on 30 Mar 2018

Thanks for the replies so far. I'm a bit concerned about this strategy.

While I can understand the pragmatic benefits (and the necessity for the patched) it is a bad idea for many reasons. First of all, if other packages would follow this example we'll soon have tons of duplicates of packages in a single package. It is starting already with pipenv, which in turn vendors pip which in turn vendors packages that have already been vendored by pipenv:

$ tree -d -L 3 pipenv/vendor/
pipenv/vendor/
├── backports
│   ├── shutil_get_terminal_size
│   └── weakref
├── blindspin
├── click
├── click_didyoumean
├── colorama                 # <<<
├── iso8601
├── jinja2
├── Levenshtein
├── markupsafe
├── pexpect
├── pip9
│   ├── commands
│   ├── compat
│   ├── models
│   ├── operations
│   ├── req
│   ├── utils
│   ├── vcs
│   └── _vendor
│       ├── cachecontrol
│       ├── colorama         # <<<
│       ├── distlib
│       ├── html5lib
│       ├── lockfile
│       ├── packaging
│       ├── pkg_resources
│       ├── progress
│       ├── requests         # <<<
│       └── webencodings
├── pipreqs
├── ptyprocess
├── pytoml
├── requests                 # <<<
│   └── packages
│       ├── chardet
│       └── urllib3
├── requirements
├── shutilwhich
└── yarg

pipenv provides the requests and the colorama package twice. Additionally, almost all packages from pipenv/vendor/pip9/_vendor are also in pipenv/patched/notpip/_vendor.

The problems will start once you need to update one of your dependencies (e.g. a security issue in the requests package) -- in a normal situation you'd just update the library and all depending packages benefit automatically, but with this setup this is impossible as you have to keep track of all your vendors (and vendors vendors!).

Space is another issue, several versions of the same lib installed by the same package is a waste of space.

And last but not least. This is the Python Packaging Authority right? So a lot of people look at this code and will take it as an example or best practice -- currently we might not provide the PyPA the best service.

I do respect the author's decision for vendoring, but if you'd be willing to replace the unpatched vendored packages with normal (i.e. install_requires) dependencies, I'll gladly help and provide patches.

venthur on 31 Mar 2018

👍1

@venthur in many cases (e.g. pip) we simply can't, because that requires pinning users to pip 9. We have a hard dependency on pip 9 until we can rework some resolver logic for example

The security argument may be valid, I might need to tag in @ncoghlan for more input. In the worst case we could provide a more clear path to upgrading our vendored dependencies.

techalchemy on 31 Mar 2018

From the discussion above I think PyPA should recommend a tool to install these global packages in their isolated env, like what homebrew and pipsi do. Then we can unvendor all the things and don't need to worry about dependency hell.

Just a rough advice.

frostming on 31 Mar 2018

@frostming we are not in a position to speak for the packaging authority as a whole, but certainly we are not about to suggest that everyone must install every dependency of pipenv in its own isolated environment -- not sure if that's what you mean. IMO the real concern here is the security one, but with that being a concern here it would surely be a concern with pip etc, right? How is debian managing that process? I assume the packages are being debundled by the process outlined in the link I posted above, that way whoever maintains the debian pip package can just make it depend on packages without vendoring them.

techalchemy on 31 Mar 2018

For the record, pip does this by automating the vendoring provess. It contains an equivalent of requirements.txt (pip/_vendor/vendor.txt) and a script to automatically pull in packages based on the specified version (pip/vendor/__init__.py). It also manages vendored (and patched) libraries very well to make sure they are as up to date as possible.

uranusjr on 31 Mar 2018

Regarding a command installer recommendation, we actually have an open issue against PyPUG for that: https://github.com/pypa/python-packaging-user-guide/issues/406

However, these tools collectively have a problem with their reliance on PATH being configured correctly, and that isn't a universally reliable assumption: https://github.com/pypa/python-packaging-user-guide/issues/396

There also isn't a way for pipenv to indicate that it depends on pew's CLI, not just its Python API, so pipsi install pipenv doesn't actually work properly - you have to do pipsi install pew as well (the project received enough bug reports about this that Kenneth eventually just removed the pipsi based installation section).

So vendoring is still a good pragmatic choice at pipenv's level, simply because the alternatives are fragile for anyone that doesn't already have a robust Python development environment set up, and that's an audience pipenv specifically aims to be suitable for.

That means that if we want to set a good example, the way I'd suggest we approach it would be:

adopt (as far as is practical) pip's unbundling-friendly approach to vendoring dependencies
look at setting up pyup.io in a way that allows it to check the vendored dependencies for security issues (using PyUp to check pipenv itself seems especially appropriate, given that pipenv check depends on their API)

The second does depend on the former though, since https://pyup.io/docs/bot/config/ needs to be pointed at a pinned requirements file to keep up to date.

The pipenv/vendor/patched directory is harder to deal with when it comes to actually doing updates, but is at least amenable to being checked for security vulnerabilities by maintaining a requirements.txt style file stating the versions that were forked to apply the downstream patches (such a file can also be a good place to track the upstream issue reports that need to be resolved before the private fork can be dropped).

ncoghlan on 1 Apr 2018

I understand the argument for the packages in patched, but I still don't follow the argument for the vendor in general.

My main concern is still security: If your dependencies also handle their dependencies by vendoring them (and theirs) you'll end up with multiple versions of the same package inside your vendors directory (e.g. the requests package). If one of those packages needs an update you'll have a hard time fixing all of them. For downstream maintainers (Debian, Suse, etc) it is even worse as we usually just update the problematic library and assume this bug to be fixed for all packages depending on it. Now we actively have to search for copies of that library everywhere.

On a side note it is also a bit funny that tools like pip and pipenv that help to solve the problem of dependency resolution in Python, don't trust the system themselves and ship their dependencies directly.

On the other hand: I'll probably not change the author's mind regarding this decision so let's find a solution that makes pipenv digestible for distributions.

We could try to minimize the number of vendored packages and try to depend on most of them via install_requires -- the remaining ones could be vendored until the required patches have been merged "upstream" -- that would be the cleanest solution but probably also the most unlikely.
Next option is to follow the pip route and make the vendoring optional. This will make things a bit hard for all of us though as we (PyPA and Debian, Ubuntu, etc) will receive a lot of bugreports that will be hard to track down as the versions of our installed dependencies have diverged.

I've already prepared and uploaded a Debian package for pipenv but I'm fairly certain the Debian ftp-masters will not accept it as-is because of the vendored packages. So I will have to start packaging the dependencies for Debian as well and then I have to work on some un-bundling process.

venthur on 1 Apr 2018

Their is a chicken and egg problem: pipenv needs packages to run but these packages can evolve and make pipenv not working. I feel there is easy solution but to package. PBR has the same problem, upon install it is actually run by the system python so without control of its own dependencies.
I however recommend pipenv to be stripped out of all its vendor packages only when packaged and distributed by Linux distrib such as Debian. It can runs its own non reg tests and provide pipenv that runs with the right system python packages, under the control of the Debian maintainers.
One should never install python package with pip install without the —user for this reason. But if one does pip install —user pipenv, the official pipenv with all vendored packages is installed from pypi.
But it requires a lot of motivation and work from the Debian maintainer ! Maybe pipenv can prepare a process/scrip to automatize this.

gsemet on 1 Apr 2018

@gsemet I get it, but everyone else has this problem too. If one of your dependencies updates in an incompatible way you'll either pin the dependency to a specific version or update your package to accommodate. The whole software industry has to deal with this. To make this easier we have semver so we can express things like: depend on package foo >=3.x AND <4 to avoid getting incompatible changes pulled in. And we can always try to minimize the number of dependencies.

I understand that it might sometimes be necessary to vendor a patched version but vendoring all dependencies is certainly not the solution for reasons I already gave above. Just imagine a world where everyone else does the same: We'd soon have lots of code duplication everywhere and created us a nice little security nightmare.

venthur on 1 Apr 2018

I agree with your point. But I also feel pip, pipenv, pbr, maybe a bunch of other might fall into the “special case” we all want to avoid. They basically “bootstrap” and so have to deal with the environment to provide a “cleaned”/“abstracted” environment.
They cannot be considered as “example” python projects.
Ultimately, it is up to the app to handle the best way possible to all incertainity of the various environments, so I understand their vendor thing (even if they may be reworked to avoid duplication).
And I also understand distributions that want to avoid that and handle the dependencies because in this case the environment (all python packages installed by apt)

gsemet on 1 Apr 2018

Fedora ships a working ensurepip and venv, but they do so by relying on pip's dependency bundling at the virtual environment level and then keeping the pip package itself up to date.

The Mac OS X system Python is infamously broken by default (in more ways than just not supporting pip), as Apple decided to head down the Swift path instead.