Hello,
I'm currently packaging pipenv
for Debian and I wonder what's the purpose of the vendor
directory and its contents? I understand the purpose of patched
but shouldn't pipenv
rather depend on those packages in vendor
(e.g. via install_requires
in setup py) than providing them?
From a packaging point of view this is also very hard as I'll probably have to un-bundle the vendor packages one by one and make pipenv
depend on the corresponding Debian packages.
Assuming the packages in vendor
are unpatched, would you accept pull requests that try to un-bundle those packages in vendor
?
Pipenv vendors because it is intended to be installed globally, and depending on too many packages makes the Python installation suspect of dependency hell. I donβt know why all dependencies are not vendored, but from past discussions it seems Kenneth is quite happy of how things currently are.
As a side note, how does the pip package on Debian (python-pip) handle this? pip also vendors quite a lot of packages.
@venthur We primarily vendor things to avoid system-wide installation / dependency resolution of the specific versions (e.g. pip 9) & to provide as much of the requirements to use pipenv by default with pipenv itself
Happy to discuss further if you have specific needs as you move forward, and if you need to incorporate something like what pip does that might be an option as well
Thanks for the replies so far. I'm a bit concerned about this strategy.
While I can understand the pragmatic benefits (and the necessity for the patched
) it is a bad idea for many reasons. First of all, if other packages would follow this example we'll soon have tons of duplicates of packages in a single package. It is starting already with pipenv, which in turn vendors pip which in turn vendors packages that have already been vendored by pipenv:
$ tree -d -L 3 pipenv/vendor/
pipenv/vendor/
βββ backports
βΒ Β βββ shutil_get_terminal_size
βΒ Β βββ weakref
βββ blindspin
βββ click
βββ click_didyoumean
βββ colorama # <<<
βββ iso8601
βββ jinja2
βββ Levenshtein
βββ markupsafe
βββ pexpect
βββ pip9
βΒ Β βββ commands
βΒ Β βββ compat
βΒ Β βββ models
βΒ Β βββ operations
βΒ Β βββ req
βΒ Β βββ utils
βΒ Β βββ vcs
βΒ Β βββ _vendor
βΒ Β βββ cachecontrol
βΒ Β βββ colorama # <<<
βΒ Β βββ distlib
βΒ Β βββ html5lib
βΒ Β βββ lockfile
βΒ Β βββ packaging
βΒ Β βββ pkg_resources
βΒ Β βββ progress
βΒ Β βββ requests # <<<
βΒ Β βββ webencodings
βββ pipreqs
βββ ptyprocess
βββ pytoml
βββ requests # <<<
βΒ Β βββ packages
βΒ Β βββ chardet
βΒ Β βββ urllib3
βββ requirements
βββ shutilwhich
βββ yarg
pipenv provides the requests and the colorama package twice. Additionally, almost all packages from pipenv/vendor/pip9/_vendor
are also in pipenv/patched/notpip/_vendor
.
The problems will start once you need to update one of your dependencies (e.g. a security issue in the requests package) -- in a normal situation you'd just update the library and all depending packages benefit automatically, but with this setup this is impossible as you have to keep track of all your vendors (and vendors vendors!).
Space is another issue, several versions of the same lib installed by the same package is a waste of space.
And last but not least. This is the Python Packaging Authority right? So a lot of people look at this code and will take it as an example or best practice -- currently we might not provide the PyPA the best service.
I do respect the author's decision for vendoring, but if you'd be willing to replace the unpatched vendored packages with normal (i.e. install_requires
) dependencies, I'll gladly help and provide patches.
@venthur in many cases (e.g. pip) we simply can't, because that requires pinning users to pip 9. We have a hard dependency on pip 9 until we can rework some resolver logic for example
The security argument may be valid, I might need to tag in @ncoghlan for more input. In the worst case we could provide a more clear path to upgrading our vendored dependencies.
From the discussion above I think PyPA should recommend a tool to install these global packages in their isolated env, like what homebrew
and pipsi
do. Then we can unvendor all the things and don't need to worry about dependency hell.
Just a rough advice.
@frostming we are not in a position to speak for the packaging authority as a whole, but certainly we are not about to suggest that everyone must install every dependency of pipenv in its own isolated environment -- not sure if that's what you mean. IMO the real concern here is the security one, but with that being a concern here it would surely be a concern with pip etc, right? How is debian managing that process? I assume the packages are being debundled by the process outlined in the link I posted above, that way whoever maintains the debian pip package can just make it depend on packages without vendoring them.
For the record, pip does this by automating the vendoring provess. It contains an equivalent of requirements.txt (pip/_vendor/vendor.txt
) and a script to automatically pull in packages based on the specified version (pip/vendor/__init__.py
). It also manages vendored (and patched) libraries very well to make sure they are as up to date as possible.
Regarding a command installer recommendation, we actually have an open issue against PyPUG for that: https://github.com/pypa/python-packaging-user-guide/issues/406
However, these tools collectively have a problem with their reliance on PATH
being configured correctly, and that isn't a universally reliable assumption: https://github.com/pypa/python-packaging-user-guide/issues/396
There also isn't a way for pipenv
to indicate that it depends on pew
's CLI, not just its Python API, so pipsi install pipenv
doesn't actually work properly - you have to do pipsi install pew
as well (the project received enough bug reports about this that Kenneth eventually just removed the pipsi
based installation section).
So vendoring is still a good pragmatic choice at pipenv
's level, simply because the alternatives are fragile for anyone that doesn't already have a robust Python development environment set up, and that's an audience pipenv
specifically aims to be suitable for.
That means that if we want to set a good example, the way I'd suggest we approach it would be:
pip
's unbundling-friendly approach to vendoring dependenciespipenv check
depends on their API)The second does depend on the former though, since https://pyup.io/docs/bot/config/ needs to be pointed at a pinned requirements file to keep up to date.
The pipenv/vendor/patched
directory is harder to deal with when it comes to actually doing updates, but is at least amenable to being checked for security vulnerabilities by maintaining a requirements.txt style file stating the versions that were forked to apply the downstream patches (such a file can also be a good place to track the upstream issue reports that need to be resolved before the private fork can be dropped).
I understand the argument for the packages in patched
, but I still don't follow the argument for the vendor
in general.
My main concern is still security: If your dependencies also handle their dependencies by vendoring them (and theirs) you'll end up with multiple versions of the same package inside your vendors
directory (e.g. the requests
package). If one of those packages needs an update you'll have a hard time fixing all of them. For downstream maintainers (Debian, Suse, etc) it is even worse as we usually just update the problematic library and assume this bug to be fixed for all packages depending on it. Now we actively have to search for copies of that library everywhere.
On a side note it is also a bit funny that tools like pip
and pipenv
that help to solve the problem of dependency resolution in Python, don't trust the system themselves and ship their dependencies directly.
On the other hand: I'll probably not change the author's mind regarding this decision so let's find a solution that makes pipenv
digestible for distributions.
install_requires
-- the remaining ones could be vendored until the required patches have been merged "upstream" -- that would be the cleanest solution but probably also the most unlikely.pip
route and make the vendoring optional. This will make things a bit hard for all of us though as we (PyPA and Debian, Ubuntu, etc) will receive a lot of bugreports that will be hard to track down as the versions of our installed dependencies have diverged.I've already prepared and uploaded a Debian package for pipenv
but I'm fairly certain the Debian ftp-masters will not accept it as-is because of the vendored packages. So I will have to start packaging the dependencies for Debian as well and then I have to work on some un-bundling process.
Their is a chicken and egg problem: pipenv needs packages to run but these packages can evolve and make pipenv not working. I feel there is easy solution but to package. PBR has the same problem, upon install it is actually run by the system python so without control of its own dependencies.
I however recommend pipenv to be stripped out of all its vendor packages only when packaged and distributed by Linux distrib such as Debian. It can runs its own non reg tests and provide pipenv that runs with the right system python packages, under the control of the Debian maintainers.
One should never install python package with pip install without the βuser for this reason. But if one does pip install βuser pipenv, the official pipenv with all vendored packages is installed from pypi.
But it requires a lot of motivation and work from the Debian maintainer ! Maybe pipenv can prepare a process/scrip to automatize this.
@gsemet I get it, but everyone else has this problem too. If one of your dependencies updates in an incompatible way you'll either pin the dependency to a specific version or update your package to accommodate. The whole software industry has to deal with this. To make this easier we have semver so we can express things like: depend on package foo >=3.x AND <4
to avoid getting incompatible changes pulled in. And we can always try to minimize the number of dependencies.
I understand that it might sometimes be necessary to vendor a patched version but vendoring all dependencies is certainly not the solution for reasons I already gave above. Just imagine a world where everyone else does the same: We'd soon have lots of code duplication everywhere and created us a nice little security nightmare.
I agree with your point. But I also feel pip, pipenv, pbr, maybe a bunch of other might fall into the βspecial caseβ we all want to avoid. They basically βbootstrapβ and so have to deal with the environment to provide a βcleanedβ/βabstractedβ environment.
They cannot be considered as βexampleβ python projects.
Ultimately, it is up to the app to handle the best way possible to all incertainity of the various environments, so I understand their vendor thing (even if they may be reworked to avoid duplication).
And I also understand distributions that want to avoid that and handle the dependencies because in this case the environment (all python packages installed by apt)
As @gsemet notes, the key problem for tools like pip
/pipenv
/setuptools
/etc is that they're part of bootstrapping the dependency management system, and we can't necessarily assume that that is already working in an end users' environment, just because they have a Python install.
Debian & Ubuntu, for example, ship without a working ensurepip
and venv
by default - users need admin privileges to get things working (see https://github.com/pypa/python-packaging-user-guide/issues/399 ).
Fedora ships a working ensurepip
and venv
, but they do so by relying on pip
's dependency bundling at the virtual environment level and then keeping the pip package itself up to date.
The Mac OS X system Python is infamously broken by default (in more ways than just not supporting pip
), as Apple decided to head down the Swift path instead.
pip
and setuptools
have the extra problem of actually ending up in the run and/or build environments of the projects being developed, giving them an extra incentive to hide their dependencies, whereas pipenv
is mainly avoiding polluting the user level site-packages directory. However, having a random upgrade in the user site-packages directory break pipenv
would also be a major problem when it came to keeping pipenv
managed environments up to date, so there's a significant resilience concern there as well.
So the answer to @venthur's original question that started the issue is "No, it shouldn't, as doing so would make bootstrapping more difficult, and introduce a greater chance of an end user accidentally breaking their ability to update their pipenv
managed environments in a way they can't readily recover from".
I'm going to close this particular issue on that basis - if folks wanted to open to open separate issues for switching to pip
-style vendoring (with clear dependency declarations, and automated updates for vendored dependencies), and then configuring pyup.io to monitor those dependencies for new releases, that would be a good thing.
More generally, I'd ask that folks avoid making the assumption that we aren't already well aware of the risks and trade-offs between fully integrated systems (fewer security updates due to the use of shared components, but either requiring more coordinated integration testing before an update is released, or else increasing the risks of failure on end user systems due to untested combinations), and isolated application silos (multiple app updates needed for the same security update, but each app can be developed and tested independently without a complex distro-style coordination layer).
The key design assumption that pipenv
makes that a lot of other developer tools don't is that we expect a large proportion of our user base to be students, educators, and research scientists that are looking to write and run their first Python applications. We default them to running the latest version of everything, and require them to explicitly opt out and pin old versions if that's what they want to do instead, but we mostly assume that they're writing software for themselves.
Even for more experienced application developers, we mostly assume that they're either working on deployment to hosted PaaS environments (so dependency updates are just a git push away), app store application development (so they have to bundle anything beyond the base platform APIs regardless), or else they're using pipenv
to manage the dependencies for their test suite and development environment, rather than for the deployed component.
This is going to be irritating for folks that prefer to treat their systems as carefully crafted interlocked webs of dependencies, and don't want to delegate responsibility for any part of that system to anyone else. However, even the traditional Linux operating system developers have conceded that the "giant ball of mud" approach to operating system development breaks down once you reach the level of thousands or tens of thousands of intertwined components, and are switching to technologies like Kubernetes, Flatpak, and Snappy to help break that scaling barrier, while mitigating the harm that can be caused by outdated components in application bundles that haven't updated appropriately.
Well, then let's open another issue requesting the pip
-style vendoring, shall we?
Most helpful comment
As @gsemet notes, the key problem for tools like
pip
/pipenv
/setuptools
/etc is that they're part of bootstrapping the dependency management system, and we can't necessarily assume that that is already working in an end users' environment, just because they have a Python install.Debian & Ubuntu, for example, ship without a working
ensurepip
andvenv
by default - users need admin privileges to get things working (see https://github.com/pypa/python-packaging-user-guide/issues/399 ).Fedora ships a working
ensurepip
andvenv
, but they do so by relying onpip
's dependency bundling at the virtual environment level and then keeping the pip package itself up to date.The Mac OS X system Python is infamously broken by default (in more ways than just not supporting
pip
), as Apple decided to head down the Swift path instead.pip
andsetuptools
have the extra problem of actually ending up in the run and/or build environments of the projects being developed, giving them an extra incentive to hide their dependencies, whereaspipenv
is mainly avoiding polluting the user level site-packages directory. However, having a random upgrade in the user site-packages directory breakpipenv
would also be a major problem when it came to keepingpipenv
managed environments up to date, so there's a significant resilience concern there as well.So the answer to @venthur's original question that started the issue is "No, it shouldn't, as doing so would make bootstrapping more difficult, and introduce a greater chance of an end user accidentally breaking their ability to update their
pipenv
managed environments in a way they can't readily recover from".I'm going to close this particular issue on that basis - if folks wanted to open to open separate issues for switching to
pip
-style vendoring (with clear dependency declarations, and automated updates for vendored dependencies), and then configuring pyup.io to monitor those dependencies for new releases, that would be a good thing.More generally, I'd ask that folks avoid making the assumption that we aren't already well aware of the risks and trade-offs between fully integrated systems (fewer security updates due to the use of shared components, but either requiring more coordinated integration testing before an update is released, or else increasing the risks of failure on end user systems due to untested combinations), and isolated application silos (multiple app updates needed for the same security update, but each app can be developed and tested independently without a complex distro-style coordination layer).
The key design assumption that
pipenv
makes that a lot of other developer tools don't is that we expect a large proportion of our user base to be students, educators, and research scientists that are looking to write and run their first Python applications. We default them to running the latest version of everything, and require them to explicitly opt out and pin old versions if that's what they want to do instead, but we mostly assume that they're writing software for themselves.Even for more experienced application developers, we mostly assume that they're either working on deployment to hosted PaaS environments (so dependency updates are just a git push away), app store application development (so they have to bundle anything beyond the base platform APIs regardless), or else they're using
pipenv
to manage the dependencies for their test suite and development environment, rather than for the deployed component.This is going to be irritating for folks that prefer to treat their systems as carefully crafted interlocked webs of dependencies, and don't want to delegate responsibility for any part of that system to anyone else. However, even the traditional Linux operating system developers have conceded that the "giant ball of mud" approach to operating system development breaks down once you reach the level of thousands or tens of thousands of intertwined components, and are switching to technologies like Kubernetes, Flatpak, and Snappy to help break that scaling barrier, while mitigating the harm that can be caused by outdated components in application bundles that haven't updated appropriately.