Based on discussion over at #59, there's interest in making the install command upgrade an installed package by default. This behaviour would make pip consistent with various other package managers, with regards to the behaviour of it's install
command.
This issue is meant to be the location for that discussion, since this deserves it's own issue.
Okay, here's a proposal:
pip install foo
: upgrades foo
to the latest version; also does the minimum set of installs/upgrades required to satisfy the new version's dependenciespip install -U foo
/ pip install --upgrade foo
: identical to pip install foo
(except maybe they should eventually issue some warning?); kept for back-compatpip require foo
: same as the current pip install foo
; has the same effect as installing a package that has Requires-Dist: foo
. This is a weird low-level operation that should not be emphasized in the docs, but we keep it for now to provide a less-bumpy transition, plus it exposes a meaningful operation we need to support anyway (the Requires-Dist
handling), so it's likely useful for some scripting use case.pip install --upgrade-recursive foo
: same as the current pip install --upgrade foo
-- ensures that foo
is the latest version _and_ ensures that all transitive dependencies are the latest version. This is a weird marginal option that should not be emphasized in the docs, but we keep it for now to provide a less-bumpy transition.pip install --upgrade-non-recursive foo
: same as the future pip install foo
, but explicit to provide a less-bumpy transition.pip install foo
doesn't upgrade, pip install --upgrade foo
does a recursive upgrade, pip require foo
& pip install --upgrade-recursive foo
are errorspip require foo
, pip install --upgrade-recursive foo
, pip install --upgrade-non-recursive foo
.pip install foo
and pip install --upgrade foo
continue to act like they do now, but are modified to check what they _would_ have done if --upgrade-non-recursive
were set, and issue a deprecation warning whenever what they actual do is different from what they will do in the future.--upgrade-non-recursive
as their default (e.g. adding [install] upgrade-non-recursive = yes
to pip.conf
)pip install foo
and pip install --upgrade foo
switch to the new behavior.KISS: skip phase 1 and go directly from phase 0 to phase 2. Rationale: it's not clear that this will actually break anything, people are going to be somewhat confused and annoyed in either case, it's entirely possible they'll be more confused and annoyed by the phased transition than by the actual change, we have limited resources, and we're eager to get to the shiny new future.
In this version we can also probably skip adding --upgrade-non-recursive
, since its immediately redundant as soon as it's introduced.
I'm sorta expecting that everyone will push back and insist on transition option A instead of transition option B. But I'd actually be happy with either one, so instead of pre-emptively compromising I'm going to let someone to else to make that argument (if they want to) :-).
Hmm, I don't see the added-value of your pip require
? It looks like a duplicate of pip install
? Or maybe a pip install --no-upgrade
?
I'm happy enough with option B.
But I don't follow your description. You say pip require foo
: same as the current pip install foo
. So it'll error if foo is installed? And pip install --upgrade-recursive foo
: same as the current pip install --upgrade foo
. I thought there were problems with the existing install --upgrade
behaviour (beyond it not being the default) - there's a whole load of discussion somewhere about needing a SAT solver. Is your proposal that we don't do anything about those issues? Or am I misremembering and there's not actually a problem with the current --upgrade
behaviour?
I'm happy with option B.
I don't like the idea of a pip require
command for the same reasons I didn't like the split pip install
and pip upgrade
commands. Two commands that do sort of the same thing but not quite forces people to make a decision about which one they use up front, versus using flags. I also think that it's good practice for boolean flags (ones that toggle something on/off) to have an inverse wherever it makes sense, to allow people to compose commands better.
So with all that in mind, here's what I would do:
pip install --upgrade ...
now does a "minimal" upgrade by default, upgrading anything named on the command line/requirements file to the latest version, but only updating dependencies if required.pip install --no-upgrade ...
behaves as pip install
does now, similarly to your pip require
command, and just ensures that a version, any version, of the names requirements are installed.pip install ...
has it's default switched from an implicit --no-upgrade
to an implicit --upgrade
.You might notice, that there's nothing like the current behavior listed so far, a "upgrade everything in the dependency path to the latest version" sort of flag. I'm on the fence about if we really want something like that (and if we want it, do we want to keep it forever, or would it just be a temporary shim to ease transition). Another thing to keep in mind when deciding this is how the theoretical pip upgrade
command affects this decision. In other words, if we have a command to upgrade all the installed items, do we foresee people ever wanting to upgrade X and all of it's dependencies?
If we do want something like the current --upgrade
behavior, then I think I see two options:
--recursive / --no-recursive
To turn on the old or new behavior (but what would these do if --no-upgrade
was selected? Silent no-op? Error?).--upgrade-strategy=(minimal / recursive)
to switch between two different strategies, a bit wordier than --[no-]recursive
, but also makes it easier to add additional strategies if we ever find ourselfves in the need.In terms of the dependency resolver, I don't think these two issues are really intertwined that much. Our resolving is currently a problem in both the pip install
and the pip install --upgrade
case, and I believe it will continue to be a problem with the proposed changes. It's something that needs fixed, but I don't think it has any bearing on what we do here (although it likely does have some bearing on the hypothetical pip upgrade
command).
I'm not aware of a strong requirement for the current behaviour (by "strong" I mean "anything other than backward compatibility"). But if people did need it, they can get it by simply listing all of the dependencies on the command line.
It's pretty easy to write a script to show all (recursive) dependencies of a package:
# reqs.py
import sys
from pkg_resources import get_distribution
def reqs(req):
results = []
queue = [get_distribution(req)]
while queue:
current = queue.pop()
results.append(current)
for r in current.requires():
d = get_distribution(r)
queue.append(d)
return results
if __name__ == '__main__':
print('\n'.join(sorted(set(d.project_name for d in reqs(sys.argv[1])))))
Then you just do pip install $(reqs.py foo)
to get an "eager install" of foo and its dependencies. I'm sure there are shortcomings with this approach, but is the problem common enough to warrant a more complex solution?
@pfmoore well that script only works if no dependencies have changed between the currently installed versions and the to-be-upgraded-to versions (and of course, it assumes everything is already installed).
That being said, the only real use case I can come up with is installing a project into an environment that already has stuff installed into it, but wanting to have the latest version of dependencies. IOW, a framework like Pyramid
might prefer that new users install it's dependencies using the recursive upgrade. HOWEVER, even in this scenario, (which is the only one I can think of) if the hypothetical Pyramid's version specifiers are all correct, then the end user should expect it to work regardless (and it's similar in nature to what folks would get already in the current pip install
behavior with something already installed).
If someone does want "Pyramid, and all of it's dependencies up to date", it's somewhat nicer than the proposed way of doing that (combining the two proposals), which would be pip install Pyramid && pip upgrade
(which isn't exactly the same, since pip upgrade
would do more than just Pyramid).
So that's my hesitation, is that I struggle to come up with a scenario where it's the clear cut right thing to do, but it could make some edge cases moderately nicer. We could always leave it out, and if we come across people asking for it add it in again at that point in time.
I dislike both A and B. I don't like the idea of introducing a new command, nor do I want to switch to the new behavior without some "deprecation" style period for the current behavior. Hence I put forth my own proposal below.
I'm not aware of a strong requirement for the current behaviour
Me neither. Yet, I don't want to break someone's working code without telling them. I would find it rude. 'Don't do unto others what you don't want others to do unto you.' This is why I think I don't want to switch with no warning as in @njsmith's B option either.
If someone does want "Pyramid, and all of it's dependencies up to date", it's somewhat nicer than the proposed way of doing that (combining the two proposals), which would be pip install Pyramid && pip upgrade (which isn't exactly the same, since pip upgrade would do more than just Pyramid).
As I understand it, If someone wants "Pyramid, and all of it's dependencies up to date", after the switch to the new behavior, it's pip install --upgrade-strategy=eager Pyramid
. That would eagerly upgrade Pyramid and it's dependencies to the latest version, regardless of whether an upgrade is unnecessary.
I thought it was clear that we wanted to provide both the current "recursive-latest" and the new default "only-if-needed" upgrades. Just emphasizes that I need to post the common accepted ideas.
Make a major version release that deprecates current behavior and provides a warning on use of these commands with opt-in flags and configuration to the new behavior.
--upgrade
.pip install --upgrade
warns that this flag will become no-op in next release. pip install
warn that the behavior is changing in the next release and current behavior won't be available in the next release.Possibly, both warnings provide a link to documentation that suggests to the user what they should do.
--no-upgrade
flag may be added. But I don't want to see that unless someone _really_ needs it.Bikeshed: Options and flags in 1. I prefer to add a --upgrade-strategy=(eager / non-eager / default)
as the flag in 1 and switch the default strategy to eager
in the 2.
Also worth pointing out, explicitly, there is no need for a dependency resolver in pip for this. While with the new behavior it's still possible to break some line edge in the entire dependency graph, it becomes less likely if you upgrade less often.
how dependencies are handled
Uniformly independent of depth. The user can choose between eager and non-eager upgrades. They are as I had define in my earlier write-up.
what happens with constraints (and when they conflict)
I would say whatever happens today.
binary vs source
To be handled in #3785. Until then, keep as is.
I think it was clear that we wanted to provide both the current "recursive-latest" and the new default "only-if-needed" upgrades.
Nope, I don't think so. The "only if needed" behaviour is, as far as I know, agreed by everyone as what we would like to have available. But I understood the current behaviour to be generally considered as having issues. Whether those issues all revolve around the "pip needs a proper dependency resolver" problem, and we're OK with keeping the current behaviour until that is fixed, I don't really know.
The main problem(s) with the current behavior (that isn't actually a result of the lack of a real resolver) is that the "greedy-ness" of it causes things to be upgraded that might not otherwise be upgraded. On the tin that doesn't seem like a big problem, however it has some subtle (and some not so subtle) interactions:
sudo pip
will inadvertently break someone's OS because it makes it more likely we'll recurse into a dependency provided by the OS (even if the user invoking pip had no idea that would be affected).The first two of those are things that could possible be fixed, at least in part, by other solutions (and for which, this solution isn't a total fix either). You could fix the breaking of the OS by making pip smarter about not mucking around with the OS files by default. Wheels make it easier to install even hard to build libraries like Numpy but not everything has a Wheel, and if you're on anything that isn't Windows, OS X, or manylinux1 then your chances of getting a wheel are basically zero.
The churn on what is installed is only going to be fixed by this patch, as well as reducing the occurrence of the first two issues (by being more conservative when we actually attempt to do anything).
Of course, this is a super subtle sort of difference and it's hard to nail down all of the exact benefits (they'd be more accurately described as trade offs, rather than a straight set of benefits). I don't know if the old behavior is something that, in the cases it's useful, it's useful enough that people would bother using a flag for it or not. If we add the flag, it becomes hard to ever remove it, if we don't add it now, we could always add it again in the future, so for that reason i lean somewhat towards leaving it out and waiting to see if we get people asking for a way to bring the old behavior back.
I think it was clear that we wanted to provide both the current "recursive-latest" and the new default "only-if-needed" upgrades.
Nope, I don't think so
Hmm... I did think that both behaviors were seen as useful. That's what the Pyramid example made me think. It's using the current behavior and it does exactly what is desired.
It seems desirable to be able to say "upgrade pkg and all it's (sub-)*dependencies to latest version". I don't want to upgrade _everything_ in my ecosystem, I just want to get the latest bug-fixes for pkg and dependencies.
- Some libraries are very expensive to install build, particularly ones like Numpy where compiling can take 30+ minutes.
By conservatively upgrading packages, it does make this happen less often.
Edit: You mentioned that.
- The recursive upgrade introduces more churn on the installed set of packages, which increases the likelihood that something that was already working, breaks because of an upgrade to a shared dependency.
This _needs_ a dependency resolver to be fixed. I consider that out-of-scope of this issue.
If we add the flag, it becomes hard to ever remove it, if we don't add it now, we could always add it again in the future, so for that reason i lean somewhat towards leaving it out and waiting to see if we get people asking for a way to bring the old behavior back.
That works pretty well with me. Adds to why I want a "deprecation" release for the current behaviour to get people asking for it to stay, rather than re-added.
Edit: s/version/behaviour/
:confused: Any comments on my proposal above?
- The recursive upgrade introduces more churn on the installed set of packages, which increases the likelihood that something that was already working, breaks because of an upgrade to a shared dependency.
This needs a dependency resolver to be fixed. I consider that out-of-scope of this issue.
No, this isn't related tho the dependency solver thing. This is just "software is hard, and new versions sometimes add new bugs, therefore, the more churn you have, the more likely you are to get bit by new bugs".
The most stable (in terms of new, not previously encountered bugs) software is software that never changes.
Any comments on my proposal above?
I'm a little concerned about adding a warning for every invocation of pip install
, but I'm not opposed to it-- it's certainly the safer route though and it's one that's more in line with our typical deprecation process and it gives a chance for people to clamor for an option to use the old behavior.
I do think that we need to either deprecate the --upgrade
flag completely as part of this (probably no-op it and hide it for a long while), or we need to add --no-upgrade
to get back to the old behavior of pip install ...
. I don't want a fairly useless --upgrade
flag laying around in our help. So then the question for a --[no-]upgrade
flag becomes whether we see the current behavior of pip install
useful at all. Here again I don't have a strong opinion-- We could use the deprecation period again as a chance to see.
Any comments on my proposal above?
Honestly, I really don't like the idea that essentially every invocation of pip install
will give a warning for a full major release cycle. That seems guaranteed to just annoy users, and as a result we'll probably get no useful feedback, just a lot of complaints about the process.
My preferences remain with @njsmith's approach - probably the "just go for it" approach, but if necessary the gradual version.
I have to admit that I find it very hard to understand the impact on my day to day usage of these various proposals. There's a lot of theory and edge cases being discussed, which is obscuring the key points. I think that whatever transition process we adopt, someone should work on a clear "press-release" style description of the proposed changes and their impact, which we can publish on distutils-sig before making the changes. That should allow us to gauge reactions from the wider community. (I don't think this needs a PEP, but I do think it needs publicising).
My instinctive feeling is that I'll be (mildly) happy by the new "as little as possible" upgrade behaviour, mildly irritated by the fact that "install" now upgrades without an explicit flag (but hopefully I'll get used to it reasonably quickly) and otherwise mostly indifferent. My main usage will probably remain pip install new_thing
to install a new package and a manual "get all the package names, and do pip install <all of them at once>
to manually simulate "update all". Neither of these will be affected by any of the proposals (except that the new "as little as possible" upgrade strategy will avoid the odd unwanted numpy upgrade attempt that the current behaviour inflicts on me).
For me, the tipping point comes when --prefer-binary
and "upgrade all" become available. Those will affect my usage, and it won't really be until then that I'll see any benefits (or issues) with the change to upgrade strategy.
Honestly, I really don't like the idea that essentially every invocation of pip install will give a warning for a full major release cycle. That seems guaranteed to just annoy users, and as a result we'll probably get no useful feedback, just a lot of complaints about the process.
Indeed. I didn't think about that in a hurry to leave. Oops!
My point is, I really want pip itself to have a major version deprecation run with such a major change to the main command of it. Any form it takes, I'm game.
I think being selective about when we show the warning message is the way forward.
How do you choose? @njsmith suggested only when the behaviour differs. Other than the fact that it's essentially doubling the work done in every install execution, as long as we publicise well (in advance and detail), I think it's good idea.
edit
Or maybe not on second thought. It won't be showing the message to everyone like we would want to. I would want to show it to everyone at least once.
How about some configuration file magic, asking the user to set a flag in the configuration file? This is where an --upgrade-strategy=default
or similar flag would come in handy.
Any alternate ideas for this?
the tipping point comes when --prefer-binary and "upgrade all" become available. Those will affect my usage, and it won't really be until then that I'll see any benefits (or issues) with the change to upgrade strategy.
True. While this change will fix some issues (unnecessary re-installs) directly, I think it will might indirectly help resolve other issues as well.
Similarly to @pradyunsg's last idea, iirc git shows (kinda long) messages for when it introduced or is going to introduce a big change that you can disable by setting a configuration via commandline that is mentioned in the message. I've liked that so far.
A temporary option to disable the message wouldn't be the worst possible behavior.
@pradyunsg: Before we get into the nitty-gritty of deprecation strategies... is there any chance I can convince you that the "option B" approach is okay? (Normally I wouldn't try, but given that core devs like @dstufft and @pfmoore are okay with it I guess I will try :-).) I definitely understand why you find just switching to be "rude" to users, but it's a complex trade-off -- not switching is also rude in different ways to different people. For example:
--prefer-binary
, etc. etc. It's not enough to say "a deprecation is important", one has to argue that it's _more_ important than other things one could do with that time.8.1.2 flat out broke a bunch of people's deployments due to a complicated bug involving the interaction between pip, pkg_resources, and devpi. It sucked but people dealt with it. Given our limited resources, it's a fact that we're going to sometimes break things and sometimes leave broken things sitting for years without progress and generally cause users pain. We can't change that, but we can at least be smarter about which _kinds_ of pain we cause users, and "install starts working the way lots of users already expect" is a much more productive outcome than most :-).
@pfmoore:
You say
pip require foo
: same as the currentpip install foo
. So it'll error iffoo
is installed?
No, right now iffoo
is already installed thenpip install foo
does nothing and exits successfully. I was imaginingpip require
would be a way to directly talk to the constraint resolver: "here's a new constraint, please ensure it is satisfied". Semantically meaningful and well-defined, but a pretty low-level for-experts interface.
@dstufft: I find pip install --no-upgrade foo
rather confusing, though -- from the name I'd expect that it would do something like... try to install foo
but error out if foo
had a dependency that would force the upgrade of something I already had installed? Which is kinda the opposite of what it would actually do. For me the require
operation and the install
operation are conceptually really distinct -- see also Guido's comments on how if you ever find yourself writing a function that takes a boolean arg, and you know that your callers will be passing a constant rather than a variable for that arg, then you should have two functions. So splitting it out into a new command was me trying to imagine what it might look like in a world where we added it for its own sake, rather than just to fulfill our obligation to have a --no
form of --upgrade
or whatever. But I'm also just as happy to drop it entirely for now...
Okay, how about this as a strategy:
pip install foo
= pip install --upgrade foo
= non-recursive upgradepip install foo
now will upgrade if foo
is installed; pip install --upgrade foo
will no longer upgrade all dependencies recursively)pip require foo
command in the future if it turns out to be useful, but defer that for now because it's not really a priority and it's easier to add stuff than to take it away--upgrade
around as a no-op indefinitely, but take it out of --help
, and the reference manual just says "no-op; kept for backwards compatibility". (Maybe in a few years we tear it out entirely, maybe not -- I don't care and am happy to just defer that discussion until a few years have passed.)That avoids the worst gratuitous breakage (there's no reason for pip install -U foo
to become a hard error and invalidate tons of existing tutorials), but otherwise keeps things radically simple, so we can skip or defer thinking about things like --no-upgrade
or the most ideal spelling for recursive upgrades and get the important parts moving ASAP.
It seems desirable to be able to say "upgrade pkg and all it's (sub-)*dependencies to latest version". I don't want to upgrade everything in my ecosystem, I just want to get the latest bug-fixes for pkg and dependencies.
The problem with this is that in lots of cases, it doesn't really make sense to assign some dependency to any particular dependant. Like, lots of people have environments with ~30 different packages installed, of which 1 is numpy and 29 are packages that depend on numpy. So if I want the new bug-fixes for astropy, should that upgrade my numpy? That might fix some issues with astropy but it might also break the other 28 packages, who knows. Pyramid's dependency chain includes a number of widely-used utility libraries like zope.interface
and repoze.lru
and setuptools
(why? idk). So recursively upgrading Pyramid might break Twisted (which depends on zope.interface
and setuptools
and nothing else). There's no way that "I want the latest bug-fixes for Pyramid" implies "I want the latest setuptools
" in most users' minds -- but that's how pip install -U
currently interprets it.
Similarly to @pradyunsg's last idea, iirc git shows (kinda long) messages for when it introduced or is going to introduce a big change that you can disable by setting a configuration via commandline that is mentioned in the message.
That's exactly where I got the idea.
I've liked that so far.
Ditto. Hence I would like to see it in pip. It's a field-tested process.
I do agree that every-run-warning is a bit too much but having it show all the time until the user acts on it is something I know, from git, works even for major changes like this.
is there any chance I can convince you that the "option B" approach is okay?
Maybe. You're right the trade-offs are complicated and having to wait an year till the switch isn't the most convenient thing either. Breaking certain niche-cases that don't affect _everyone_ is fine. That is just going to happen. Here, we're changing the most used command of pip (in documentation of packages and otherwise). Doing so without a proper warning period might just not be the best of things to do. Nor should this be done without giving people some time to fix their tools/workflow/etc to work with the new behaviour.
With @njsmith's current proposal, I still don't get a proper warning or give people some preview of the upcoming (major) change. That's all but it's enough that I don't like the proposal. If someone can convince me that dropping the these two requirements would be fine and it's possible to properly inform people that this, a big change, is coming their way in some other manner, I'm fine with that.
If we get the deprecation nitty-gritties right, it should possible to implement this in such a manner that the deprecation-release-only stuff stays in one module (module as in English; a class, function or something else) and the next major release just stops invoking that module and removes it. That way at least the post-deprecation work is minimized.
59 has 199 comments from 56 participants, many of them just +1's. Making them wait another year is kinda rude too.
They don't _have_ to wait another year. They can just opt-in to the new behaviour. We're just giving time to people whose stuff broke due to the change. Others can just opt-in to the nicer behaviour.
We keep --upgrade around as a no-op indefinitely, but take it out of --help, and the reference manual just says "no-op; kept for backwards compatibility". (Maybe in a few years we tear it out entirely, maybe not -- I don't care and am happy to just defer that discussion until a few years have passed.)
[snip?]
That avoids the worst gratuitous breakage (there's no reason for pip install -U foo to become a hard error and invalidate tons of existing tutorials)
If it wasn't obvious, this would happen in my proposal's 1. No one gets bothered by a no-op -U
's presence. It's absence will invalidate many packages' documentation and break stuff. We'll keep it till it is rare enough to be safe to remove. That discussion should happen a few years later. (let's mark 16th September 2018 for this, for no reason what so ever)
Regardless of whether I change my position on @njsmith's proposal, we'll keep a no-op --upgrade
post-deprecation.
There's no way that "I want the latest bug-fixes for Pyramid" implies "I want the latest setuptools" in most users' minds -- but that's how pip install -U currently interprets it.
True. But this is due to the lack of a dependency resolver. Once it's added, it does _exactly_ what the user wanted. There's only so much we can do till then. Adding a warning in the documentation about the potential breakage of the dependencies is sufficient for now IMO, since this behaviour shall become opt-in. And this assumes that the packages maintain their promises made through version-numbers. If they break, there's little pip can do until packages refine their version-specifiers.
As a side, I think there should be a piece of documentation mentioning that pip may break your dependency graphs.
So if I want the new bug-fixes for astropy, should that upgrade my numpy?
Not if it breaks your dependency graph. Neither if it removes your well-configured numpy. The former case needs a dependency resolver. The latter needs "holding back" of upgrades. Both out-of-scope in this discussion.
Until we get those, the most we can do is tell people - "pip doesn't do the right thing all the time and we don't have the resources to fix it. Help would be appreciated."
This is just "software is hard, and new versions sometimes add new bugs, therefore, the more churn you have, the more likely you are to get bit by new bugs".
I can only say, sad but true to this.
I am posting what is the mental picture of the post-deprecation behaviour is in my head... Just to make sure I don't miss out on anyone's concerns.
pip install
upgrades in a non-eager manner, upgrading dependencies only-if-needed.pip install --some-flag
upgrades in an eager manner, upgrading dependencies to the latest version allowed by version-specifiers.--upgrade
becomes a no-op. It is kept in install --help
, documented as "kept for backwards compatibility".pip require
is deferred until someone comes around asking for it. As note below, this cannot be the case. (edit: it later turned out that I was wrong.. :| )Once we have decided upon the required behaviour, I'll start working on the implementation. (I'm still familiarizing myself with the implementation details of pip install and #3194 right now.)
Let's finalize the behaviour and how we want to do the deprecation here and we'll bikeshed the option names in the PR I eventually make.
pip install --target <dir>
is documented as "By default this will not replace existing files/folders in
Since install shall now start upgrading (replacing) by default, it seems more consistent to replace the existing files and folders by default and provide some flag if the user wishes to have the older behaviour of not-replacing. AFAIK, this flag is undecided on. pip require
has similarities. So, I think we can't defer the discussion on pip require
and need to do it now.
The overlap with pip install
and the need for it presented by install --target
makes me want to have the require
behaviour behind a flag in install
.
@pradyunsg:
Here, we're changing the most used command of pip (in documentation of packages and otherwise). Doing so without a proper warning period might just not be the best of things to do. Nor should this be done without giving people some time to fix their tools/workflow/etc to work with the new behaviour.
It's the most used command of pip, but we're only touching two weird corner cases: pip install foo
where foo is already installed, and pip install -U foo
where foo has some recursive dependency that's out of date. While I'm sure there will be some obscure breakage no matter what we do, I can't think of any sensible tools or workflows that would be broken by this -- can you give an example of what you're thinking of?
True. But this is due to the lack of a dependency resolver. Once it's added, it does exactly what the user wanted.
??? no idea what you mean here -- Pyramid recursively depends on setuptools, and my argument is that this demonstrates that "package and its recursive dependencies" doesn't actually correspond to any meaningful concept in the user's mental model. AFAICT this is totally orthogonal to the dependency resolver issue?
pip install --target <dir>
... Since install shall now start upgrading (replacing) by default, it seems more consistent to replace the existing files and folders by default
I think the issue with pip install --target <dir>
is that it doesn't really install into an environment at all -- it's used for things like vendoring. And without an environment, the upgrade/install distinction doesn't even make sense. My vote is that we leave it alone -- the current behavior is fine IMO.
pip require
has similarities.
It does?
we're only touching two weird corner cases: pip install foo where foo is already installed, and pip install -U foo where foo has some recursive dependency that's out of date.
Hmm... Indeed. While the change is major, I do agree that it's just weird corner cases that we break. But I would really want to get some user input before making the change... It doesn't feel right to make such a change without a deprecation.
If everyone else here (mainly @pfmoore and @dstufft) says that they prefer no-deprecation switch over a deprecation switch, I guess I'll be fine with going ahead and implementing @njsmith's proposal.
True. But this is due to the lack of a dependency resolver. Once it's added, it does exactly what the user wanted.
Pyramid recursively depends on setuptools, and my argument is that this demonstrates that "package and its recursive dependencies" doesn't actually correspond to any meaningful concept in the user's mental model.
I disagree. It is a meaningful thing to want to get the latest possible version of a package and its dependencies. As an example, if I have found that my current environment has an issue related to pkgA, I would want to check against the latest releases of it and all it's dependencies to eliminate the possibility of this being an issue that got fixed in a new release. I think it's reasonable to expect that to be possible.
Just to be clear, Let's not provide the old behavior for the simple reason that it provides lazy people a way to keep the existing behavior if it works for them. We'll keep it only if we figure out some valid use-case. If we go down the deprecation path, it'll be deprecated but available till end-of-deprecation. If someone wants that behavior, they'll say they do and we'll pull it out of deprecation and let it stay.
AFAICT this is totally orthogonal to the dependency resolver issue?
The dependency resolver comes into play when A
and B
both depend on C
, A
is recursively upgraded, breaking C
for B
since pip does not care about B
's version specifiers when handling A
's. This was the example you gave with Pyramid, Twisted and zope.interface being A, B and C respectively.
pip require has similarities.
It does?
Yes, in that it also does not affect already-installed packages. But on reviewing this, they are more different than similar. This option is more along the lines of --avoid-installed
. I don't know why I thought they were similar enough to merge...
@njsmith
No, right now if foo is already installed then pip install foo does nothing and exits successfully
What I see is
>pip install xlrd
Requirement already satisfied (use --upgrade to upgrade): xlrd in c:\users\uk03306\appdata\local\programs\python\python35\lib\site-packages
I'm not sure about the exit status, I was thinking about the user experience. Apologies, I was being sloppy in my wording - I meant that I "get an error message" (maybe it's technically a warning) rather than that pip sets the exit code to error. But either way it's a minor point.
Responding to other emails:
I agree with @njsmith that deprecation is in many ways just as bad an experience for users as a sudden change. In this case I remain in favour of just going straight to the improved version. There's been plenty of debate on the tracker, and lots of people have noted their interest in seeing the new approach land. @pradyunsg if you still feel that we should warn users, then by all means post on distutils-sig (and even python-list if you feel it's warranted) and announce the plan there. There's a risk that doing so results in even more bikeshedding and debate, which may or may not be productive, but that's the nature of packaging changes :-)
I'm also in agreement that I don't see "Pyramid and all its dependencies" as a particularly useful thing to want to upgrade. Pyramid itself, of course. And Pyramid and _selected_ dependencies, quite possibly. And certainly "everything in this virtualenv (which was set up for my Pyramid development)".
Which prompts the thought - how often would people asking for eager upgrades be better served by using virtualenvs and upgrade-all? I can't speak for other people's workflows, but it's certainly how I tend to operate. And of course for many environments, pip freeze
and exact version restrictions are the norm, so eager updates would be inappropriate there.
Finally, we've decoupled "pip needs a solver" from this proposal - so arguing that eager is useful once we have a solver isn't relevant right now. Current eager behaviour can break dependencies - so we should remove it, and then maybe reintroduce a working version once we have a solver and we've had feedback that (a not-broken version of) the feature is useful to people.
if you still feel that we should warn users, then by all means post on distutils-sig (and even python-list if you feel it's warranted) and announce the plan there.
I think announcing on distutils-sig sounds fine to me. python-list, I'll think about it.
There's a risk that doing so results in even more bikeshedding and debate, which may or may not be productive, but that's the nature of packaging changes :-)
That's a trade-off. I guess I'll redirect them to the PR for the bikeshedding and take other comments on the mailing list...
Quick correction: I really should have mentioned the entire help-text of --target
.
""" Install packages into
If we are making --upgrade
a no-op, --target
should not depend on it. We need to figure this out.
Finally, we've decoupled "pip needs a solver" from this proposal - so arguing that eager is useful once we have a solver isn't relevant right now. Current eager behaviour can break dependencies - so we should remove it, and then maybe reintroduce a working version once we have a solver and we've had feedback that (a not-broken version of) the feature is useful to people.
Sounds good to me. I guess we can drop the eager upgrade behavior. It's easy to add it if we need to. Removing it (after the switch), not so much. I do think not providing it and advocating use of virtualenv for the job is a good idea.
@pfmoore I take it that you wish to go down the no-deprecation path.
I'm also in agreement that I don't see "Pyramid and all its dependencies" as a particularly useful thing to want to upgrade. Pyramid itself, of course. And Pyramid and selected dependencies, quite possibly.
When you put it that way, it makes sense why what I was saying is not ideal.
Current eager behaviour can break dependencies
I think any package change has the potential to. The non-eager behavior just reduces the number of changes and thus works around this issue fairly well enough to reduce breakages substantially.
Anyway, I take it that it's decided that eager upgrades would be dropped.
We need to figure this out.
Maybe reuse --force-reinstall
? I don't know enough about these options to be sure...
@dstufft I'm waiting for your views on deprecation vs no-deprecation.
So, that leaves us with --upgrade
and --target
only. (and @dstufft's vote)
I request anyone with any issues/requirements, that they feel haven't been handled, to bring them up now. Not that it's the last chance or anything, just a good time to do so.
Current eager behaviour can break dependencies
I think any package change has the potential to.
Specifically current eager behaviour can leave the system in a state where declared dependency requirements (which aren't inconsistent, or otherwise broken) are violated when they were not previously. That is not acceptable, and is what a "proper solver" should address. For the simpler "only as needed" upgrades, my understanding is that the risk of such breakage is minimised even without a solver.
So, that leaves us with
--upgrade
and--target
only.
Apart from changing the help text of --target
to not refer to --upgrade
, I consider --target
to be out of scope here. The help text is
Install packages into
<dir>
. By default this will not replace existing files/folders in<dir>
. Use --upgrade to replace existing packages in<dir>
with new versions.
I propose we just replace it with
Install packages into
<dir>
.
Presumably the default will change (as with normal "install") to overwrite by default, and if you don't want to overwrite, you just don't run the install command (same as if you're installing into site-packages). If users want anything more complex, they can work out the appropriate commands, let's not worry about trying to offer suggestions (that may or may not be helpful in practice).
The help text is
Install packages into
. By default this will not replace existing files/folders in . Use --upgrade to replace existing packages in with new versions. I propose we just replace it with
Install packages into
.
Hmm... Are you sure that you want to remove the functionality of not replacing existing files/folders?
Are you sure that you want to remove the functionality of not replacing existing files/folders?
It's not me that was advocating that - @dstufft and @njsmith argued strongly that "install" should upgrade when given an already installed package. The only thing I'm adding is that I don't think the behaviour should be different just because the user specified --target
.
Maybe having a --no-replace
option is needed, but if so it should apply to both --target
and non---target
installs.
Off Topic
At the cost of being picky, a tiny markdown suggestion/request/tip/{whatever_you_want_to_call_it} - Keep an empty >
line in block quote to make it dedent... Otherwise it just merges into the higher-level quote...
> > > A
> >
> > B
> C
>
> D
A
B
CD
Do note how B and C came up on the same level of quoting but D actually got the dedent...
Maybe having a
--no-replace
option is needed, but if so it should apply to both--target
and non---target
installs.
:+1:
At the cost of being picky, a tiny markdown suggestion/request/tip
Thanks. I try to do "preview" but missed that.
I'm will be starting my implementation work off master, on Monday. We're nearly decided on almost everything and even if @dstufft says we want deprecation, the new behaviour to be introduced has to be provided anyway.
Here's what I'm going to start implementing:
--upgrade
stays but becomes a no-op. It's value is never used anywhere.--help
.pip install
will do upgrades in a non-eager manner, upgrading dependencies only-if-needed.--no-replace
that would not allow installing packages over other packages and move on without errors (like what current pip install pkgA
does for pkgB when pkgA is not installed, pkgB is installed and pkgA depends on pkgB).I think we decided we'll keep --upgrade
around for now (for backwards-compatibility) but not about deprecation and _eventual_ removal. Should it be removed using the normal deprecation cycle, starting v9.0 (I think it's al-right if we remove it in in 10.0/11.0...)?
As an aside, I was thinking, since this change will make the next major version an intentionally-backwards-incompatible release; Would it make sense to try to push for some other issues to be fixed in the same release? If so, are there any such issues?
It would help maximize the utility of our decision to break backwards compatibility.
waiting on @dstufft's comment
edit: Added "on Monday", moved stuff around.
Hello.
Quick apologies for the lack of activity over the past week... Some other urgent work came up and took some of my time. Anyway, I have started to work on this issue's implementation.
@pradyunsg: I don't understand what --no-replace
is for. --target
is a weirdo option that almost got deprecated a few months ago, and may or may not survive in the long term, so if it's for --target
specifically then it's very low priority and I wouldn't worry about it for now.
Currently --target
has a _dependency_ on --upgrade
. The current (default) behaviour of --target
is to not replace files and folders already in the target-dir. Passing --upgrade
changes this to replace files and folders already in the target-dir.
Since install
now defaults to replacing (read upgrading) packages by default, it seems to make sense to switch the default behaviour of install --target
correspondingly. This would --upgrade
a useless flag for --target
, which is what we want (--upgrade
becoming a no-op that would eventually be removed). Then, a new option would have to be introduced the current behaviour of --target
. This is the --no-replace
.
Then, for consistency, if --no-replace
works with --target
runs, it should also work with non-target ones. AFAICS, the latter is new behaviour.
I guess even if --target
doesn't survive very long, it might make sense to have a --no-replace
that works regardless of --target
. I don't know if someone would want that functionality without --target
though.
PS: Apologies for littering so many inline-monospace blocks.
I don't think --target
(and specifically its current default behaviour) is important enough to warrant adding a new flag just to retain it. IMO, we just switch --target
to replace by default, and lose the ability to only add new files (which seems likely to result in broken setups anyway).
Not upgrading an already installed package _is_ a safe operation, but --target
doesn't do that, because it doesn't have access to "what is currently installed" information.
So, change the behaviour of --target
to stop bothering about already-present directories and just go about replacing them, printing a message as it does so? Even no message printing?
Hmm, wait. Sorry, your description confused me (and I didn't go back to check the docs). Sorry. My above comment was wrong. What I should have said:
Currently --target
doesn't replace stuff. That is necessary, because it cannot safely uninstall/upgrade (there's no installed package database with --target
and no guarantee that a new version doesn't have a different set of files than the previous version). The current behaviour of --upgrade --target
is (AFAICT) unsafe.
So --target
should keep its current behaviour. This does make it inconsistent with the new install, but that's fine, it's for a completely different use case. I don't have a problem with --upgrade
being removed, and as a result --target
loses that capability - it's an unsafe operation anyway.
Given that I disagree with changing the default behaviour of --target
, there's no need for a --no-replace
flag.
I'm not sure what you mean by --target
having a dependency on --upgrade
.
It might help the discussion to read the current help text of --target
.
""" Install packages into
I'm not sure what you mean by --target having a dependency on --upgrade.
To enable replacing existing stuff.
Given that I disagree with changing the default behaviour of --target, there's no need for a --no-replace flag.
If the behaviour of --target is not changed, it would mean --upgrade
flag would need to stay at least as long as --target
is there.
I want to remove the need for referring to --upgrade in --target's help.
OK, let me rephrase. The behaviour of --target
should (IMO) be changed in one respect only, that --upgrade
(and the behaviour it enables) should be removed.
If someone can demonstrate a use case for --upgrade
(given that it potentially breaks things) then I'm willing to review that position, but I don't think it's worth keeping "just in case".
The behaviour of --target should (IMO) be changed in one respect only, that --upgrade (and the behaviour it enables) should be removed.
Okay. That makes it clear.
If someone can demonstrate a use case for --upgrade
Not me.
That sounds fine to me too. It strikes me as a nasty wart that --target
used --upgrade
for this purpose in the first place.
I think we should move the further discussion over to #3806 to avoid having 2 comment threads with simultaneous discussions on the same thing.
Wow this thread has gone critical. Let me just add my strong opposition to changing the meaning of -U. There's absolutely no need to break our users muscle memory - we can add a new option if we need non-recursive upgrades. That said, whats the use case for non-recursive upgrades _other than_ 'pip install named-thing' ?
E.g. I think its fine to say that explicitly named distributions upgrade implicitly, and -U if provided causes fully recursive upgrading. in all cases without --ignore-dependencies, pip will recursive check for satisfaction.
@rbtcollins: "breaking our users muscle memory" seems a bit strong -- WRT -U
, the changes in the current proposal would be: (a) pip install -U foo
is still legal and still upgrades foo
, but now non-recursively, (b) it loses the special behavior where combining -U
plus --target
means "overwrite any existing files". I'm guessing that the latter change is not one that worries you overmuch given that you recently tried to deprecate --target
and that most users don't have muscle memory for -U --target
(I hope!!). So I guess you're saying specifically that you prefer that pip install foo
do a non-recursive upstall of foo
, and that pip install -U foo
do a recursive upstall of foo
?
I could live with this, especially as a transitional state where we deprecate -U
at the same time, but it definitely has downsides:
--upgrade
. Do we really want to have to explain to new users that if they want to upgrade a package, the right way to do that is to _leave off_ the --upgrade
switch?pip install foo
, but on pip < 9 it's pip install -U foo
. I don't want to guide people to recursive upgrade, but I don't want to confuse them with pip version numbers either, so what do I put in my tutorial? OTOH if pip >= 9 makes pip install
and pip install -U
equivalent, then the advice to use pip install -U foo
remains correct, just a bit redundant.pip
already has way too many modes that are complicated, poorly documented, hard to maintain, and don't quite do what anyone wants. If we can simplify by getting rid of one of them then that's _great_, and so far the arguments for recursive upgrades have been very thin on the ground. AFAICT no other package manager supports this and no-one complaints. Even you seem to be basing your argument (so far) on "this thing exists and we shouldn't change that" rather than "this thing is actually useful and what our users want"...So I'd much rather we move on and make pip install foo
/ pip install --upgrade foo
do the obvious thing that everyone else does. I think most users' muscle memories will actually be pleasantly surprised to start getting what they were hoping for in the first place :-).
@njsmith's comment provides a nice summary of why we're doing this.
I guess I should link to https://gist.github.com/pradyunsg/4c9db6a212239fee69b429c96cdc3d73 from here. This is the final "proposal" I wrote, that came out of this issue's discussion. It's got a section about "Current State of Affairs" that I think @rbtcollins would like to read.
@njs - I don't think its too strong: right now, folk know that to get the latest across the board they run 'pip install -U' X. Thats the only reason to run install -U ever (today), and so breaking it is breaking its primary use case.
The behaviour with --target is indeed not the case I'm worried about.
FWIW I disagree with your analysis about what people do/don't want. Most projects only test a small number of permutations of versions: latest-with-latest + latest-with-stable, when a stable exists. Upgrading everything is actually safer that upgrading only the named component because folks lower version specifiers are usually wrong. See #3188 for an enhancement that would make testing lower version limits much easier. I have lost count of how many times I've 'fixed' folks problem by telling them to 'pip install -U' : they've had a package with an incorrect lower minimum.
The actual underlying thing that drives your 'this is wrong' is #2687 as far as I can tell - thats where pip can do the right thing.
Further, the _very last thing_ we want is for pycrypto and friends to stay un-upgraded for months or years because folk don't know they have to do something special to have up to date secure software.
If folk are running very complex venvs, they are opting into the complexity - the common cases are a) full Python installs and b) dedicated venvs. We should steer everyone to b as much as possible because its inherently more reliable, and that strengthens the argument I'm making that the default should be to be secure, and running as close to what upstream will have tested as possible.
w.r.t. package managers - 'apt install X' will never upstall - it only installs. 'apt upgrade' is global - it upgrades everything'. DNF is similar AIUI. I haven't canvassed suse's tool, but I'd expect similar behaviour because of the flattened there-can-be-only-one idiom distros use.
Perhaps we should make a higher bandwidth discussion for this? It seems to be pointed in a pretty dangerous direction IMO.
@pradyunsg your assertion about pip's current status in https://gist.github.com/pradyunsg/4c9db6a212239fee69b429c96cdc3d73 is factually incorrect: there is already --no-dependencies switch which covers off the recursive/non-recursive case. 'pip install -U foo --no-deps && pip install foo' should be semantically equivalent to the 'upstall named packages by default' - and I'm fine with that.
No tl;dr. Read it.
@rbtcollins
your assertion about pip's current status in https://gist.github.com/pradyunsg/4c9db6a212239fee69b429c96cdc3d73 is factually incorrect: there is already --no-dependencies switch which covers off the recursive/non-recursive case
I never asserted that pip does not provide the possibility to do non-eager upgrades or that there is the lack of a --no-deps
in the write-up or (in my memory) anywhere else. Which part of my "assertion about pip's current status" do you feel is "factually incorrect"?
Do consider re-reading this section and explicitly pointing out of any "factually incorrect" points in a comment on the Gist (not here, it'll be noise) so that I can correct them.
Thats the only reason to run install -U ever (today), and so breaking it is breaking its primary use case.
No one's going around breaking the world.
pip install -U pkg
will still upgrade pkg
which is what most people want and care about. It's what happens with the dependencies that has changed.-U/--upgrade
will be staying until it's felt it's no longer needed because everyone's moved on.FWIW I disagree with your analysis about what people do/don't want. Most projects only test a small number of permutations of versions: latest-with-latest + latest-with-stable, when a stable exists.
If the package developer provides poor metadata, it is not _wrong_ behaviour on pip's side that it broke the user's environment because of that. It's the responsibility of the package developer to provide proper version constraints. I do agree that #3188 would help the package developer do so.
I don't think it's wrong to expect people to improve the metadata they provide to PyPI (and hence pip).
the very last thing we want is for pycrypto and friends to stay un-upgraded for months or years because folk don't know they have to do something special to have up to date secure software.
Agreed. I do think that if it's secure software, there's should to be extra attention given to the security packages. Moreover, any packages that are skipped from upgrades are explicitly listed as such. So, someone looking at the output would know what's happened and determine if they wish to take action.
If you care about a security package, after this change, you can simply mention it directly on the CLI, which makes your intentions more explicit and clear. I prefer it this way. This change would force you to mention which packages you care about being up to date.
"explicit is better than implicit"
If folk are running very complex venvs, they are opting into the complexity - the common cases are a) full Python installs and b) dedicated venvs. We should steer everyone to b as much as possible because its inherently more reliable, and that strengthens the argument I'm making that the default should be to be secure, and running as close to what upstream will have tested as possible.
I agree that everyone should be using virtual environments more often. I also agree that running as close to upstream as possible is also favourable. I find it ironic that you use the word "secure" to defend a behaviour that silently (and often) breaks the dependency-graph.
'pip install -U foo --no-deps && pip install foo' should be semantically equivalent to the 'upstall named packages by default'
It is. The whole motivation of this PR is to provide pip install -U foo --no-deps && pip install foo
as pip install foo
because the behaviour that everyone wants most of the time should be directly available. It was discussed and decided that it's better to not provide any way to do eager upgrades.
The actual underlying thing that drives your 'this is wrong' is #2687 as far as I can tell - thats where pip can do the right thing.
It has been concluded in prior discussions (#59, at pypa-dev) that the behaviour in #2687 is not fixable until #988 lands, which may will take a fair bit of time, and this behaviour is seen as the safer-middle-ground in the mean time.
Today, every time someone runs pip install -U pkg
, they risk breaking some other package in their environment. While there will still be the same risk even after this PR, the number of times that pip's actions result in the environment breaking are reduced.
I'm fine with that
You're fine with having it as an opt-in behaviour to do non-eager upgrades. I'm not fine with breaking the user's environment silently, by default. That's what eager upgrades do as I see it, with #2687 unresolved.
It would be better to not be breaking the user's environment silently. This change is the best we can do for that given the limited development time that gets directly invested in pip.
Perhaps we should make a higher bandwidth discussion for this?
That's the idea behind the "shout-out" on distutils-sig.
(deleted as posted on wrong thread)
Your argument is that less folk will be harmed by not upgrading dependencies that don't /have/ to be changed when someone has supplied -U. My argument is that more people will be harmed:
Given these two choices:
A - users are vulnerable to security issues and don't know they are
B - users get occasionally broken environments due to #2687
I don't see how we can possibly choose anything other than B. It would be wildly irresponsible to do otherwise.
Sorry to do bitsy replies, but another thing I observe here is that we're accreting complexity - cargo, for instance, doesn't have anywhere near the fine grained control being proposed here. Largely thats because the language has better primitives for isolation - like Javascript and Java rust can cope with multiple versions of a package in the dependency set - making our resolution work nearly totally irrelevant there _except_ when folk want to collapse down to just-one-version, whereas we have no choice. But I think we need to seek really strong reasons for adding complexity - not just in pip's core, but in the user model. The basic expectation of PyPI is that everyone works together all the time; defaulting to not upgrading is pretty much the opposite of that.
So - are our expectations broken, or are we just reacting to bugs in pip where enough information exists to at least do a better job?
are we just reacting to bugs in pip where enough information exists to at least do a better job
This. We can definitely do _much_ better with the information we have. But it's sad that we don't. There are reasons for that as well but that's not the main point of this discussion.
Given these two choices:
A - users are vulnerable to security issues and don't know they are
B - users get occasionally broken environments due to #2687
I think they're equally bad. The thing is, the user can see that their security package wasn't upgraded and they can (and should) opt-in to that upgrade. That's slightly better than the status quo IMO.
I don't agree that they are equally bad.
Your proposed change moves pip from a do-the-safe-thing model to a review-every-invocation-carefully-because-it-may-have-silently-done-the-wrong-thing. I would expect that to be incredibly worrying for a system administrator [which I have been, so this isn't idle speculation]. We don't however have richly defined personas to point at to allow this to be easily internalised by new contributors to pip.
However, I'm at risk of burning out in this conversation pattern, so I'm signing off and muting the issue; my offer for higher bandwidth conversation - which the sig list is not - remains open, if thats useful - @njsmith and @dstufft can get hold of me on hangouts or IRC or whatever realtime medium is desired.
I'm still waiting on @pfmoore https://github.com/pfmoore or @njsmith
https://github.com/njsmith giving me the go-ahead that this PR is fine
and we can announce the same on distutils-sig for comments from a larger
audience.
Sorry, I hadn't realised you were holding off. Yes, please go ahead and
announce. The PR will obviously need to be updated as those discussions
progress, but it's certainly fine as a starting point for that process.
@rbtcollins I didn't wish to be overly pushy. Sorry about that.
higher bandwidth conversation
I didn't interpret it this as real-time conversation. :sweat:
do-the-safe-thing model to a review-every-invocation-carefully-because-it-may-have-silently-done-the-wrong-thing
I do understand what you are trying to say about the possible security-related implications of this change. I don't have an especially strong grasp of that topic and the associated nuances for a proper discussion on it.
I'll be happy to defer any further discussion to pip's core developers.
Now, I think that both of the behaviours (pip's current and the one that's been proposed) are not-ideal. I'll change my position to not being in favour of either of those. I'll write that mail to distutils-sig.
Further, the _very last thing_ we want is for pycrypto and friends to
stay un-upgraded for months or years because folk don't know they have to
do something special to have up to date secure software.
If you're saying that a user doing pip install --upgrade django
(where django depends on pycrypto, say) currently gets pycrypto upgraded,
and in future won't, then yes, that is a change (and agreed, it could
result in pycrypto being an older version).
There have been rather too many divisive security discussions recently, and
I don't want to be the cause of another one starting. So all I will say
here is that there's a trade-off between keeping crypto software up to date
versus breaking package X by upgrading a crypto (or other!) dependency to a
version that X cannot use as part of an unrelated pip install Y
. My
personal preference is for the approach this PR takes (but see below), but
I acknowledge the problem. One question - what do Linux package managers
do? Would apt-get install python-django
upgrade an already-installed
and compatible python-pycrypto
? Following established Linux package
manager approaches would fit well with the basis of the rest of this PR).
Note that "keep my stuff up to date" (the equivalent of apt-get
upgrade
, I guess) is what pip upgrade-all
was intended for (see
concern above would be to say "the correct way to keep security packages up
to date is pip upgrade-all
" - security upgrades actually _shouldn't_
wait until some other package gets updated. (It's just as true to say that
they shouldn't wait till the user updates his whole system, so the "real"
answer is pip install [-U] pycrypto
but I think we can agree that
user inertia being what it is, that's not always realistic). Maybe the
above is an argument in favour of that subcommand? But I'm not sure if it
can do an acceptable job without a proper dependency resolver.
Note also that in an ideal world, pip would never break anything by doing
an upgrade. Sadly, until we get a full dependency solver, that's not the
case. We've had reported cases of such breakage, I believe. Breakage should
be rare - but let's be fair, so should security exploits. We're discussing
low-probability (but potentially high-impact) scenarios here, and it's
never easy to judge those (if it were, nobody would ever buy lottery
tickets :-))
I don't have an especially strong grasp of that topic and the associated nuances for a proper discussion on it.
Me neither. I think @rbtcollins has brought up an important point. Ping @dstufft as the one person I know of with sufficient understanding of both pip and security to make an informed decision here...
@pfmoore yes, thats what I'm saying, and I'm staring down the barrel of millions of environments no longer receiving upgrades to such libraries with horror.
I agree that having a command to do an environment wide upgrade would help.
w.r.t. resolver aspect - breakage will always happen even with a resolver: the resolver is not the cause of most breakage I see, rather accidental bugs are. Yes we need it, but its not a panacea.
@pradyunsg No need to apologise - you've done nothing wrong; I engaged while tired and got stressed at the idea of something I consider a poor choice being pushed into master if I didn't immediately get traction on it.
I'm staring down the barrel of millions of environments no longer
receiving upgrades to such libraries with horror.
This would be something separate from the current change, and I haven't
thought it through at all, but I wonder if there's a need here for packages
(or maybe individual releases) to be marked as "critical", implying that
pip should always try to update those packages when updating anything that
depends on them - essentially a finer-grained, opt-in version of eager
updates. Projects like pycrypto could then mark themselves as critical.
Such a mechanism may be open to abuse, but would this be of any help?
Such a mechanism may be open to abuse, but would this be of any help?
Please no. It's a bad idea to have something like this in a largely un-moderated index like PyPI. Other than the obvious possibility of abuse, it's an extra behaviour that the user will potentially be surprised by. I think there are just better ways to handle such a problem; it's better to delegate this to the end-users to decide what they feel is critical.
@rbtcollins I felt I was the reason you were feeling like you'll be burning out.
I think @dstufft already has enough things on his plate. So, FWIW, I'll forth what I think about the security front of non-eager upgrade as the default. If nothing else, I'll learn something new.
I don't think anyone runs pip install --upgrade
in a production environment without pinned requirements/constraints. I could be wrong about this, but really, they shouldn't. If they do that, they're opening themselves to breakage of their production environment already, as on today. Post this change, they still have that risk (reduced but present) and additionally have the risk of not having the latest security upgrades. Really, by not pinning their dependencies, I'd say they opted into a security vulnerability. Would you agree with this?
To me, the only people who run pip install --upgrade
are those doing so on their local-machines, either during development of an application or as the end users of a library. How much this affects them, I won't know. I'm by no means an informed person on this topic anyway.
Honestly, the more I think about this, the more I feel like sitting down and writing a SAT solver in pure Python for pip.
Ok whew, I ignore this discussion for a few days it appears to have blown up on both the sig and here :)
I've tried to read over what's gone on in this thread, but well it's long and information dense so I might miss something, however you're about to get a wall of words.
I don't believe it's completely fair to say that the current behavior of pip install -U <foo>
makes people more secure across the board. Yes, I can easily point out some projects like PyCrypto or cryptography where regressions are rare (particularly security regressions) and new releases generally include improvements to security. I think focusing only on those misses other cases though, such as the case where a new version of something on PyPI has caused a regression in security. There's I think, two other cases though, both which can boil down to "upgrading or not doesn't affect security at all" but differ in whether an upgrade is an OK thing to do for them or not.
Overall, I don't think that recursive upgrades is a good security mechanism and if one of our goals is to prevent people from running old, insecure versions of software (and I think it should be a goal) than I think the way to achieve that is not to try and hang ourself off of the behavior of upgrades (and just pray and hope that they've happened to run an update in some amount of time) but to instead devote time to a dedicate solution to the problem. This may be something like pip list -o
but which checks explicitly for security problems, it may be checking the entire installed set of packages against PyPI to see if there are any known security issues with any of them, or it may take on some completely other format. However, I think it's important that this isn't tied to some semi related functionality and that it actually covers the entire environment and not just whatever the user happens to be using. If someone does pip install requests[security]
once, and then from there on out does pip install -U requests
-- we're going to completely miss updates to pyopenssl
and cryptogaphy
and such if we only rely on recursive upgrades.
So pushing aside the security concerns for a moment, I think we need to take a look at what behaviors are most likely to give people what they want. Unfortunately with an ecosystem as large and with as varying use cases as Python I suspect there is no singular answer to "what people want". There are a few interrelated behaviors being discussed here, so let's tackle them one at a time.
For pip install --upgrade <foo>
, we have evidence to that fact that our current behavior is actively harmful, so much to the point that projects are going out of their way lie to pip to prevent triggering that behavior. I think that we can all agree that something where people feel the need to actively subvert (not only in their own projects, but also advocate to other projects) likely need some refinements to how it actually works. In this case the only real solution is to attempt to avoid upgrading (or downgrading!) where possible, and to prefer the already installed version, _unless_ the user has explicitly asked for that to be changed _or_ we can't satisfy the version constraints otherwise. I can't see any other reasonable way to implement this that isn't going to accidentally trigger 30+ minute builds which possibly result in a version that is less suitable for the task at hand (not using an optimized BLAS or something).
Advocating for leaving the current behavior of pip install --upgrade
as it is, is essentially advocating against projects that depend on numpy from being honest about whether or not they depend on numpy. If someone has another suggestion for how we might solve the numpy problem [1] then I think they should bring it up.
I know that one suggestion has been adding a --non-recursive-upgrade
flag or a sort of --upgrade-strategy
flag, but I think that these ideas largely serve only to complicate the mental model people have of pip. For the vast majority of packages (particularly pure python ones that don't have a security sensitive role) it's not going to matter a whole lot whether we upgrade them or not, upgrading is low cost but there's little downside to keeping them pinned to the installed version (unless the person finds a reason, a feature or a bug to explicitly upgrade _that_ package). However we're living in the edge cases here and I only really see the two that matters, hard-to-upgrade packages and security sensitive packages, and like I mentioned above I think that if we're going to be worrying about ensuring folks get security upgrades we need a real mechanism for that, not a half hearted hope that an important upgrade got caught in a recursive upgrade at some point. Given that I think we need dedicated support for security sensitive, for most packages this won't matter, and for the hard-to-upgrade case there's only really one answer, I think that shunting this behind it's own flag is a bad idea without justification for the ongoing cost of maintaining a whole option for this [2]. I also think that the more conservative approach has to be the most obvious approach or we don't really solve things for the hard-to-upgrade crowd, so if we did add a new option, we'd still want to change the behavior of --upgrade
by default and add in some explicit option to get the less conversative/safe approach to upgrades.
I don't believe we should allow a package to be able to mark itself for eager (or non eager) upgrades. I think this is something that needs to be consistent amongst all packages for end users to be able to have any hope of having a reasonable mental model of what pip is going to do to their system when they execute some command. That being said, I could see us adding the ability to have people mark versions insecure on PyPI and warn people if they have an insecure version installed on their system (or at the very least, if they're about to install one).
So, now that we've covered --upgrade
, the other inter-related issue here is what should we do with pip install
versus pip install --upgrade
. Personally I think that we should make the two mean the same thing, _HOWEVER_ it might be more reasonable to focus the discussion first on just the behavior of --upgrade
and leave pip install
alone for now. Once we get the solution to upgrading sorted out we can tackle what we'll do about pip install
itself.
[1] Although to be completely honest it's not strictly related to Numpy. I've seen people break their systems or their installs time and time again because of an inadvertent ugprade. It's true that a lot of project's don't have correct lower bounds, but I believe it's equally true (or more so) that they don't have correct upper bounds either. One particularly important thing is that it's possible to determine what the correct lower bounds are at time of packaging, but it's impossible to determine what the correct upper bounds are.
[2] Options are not free, they incur a cost and it's important to attempt to reduce the number of them you have as much as you can. While you typically can't reduce them to zero, a pattern I see far too much in OSS software is the desire to please everyone by adding more and more options, when really it's just a mechanism for avoiding making a decision that may be unpopular with some group of people.
Thanks for the wall of words :). A few responses and some thoughts.
tl;dr: I agree with you about the costs of options and mental models around pip and so forth. I'm still very scared of the implications of what you're proposing.
w.r.t. updating being more secure: we can be pretty confident that if one never upgrades, existing security bugs will eventually be attackable. OTOH if one always upgrades, while there may be occasional security-bugs introduced, they will be removed again in later upgrades.
The key thing there in _either_ case is having a systemic, automated upgrade process taking place, which we don't have today. In the absence of it, I do believe that upgrading-by-default is significantly better.
w.r.t. upper bounds and lower bounds: theres absolutely no facility to get lower bounds right today. You're correct that in a logical sense, one can only state lower bounds accurately, but the reality is that noone states them accurately today :- its entirely responsive to bugs from people where they find out that the lower bound is wrong after debugging it. And there isn't even consensus amongst the folk I've spoken to about what _should_ be done when lower bounds interact with optional things - should folk detect features, or raise the lower bound, or just crash if some incompatible path is taken? If pip had the select-oldest-version thing I proposed, then this wouldn't be the case, and it would be _much_ more reasonable to assume lower bounds would be sane.
But I've _literally lost count_ of the number of broken environments I've fixed for people by telling them 'pip install -U package'.
I don't really understand the 'numpy problem'? Is that the collection of projects lying to pip about dependencies to avoid upgrades?
pip gets used in 3 different contexts IME:
For the latter two, I don't care about the upgrade algorithm, as long as its deterministic.
For the first one, I care a lot, and it sounds like what we've got here is one project causing the majority of the pain, due to a combination of API and ABI breaks - because numpy actually has one of those?
IF a production-maintenance-thing existed, to do upgrades of everything, then I wouldn't push back on the proposed change to install at all. But it doesn't, and we have no idea how long until one will exist - AIUI the command that might have done it was pushed back on in fact, so we should expect to _not_ have one?
I don't really understand the 'numpy problem'? Is that the collection of projects lying to pip about dependencies to avoid upgrades?
Yes. Libraries like scipy and scikit-learn lie to pip that they don't depend on numpy so that pip doesn't start a half-hour long reinstall of newer numpy over a possibly optimized or even self-compiled numpy.
IF a production-maintenance-thing existed, to do upgrades of everything, then I wouldn't push back on the proposed change to install at all. But it doesn't, and we have no idea how long until one will exist - AIUI the command that might have done it was pushed back on in fact, so we should expect to not have one?
I don't think anyone has pushed back on this, except that people are waiting for the resolver to get finished first because there's a perception that without a resolver, upgrade-all will have an increased tendency to result in inconsistent environments. I dunno, maybe we should just go ahead and implement an upgrade-all
command even knowing it will be imperfect to start with... the perfect is the enemy of the good and all that. Maybe we could have it try to upgrade everything, and then automatically run the new pip check
code to warn people if stuff broke that they might need to fix.
the 'numpy problem'
Yeah, this is partly that packages like numpy are expensive to install or tend to just error out (think: Windows users without a compiler). For numpy itself this problem is greatly reduced now that we have better wheel support, but there are lots of packages besides numpy that need a compiler. It's extraordinarily frustrating when you're just trying to upgrade some trivial pure-python package and then suddenly Unable to find vcvarsall.bat
.
And it's partly that people have finicky preferences about things like numpy. For example, it's common to mix numpy-with-proprietary-MKL-patches installed from Anaconda, while using pip for other packages that Anaconda doesn't ship. And then pip install -U other-package
might throw away the Anaconda numpy and replace it with a PyPI numpy, which both gives you a numpy build you don't want + totally breaks your conda environment going forward b/c this core package just got deleted out from under conda.
And it's partly just that numpy is a canonical example of a package that a _lot_ of other packages depend on in complicated ways, so there's a great deal of risk that trying to upgrade package A -> triggers upgrade of numpy -> breaks package B or C or D or ... Recursive upgrades create surprising couplings between different packages.
It's extraordinarily frustrating when you're just trying to upgrade some
trivial pure-python package and then suddenly Unable to find vcvarsall.bat
.
Precisely this. You can work around this with --no-deps
, or
--only-binary
, but it makes what should be a really simple activity
into something really annoying. In my experience, the only way to maintain
an environment is to carefully set up the hard-to-install packages
manually, maintain these by hand, and then don't let pip upgrade them
automatically. That last part isn't easily possible with the current
behaviour of --upgrade
.
For a non-numpy example, lxml
doesn't build easily on Windows,
doesn't supply wheels, and is a dependency of a lot of things. A new
release of lxml
(which I probably don't want to bother upgrading to,
as it needs me to manually download Christoph Gohlke's wheel when he builds
it, and install that by hand) can cause upgrades of all sorts of stuff to
fail. Non-eager upgrades would fix this issue for me.
and then don't let pip upgrade them automatically.
I suppose the proper solution for that would be pinning versions, which https://github.com/pypa/pip/issues/654 tracks, and then emitting a warning if a pinned version causes a dep requirement not to be fulfilled. That will allow users to _really_ manage a package manually.
For a production environment probably. But for an environment like my laptop, or a development virtualenv I'm using for testing, pinning versions overspecifies the problem (and requires me to manage at the version level). What I actually want is exactly what I stated - "don't upgrade unless I ask you to".
pip gets used in 3 different contexts IME:
- maintaining production environments
- building fresh environments in a reproducible way - e.g. container builds
- testing
@rbtcollins I'm not clear on whether your definition of "production environment" includes everything from "new user who installed Python + some packages" to "advanced users with multiple venvs" to "production deployments runnings apps/websites"? Your comments about security seem like you're mostly worried about the last category, but imho that is the least interesting one because it caters to the most knowledgeable users. Defaults should be chosen for non-expert users, the ones who installed Python on their laptop/desktop and want to get their analysis done or website to work.
I would like to revive this issue and the related discussion(s?) again.
@rbtcollins Have your concerns been addressed? If not, please point out any outstanding concerns you have.
There were at least 2 people on the distutils-sig discussion who were against the whole idea of pip install
doing an upgrade. I remain willing to accept the community consensus but uncomfortable with the idea on a purely personal level. It actually doesn't feel to me as if we're close to a consensus that a bare pip install foo
should upgrade an already-installed foo
.
I'm not sure there _is_ a consensus to be had here. More like two use cases that need different behaviours. Or maybe three:
All of these seem to be valid use cases. All can be implemented in terms of the others with sufficient manual checks or additional scripting. We currently have (1) and (2) via install
and install -U
(at least in the simple case - I'm deliberately ignoring recursive upgrading for now).
There are some people arguing that the default behaviour of install
causes users to make mistakes because they expect or need behaviour (2) and don't understand that they are getting (1). Maybe that's so - my experience is too limited to say they are wrong. Our defaults may be wrong.
Looking at the practicalities (of the fundamental question "should install upgrade by default?"):
The arguments in favour of changing seem to be:
pip install foo
as the way to get foo breaks a user's system, that's a bug - and a bad one. But it seems more likely to be a bug in the dependency management code, than because pip install foo
didn't upgrade an already installed foo. Or a documentation bug (confusing "how to install for the first time" with "how to upgrade from a previous version"), but that's not "broken", just confusing.IMO, we _must_ reach a conclusion over how a simple install of a package with no dependencies works before we start debating the more complex cases. That's the majority use case. Packages with dependencies, conflict resolution, recursive upgrades, should all be considered only once we have a solid and agreed foundation of how a simple package install works.
Personal view - there is nothing wrong with the traditional install
and install --upgrade
options. They seem clear and natural to me (reiterating: in the simple cases). There's no "upgrade but only if already present" option, but the natural place for that would be a new upgrade
command, and I don't think the need is high enough to warrant a new subcommand, so I'm OK with having to manually handle that case.
It is wrong to change the behavior of 'pip install'. There is nothing wrong with install meaning "ensure it is installed", and there is nothing wrong with install meaning "replace the named package with the latest version", and the developers are smart enough to convince themselves that either behavior is more intuitive. But it is wrong to steal the afternoons of thousands of developers who rely on the current behavior, and who will have suddenly broken environments the next time they are foolish enough to upgrade pip.
I didn't like the proposal about changing how the directly-named dependency was installed, but I did like the rest of the proposed changes regarding recursive-ness of package upgrades that do happen and so on.
Just today I had someone ask me for help because they were confused why pip install --pre docker-py
did not install 1.9.0rc2, but pip install docker-py==1.9.0rc2
did. They believed it to be a bug in pip, until it was figured out that the reasoning for that is they already had a previous version of docker-py
installed and that was being used.
This matches my own use, I never invoke pip without a -U
except out of laziness to type that extra couple of characters, and when I do omit it, half the time I get annoyed and end up needing to re-run the command with a -U
on there.
@dholth says it's wrong to steal the afternoons of thousands of developers relying on the current behavior, but what of the afternoons of thousands of developers being bitten by the current behavior? As always, any breakage is always a weighing of the cost of breakage against the benefits. It makes the behavior of pip less surprising by default, you don't have to inspect the environment to figure out what the outcome of some command is going to be, you only need to know the command.
I think whatever the chosen solution is, we'll have to provide an option to enable the old behavior to smooth the transition.
If we go with an upgrading pip install foo
we'll need a --no-upgrade
.
If we remove the recursive behavior of upgrade we'll need a --recursive
.
I agree with a --no-upgrade
flag for sure (and keep the --upgrade
flag to enable it to be turned back to upgrade if someone has disabled it). I'm not sure about the --recursive
flag long term, but I'm not dead set against it.
I think I would experience major breakage by this change, but principally in non-interactive pip invocations, while 'pip install -U' is something that would typically happen interactively, and crucially when someone is doing development work and is available to deal with the consequences. That's why I jokingly suggested we could check isatty() to choose between one behavior or the other. But is there a way to measure either amount of time or is it just a circular volley of opinions? My opinion is that as an experienced person I do have to re-type install with -U, but it is quick, while fixing a virtualenv when I was least expecting it is hundreds of times slower.
Another solution that has already been discussed is to give the new behavior a new name (an easier name to remember than pip install -U?), and educate people on the new best-er practice; if the n00bs who in theory have the most trouble are reading the new documentation and using the new name, problem solved.
While we're on the subject, where is the 'pip rollback' command? Before and after each invocation pip should store the versions and perhaps the wheels of every installed package in a log along with a timestamp. Then if there is a problem you can just go backwards, no fuss.
Yes, I'm also aware that some set of current best practices, which are more work, could also solve some of these same problems, but one person's best practices are just another person's unnecessary extra work.
The fact that the breakage will be principally in non-interactive pip invocations is actually a good point, but I think more to the fact we should do it. The primary place where the current behavior makes sense is in when scripting using pip, and when you're scripting adding an extra flag to the command is no great burden, however when you're running pip interactively the default option should be the option that you're most likely going to want.
If you want a rollback command I suggest another issue for it.
Just in terms of UI bikeshedding, I still like the idea of the idempotent/scripting-oriented behavior getting a new verb, like pip require numpy
-- to me that does a good job of capturing the conceptual difference (while pip's thicket of flags is super-confusing and their interaction hard to predict), and when scripting IME it's easier to remember to use the verb that means what you want than it is to remember to consistently pass some extra flag every time.
But I think the verb that we teach users first (which is install
) should be the verb whose defaults are oriented towards new user needs, meaning interactive use, interpreting pip install django
as meaning pip install django==$LATEST
, etc.
But is there a way to measure either amount of time or is it just a circular volley of opinions?
This is precisely my point. I don't think there's any compelling (as in, likely to convince the other camp) arguments for either side. And in that case, the status quo wins. My biggest concern here is that we don't (as a project) have a good means of arbitrating this type of situation, and we end up with this hovering over us forever, because there's always the possibility that someone could commit a PR, simply because those who objected the previous time didn't notice a discussion being reopened. What we need is some sort of equivalent of Python's rejected PEPs, which would allow us to say "we've decided (for the following reasons) to do nothing" and then be able to shortcut he process of someone asking to revisit the decision and having to go through all the old fruitless arguments again.
I'd rather find a way to make the current non-default behaviour more easily accessible for people that need it, than waste time in arguments that will simply result in both sides becoming more and more entrenched in their positions. Although I don't really know how to do that - I really don't understand what's so confusing about "install
installs, install --upgrade
installs but also upgrades if needed".
But I think the verb that we teach users first (which is install) should be the verb whose defaults are oriented towards new user needs, meaning interactive use, interpreting pip install django as meaning pip install django==$LATEST, etc.
Well, while I see your point that we should view interactive use (by new users) as the prime use case, I'm not convinced that implies "install or upgrade". I'd argue that the failure mode of an implied upgrade (you upgrade an existing install without meaning to, and break another part of your system by doing so) is sufficiently bad (even for an experienced user) that it warrants a flag to say "I understand the consequences".
Project instructions saying "use pip install FOO
and you're good to go" can (and should) be changed. We shouldn't be driving a decision like this based on other people's erroneous documentation, no matter how much of it there is. The wording should just be "If you don't already have FOO, use pip install FOO
and you're good to go. If you have FOO already but want the new version, use pip install --upgrade FOO
".
I know there's anecdotal evidence of people spending lots of time trying to work out what went wrong because they didn't include --upgrade
. But how are we supposed to evaluate that, given that it's (by definition) impossible to get evidence of how many people have _not_ had any issue with the current behaviour? Make a change and wait for bug reports from people saying "I did pip install foo
and it upgraded my existing foo, which broke bar - how do I unpick this mess?" Personally, I don't want to have to support people in that situation...
Mercurial measures by getting usage stats from Facebook, they have a special corporate plugin to record them.
I really don't understand what's so confusing about "install installs, install --upgrade installs but also upgrades if needed".
It's not really that it's confusing at a high level, but that the default behavior requires you to know what's already installed on your system in order to figure out what the outcome of the command is going to be. It's easy to not realize, particularly with virtual environments, what exactly you have installed and assume that you don't have something installed (and then get confused when you're not getting the version you expect).
I have, at any one time, something like 50-100 different virtual environments on my personal computer, one for each project I work on. It's basically impossible for me to know what's installed into a particular environment without sitting there and hitting pip list
and then going through the entire list which takes way more time than I'm ever going to do.
We shouldn't be driving a decision like this based on other people's erroneous documentation, no matter how much of it there is.
I don't think this is entirely true. Neither option is objectively correct so we're lefting to trying to figure out a subjective answer of what is better, and looking at what mistakes other people made in their documentation is not a bad source of information. To use an extreme example, if literally everyone was doing it the wrong way, than that would suggest that the wrong way is too obvious and the right way isn't obvious enough.
But how are we supposed to evaluate that, given that it's (by definition) impossible to get evidence of how many people have not had any issue with the current behaviour?
Metrics in OSS is a problem :( At some point it'd be great if we can get some so we can see things like "this person ran install and then nothing else" compared to "this person ran install, then almost immediately re-ran it with --upgrade
". Unfortunately that's still in the "gee it'd be nice" phase and not anywhere near being done so we're left with throwing chicken bones and trying to divine reality from imagination.
the default behavior requires you to know what's already installed on your system
While I appreciate that this might be an issue, I'm not really convinced it's that major of a problem. After all, if you do pip install
and the package is already present, you get immediate feedback:
(x) C:\Work\Scratch>pip install wheel
Requirement already satisfied (use --upgrade to upgrade): wheel in c:\work\scratch\x\lib\site-packages
So it's not like it's going to take you forever to find out that you need to upgrade, or how to do so.
I'd rather have a safe default with the system able to detect that you may have meant the alternative (plus a clear message telling you what to do) over a default with the potential to break unrelated stuff, and no recovery mechanism.
The more we have this discussion the less I understand the advantage of upgrade as default.
After all, if you do
pip install
and the package is already present, you get immediate feedback.
Sort of, though it's easy for that to get drowned out in all of the other output with even a moderate amount of packages being installed:
$ pip install Pyramid
Collecting Pyramid
Using cached pyramid-1.7-py2.py3-none-any.whl
Collecting WebOb>=1.3.1 (from Pyramid)
Using cached WebOb-1.6.1-py2.py3-none-any.whl
Collecting translationstring>=0.4 (from Pyramid)
Using cached translationstring-1.3-py2.py3-none-any.whl
Collecting zope.deprecation>=3.5.0 (from Pyramid)
Collecting venusian>=1.0a3 (from Pyramid)
Requirement already satisfied (use --upgrade to upgrade): setuptools in ./lib/python3.5/site-packages (from Pyramid)
Collecting PasteDeploy>=1.5.0 (from Pyramid)
Using cached PasteDeploy-1.5.2-py2.py3-none-any.whl
Collecting repoze.lru>=0.4 (from Pyramid)
Requirement already satisfied (use --upgrade to upgrade): zope.interface>=3.8.0 in ./lib/python3.5/site-packages (from Pyramid)
Installing collected packages: WebOb, translationstring, zope.deprecation, venusian, PasteDeploy, repoze.lru, Pyramid
Successfully installed PasteDeploy-1.5.2 Pyramid-1.7 WebOb-1.6.1 repoze.lru-0.6 translationstring-1.3 venusian-1.0 zope.deprecation-4.1.2
I'd rather have a safe default with the system able to detect that you may have meant the alternative (plus a clear message telling you what to do) over a default with the potential to break unrelated stuff, and no recovery mechanism.
See, I don't think upgrade-by-default is unsafe at all (when you take into account the other change to upgrade). I find more software that doesn't work with whatever older version of something I had installed with than I do software that doesn't work with a newer version. I already know what versions might be getting installed, because I named them explicitly on the command line, so we're not upgrading things that I didn't explicitly call out.
Sort of, though it's easy for that to get drowned out in all of the other output with even a moderate amount of packages being installed:
Good point, maybe it should be highlighted (we use colours for things like warnings, this seems like a good candidate).
See, I don't think upgrade-by-default is unsafe at all
Well, suppose you have foo 1.0 installed, and bar 1.0 that depends on foo. Suppose bar works with foo 1.0 but not foo 2.0 (but the dependency is just on "foo", not "foo < 2.0" because foo 2.0 wasn't out when bar 1.0 was released, and how was the author to know?) Now if I do pip install --upgrade foo
, bar breaks. And I may not even find out that bar is broken for a long time, if it's not something I use a lot. That's not a failure mode I want to have to deal with as the default behaviour - even if it's rare, and even if it's arguably bar's fault for not being more strict in its dependencies.
I don't want to turn this into an exercise in "my failure scenario is worse than yours", as that tends to make a debate way too heated (see a typical security discussion) but I do think that "not unsafe at all" is wrong - at best it's "unlikely to cause an issue".
Of course, you mention "the other change to upgrade" here. There's way too many combinations of things being proposed and becoming dependencies of one another (upgrade, upgrade all, rollback, recursive upgrade, non-eager upgrading, ...). Maybe we should take things one step at a time - why not leave this discussion for now, and focus on getting "safe upgrade" in place. Once we have pip install --upgrade
in a place where we can guarantee it won't ever break someone's system, maybe we can reopen the debate on the default behaviour then?
I don't want to turn this into an exercise in "my failure scenario is worse than yours"
Exactly! Let's not.
There's way too many combinations of things being proposed and becoming dependencies of one another (upgrade, upgrade all, rollback, recursive upgrade, non-eager upgrading, ...). Maybe we should take things one step at a time - why not leave this discussion for now, and focus on getting "safe upgrade" in place.
The "other change" is the switch to non-eager upgrades. I feel, this issue deals only with change in behaviour of install
and install --upgrade
and thus, it should involve discussion on upgrade strategies. Everything else (upgrade-all, rollback), we explicitly decoupled when we opened this issue.
Once we have pip install --upgrade in a place where we can guarantee it won't ever break someone's system, maybe we can reopen the debate on the default behaviour then?
This would mean resolving #988 first which has been stuck for a fairly long amount of time.
I feel we've been bikeshedding and speculating what the user would do for too long. I feel it's no longer reasonable to do that without some metrics which are hard to get reliably. I saw this change as a quick-fix that provided a good middle ground until #988 landed. It's definitely not been quick and it's been debatable if it's a good middle ground. I think it might be worth it to take a step back.
It's already possible to do non-eager upgrades if you want but to figure that out it takes a google search, which is more difficult than it should be. Even if pip provides an option on install to do non-eager upgrades, it'll be better that status-quo.
Also, I don't think anyone wants the current "eager upgrade" default to be the default. If that's not the case, I must have missed it. So, why not switch to non-eager upgrades by default? As long as we switch the default upgrade strategy to be non-eager and _maybe_ provide a way to do eager upgrades, we'll be better off than status-quo.
So, assuming that no one is opposed to these two points, a minimum disruption change would be:
pip install --upgrade
provides non-eager upgrades by default.--upgrade-strategy=[eager/non-eager]
(with any spelling) to choose your upgrade strategy, iff you really want to provide eager upgrades.How does this sound?
Adding to what I just commented, this is what I feel is the path of least resistance, to get the non-recursive default behaviour through which I would like to see get through.
I feel, making install upgrade by default is essentially a separate discussion. It is something worth discussing but I feel that it shouldn't hold up the change in upgrade-strategy.
(I feel like this would mean a new issue for discussing what I just proposed but I'll take the first-opinions here before doing that)
Just to clarify - I don't believe that non-eager upgrades fix the issue that "pip install --upgrade foo" could upgrade foo from 1.0 to 2.0, but an already-installed bar might declare a dependency on foo (with no version) but not work with 2.0? I can't see that it could (or indeed that it should) and yet that's the scenario that bothers me about making upgrade the default.
Which isn't to imply that I have a problem with your proposal to get non-eager upgrades in place as the first step (I'm +1 on that regardless).
I'd argue that the failure mode of an implied upgrade (you upgrade an existing install without meaning to, and break another part of your system by doing so) is sufficiently bad (even for an experienced user) that it warrants a flag to say "I understand the consequences".
This is literally the failure mode of every single thing that pip does. It's also the failure mode of not running pip (e.g. the publication of a security exploit that targets your current stack will cause it to go from working -> broken without you changing your environment at all). People who run pip are explicitly requesting that whatever change they have specified be made to their environment, with all the risks and benefits that entails.
No-one runs pip install foo
when they already _know_ that foo
is installed, because that would be silly. So users already have to be prepared for this to break their environment, because installing new packages (and pulling in their arbitrary transitive dependencies) is just as dangerous as upgrading existing packages. In fact, upgrading foo
and installing foo
are _exactly_ as dangerous, because they do _exactly the same thing_ -- they pull in the exact same versions of the exact same packages.
The argument that a plain pip install numpy
should be interpreted as pip install numpy==$LATEST
is that this is much simpler and predictable than the current thing (where pip install numpy
is interpreted as pip install numpy==$CURRENTLY_INSTALLED_VERSION
unless there is no currently installed version, in which case it's interpreted as pip install numpy==$LATEST
-- just look how much longer that took to write). It reduces the state space that the user has to keep track of -- I can't see how reducing the possible outcomes of a command to a strict subset of what they used to be makes it _more_ dangerous :-).
It's also has the important benefit that actually _reduces_ the proliferation of paths through the pip internals -- having separate options for every little thing, no matter how use(ful/less), has a very substantial cost for maintainers, and is how pip became the "Rube Goldberg machine of sadness" described in the #pypa-dev topic.
Project instructions saying "use pip install FOO and you're good to go" can (and should) be changed.
I find it difficult to believe that you would put up with this argument if we were talking about a library API :-(. "Yes, many users of this API function call it with the default values, and yes, those work 95% of the time so that most users don't realize that their code is broken in the other 5% of cases. The solution is to keep that API the way it is and file bug reports forever telling everyone to add the unbreakme=True
kwarg to every call. Because this is totally the user's fault."
we end up with this hovering over us forever, because there's always the possibility that someone could commit a PR, simply because those who objected the previous time didn't notice a discussion being reopened.
That's not how it does work, though -- notice that this change had extensive discussion on github and then there was a mailing list heads-up to make sure that no-one was surprised.
I actually kind of wish this is how it worked, because this change would be _fait accompli_ and we could all move on and stop wasting time on this ;-). And jokes aside, it might actually be healthier for the project if someone like dstufft decided to play BDFL in situations like right. Right now the de facto outcome is that changes are just impossible, and I'm starting to feel like it would be more productive to give up on trying to improve pip, and instead put my energy/recommend others put their energy into figuring out to make a viable pip fork :-(
Just to clarify - I don't believe that non-eager upgrades fix the issue that "pip install --upgrade foo" could upgrade foo from 1.0 to 2.0, but an already-installed bar might declare a dependency on foo (with no version) but not work with 2.0?
FWIW, I don't think it works even if bar explicitly depends on foo==1.0.
/tmp/pip-testing
$ ls ./repo
bar-1.0.tar.gz foo-1.0.tar.gz foo-2.0.tar.gz
/tmp/pip-testing
$ pip install --find-links ./
/tmp/pip-testing
$ pip install --find-links ./repo bar
Collecting bar
Collecting foo==1.0 (from bar)
Building wheels for collected packages: bar, foo
Running setup.py bdist_wheel for bar ... done
Stored in directory: /home/pradyunsg/.cache/pip/wheels/20/cd/44/f59790040978a7eb9989ce680e85681c252516bd7fc9baf059
Running setup.py bdist_wheel for foo ... done
Stored in directory: /home/pradyunsg/.cache/pip/wheels/27/9a/5f/3e8efff98718d38adb7cf6b20e4435694e8c465085792441be
Successfully built bar foo
Installing collected packages: foo, bar
Successfully installed bar-1.0 foo-1.0
/tmp/pip-testing
$ pip install --upgrade foo
Requirement already up-to-date: foo in /home/pradyunsg/.venvwrap/venvs/tmp-734de48113851ca/lib/python3.5/site-packages
/tmp/pip-testing
$ pip install --find-links ./repo --upgrade foo
Collecting foo
Building wheels for collected packages: foo
Running setup.py bdist_wheel for foo ... done
Stored in directory: /home/pradyunsg/.cache/pip/wheels/9d/3e/ce/b183a52b3e6844394d6cbf5606acadf8c340d48ccfcf02cc1c
Successfully built foo
Installing collected packages: foo
Found existing installation: foo 1.0
Uninstalling foo-1.0:
Successfully uninstalled foo-1.0
Successfully installed foo-2.0
/tmp/pip-testing
$ pip list
bar (1.0)
foo (2.0)
pip (8.1.2)
setuptools (25.0.0)
wheel (0.29.0)
/tmp/pip-testing
$ pip --version
pip 8.1.2 from /home/pradyunsg/.venvwrap/venvs/tmp-734de48113851ca/lib/python3.5/site-packages (python 3.5)
I don't think it works even if bar explicitly depends on foo==1.0.
Right, this is the "pip needs a real resolver" bug, which will get fixed eventually but is a big task so we don't want it to block other things if at all avoidable.
OTOH the case where bar uses an unversioned dependency on foo is basically impossible to get right AFAICT, so I'm not sure what it has to do with anything. The only solution for that is "never touch your venv ever again", and even that isn't guaranteed (because of things like new security holes or changes in external APIs that you need to talk to).
In fact, upgrading foo and installing foo are exactly as dangerous, because they do exactly the same thing -- they pull in the exact same versions of the exact same packages.
OK. We really are simply going to _have_ to agree to disagree on this. In my view, it's about the user's perception - "installing foo" is _adding something previously not present_ to your system, whereas "upgrading foo" is _changing something that's already there_. To the user, these are far from being the same thing.
Right now the de facto outcome is that changes are just impossible
OK, I give up. I don't believe there's consensus on this change, and I think it's wrong to implement it without consensus. You know I don't agree with it myself, but that's not the point here. Changes really aren't impossible (we've made plenty of changes, some pretty controversial) but neither side in this argument seems able to convince the other. In my view, that typically results in the status quo winning - but I'm aware that by saying that I'm going to be perceived as implying that "all I have to do to get my way is stall things". IMO, it says something about where we are at the moment that I feel that way :-(
I'm bowing out of this discussion now. If anyone makes an argument that changes my mind, I'll acknowledge that, but otherwise I have nothing more to say. If I'm the last holdout for not making this change, I give my permission to everyone to ignore me - I certainly don't feel that I (or anyone) should have a veto over changes, and I'm completely comfortable accepting a majority decision. If others do still have reservations about this change, they'll have to make their own arguments (but I'd remind participants that not everyone reads github issues - in spite of the discussion going off-track, there were some comments on distutils-sig that IMO deserve a response).
And jokes aside, it might actually be healthier for the project if someone like dstufft decided to play BDFL in situations like right. Right now the de facto outcome is that changes are just impossible
While I don't think that our current process is optimal I don't think it's quite as bad as "changes are impossible". Generally we previously would do something like bring something up on pypa-dev ML with a simple majority vote amongst pip core in cases that there wasn't a clear consensus. I think there are three active pip core devs now (Myself, @pfmoore, and @xavfernandez) so if all three of us vote you end up with a vote one way or another instead of a tie. Could we use a more formalized process? Yes probably. Could that be a BDFL role? Possibly, but I don't think that's required either.
Sadly, the current ad hoc process typically means that one of the core contributors needs to sit down and decide to push for the change and say "Ok let's vote on this" and declare some ad hoc rules for doing so.
Recall, there was agreeing on changing the behavior of --upgrade
, which is the major thing that was preventing things like projects depending on Numpy to declare their dependency. This particular change is jsut an idea that came out of that and is more of a UX thing than anything else. Like any project, unless _you're_ the BDFL there are going to be times when the decision making process goes against the option you want. I haven't done any of this because I've been focusing on Warehouse lately, pip will be coming back in my cross hairs after that's launched :)
No-one runs pip install foo when they already know that foo is installed, because that would be silly.
I'd guess you're right, but I'd also say a lot of people are running pip install -r requirements.txt
with requirements.txt
containing foo
(which is equivalent to pip install foo
) even though they know that foo
is installed on a daily basis.
And they are happy with the fact that pip does it quickly without checking if there is something to upgrade.
I'm not against the idea that pip install foo
could be equivalent to pip install foo==$LATEST
(in fact I like it) but I'm against changing this fundamental behavior without a deprecation period (and an escape option to keep the old behavior).
I'm not sure we have discussed this solution already, but this could be a new --strategy
option to pip install
:
no-upgrade
would be the default in pip 9 and pip install --strategy=no-upgrade
would be the current pip install
behavioreager
would be the current pip install --upgrade
behavior (and --upgrade
a deprecated alias for --strategy=eager
)non-eager
would be the default in pip 10oldest-compatible
for #3188, etcNote that you could also put strategy=non-eager
in your pip.conf to directly have it being the default in pip 9.
Could we use a more formalized process? Yes probably.
:+1:
And they are happy with the fact that pip does it quickly without checking if there is something to upgrade.
It'd probably still be pretty quick TBH. We serve responses in less than a ms from the Fastly cache :)
I'm not against the idea that pip install foo could be equivalent to pip install foo==$LATEST (in fact I like it) but I'm against changing this fundamental behavior without a deprecation period (and an escape option to keep the old behavior).
I'm fine with a deprecation period. I'm not sure about a long term option to keep the old behavior. I'm not opposed, I just want to make sure that it's something we really should support long term, options in general coming with a cost, and wanting to make sure the cost is worth it.
How does this sound?
Followed up with #3972.
Closing since #3972 is merged.
We have taken a different path to resolving the behaviour of --upgrade
.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Ok whew, I ignore this discussion for a few days it appears to have blown up on both the sig and here :)
I've tried to read over what's gone on in this thread, but well it's long and information dense so I might miss something, however you're about to get a wall of words.
I don't believe it's completely fair to say that the current behavior of
pip install -U <foo>
makes people more secure across the board. Yes, I can easily point out some projects like PyCrypto or cryptography where regressions are rare (particularly security regressions) and new releases generally include improvements to security. I think focusing only on those misses other cases though, such as the case where a new version of something on PyPI has caused a regression in security. There's I think, two other cases though, both which can boil down to "upgrading or not doesn't affect security at all" but differ in whether an upgrade is an OK thing to do for them or not.Overall, I don't think that recursive upgrades is a good security mechanism and if one of our goals is to prevent people from running old, insecure versions of software (and I think it should be a goal) than I think the way to achieve that is not to try and hang ourself off of the behavior of upgrades (and just pray and hope that they've happened to run an update in some amount of time) but to instead devote time to a dedicate solution to the problem. This may be something like
pip list -o
but which checks explicitly for security problems, it may be checking the entire installed set of packages against PyPI to see if there are any known security issues with any of them, or it may take on some completely other format. However, I think it's important that this isn't tied to some semi related functionality and that it actually covers the entire environment and not just whatever the user happens to be using. If someone doespip install requests[security]
once, and then from there on out doespip install -U requests
-- we're going to completely miss updates topyopenssl
andcryptogaphy
and such if we only rely on recursive upgrades.So pushing aside the security concerns for a moment, I think we need to take a look at what behaviors are most likely to give people what they want. Unfortunately with an ecosystem as large and with as varying use cases as Python I suspect there is no singular answer to "what people want". There are a few interrelated behaviors being discussed here, so let's tackle them one at a time.
For
pip install --upgrade <foo>
, we have evidence to that fact that our current behavior is actively harmful, so much to the point that projects are going out of their way lie to pip to prevent triggering that behavior. I think that we can all agree that something where people feel the need to actively subvert (not only in their own projects, but also advocate to other projects) likely need some refinements to how it actually works. In this case the only real solution is to attempt to avoid upgrading (or downgrading!) where possible, and to prefer the already installed version, _unless_ the user has explicitly asked for that to be changed _or_ we can't satisfy the version constraints otherwise. I can't see any other reasonable way to implement this that isn't going to accidentally trigger 30+ minute builds which possibly result in a version that is less suitable for the task at hand (not using an optimized BLAS or something).Advocating for leaving the current behavior of
pip install --upgrade
as it is, is essentially advocating against projects that depend on numpy from being honest about whether or not they depend on numpy. If someone has another suggestion for how we might solve the numpy problem [1] then I think they should bring it up.I know that one suggestion has been adding a
--non-recursive-upgrade
flag or a sort of--upgrade-strategy
flag, but I think that these ideas largely serve only to complicate the mental model people have of pip. For the vast majority of packages (particularly pure python ones that don't have a security sensitive role) it's not going to matter a whole lot whether we upgrade them or not, upgrading is low cost but there's little downside to keeping them pinned to the installed version (unless the person finds a reason, a feature or a bug to explicitly upgrade _that_ package). However we're living in the edge cases here and I only really see the two that matters, hard-to-upgrade packages and security sensitive packages, and like I mentioned above I think that if we're going to be worrying about ensuring folks get security upgrades we need a real mechanism for that, not a half hearted hope that an important upgrade got caught in a recursive upgrade at some point. Given that I think we need dedicated support for security sensitive, for most packages this won't matter, and for the hard-to-upgrade case there's only really one answer, I think that shunting this behind it's own flag is a bad idea without justification for the ongoing cost of maintaining a whole option for this [2]. I also think that the more conservative approach has to be the most obvious approach or we don't really solve things for the hard-to-upgrade crowd, so if we did add a new option, we'd still want to change the behavior of--upgrade
by default and add in some explicit option to get the less conversative/safe approach to upgrades.I don't believe we should allow a package to be able to mark itself for eager (or non eager) upgrades. I think this is something that needs to be consistent amongst all packages for end users to be able to have any hope of having a reasonable mental model of what pip is going to do to their system when they execute some command. That being said, I could see us adding the ability to have people mark versions insecure on PyPI and warn people if they have an insecure version installed on their system (or at the very least, if they're about to install one).
So, now that we've covered
--upgrade
, the other inter-related issue here is what should we do withpip install
versuspip install --upgrade
. Personally I think that we should make the two mean the same thing, _HOWEVER_ it might be more reasonable to focus the discussion first on just the behavior of--upgrade
and leavepip install
alone for now. Once we get the solution to upgrading sorted out we can tackle what we'll do aboutpip install
itself.[1] Although to be completely honest it's not strictly related to Numpy. I've seen people break their systems or their installs time and time again because of an inadvertent ugprade. It's true that a lot of project's don't have correct lower bounds, but I believe it's equally true (or more so) that they don't have correct upper bounds either. One particularly important thing is that it's possible to determine what the correct lower bounds are at time of packaging, but it's impossible to determine what the correct upper bounds are.
[2] Options are not free, they incur a cost and it's important to attempt to reduce the number of them you have as much as you can. While you typically can't reduce them to zero, a pattern I see far too much in OSS software is the desire to please everyone by adding more and more options, when really it's just a mechanism for avoiding making a decision that may be unpopular with some group of people.