Okay, here's a proposal:

End goal (where we want to end up)

pip install foo: upgrades foo to the latest version; also does the minimum set of installs/upgrades required to satisfy the new version's dependencies
pip install -U foo / pip install --upgrade foo: identical to pip install foo (except maybe they should eventually issue some warning?); kept for back-compat
pip require foo: same as the current pip install foo; has the same effect as installing a package that has Requires-Dist: foo. This is a weird low-level operation that should not be emphasized in the docs, but we keep it for now to provide a less-bumpy transition, plus it exposes a meaningful operation we need to support anyway (the Requires-Dist handling), so it's likely useful for some scripting use case.
pip install --upgrade-recursive foo: same as the current pip install --upgrade foo -- ensures that foo is the latest version _and_ ensures that all transitive dependencies are the latest version. This is a weird marginal option that should not be emphasized in the docs, but we keep it for now to provide a less-bumpy transition.
pip install --upgrade-non-recursive foo: same as the future pip install foo, but explicit to provide a less-bumpy transition.

Transition option A

Phase 0: what we have now -- pip install foo doesn't upgrade, pip install --upgrade foo does a recursive upgrade, pip require foo & pip install --upgrade-recursive foo are errors
Phase 1:
- we add pip require foo, pip install --upgrade-recursive foo, pip install --upgrade-non-recursive foo.
- pip install foo and pip install --upgrade foo continue to act like they do now, but are modified to check what they _would_ have done if --upgrade-non-recursive were set, and issue a deprecation warning whenever what they actual do is different from what they will do in the future.
- Users who want to opt-in to the future behavior (and silence the warnings) can use the usual configury to set --upgrade-non-recursive as their default (e.g. adding [install] upgrade-non-recursive = yes to pip.conf)
Phase 2:
- pip install foo and pip install --upgrade foo switch to the new behavior.

Transition option B

KISS: skip phase 1 and go directly from phase 0 to phase 2. Rationale: it's not clear that this will actually break anything, people are going to be somewhat confused and annoyed in either case, it's entirely possible they'll be more confused and annoyed by the phased transition than by the actual change, we have limited resources, and we're eager to get to the shiny new future.

In this version we can also probably skip adding --upgrade-non-recursive, since its immediately redundant as soon as it's introduced.

Comment

I'm sorta expecting that everyone will push back and insist on transition option A instead of transition option B. But I'd actually be happy with either one, so instead of pre-emptively compromising I'm going to let someone to else to make that argument (if they want to) :-).

njsmith on 8 Jun 2016

👍1

Hmm, I don't see the added-value of your pip require ? It looks like a duplicate of pip install ? Or maybe a pip install --no-upgrade ?

xavfernandez on 8 Jun 2016

I'm happy enough with option B.

But I don't follow your description. You say pip require foo: same as the current pip install foo. So it'll error if foo is installed? And pip install --upgrade-recursive foo: same as the current pip install --upgrade foo. I thought there were problems with the existing install --upgrade behaviour (beyond it not being the default) - there's a whole load of discussion somewhere about needing a SAT solver. Is your proposal that we don't do anything about those issues? Or am I misremembering and there's not actually a problem with the current --upgrade behaviour?

pfmoore on 8 Jun 2016

I'm happy with option B.

I don't like the idea of a pip require command for the same reasons I didn't like the split pip install and pip upgrade commands. Two commands that do sort of the same thing but not quite forces people to make a decision about which one they use up front, versus using flags. I also think that it's good practice for boolean flags (ones that toggle something on/off) to have an inverse wherever it makes sense, to allow people to compose commands better.

So with all that in mind, here's what I would do:

pip install --upgrade ... now does a "minimal" upgrade by default, upgrading anything named on the command line/requirements file to the latest version, but only updating dependencies if required.
pip install --no-upgrade ... behaves as pip install does now, similarly to your pip require command, and just ensures that a version, any version, of the names requirements are installed.
pip install ... has it's default switched from an implicit --no-upgrade to an implicit --upgrade.

You might notice, that there's nothing like the current behavior listed so far, a "upgrade everything in the dependency path to the latest version" sort of flag. I'm on the fence about if we really want something like that (and if we want it, do we want to keep it forever, or would it just be a temporary shim to ease transition). Another thing to keep in mind when deciding this is how the theoretical pip upgrade command affects this decision. In other words, if we have a command to upgrade all the installed items, do we foresee people ever wanting to upgrade X and all of it's dependencies?

If we do want something like the current --upgrade behavior, then I think I see two options:

--recursive / --no-recursive To turn on the old or new behavior (but what would these do if --no-upgrade was selected? Silent no-op? Error?).
--upgrade-strategy=(minimal / recursive) to switch between two different strategies, a bit wordier than --[no-]recursive, but also makes it easier to add additional strategies if we ever find ourselfves in the need.

In terms of the dependency resolver, I don't think these two issues are really intertwined that much. Our resolving is currently a problem in both the pip install and the pip install --upgrade case, and I believe it will continue to be a problem with the proposed changes. It's something that needs fixed, but I don't think it has any bearing on what we do here (although it likely does have some bearing on the hypothetical pip upgrade command).

dstufft on 8 Jun 2016

I'm not aware of a strong requirement for the current behaviour (by "strong" I mean "anything other than backward compatibility"). But if people did need it, they can get it by simply listing all of the dependencies on the command line.

It's pretty easy to write a script to show all (recursive) dependencies of a package:

# reqs.py
import sys
from pkg_resources import get_distribution

def reqs(req):
    results = []
    queue = [get_distribution(req)]
    while queue:
        current = queue.pop()
        results.append(current)
        for r in current.requires():
            d = get_distribution(r)
            queue.append(d)
    return results

if __name__ == '__main__':
    print('\n'.join(sorted(set(d.project_name for d in reqs(sys.argv[1])))))

Then you just do pip install $(reqs.py foo) to get an "eager install" of foo and its dependencies. I'm sure there are shortcomings with this approach, but is the problem common enough to warrant a more complex solution?

pfmoore on 8 Jun 2016

@pfmoore well that script only works if no dependencies have changed between the currently installed versions and the to-be-upgraded-to versions (and of course, it assumes everything is already installed).

That being said, the only real use case I can come up with is installing a project into an environment that already has stuff installed into it, but wanting to have the latest version of dependencies. IOW, a framework like Pyramid might prefer that new users install it's dependencies using the recursive upgrade. HOWEVER, even in this scenario, (which is the only one I can think of) if the hypothetical Pyramid's version specifiers are all correct, then the end user should expect it to work regardless (and it's similar in nature to what folks would get already in the current pip install behavior with something already installed).

If someone does want "Pyramid, and all of it's dependencies up to date", it's somewhat nicer than the proposed way of doing that (combining the two proposals), which would be pip install Pyramid && pip upgrade (which isn't exactly the same, since pip upgrade would do more than just Pyramid).

So that's my hesitation, is that I struggle to come up with a scenario where it's the clear cut right thing to do, but it could make some edge cases moderately nicer. We could always leave it out, and if we come across people asking for it add it in again at that point in time.

dstufft on 8 Jun 2016

I dislike both A and B. I don't like the idea of introducing a new command, nor do I want to switch to the new behavior without some "deprecation" style period for the current behavior. Hence I put forth my own proposal below.

I'm not aware of a strong requirement for the current behaviour

Me neither. Yet, I don't want to break someone's working code without telling them. I would find it rude. 'Don't do unto others what you don't want others to do unto you.' This is why I think I don't want to switch with no warning as in @njsmith's B option either.

If someone does want "Pyramid, and all of it's dependencies up to date", it's somewhat nicer than the proposed way of doing that (combining the two proposals), which would be pip install Pyramid && pip upgrade (which isn't exactly the same, since pip upgrade would do more than just Pyramid).

As I understand it, If someone wants "Pyramid, and all of it's dependencies up to date", after the switch to the new behavior, it's pip install --upgrade-strategy=eager Pyramid. That would eagerly upgrade Pyramid and it's dependencies to the latest version, regardless of whether an upgrade is unnecessary.

I thought it was clear that we wanted to provide both the current "recursive-latest" and the new default "only-if-needed" upgrades. Just emphasizes that I need to post the common accepted ideas.

Proposal

Make a major version release that deprecates current behavior and provides a warning on use of these commands with opt-in flags and configuration to the new behavior.
- Flag(s) should be provided to allow the user to check out the new behavior to be introduced. Using the flag(s) in this version would imply --upgrade.
- _maybe_, pip install --upgrade warns that this flag will become no-op in next release.
- pip install warn that the behavior is changing in the next release and current behavior won't be available in the next release.
Possibly, both warnings provide a link to documentation that suggests to the user what they should do.
Switch to new behavior in next major version release.
- If someone really needs the current behavior, a --no-upgrade flag may be added. But I don't want to see that unless someone _really_ needs it.

Bikeshed: Options and flags in 1. I prefer to add a --upgrade-strategy=(eager / non-eager / default) as the flag in 1 and switch the default strategy to eager in the 2.

pradyunsg on 8 Jun 2016

Also worth pointing out, explicitly, there is no need for a dependency resolver in pip for this. While with the new behavior it's still possible to break some ~~line~~ edge in the entire dependency graph, it becomes less likely if you upgrade less often.

how dependencies are handled

Uniformly independent of depth. The user can choose between eager and non-eager upgrades. They are as I had define in my earlier write-up.

what happens with constraints (and when they conflict)

I would say whatever happens today.

binary vs source

To be handled in #3785. Until then, keep as is.

pradyunsg on 8 Jun 2016

I think it was clear that we wanted to provide both the current "recursive-latest" and the new default "only-if-needed" upgrades.

Nope, I don't think so. The "only if needed" behaviour is, as far as I know, agreed by everyone as what we would like to have available. But I understood the current behaviour to be generally considered as having issues. Whether those issues all revolve around the "pip needs a proper dependency resolver" problem, and we're OK with keeping the current behaviour until that is fixed, I don't really know.

pfmoore on 8 Jun 2016

The main problem(s) with the current behavior (that isn't actually a result of the lack of a real resolver) is that the "greedy-ness" of it causes things to be upgraded that might not otherwise be upgraded. On the tin that doesn't seem like a big problem, however it has some subtle (and some not so subtle) interactions:

It makes it more likely that something like sudo pip will inadvertently break someone's OS because it makes it more likely we'll recurse into a dependency provided by the OS (even if the user invoking pip had no idea that would be affected).
Some libraries are very expensive to ~~install~~ build, particularly ones like Numpy where compiling can take 30+ minutes.
The recursive upgrade introduces more churn on the installed set of packages, which increases the likelihood that something that was already working, breaks because of an upgrade to a shared dependency.

The first two of those are things that could possible be fixed, at least in part, by other solutions (and for which, this solution isn't a total fix either). You could fix the breaking of the OS by making pip smarter about not mucking around with the OS files by default. Wheels make it easier to install even hard to build libraries like Numpy but not everything has a Wheel, and if you're on anything that isn't Windows, OS X, or manylinux1 then your chances of getting a wheel are basically zero.

The churn on what is installed is only going to be fixed by this patch, as well as reducing the occurrence of the first two issues (by being more conservative when we actually attempt to do anything).

dstufft on 8 Jun 2016

Of course, this is a super subtle sort of difference and it's hard to nail down all of the exact benefits (they'd be more accurately described as trade offs, rather than a straight set of benefits). I don't know if the old behavior is something that, in the cases it's useful, it's useful enough that people would bother using a flag for it or not. If we add the flag, it becomes hard to ever remove it, if we don't add it now, we could always add it again in the future, so for that reason i lean somewhat towards leaving it out and waiting to see if we get people asking for a way to bring the old behavior back.

dstufft on 8 Jun 2016

I think it was clear that we wanted to provide both the current "recursive-latest" and the new default "only-if-needed" upgrades.

Nope, I don't think so

Hmm... I did think that both behaviors were seen as useful. That's what the Pyramid example made me think. It's using the current behavior and it does exactly what is desired.

It seems desirable to be able to say "upgrade pkg and all it's (sub-)*dependencies to latest version". I don't want to upgrade _everything_ in my ecosystem, I just want to get the latest bug-fixes for pkg and dependencies.

Some libraries are very expensive to install build, particularly ones like Numpy where compiling can take 30+ minutes.

By conservatively upgrading packages, it does make this happen less often.

Edit: You mentioned that.

The recursive upgrade introduces more churn on the installed set of packages, which increases the likelihood that something that was already working, breaks because of an upgrade to a shared dependency.

This _needs_ a dependency resolver to be fixed. I consider that out-of-scope of this issue.

If we add the flag, it becomes hard to ever remove it, if we don't add it now, we could always add it again in the future, so for that reason i lean somewhat towards leaving it out and waiting to see if we get people asking for a way to bring the old behavior back.

That works pretty well with me. Adds to why I want a "deprecation" release for the current behaviour to get people asking for it to stay, rather than re-added.

Edit: s/version/behaviour/

:confused: Any comments on my proposal above?

pradyunsg on 8 Jun 2016

The recursive upgrade introduces more churn on the installed set of packages, which increases the likelihood that something that was already working, breaks because of an upgrade to a shared dependency.
This needs a dependency resolver to be fixed. I consider that out-of-scope of this issue.

No, this isn't related tho the dependency solver thing. This is just "software is hard, and new versions sometimes add new bugs, therefore, the more churn you have, the more likely you are to get bit by new bugs".

The most stable (in terms of new, not previously encountered bugs) software is software that never changes.

Any comments on my proposal above?

I'm a little concerned about adding a warning for every invocation of pip install, but I'm not opposed to it-- it's certainly the safer route though and it's one that's more in line with our typical deprecation process and it gives a chance for people to clamor for an option to use the old behavior.

I do think that we need to either deprecate the --upgrade flag completely as part of this (probably no-op it and hide it for a long while), or we need to add --no-upgrade to get back to the old behavior of pip install .... I don't want a fairly useless --upgrade flag laying around in our help. So then the question for a --[no-]upgrade flag becomes whether we see the current behavior of pip install useful at all. Here again I don't have a strong opinion-- We could use the deprecation period again as a chance to see.

dstufft on 8 Jun 2016

Any comments on my proposal above?

Honestly, I really don't like the idea that essentially every invocation of pip install will give a warning for a full major release cycle. That seems guaranteed to just annoy users, and as a result we'll probably get no useful feedback, just a lot of complaints about the process.

My preferences remain with @njsmith's approach - probably the "just go for it" approach, but if necessary the gradual version.

I have to admit that I find it very hard to understand the impact on my day to day usage of these various proposals. There's a lot of theory and edge cases being discussed, which is obscuring the key points. I think that whatever transition process we adopt, someone should work on a clear "press-release" style description of the proposed changes and their impact, which we can publish on distutils-sig before making the changes. That should allow us to gauge reactions from the wider community. (I don't think this needs a PEP, but I do think it needs publicising).

My instinctive feeling is that I'll be (mildly) happy by the new "as little as possible" upgrade behaviour, mildly irritated by the fact that "install" now upgrades without an explicit flag (but hopefully I'll get used to it reasonably quickly) and otherwise mostly indifferent. My main usage will probably remain pip install new_thing to install a new package and a manual "get all the package names, and do pip install <all of them at once> to manually simulate "update all". Neither of these will be affected by any of the proposals (except that the new "as little as possible" upgrade strategy will avoid the odd unwanted numpy upgrade attempt that the current behaviour inflicts on me).

For me, the tipping point comes when --prefer-binary and "upgrade all" become available. Those will affect my usage, and it won't really be until then that I'll see any benefits (or issues) with the change to upgrade strategy.

pfmoore on 8 Jun 2016

Honestly, I really don't like the idea that essentially every invocation of pip install will give a warning for a full major release cycle. That seems guaranteed to just annoy users, and as a result we'll probably get no useful feedback, just a lot of complaints about the process.

Indeed. I didn't think about that in a hurry to leave. Oops!

My point is, I really want pip itself to have a major version deprecation run with such a major change to the main command of it. Any form it takes, I'm game.

I think being selective about when we show the warning message is the way forward.

How do you choose? @njsmith suggested only when the behaviour differs. Other than the fact that it's essentially doubling the work done in every install execution, as long as we publicise well (in advance and detail), I think it's good idea.

edit

Or maybe not on second thought. It won't be showing the message to everyone like we would want to. I would want to show it to everyone at least once.

How about some configuration file magic, asking the user to set a flag in the configuration file? This is where an --upgrade-strategy=default or similar flag would come in handy.

Any alternate ideas for this?

the tipping point comes when --prefer-binary and "upgrade all" become available. Those will affect my usage, and it won't really be until then that I'll see any benefits (or issues) with the change to upgrade strategy.

True. While this change will fix some issues (unnecessary re-installs) directly, I think it ~~will~~ might indirectly help resolve other issues as well.

pradyunsg on 8 Jun 2016

Similarly to @pradyunsg's last idea, iirc git shows (kinda long) messages for when it introduced or is going to introduce a big change that you can disable by setting a configuration via commandline that is mentioned in the message. I've liked that so far.

FichteFoll on 9 Jun 2016

A temporary option to disable the message wouldn't be the worst possible behavior.

dstufft on 9 Jun 2016

@pradyunsg: Before we get into the nitty-gritty of deprecation strategies... is there any chance I can convince you that the "option B" approach is okay? (Normally I wouldn't try, but given that core devs like @dstufft and @pfmoore are okay with it I guess I will try :-).) I definitely understand why you find just switching to be "rude" to users, but it's a complex trade-off -- not switching is also rude in different ways to different people. For example:

The longer we delay the switch, the longer we're continuing to inflict the annoying current behavior on our users -- note that #59 has 199 comments from 56 participants, many of them just +1's. Making them wait another year is kinda rude too.
Deprecation periods are complicated and difficult -- they intrinsically impose extra costs on users. Pip gets 10 million downloads/month just from PyPI, so e.g. your proposed message will be shown at least 10 million times. Multiply by how long it takes to read something like that, make some decision, update some config file, etc., and then maybe do it over again in a year when the defaults actually switch.
If we're ever going to get this swamp drained then at some point we gotta get moving. Waiting a year between each improvement is really painful.
And deprecation cycles are costly for developers -- we're already extremely, extremely short on developer resources, so there's a very real cost to spending time implementing complex deprecation logic, keeping track of the schedule, coming back a year later and reminding ourselves what we decided, etc. That's time that could be spent on improving warehouse, implementing a proper resolver, pushing forward --prefer-binary, etc. etc. It's not enough to say "a deprecation is important", one has to argue that it's _more_ important than other things one could do with that time.

8.1.2 flat out broke a bunch of people's deployments due to a complicated bug involving the interaction between pip, pkg_resources, and devpi. It sucked but people dealt with it. Given our limited resources, it's a fact that we're going to sometimes break things and sometimes leave broken things sitting for years without progress and generally cause users pain. We can't change that, but we can at least be smarter about which _kinds_ of pain we cause users, and "install starts working the way lots of users already expect" is a much more productive outcome than most :-).

@pfmoore:

You say pip require foo: same as the current pip install foo. So it'll error if foo is installed?
No, right now if foo is already installed then pip install foo does nothing and exits successfully. I was imagining pip require would be a way to directly talk to the constraint resolver: "here's a new constraint, please ensure it is satisfied". Semantically meaningful and well-defined, but a pretty low-level for-experts interface.

@dstufft: I find pip install --no-upgrade foo rather confusing, though -- from the name I'd expect that it would do something like... try to install foo but error out if foo had a dependency that would force the upgrade of something I already had installed? Which is kinda the opposite of what it would actually do. For me the require operation and the install operation are conceptually really distinct -- see also Guido's comments on how if you ever find yourself writing a function that takes a boolean arg, and you know that your callers will be passing a constant rather than a variable for that arg, then you should have two functions. So splitting it out into a new command was me trying to imagine what it might look like in a world where we added it for its own sake, rather than just to fulfill our obligation to have a --no form of --upgrade or whatever. But I'm also just as happy to drop it entirely for now...

Okay, how about this as a strategy:

9.0 makes pip install foo = pip install --upgrade foo = non-recursive upgrade
We make a nice little writeup explaining the actual effect this has (pip install foo now will upgrade if foo is installed; pip install --upgrade foo will no longer upgrade all dependencies recursively)
We provide some script like @pfmoore's above and in the release notes say "if you really want a recursive upgrade, try this..."
We make a mental note to consider adding a pip require foo command in the future if it turns out to be useful, but defer that for now because it's not really a priority and it's easier to add stuff than to take it away
We keep --upgrade around as a no-op indefinitely, but take it out of --help, and the reference manual just says "no-op; kept for backwards compatibility". (Maybe in a few years we tear it out entirely, maybe not -- I don't care and am happy to just defer that discussion until a few years have passed.)

That avoids the worst gratuitous breakage (there's no reason for pip install -U foo to become a hard error and invalidate tons of existing tutorials), but otherwise keeps things radically simple, so we can skip or defer thinking about things like --no-upgrade or the most ideal spelling for recursive upgrades and get the important parts moving ASAP.

njsmith on 9 Jun 2016

It seems desirable to be able to say "upgrade pkg and all it's (sub-)*dependencies to latest version". I don't want to upgrade everything in my ecosystem, I just want to get the latest bug-fixes for pkg and dependencies.

The problem with this is that in lots of cases, it doesn't really make sense to assign some dependency to any particular dependant. Like, lots of people have environments with ~30 different packages installed, of which 1 is numpy and 29 are packages that depend on numpy. So if I want the new bug-fixes for astropy, should that upgrade my numpy? That might fix some issues with astropy but it might also break the other 28 packages, who knows. Pyramid's dependency chain includes a number of widely-used utility libraries like zope.interface and repoze.lru and setuptools (why? idk). So recursively upgrading Pyramid might break Twisted (which depends on zope.interface and setuptools and nothing else). There's no way that "I want the latest bug-fixes for Pyramid" implies "I want the latest setuptools" in most users' minds -- but that's how pip install -U currently interprets it.

njsmith on 9 Jun 2016

Similarly to @pradyunsg's last idea, iirc git shows (kinda long) messages for when it introduced or is going to introduce a big change that you can disable by setting a configuration via commandline that is mentioned in the message.

That's exactly where I got the idea.

I've liked that so far.

Ditto. Hence I would like to see it in pip. It's a field-tested process.

I do agree that every-run-warning is a bit too much but having it show all the time until the user acts on it is something I know, from git, works even for major changes like this.

is there any chance I can convince you that the "option B" approach is okay?

Maybe. You're right the trade-offs are complicated and having to wait an year till the switch isn't the most convenient thing either. Breaking certain niche-cases that don't affect _everyone_ is fine. That is just going to happen. Here, we're changing the most used command of pip (in documentation of packages and otherwise). Doing so without a proper warning period might just not be the best of things to do. Nor should this be done without giving people some time to fix their tools/workflow/etc to work with the new behaviour.

With @njsmith's current proposal, I still don't get a proper warning or give people some preview of the upcoming (major) change. That's all but it's enough that I don't like the proposal. If someone can convince me that dropping the these two requirements would be fine and it's possible to properly inform people that this, a big change, is coming their way in some other manner, I'm fine with that.

If we get the deprecation nitty-gritties right, it should possible to implement this in such a manner that the deprecation-release-only stuff stays in one module (module as in English; a class, function or something else) and the next major release just stops invoking that module and removes it. That way at least the post-deprecation work is minimized.

59 has 199 comments from 56 participants, many of them just +1's. Making them wait another year is kinda rude too.

They don't _have_ to wait another year. They can just opt-in to the new behaviour. We're just giving time to people whose stuff broke due to the change. Others can just opt-in to the nicer behaviour.

We keep --upgrade around as a no-op indefinitely, but take it out of --help, and the reference manual just says "no-op; kept for backwards compatibility". (Maybe in a few years we tear it out entirely, maybe not -- I don't care and am happy to just defer that discussion until a few years have passed.)
[snip?]
That avoids the worst gratuitous breakage (there's no reason for pip install -U foo to become a hard error and invalidate tons of existing tutorials)

If it wasn't obvious, this would happen in my proposal's 1. No one gets bothered by a no-op -U's presence. It's absence will invalidate many packages' documentation and break stuff. We'll keep it till it is rare enough to be safe to remove. That discussion should happen a few years later. (let's mark 16th September 2018 for this, for no reason what so ever)

Regardless of whether I change my position on @njsmith's proposal, we'll keep a no-op --upgrade post-deprecation.

There's no way that "I want the latest bug-fixes for Pyramid" implies "I want the latest setuptools" in most users' minds -- but that's how pip install -U currently interprets it.

True. But this is due to the lack of a dependency resolver. Once it's added, it does _exactly_ what the user wanted. There's only so much we can do till then. Adding a warning in the documentation about the potential breakage of the dependencies is sufficient for now IMO, since this behaviour shall become opt-in. And this assumes that the packages maintain their promises made through version-numbers. If they break, there's little pip can do until packages refine their version-specifiers.

As a side, I think there should be a piece of documentation mentioning that pip may break your dependency graphs.

So if I want the new bug-fixes for astropy, should that upgrade my numpy?

Not if it breaks your dependency graph. Neither if it removes your well-configured numpy. The former case needs a dependency resolver. The latter needs "holding back" of upgrades. Both out-of-scope in this discussion.

Until we get those, the most we can do is tell people - "pip doesn't do the right thing all the time and we don't have the resources to fix it. Help would be appreciated."

This is just "software is hard, and new versions sometimes add new bugs, therefore, the more churn you have, the more likely you are to get bit by new bugs".

I can only say, sad but true to this.

pradyunsg on 9 Jun 2016

I am posting what is the mental picture of the post-deprecation behaviour is in my head... Just to make sure I don't miss out on anyone's concerns.

pip install upgrades in a non-eager manner, upgrading dependencies only-if-needed.
- TBD: if also want to add a no-op flag which depends on deprecation path
pip install --some-flag upgrades in an eager manner, upgrading dependencies to the latest version allowed by version-specifiers.
- TBD: if wanted
--upgrade becomes a no-op. It is kept in install --help, documented as "kept for backwards compatibility".
- TBD if it is removed from help, I say no
pip require is deferred until someone comes around asking for it. As note below, this cannot be the case. (edit: it later turned out that I was wrong.. :| )

Once we have decided upon the required behaviour, I'll start working on the implementation. (I'm still familiarizing myself with the implementation details of pip install and #3194 right now.)

Let's finalize the behaviour and how we want to do the deprecation here and we'll bikeshed the option names in the PR I eventually make.

pip install --target <dir> is documented as "By default this will not replace existing files/folders in

."

Since install shall now start upgrading (replacing) by default, it seems more consistent to replace the existing files and folders by default and provide some flag if the user wishes to have the older behaviour of not-replacing. AFAIK, this flag is undecided on. pip require has similarities. So, I think we can't defer the discussion on pip require and need to do it now.

The overlap with pip install and the need for it presented by install --target makes me want to have the require behaviour behind a flag in install.

pradyunsg on 9 Jun 2016

@pradyunsg:

Here, we're changing the most used command of pip (in documentation of packages and otherwise). Doing so without a proper warning period might just not be the best of things to do. Nor should this be done without giving people some time to fix their tools/workflow/etc to work with the new behaviour.

It's the most used command of pip, but we're only touching two weird corner cases: pip install foo where foo is already installed, and pip install -U foo where foo has some recursive dependency that's out of date. While I'm sure there will be some obscure breakage no matter what we do, I can't think of any sensible tools or workflows that would be broken by this -- can you give an example of what you're thinking of?

True. But this is due to the lack of a dependency resolver. Once it's added, it does exactly what the user wanted.

??? no idea what you mean here -- Pyramid recursively depends on setuptools, and my argument is that this demonstrates that "package and its recursive dependencies" doesn't actually correspond to any meaningful concept in the user's mental model. AFAICT this is totally orthogonal to the dependency resolver issue?

pip install --target <dir> ... Since install shall now start upgrading (replacing) by default, it seems more consistent to replace the existing files and folders by default

I think the issue with pip install --target <dir> is that it doesn't really install into an environment at all -- it's used for things like vendoring. And without an environment, the upgrade/install distinction doesn't even make sense. My vote is that we leave it alone -- the current behavior is fine IMO.

pip require has similarities.

It does?

njsmith on 9 Jun 2016

we're only touching two weird corner cases: pip install foo where foo is already installed, and pip install -U foo where foo has some recursive dependency that's out of date.

Hmm... Indeed. While the change is major, I do agree that it's just weird corner cases that we break. But I would really want to get some user input before making the change... It doesn't feel right to make such a change without a deprecation.

If everyone else here (mainly @pfmoore and @dstufft) says that they prefer no-deprecation switch over a deprecation switch, I guess I'll be fine with going ahead and implementing @njsmith's proposal.

True. But this is due to the lack of a dependency resolver. Once it's added, it does exactly what the user wanted.

Pyramid recursively depends on setuptools, and my argument is that this demonstrates that "package and its recursive dependencies" doesn't actually correspond to any meaningful concept in the user's mental model.

I disagree. It is a meaningful thing to want to get the latest possible version of a package and its dependencies. As an example, if I have found that my current environment has an issue related to pkgA, I would want to check against the latest releases of it and all it's dependencies to eliminate the possibility of this being an issue that got fixed in a new release. I think it's reasonable to expect that to be possible.

Just to be clear, Let's not provide the old behavior for the simple reason that it provides lazy people a way to keep the existing behavior if it works for them. We'll keep it only if we figure out some valid use-case. If we go down the deprecation path, it'll be deprecated but available till end-of-deprecation. If someone wants that behavior, they'll say they do and we'll pull it out of deprecation and let it stay.

AFAICT this is totally orthogonal to the dependency resolver issue?

The dependency resolver comes into play when A and B both depend on C, A is recursively upgraded, breaking C for B since pip does not care about B's version specifiers when handling A's. This was the example you gave with Pyramid, Twisted and zope.interface being A, B and C respectively.

pip require has similarities.

It does?

Yes, in that it also does not affect already-installed packages. But on reviewing this, they are more different than similar. This option is more along the lines of --avoid-installed. I don't know why I thought they were similar enough to merge...

pradyunsg on 9 Jun 2016

@njsmith

No, right now if foo is already installed then pip install foo does nothing and exits successfully

What I see is

>pip install xlrd
Requirement already satisfied (use --upgrade to upgrade): xlrd in c:\users\uk03306\appdata\local\programs\python\python35\lib\site-packages

I'm not sure about the exit status, I was thinking about the user experience. Apologies, I was being sloppy in my wording - I meant that I "get an error message" (maybe it's technically a warning) rather than that pip sets the exit code to error. But either way it's a minor point.

Responding to other emails:

I agree with @njsmith that deprecation is in many ways just as bad an experience for users as a sudden change. In this case I remain in favour of just going straight to the improved version. There's been plenty of debate on the tracker, and lots of people have noted their interest in seeing the new approach land. @pradyunsg if you still feel that we should warn users, then by all means post on distutils-sig (and even python-list if you feel it's warranted) and announce the plan there. There's a risk that doing so results in even more bikeshedding and debate, which may or may not be productive, but that's the nature of packaging changes :-)

I'm also in agreement that I don't see "Pyramid and all its dependencies" as a particularly useful thing to want to upgrade. Pyramid itself, of course. And Pyramid and _selected_ dependencies, quite possibly. And certainly "everything in this virtualenv (which was set up for my Pyramid development)".

Which prompts the thought - how often would people asking for eager upgrades be better served by using virtualenvs and upgrade-all? I can't speak for other people's workflows, but it's certainly how I tend to operate. And of course for many environments, pip freeze and exact version restrictions are the norm, so eager updates would be inappropriate there.

Finally, we've decoupled "pip needs a solver" from this proposal - so arguing that eager is useful once we have a solver isn't relevant right now. Current eager behaviour can break dependencies - so we should remove it, and then maybe reintroduce a working version once we have a solver and we've had feedback that (a not-broken version of) the feature is useful to people.

pfmoore on 9 Jun 2016

if you still feel that we should warn users, then by all means post on distutils-sig (and even python-list if you feel it's warranted) and announce the plan there.

I think announcing on distutils-sig sounds fine to me. python-list, I'll think about it.

There's a risk that doing so results in even more bikeshedding and debate, which may or may not be productive, but that's the nature of packaging changes :-)

That's a trade-off. I guess I'll redirect them to the PR for the bikeshedding and take other comments on the mailing list...

Quick correction: I really should have mentioned the entire help-text of --target.

""" Install packages into

. By default this will not replace existing files/folders in . Use --upgrade to replace existing packages in with new versions. """

If we are making --upgrade a no-op, --target should not depend on it. We need to figure this out.

Finally, we've decoupled "pip needs a solver" from this proposal - so arguing that eager is useful once we have a solver isn't relevant right now. Current eager behaviour can break dependencies - so we should remove it, and then maybe reintroduce a working version once we have a solver and we've had feedback that (a not-broken version of) the feature is useful to people.

Sounds good to me. I guess we can drop the eager upgrade behavior. It's easy to add it if we need to. Removing it (after the switch), not so much. I do think not providing it and advocating use of virtualenv for the job is a good idea.

pradyunsg on 9 Jun 2016

@pfmoore I take it that you wish to go down the no-deprecation path.

I'm also in agreement that I don't see "Pyramid and all its dependencies" as a particularly useful thing to want to upgrade. Pyramid itself, of course. And Pyramid and selected dependencies, quite possibly.

When you put it that way, it makes sense why what I was saying is not ideal.

Current eager behaviour can break dependencies

I think any package change has the potential to. The non-eager behavior just reduces the number of changes and thus works around this issue fairly well enough to reduce breakages substantially.

Anyway, I take it that it's decided that eager upgrades would be dropped.

We need to figure this out.

Maybe reuse --force-reinstall? I don't know enough about these options to be sure...

@dstufft I'm waiting for your views on deprecation vs no-deprecation.

pradyunsg on 9 Jun 2016

So, that leaves us with --upgrade and --target only. (and @dstufft's vote)

I request anyone with any issues/requirements, that they feel haven't been handled, to bring them up now. Not that it's the last chance or anything, just a good time to do so.

pradyunsg on 9 Jun 2016

Current eager behaviour can break dependencies
I think any package change has the potential to.

Specifically current eager behaviour can leave the system in a state where declared dependency requirements (which aren't inconsistent, or otherwise broken) are violated when they were not previously. That is not acceptable, and is what a "proper solver" should address. For the simpler "only as needed" upgrades, my understanding is that the risk of such breakage is minimised even without a solver.

So, that leaves us with --upgrade and --target only.

Apart from changing the help text of --target to not refer to --upgrade, I consider --target to be out of scope here. The help text is

Install packages into <dir>. By default this will not replace existing files/folders in <dir>. Use --upgrade to replace existing packages in <dir> with new versions.

I propose we just replace it with

Install packages into <dir>.

Presumably the default will change (as with normal "install") to overwrite by default, and if you don't want to overwrite, you just don't run the install command (same as if you're installing into site-packages). If users want anything more complex, they can work out the appropriate commands, let's not worry about trying to offer suggestions (that may or may not be helpful in practice).

pfmoore on 9 Jun 2016

The help text is

Install packages into
. By default this will not replace existing files/folders in . Use --upgrade to replace existing packages in with new versions.

I propose we just replace it with

Install packages into
.

Hmm... Are you sure that you want to remove the functionality of not replacing existing files/folders?

pradyunsg on 9 Jun 2016

Are you sure that you want to remove the functionality of not replacing existing files/folders?

It's not me that was advocating that - @dstufft and @njsmith argued strongly that "install" should upgrade when given an already installed package. The only thing I'm adding is that I don't think the behaviour should be different just because the user specified --target.

Maybe having a --no-replace option is needed, but if so it should apply to both --target and non---target installs.

pfmoore on 9 Jun 2016

Off Topic

At the cost of being picky, a tiny markdown suggestion/request/tip/{whatever_you_want_to_call_it} - Keep an empty > line in block quote to make it dedent... Otherwise it just merges into the higher-level quote...

> > > A
> >
> > B
> C
>
> D

A

B
C

D

Do note how B and C came up on the same level of quoting but D actually got the dedent...

pradyunsg on 9 Jun 2016

Maybe having a --no-replace option is needed, but if so it should apply to both --target and non---target installs.

:+1:

pradyunsg on 9 Jun 2016

At the cost of being picky, a tiny markdown suggestion/request/tip

Thanks. I try to do "preview" but missed that.

pfmoore on 9 Jun 2016

I'm will be starting my implementation work off master, on Monday. We're nearly decided on almost everything and even if @dstufft says we want deprecation, the new behaviour to be introduced has to be provided anyway.

Here's what I'm going to start implementing:

--upgrade stays but becomes a no-op. It's value is never used anywhere.
- I'm keeping it in --help.
pip install will do upgrades in a non-eager manner, upgrading dependencies only-if-needed.
No eager upgrade options. Current behaviour would become unavailable.
Will implement --no-replace that would not allow installing packages over other packages and move on without errors (like what current pip install pkgA does for pkgB when pkgA is not installed, pkgB is installed and pkgA depends on pkgB).

I think we decided we'll keep --upgrade around for now (for backwards-compatibility) but not about deprecation and _eventual_ removal. Should it be removed using the normal deprecation cycle, starting v9.0 (I think it's al-right if we remove it in in 10.0/11.0...)?

As an aside, I was thinking, since this change will make the next major version an intentionally-backwards-incompatible release; Would it make sense to try to push for some other issues to be fixed in the same release? If so, are there any such issues?

It would help maximize the utility of our decision to break backwards compatibility.

waiting on @dstufft's comment

edit: Added "on Monday", moved stuff around.

pradyunsg on 10 Jun 2016

Hello.

Quick apologies for the lack of activity over the past week... Some other urgent work came up and took some of my time. Anyway, I have started to work on this issue's implementation.

pradyunsg on 18 Jun 2016

@pradyunsg: I don't understand what --no-replace is for. --target is a weirdo option that almost got deprecated a few months ago, and may or may not survive in the long term, so if it's for --target specifically then it's very low priority and I wouldn't worry about it for now.

njsmith on 21 Jun 2016

Currently --target has a _dependency_ on --upgrade. The current (default) behaviour of --target is to not replace files and folders already in the target-dir. Passing --upgrade changes this to replace files and folders already in the target-dir.

Since install now defaults to replacing (read upgrading) packages by default, it seems to make sense to switch the default behaviour of install --target correspondingly. This would --upgrade a useless flag for --target, which is what we want (--upgrade becoming a no-op that would eventually be removed). Then, a new option would have to be introduced the current behaviour of --target. This is the --no-replace.

Then, for consistency, if --no-replace works with --target runs, it should also work with non-target ones. AFAICS, the latter is new behaviour.

I guess even if --target doesn't survive very long, it might make sense to have a --no-replace that works regardless of --target. I don't know if someone would want that functionality without --target though.

PS: Apologies for littering so many inline-monospace blocks.

pradyunsg on 21 Jun 2016

I don't think --target (and specifically its current default behaviour) is important enough to warrant adding a new flag just to retain it. IMO, we just switch --target to replace by default, and lose the ability to only add new files (which seems likely to result in broken setups anyway).

Not upgrading an already installed package _is_ a safe operation, but --target doesn't do that, because it doesn't have access to "what is currently installed" information.

pfmoore on 21 Jun 2016

So, change the behaviour of --target to stop bothering about already-present directories and just go about replacing them, printing a message as it does so? Even no message printing?

pradyunsg on 21 Jun 2016

Hmm, wait. Sorry, your description confused me (and I didn't go back to check the docs). Sorry. My above comment was wrong. What I should have said:

Currently --target doesn't replace stuff. That is necessary, because it cannot safely uninstall/upgrade (there's no installed package database with --target and no guarantee that a new version doesn't have a different set of files than the previous version). The current behaviour of --upgrade --target is (AFAICT) unsafe.

So --target should keep its current behaviour. This does make it inconsistent with the new install, but that's fine, it's for a completely different use case. I don't have a problem with --upgrade being removed, and as a result --target loses that capability - it's an unsafe operation anyway.

Given that I disagree with changing the default behaviour of --target, there's no need for a --no-replace flag.

I'm not sure what you mean by --target having a dependency on --upgrade.

pfmoore on 21 Jun 2016

It might help the discussion to read the current help text of --target.

""" Install packages into

. By default this will not replace existing files/folders in . Use --upgrade to replace existing packages in with new versions. """

I'm not sure what you mean by --target having a dependency on --upgrade.

To enable replacing existing stuff.

Given that I disagree with changing the default behaviour of --target, there's no need for a --no-replace flag.

If the behaviour of --target is not changed, it would mean --upgrade flag would need to stay at least as long as --target is there.

I want to remove the need for referring to --upgrade in --target's help.

pradyunsg on 21 Jun 2016

OK, let me rephrase. The behaviour of --target should (IMO) be changed in one respect only, that --upgrade (and the behaviour it enables) should be removed.

If someone can demonstrate a use case for --upgrade (given that it potentially breaks things) then I'm willing to review that position, but I don't think it's worth keeping "just in case".

pfmoore on 21 Jun 2016

The behaviour of --target should (IMO) be changed in one respect only, that --upgrade (and the behaviour it enables) should be removed.

Okay. That makes it clear.

If someone can demonstrate a use case for --upgrade

Not me.

pradyunsg on 21 Jun 2016

That sounds fine to me too. It strikes me as a nasty wart that --target used --upgrade for this purpose in the first place.

njsmith on 21 Jun 2016

I think we should move the further discussion over to #3806 to avoid having 2 comment threads with simultaneous discussions on the same thing.

pradyunsg on 22 Jun 2016

Wow this thread has gone critical. Let me just add my strong opposition to changing the meaning of -U. There's absolutely no need to break our users muscle memory - we can add a new option if we need non-recursive upgrades. That said, whats the use case for non-recursive upgrades _other than_ 'pip install named-thing' ?

E.g. I think its fine to say that explicitly named distributions upgrade implicitly, and -U if provided causes fully recursive upgrading. in all cases without --ignore-dependencies, pip will recursive check for satisfaction.

rbtcollins on 24 Jun 2016

👎1

@rbtcollins: "breaking our users muscle memory" seems a bit strong -- WRT -U, the changes in the current proposal would be: (a) pip install -U foo is still legal and still upgrades foo, but now non-recursively, (b) it loses the special behavior where combining -U plus --target means "overwrite any existing files". I'm guessing that the latter change is not one that worries you overmuch given that you recently tried to deprecate --target and that most users don't have muscle memory for -U --target (I hope!!). So I guess you're saying specifically that you prefer that pip install foo do a non-recursive upstall of foo, and that pip install -U foo do a recursive upstall of foo?

I could live with this, especially as a transitional state where we deprecate -U at the same time, but it definitely has downsides:

AFAICT recursive upgrade is never actually what anyone wants? It has very serious downsides, and even in the cases where it sorta makes sense, it seems to be because the code's semantics ("package and its dependencies") sort of accidentally match up with user's semantics ("this package + the related packages that in my head I think of shipping together as a unit"). See my post upthread -- _in lots of cases, it doesn't really make sense to assign some dependency to any particular dependant..._. Maybe users sometimes think "I want to upgrade my Pyramid ecosystem", but even when they do, they don't expect that to upgrade setuptools -- yet that's what recursive upgrade does. So in the long run, recursive upgrade seems like just the wrong thing to me -- maybe there's a valid use case here, but if there is we should figure out how to address that use case directly.
Even if a recursive upgrade switch is a thing that is needed, there's no way that the right long-term solution is for that switch to be spelled --upgrade. Do we really want to have to explain to new users that if they want to upgrade a package, the right way to do that is to _leave off_ the --upgrade switch?
It also makes life rather difficult for people writing documentation if the "best available upgrade command" on pip >= 9 is pip install foo, but on pip < 9 it's pip install -U foo. I don't want to guide people to recursive upgrade, but I don't want to confuse them with pip version numbers either, so what do I put in my tutorial? OTOH if pip >= 9 makes pip install and pip install -U equivalent, then the advice to use pip install -U foo remains correct, just a bit redundant.
Plus supporting these two different modes is complicated and annoying -- pip already has way too many modes that are complicated, poorly documented, hard to maintain, and don't quite do what anyone wants. If we can simplify by getting rid of one of them then that's _great_, and so far the arguments for recursive upgrades have been very thin on the ground. AFAICT no other package manager supports this and no-one complaints. Even you seem to be basing your argument (so far) on "this thing exists and we shouldn't change that" rather than "this thing is actually useful and what our users want"...

So I'd much rather we move on and make pip install foo / pip install --upgrade foo do the obvious thing that everyone else does. I think most users' muscle memories will actually be pleasantly surprised to start getting what they were hoping for in the first place :-).

njsmith on 25 Jun 2016

@njsmith's comment provides a nice summary of why we're doing this.

I guess I should link to https://gist.github.com/pradyunsg/4c9db6a212239fee69b429c96cdc3d73 from here. This is the final "proposal" I wrote, that came out of this issue's discussion. It's got a section about "Current State of Affairs" that I think @rbtcollins would like to read.

pradyunsg on 25 Jun 2016

@njs - I don't think its too strong: right now, folk know that to get the latest across the board they run 'pip install -U' X. Thats the only reason to run install -U ever (today), and so breaking it is breaking its primary use case.

The behaviour with --target is indeed not the case I'm worried about.

FWIW I disagree with your analysis about what people do/don't want. Most projects only test a small number of permutations of versions: latest-with-latest + latest-with-stable, when a stable exists. Upgrading everything is actually safer that upgrading only the named component because folks lower version specifiers are usually wrong. See #3188 for an enhancement that would make testing lower version limits much easier. I have lost count of how many times I've 'fixed' folks problem by telling them to 'pip install -U' : they've had a package with an incorrect lower minimum.

The actual underlying thing that drives your 'this is wrong' is #2687 as far as I can tell - thats where pip can do the right thing.

Further, the _very last thing_ we want is for pycrypto and friends to stay un-upgraded for months or years because folk don't know they have to do something special to have up to date secure software.

If folk are running very complex venvs, they are opting into the complexity - the common cases are a) full Python installs and b) dedicated venvs. We should steer everyone to b as much as possible because its inherently more reliable, and that strengthens the argument I'm making that the default should be to be secure, and running as close to what upstream will have tested as possible.

w.r.t. package managers - 'apt install X' will never upstall - it only installs. 'apt upgrade' is global - it upgrades everything'. DNF is similar AIUI. I haven't canvassed suse's tool, but I'd expect similar behaviour because of the flattened there-can-be-only-one idiom distros use.

Perhaps we should make a higher bandwidth discussion for this? It seems to be pointed in a pretty dangerous direction IMO.

@pradyunsg your assertion about pip's current status in https://gist.github.com/pradyunsg/4c9db6a212239fee69b429c96cdc3d73 is factually incorrect: there is already --no-dependencies switch which covers off the recursive/non-recursive case. 'pip install -U foo --no-deps && pip install foo' should be semantically equivalent to the 'upstall named packages by default' - and I'm fine with that.

rbtcollins on 25 Jun 2016

No tl;dr. Read it.

@rbtcollins

your assertion about pip's current status in https://gist.github.com/pradyunsg/4c9db6a212239fee69b429c96cdc3d73 is factually incorrect: there is already --no-dependencies switch which covers off the recursive/non-recursive case

I never asserted that pip does not provide the possibility to do non-eager upgrades or that there is the lack of a --no-deps in the write-up or (in my memory) anywhere else. Which part of my "assertion about pip's current status" do you feel is "factually incorrect"?

Do consider re-reading this section and explicitly pointing out of any "factually incorrect" points in a comment on the Gist (not here, it'll be noise) so that I can correct them.

Thats the only reason to run install -U ever (today), and so breaking it is breaking its primary use case.

No one's going around breaking the world.

pip install -U pkg will still upgrade pkg which is what most people want and care about. It's what happens with the dependencies that has changed.
The (now no-op) -U/--upgrade will be staying until it's felt it's no longer needed because everyone's moved on.

FWIW I disagree with your analysis about what people do/don't want. Most projects only test a small number of permutations of versions: latest-with-latest + latest-with-stable, when a stable exists.

If the package developer provides poor metadata, it is not _wrong_ behaviour on pip's side that it broke the user's environment because of that. It's the responsibility of the package developer to provide proper version constraints. I do agree that #3188 would help the package developer do so.

I don't think it's wrong to expect people to improve the metadata they provide to PyPI (and hence pip).

the very last thing we want is for pycrypto and friends to stay un-upgraded for months or years because folk don't know they have to do something special to have up to date secure software.

Agreed. I do think that if it's secure software, there's should to be extra attention given to the security packages. Moreover, any packages that are skipped from upgrades are explicitly listed as such. So, someone looking at the output would know what's happened and determine if they wish to take action.

If you care about a security package, after this change, you can simply mention it directly on the CLI, which makes your intentions more explicit and clear. I prefer it this way. This change would force you to mention which packages you care about being up to date.

"explicit is better than implicit"

If folk are running very complex venvs, they are opting into the complexity - the common cases are a) full Python installs and b) dedicated venvs. We should steer everyone to b as much as possible because its inherently more reliable, and that strengthens the argument I'm making that the default should be to be secure, and running as close to what upstream will have tested as possible.

I agree that everyone should be using virtual environments more often. I also agree that running as close to upstream as possible is also favourable. I find it ironic that you use the word "secure" to defend a behaviour that silently (and often) breaks the dependency-graph.

'pip install -U foo --no-deps && pip install foo' should be semantically equivalent to the 'upstall named packages by default'

It is. The whole motivation of this PR is to provide pip install -U foo --no-deps && pip install foo as pip install foo because the behaviour that everyone wants most of the time should be directly available. It was discussed and decided that it's better to not provide any way to do eager upgrades.

The actual underlying thing that drives your 'this is wrong' is #2687 as far as I can tell - thats where pip can do the right thing.

It has been concluded in prior discussions (#59, at pypa-dev) that the behaviour in #2687 is not fixable until #988 lands, which ~~may~~ will take a fair bit of time, and this behaviour is seen as the safer-middle-ground in the mean time.

Today, every time someone runs pip install -U pkg, they risk breaking some other package in their environment. While there will still be the same risk even after this PR, the number of times that pip's actions result in the environment breaking are reduced.

I'm fine with that

You're fine with having it as an opt-in behaviour to do non-eager upgrades. I'm not fine with breaking the user's environment silently, by default. That's what eager upgrades do as I see it, with #2687 unresolved.

It would be better to not be breaking the user's environment silently. This change is the best we can do for that given the limited development time that gets directly invested in pip.

Perhaps we should make a higher bandwidth discussion for this?

That's the idea behind the "shout-out" on distutils-sig.

pradyunsg on 25 Jun 2016

(deleted as posted on wrong thread)

pradyunsg on 25 Jun 2016

Your argument is that less folk will be harmed by not upgrading dependencies that don't /have/ to be changed when someone has supplied -U. My argument is that more people will be harmed:

it changes from secure by default to insecure by default. Users cannot be presumed to be qualified or interested in establishing which packages are security sensitive and which are not. A UX model which depends on users doing that extra work is actively harmful IMO
it changes from being ideal in the recommended cases to conservative in the pathological case and harmful in the recommended case

Given these two choices:
A - users are vulnerable to security issues and don't know they are
B - users get occasionally broken environments due to #2687

I don't see how we can possibly choose anything other than B. It would be wildly irresponsible to do otherwise.

rbtcollins on 25 Jun 2016

Sorry to do bitsy replies, but another thing I observe here is that we're accreting complexity - cargo, for instance, doesn't have anywhere near the fine grained control being proposed here. Largely thats because the language has better primitives for isolation - like Javascript and Java rust can cope with multiple versions of a package in the dependency set - making our resolution work nearly totally irrelevant there _except_ when folk want to collapse down to just-one-version, whereas we have no choice. But I think we need to seek really strong reasons for adding complexity - not just in pip's core, but in the user model. The basic expectation of PyPI is that everyone works together all the time; defaulting to not upgrading is pretty much the opposite of that.

So - are our expectations broken, or are we just reacting to bugs in pip where enough information exists to at least do a better job?

rbtcollins on 25 Jun 2016

are we just reacting to bugs in pip where enough information exists to at least do a better job

This. We can definitely do _much_ better with the information we have. But it's sad that we don't. There are reasons for that as well but that's not the main point of this discussion.

Given these two choices:
A - users are vulnerable to security issues and don't know they are
B - users get occasionally broken environments due to #2687

I think they're equally bad. The thing is, the user can see that their security package wasn't upgraded and they can (and should) opt-in to that upgrade. That's slightly better than the status quo IMO.

pradyunsg on 25 Jun 2016

I don't agree that they are equally bad.

Users in automated environments will not see or review output, so they won't know that a package wasn't upgraded.
Users that do see won't know that a given package is security sensitive.

Your proposed change moves pip from a do-the-safe-thing model to a review-every-invocation-carefully-because-it-may-have-silently-done-the-wrong-thing. I would expect that to be incredibly worrying for a system administrator [which I have been, so this isn't idle speculation]. We don't however have richly defined personas to point at to allow this to be easily internalised by new contributors to pip.

However, I'm at risk of burning out in this conversation pattern, so I'm signing off and muting the issue; my offer for higher bandwidth conversation - which the sig list is not - remains open, if thats useful - @njsmith and @dstufft can get hold of me on hangouts or IRC or whatever realtime medium is desired.

rbtcollins on 25 Jun 2016

I'm still waiting on @pfmoore https://github.com/pfmoore or @njsmith
https://github.com/njsmith giving me the go-ahead that this PR is fine
and we can announce the same on distutils-sig for comments from a larger
audience.

Sorry, I hadn't realised you were holding off. Yes, please go ahead and
announce. The PR will obviously need to be updated as those discussions
progress, but it's certainly fine as a starting point for that process.

pfmoore on 25 Jun 2016

@rbtcollins I didn't wish to be overly pushy. Sorry about that.

higher bandwidth conversation

I didn't interpret it this as real-time conversation. :sweat:

do-the-safe-thing model to a review-every-invocation-carefully-because-it-may-have-silently-done-the-wrong-thing

I do understand what you are trying to say about the possible security-related implications of this change. I don't have an especially strong grasp of that topic and the associated nuances for a proper discussion on it.

I'll be happy to defer any further discussion to pip's core developers.

Now, I think that both of the behaviours (pip's current and the one that's been proposed) are not-ideal. I'll change my position to not being in favour of either of those. I'll write that mail to distutils-sig.

pradyunsg on 25 Jun 2016

Further, the _very last thing_ we want is for pycrypto and friends to
stay un-upgraded for months or years because folk don't know they have to
do something special to have up to date secure software.

If you're saying that a user doing pip install --upgrade django
(where django depends on pycrypto, say) currently gets pycrypto upgraded,
and in future won't, then yes, that is a change (and agreed, it could
result in pycrypto being an older version).

There have been rather too many divisive security discussions recently, and
I don't want to be the cause of another one starting. So all I will say
here is that there's a trade-off between keeping crypto software up to date
versus breaking package X by upgrading a crypto (or other!) dependency to a
version that X cannot use as part of an unrelated pip install Y. My
personal preference is for the approach this PR takes (but see below), but
I acknowledge the problem. One question - what do Linux package managers
do? Would apt-get install python-django upgrade an already-installed
and compatible python-pycrypto? Following established Linux package
manager approaches would fit well with the basis of the rest of this PR).

Note that "keep my stuff up to date" (the equivalent of apt-get upgrade, I guess) is what pip upgrade-all was intended for (see

59). That's not part of this proposal. My preferred response to your

concern above would be to say "the correct way to keep security packages up
to date is pip upgrade-all" - security upgrades actually _shouldn't_
wait until some other package gets updated. (It's just as true to say that
they shouldn't wait till the user updates his whole system, so the "real"
answer is pip install [-U] pycrypto but I think we can agree that
user inertia being what it is, that's not always realistic). Maybe the
above is an argument in favour of that subcommand? But I'm not sure if it
can do an acceptable job without a proper dependency resolver.

Note also that in an ideal world, pip would never break anything by doing
an upgrade. Sadly, until we get a full dependency solver, that's not the
case. We've had reported cases of such breakage, I believe. Breakage should
be rare - but let's be fair, so should security exploits. We're discussing
low-probability (but potentially high-impact) scenarios here, and it's
never easy to judge those (if it were, nobody would ever buy lottery
tickets :-))

pfmoore on 25 Jun 2016

👍1

I don't have an especially strong grasp of that topic and the associated nuances for a proper discussion on it.

Me neither. I think @rbtcollins has brought up an important point. Ping @dstufft as the one person I know of with sufficient understanding of both pip and security to make an informed decision here...

pfmoore on 25 Jun 2016

@pfmoore yes, thats what I'm saying, and I'm staring down the barrel of millions of environments no longer receiving upgrades to such libraries with horror.

I agree that having a command to do an environment wide upgrade would help.

w.r.t. resolver aspect - breakage will always happen even with a resolver: the resolver is not the cause of most breakage I see, rather accidental bugs are. Yes we need it, but its not a panacea.

rbtcollins on 26 Jun 2016

@pradyunsg No need to apologise - you've done nothing wrong; I engaged while tired and got stressed at the idea of something I consider a poor choice being pushed into master if I didn't immediately get traction on it.

rbtcollins on 26 Jun 2016

I'm staring down the barrel of millions of environments no longer
receiving upgrades to such libraries with horror.

This would be something separate from the current change, and I haven't
thought it through at all, but I wonder if there's a need here for packages
(or maybe individual releases) to be marked as "critical", implying that
pip should always try to update those packages when updating anything that
depends on them - essentially a finer-grained, opt-in version of eager
updates. Projects like pycrypto could then mark themselves as critical.

Such a mechanism may be open to abuse, but would this be of any help?

pfmoore on 26 Jun 2016

Such a mechanism may be open to abuse, but would this be of any help?

Please no. It's a bad idea to have something like this in a largely un-moderated index like PyPI. Other than the obvious possibility of abuse, it's an extra behaviour that the user will potentially be surprised by. I think there are just better ways to handle such a problem; it's better to delegate this to the end-users to decide what they feel is critical.

@rbtcollins I felt I was the reason you were feeling like you'll be burning out.

I think @dstufft already has enough things on his plate. So, FWIW, I'll forth what I think about the security front of non-eager upgrade as the default. If nothing else, I'll learn something new.

I don't think anyone runs pip install --upgrade in a production environment without pinned requirements/constraints. I could be wrong about this, but really, they shouldn't. If they do that, they're opening themselves to breakage of their production environment already, as on today. Post this change, they still have that risk (reduced but present) and additionally have the risk of not having the latest security upgrades. Really, by not pinning their dependencies, I'd say they opted into a security vulnerability. Would you agree with this?

To me, the only people who run pip install --upgrade are those doing so on their local-machines, either during development of an application or as the end users of a library. How much this affects them, I won't know. I'm by no means an informed person on this topic anyway.

Honestly, the more I think about this, the more I feel like sitting down and writing a SAT solver in pure Python for pip.

pradyunsg on 26 Jun 2016

Ok whew, I ignore this discussion for a few days it appears to have blown up on both the sig and here :)

I've tried to read over what's gone on in this thread, but well it's long and information dense so I might miss something, however you're about to get a wall of words.

I don't believe it's completely fair to say that the current behavior of pip install -U <foo> makes people more secure across the board. Yes, I can easily point out some projects like PyCrypto or cryptography where regressions are rare (particularly security regressions) and new releases generally include improvements to security. I think focusing only on those misses other cases though, such as the case where a new version of something on PyPI has caused a regression in security. There's I think, two other cases though, both which can boil down to "upgrading or not doesn't affect security at all" but differ in whether an upgrade is an OK thing to do for them or not.

Overall, I don't think that recursive upgrades is a good security mechanism and if one of our goals is to prevent people from running old, insecure versions of software (and I think it should be a goal) than I think the way to achieve that is not to try and hang ourself off of the behavior of upgrades (and just pray and hope that they've happened to run an update in some amount of time) but to instead devote time to a dedicate solution to the problem. This may be something like pip list -o but which checks explicitly for security problems, it may be checking the entire installed set of packages against PyPI to see if there are any known security issues with any of them, or it may take on some completely other format. However, I think it's important that this isn't tied to some semi related functionality and that it actually covers the entire environment and not just whatever the user happens to be using. If someone does pip install requests[security] once, and then from there on out does pip install -U requests-- we're going to completely miss updates to pyopenssl and cryptogaphy and such if we only rely on recursive upgrades.

So pushing aside the security concerns for a moment, I think we need to take a look at what behaviors are most likely to give people what they want. Unfortunately with an ecosystem as large and with as varying use cases as Python I suspect there is no singular answer to "what people want". There are a few interrelated behaviors being discussed here, so let's tackle them one at a time.

For pip install --upgrade <foo>, we have evidence to that fact that our current behavior is actively harmful, so much to the point that projects are going out of their way lie to pip to prevent triggering that behavior. I think that we can all agree that something where people feel the need to actively subvert (not only in their own projects, but also advocate to other projects) likely need some refinements to how it actually works. In this case the only real solution is to attempt to avoid upgrading (or downgrading!) where possible, and to prefer the already installed version, _unless_ the user has explicitly asked for that to be changed _or_ we can't satisfy the version constraints otherwise. I can't see any other reasonable way to implement this that isn't going to accidentally trigger 30+ minute builds which possibly result in a version that is less suitable for the task at hand (not using an optimized BLAS or something).

Advocating for leaving the current behavior of pip install --upgrade as it is, is essentially advocating against projects that depend on numpy from being honest about whether or not they depend on numpy. If someone has another suggestion for how we might solve the numpy problem [1] then I think they should bring it up.

I know that one suggestion has been adding a --non-recursive-upgrade flag or a sort of --upgrade-strategy flag, but I think that these ideas largely serve only to complicate the mental model people have of pip. For the vast majority of packages (particularly pure python ones that don't have a security sensitive role) it's not going to matter a whole lot whether we upgrade them or not, upgrading is low cost but there's little downside to keeping them pinned to the installed version (unless the person finds a reason, a feature or a bug to explicitly upgrade _that_ package). However we're living in the edge cases here and I only really see the two that matters, hard-to-upgrade packages and security sensitive packages, and like I mentioned above I think that if we're going to be worrying about ensuring folks get security upgrades we need a real mechanism for that, not a half hearted hope that an important upgrade got caught in a recursive upgrade at some point. Given that I think we need dedicated support for security sensitive, for most packages this won't matter, and for the hard-to-upgrade case there's only really one answer, I think that shunting this behind it's own flag is a bad idea without justification for the ongoing cost of maintaining a whole option for this [2]. I also think that the more conservative approach has to be the most obvious approach or we don't really solve things for the hard-to-upgrade crowd, so if we did add a new option, we'd still want to change the behavior of --upgrade by default and add in some explicit option to get the less conversative/safe approach to upgrades.

I don't believe we should allow a package to be able to mark itself for eager (or non eager) upgrades. I think this is something that needs to be consistent amongst all packages for end users to be able to have any hope of having a reasonable mental model of what pip is going to do to their system when they execute some command. That being said, I could see us adding the ability to have people mark versions insecure on PyPI and warn people if they have an insecure version installed on their system (or at the very least, if they're about to install one).

So, now that we've covered --upgrade, the other inter-related issue here is what should we do with pip install versus pip install --upgrade. Personally I think that we should make the two mean the same thing, _HOWEVER_ it might be more reasonable to focus the discussion first on just the behavior of --upgrade and leave pip install alone for now. Once we get the solution to upgrading sorted out we can tackle what we'll do about pip install itself.

[1] Although to be completely honest it's not strictly related to Numpy. I've seen people break their systems or their installs time and time again because of an inadvertent ugprade. It's true that a lot of project's don't have correct lower bounds, but I believe it's equally true (or more so) that they don't have correct upper bounds either. One particularly important thing is that it's possible to determine what the correct lower bounds are at time of packaging, but it's impossible to determine what the correct upper bounds are.

[2] Options are not free, they incur a cost and it's important to attempt to reduce the number of them you have as much as you can. While you typically can't reduce them to zero, a pattern I see far too much in OSS software is the desire to please everyone by adding more and more options, when really it's just a mechanism for avoiding making a decision that may be unpopular with some group of people.

dstufft on 26 Jun 2016

👍4

Thanks for the wall of words :). A few responses and some thoughts.

tl;dr: I agree with you about the costs of options and mental models around pip and so forth. I'm still very scared of the implications of what you're proposing.

w.r.t. updating being more secure: we can be pretty confident that if one never upgrades, existing security bugs will eventually be attackable. OTOH if one always upgrades, while there may be occasional security-bugs introduced, they will be removed again in later upgrades.

The key thing there in _either_ case is having a systemic, automated upgrade process taking place, which we don't have today. In the absence of it, I do believe that upgrading-by-default is significantly better.

w.r.t. upper bounds and lower bounds: theres absolutely no facility to get lower bounds right today. You're correct that in a logical sense, one can only state lower bounds accurately, but the reality is that noone states them accurately today :- its entirely responsive to bugs from people where they find out that the lower bound is wrong after debugging it. And there isn't even consensus amongst the folk I've spoken to about what _should_ be done when lower bounds interact with optional things - should folk detect features, or raise the lower bound, or just crash if some incompatible path is taken? If pip had the select-oldest-version thing I proposed, then this wouldn't be the case, and it would be _much_ more reasonable to assume lower bounds would be sane.

But I've _literally lost count_ of the number of broken environments I've fixed for people by telling them 'pip install -U package'.

I don't really understand the 'numpy problem'? Is that the collection of projects lying to pip about dependencies to avoid upgrades?

pip gets used in 3 different contexts IME:

maintaining production environments
building fresh environments in a reproducible way - e.g. container builds
testing

For the latter two, I don't care about the upgrade algorithm, as long as its deterministic.
For the first one, I care a lot, and it sounds like what we've got here is one project causing the majority of the pain, due to a combination of API and ABI breaks - because numpy actually has one of those?

IF a production-maintenance-thing existed, to do upgrades of everything, then I wouldn't push back on the proposed change to install at all. But it doesn't, and we have no idea how long until one will exist - AIUI the command that might have done it was pushed back on in fact, so we should expect to _not_ have one?

rbtcollins on 27 Jun 2016

I don't really understand the 'numpy problem'? Is that the collection of projects lying to pip about dependencies to avoid upgrades?

Yes. Libraries like scipy and scikit-learn lie to pip that they don't depend on numpy so that pip doesn't start a half-hour long reinstall of newer numpy over a possibly optimized or even self-compiled numpy.

pradyunsg on 27 Jun 2016

IF a production-maintenance-thing existed, to do upgrades of everything, then I wouldn't push back on the proposed change to install at all. But it doesn't, and we have no idea how long until one will exist - AIUI the command that might have done it was pushed back on in fact, so we should expect to not have one?

I don't think anyone has pushed back on this, except that people are waiting for the resolver to get finished first because there's a perception that without a resolver, upgrade-all will have an increased tendency to result in inconsistent environments. I dunno, maybe we should just go ahead and implement an upgrade-all command even knowing it will be imperfect to start with... the perfect is the enemy of the good and all that. Maybe we could have it try to upgrade everything, and then automatically run the new pip check code to warn people if stuff broke that they might need to fix.

the 'numpy problem'

Yeah, this is partly that packages like numpy are expensive to install or tend to just error out (think: Windows users without a compiler). For numpy itself this problem is greatly reduced now that we have better wheel support, but there are lots of packages besides numpy that need a compiler. It's extraordinarily frustrating when you're just trying to upgrade some trivial pure-python package and then suddenly Unable to find vcvarsall.bat.

And it's partly that people have finicky preferences about things like numpy. For example, it's common to mix numpy-with-proprietary-MKL-patches installed from Anaconda, while using pip for other packages that Anaconda doesn't ship. And then pip install -U other-package might throw away the Anaconda numpy and replace it with a PyPI numpy, which both gives you a numpy build you don't want + totally breaks your conda environment going forward b/c this core package just got deleted out from under conda.

And it's partly just that numpy is a canonical example of a package that a _lot_ of other packages depend on in complicated ways, so there's a great deal of risk that trying to upgrade package A -> triggers upgrade of numpy -> breaks package B or C or D or ... Recursive upgrades create surprising couplings between different packages.

njsmith on 27 Jun 2016

👍1

It's extraordinarily frustrating when you're just trying to upgrade some
trivial pure-python package and then suddenly Unable to find vcvarsall.bat
.

Precisely this. You can work around this with --no-deps, or
--only-binary, but it makes what should be a really simple activity
into something really annoying. In my experience, the only way to maintain
an environment is to carefully set up the hard-to-install packages
manually, maintain these by hand, and then don't let pip upgrade them
automatically. That last part isn't easily possible with the current
behaviour of --upgrade.

For a non-numpy example, lxml doesn't build easily on Windows,
doesn't supply wheels, and is a dependency of a lot of things. A new
release of lxml (which I probably don't want to bother upgrading to,
as it needs me to manually download Christoph Gohlke's wheel when he builds
it, and install that by hand) can cause upgrades of all sorts of stuff to
fail. Non-eager upgrades would fix this issue for me.

pfmoore on 27 Jun 2016

and then don't let pip upgrade them automatically.

I suppose the proper solution for that would be pinning versions, which https://github.com/pypa/pip/issues/654 tracks, and then emitting a warning if a pinned version causes a dep requirement not to be fulfilled. That will allow users to _really_ manage a package manually.

FichteFoll on 27 Jun 2016

For a production environment probably. But for an environment like my laptop, or a development virtualenv I'm using for testing, pinning versions overspecifies the problem (and requires me to manage at the version level). What I actually want is exactly what I stated - "don't upgrade unless I ask you to".

pfmoore on 27 Jun 2016

👍2

pip gets used in 3 different contexts IME:

maintaining production environments

building fresh environments in a reproducible way - e.g. container builds

testing

@rbtcollins I'm not clear on whether your definition of "production environment" includes everything from "new user who installed Python + some packages" to "advanced users with multiple venvs" to "production deployments runnings apps/websites"? Your comments about security seem like you're mostly worried about the last category, but imho that is the least interesting one because it caters to the most knowledgeable users. Defaults should be chosen for non-expert users, the ones who installed Python on their laptop/desktop and want to get their analysis done or website to work.

rgommers on 27 Jun 2016

👍2

I would like to revive this issue and the related discussion(s?) again.

@rbtcollins Have your concerns been addressed? If not, please point out any outstanding concerns you have.

pradyunsg on 21 Jul 2016

There were at least 2 people on the distutils-sig discussion who were against the whole idea of pip install doing an upgrade. I remain willing to accept the community consensus but uncomfortable with the idea on a purely personal level. It actually doesn't feel to me as if we're close to a consensus that a bare pip install foo should upgrade an already-installed foo.

I'm not sure there _is_ a consensus to be had here. More like two use cases that need different behaviours. Or maybe three:

Automated script (or user who doesn't want to manually check) that wants to "ensure that foo is available" but doesn't want to make unnecessary changes to the system. Idempotent install.
User who wants the latest version of foo. Install or upgrade.
Automated script that wants to ensure foo (if present) is up to date. Upgrade without install.

All of these seem to be valid use cases. All can be implemented in terms of the others with sufficient manual checks or additional scripting. We currently have (1) and (2) via install and install -U (at least in the simple case - I'm deliberately ignoring recursive upgrading for now).

There are some people arguing that the default behaviour of install causes users to make mistakes because they expect or need behaviour (2) and don't understand that they are getting (1). Maybe that's so - my experience is too limited to say they are wrong. Our defaults may be wrong.

Looking at the practicalities (of the fundamental question "should install upgrade by default?"):

There's no consensus over how a bare install should treat an already present requirement. But always requiring an option is a bad UI, so we need to decide.
Doing nothing if the requirement is present minimises the chance of an unexpected change being made to the user's system. And note that rolling back such an unexpected change is non-trivial.
Doing nothing if the requirement is present is the current behaviour, so it has the benefit of not breaking users' scripts or existing documentation.
The typical English use of "install" doesn't include the sense of "upgrade". If you replace something with a later version, you'd typically say that you "upgraded" it, not that you "installed" it.

The arguments in favour of changing seem to be:

It encourages people to think they are up to date while running out of date software (bad because of bugs, and possibly security holes). Surely that's just a case of people not understanding what the command they run is doing?
It misses an opportunity to keep people up to date. But is it the install command's role to do that? Wouldn't an "upgrade all my stuff" command be better for this? The argument here seems to be that we should protect people from themselves, and do upgrades they didn't explicitly request.
The current behaviour breaks systems. It's hard to respond to this without specifics. If pip install foo as the way to get foo breaks a user's system, that's a bug - and a bad one. But it seems more likely to be a bug in the dependency management code, than because pip install foo didn't upgrade an already installed foo. Or a documentation bug (confusing "how to install for the first time" with "how to upgrade from a previous version"), but that's not "broken", just confusing.
It matches the behaviour of other systems. The debate on distutils-sig seems to indicate that this is far from being as clear-cut as it sounded when first stated.

IMO, we _must_ reach a conclusion over how a simple install of a package with no dependencies works before we start debating the more complex cases. That's the majority use case. Packages with dependencies, conflict resolution, recursive upgrades, should all be considered only once we have a solid and agreed foundation of how a simple package install works.

Personal view - there is nothing wrong with the traditional install and install --upgrade options. They seem clear and natural to me (reiterating: in the simple cases). There's no "upgrade but only if already present" option, but the natural place for that would be a new upgrade command, and I don't think the need is high enough to warrant a new subcommand, so I'm OK with having to manually handle that case.

pfmoore on 21 Jul 2016

It is wrong to change the behavior of 'pip install'. There is nothing wrong with install meaning "ensure it is installed", and there is nothing wrong with install meaning "replace the named package with the latest version", and the developers are smart enough to convince themselves that either behavior is more intuitive. But it is wrong to steal the afternoons of thousands of developers who rely on the current behavior, and who will have suddenly broken environments the next time they are foolish enough to upgrade pip.

I didn't like the proposal about changing how the directly-named dependency was installed, but I did like the rest of the proposed changes regarding recursive-ness of package upgrades that do happen and so on.

dholth on 21 Jul 2016

Just today I had someone ask me for help because they were confused why pip install --pre docker-py did not install 1.9.0rc2, but pip install docker-py==1.9.0rc2 did. They believed it to be a bug in pip, until it was figured out that the reasoning for that is they already had a previous version of docker-py installed and that was being used.

This matches my own use, I never invoke pip without a -U except out of laziness to type that extra couple of characters, and when I do omit it, half the time I get annoyed and end up needing to re-run the command with a -U on there.

@dholth says it's wrong to steal the afternoons of thousands of developers relying on the current behavior, but what of the afternoons of thousands of developers being bitten by the current behavior? As always, any breakage is always a weighing of the cost of breakage against the benefits. It makes the behavior of pip less surprising by default, you don't have to inspect the environment to figure out what the outcome of some command is going to be, you only need to know the command.

dstufft on 21 Jul 2016

I think whatever the chosen solution is, we'll have to provide an option to enable the old behavior to smooth the transition.
If we go with an upgrading pip install foo we'll need a --no-upgrade.
If we remove the recursive behavior of upgrade we'll need a --recursive.

xavfernandez on 21 Jul 2016

I agree with a --no-upgrade flag for sure (and keep the --upgrade flag to enable it to be turned back to upgrade if someone has disabled it). I'm not sure about the --recursive flag long term, but I'm not dead set against it.

dstufft on 21 Jul 2016

I think I would experience major breakage by this change, but principally in non-interactive pip invocations, while 'pip install -U' is something that would typically happen interactively, and crucially when someone is doing development work and is available to deal with the consequences. That's why I jokingly suggested we could check isatty() to choose between one behavior or the other. But is there a way to measure either amount of time or is it just a circular volley of opinions? My opinion is that as an experienced person I do have to re-type install with -U, but it is quick, while fixing a virtualenv when I was least expecting it is hundreds of times slower.

Another solution that has already been discussed is to give the new behavior a new name (an easier name to remember than pip install -U?), and educate people on the new best-er practice; if the n00bs who in theory have the most trouble are reading the new documentation and using the new name, problem solved.

While we're on the subject, where is the 'pip rollback' command? Before and after each invocation pip should store the versions and perhaps the wheels of every installed package in a log along with a timestamp. Then if there is a problem you can just go backwards, no fuss.

Yes, I'm also aware that some set of current best practices, which are more work, could also solve some of these same problems, but one person's best practices are just another person's unnecessary extra work.

dholth on 21 Jul 2016

The fact that the breakage will be principally in non-interactive pip invocations is actually a good point, but I think more to the fact we should do it. The primary place where the current behavior makes sense is in when scripting using pip, and when you're scripting adding an extra flag to the command is no great burden, however when you're running pip interactively the default option should be the option that you're most likely going to want.

If you want a rollback command I suggest another issue for it.

dstufft on 21 Jul 2016

Just in terms of UI bikeshedding, I still like the idea of the idempotent/scripting-oriented behavior getting a new verb, like pip require numpy -- to me that does a good job of capturing the conceptual difference (while pip's thicket of flags is super-confusing and their interaction hard to predict), and when scripting IME it's easier to remember to use the verb that means what you want than it is to remember to consistently pass some extra flag every time.

But I think the verb that we teach users first (which is install) should be the verb whose defaults are oriented towards new user needs, meaning interactive use, interpreting pip install django as meaning pip install django==$LATEST, etc.

njsmith on 21 Jul 2016

But is there a way to measure either amount of time or is it just a circular volley of opinions?

This is precisely my point. I don't think there's any compelling (as in, likely to convince the other camp) arguments for either side. And in that case, the status quo wins. My biggest concern here is that we don't (as a project) have a good means of arbitrating this type of situation, and we end up with this hovering over us forever, because there's always the possibility that someone could commit a PR, simply because those who objected the previous time didn't notice a discussion being reopened. What we need is some sort of equivalent of Python's rejected PEPs, which would allow us to say "we've decided (for the following reasons) to do nothing" and then be able to shortcut he process of someone asking to revisit the decision and having to go through all the old fruitless arguments again.

I'd rather find a way to make the current non-default behaviour more easily accessible for people that need it, than waste time in arguments that will simply result in both sides becoming more and more entrenched in their positions. Although I don't really know how to do that - I really don't understand what's so confusing about "install installs, install --upgrade installs but also upgrades if needed".

But I think the verb that we teach users first (which is install) should be the verb whose defaults are oriented towards new user needs, meaning interactive use, interpreting pip install django as meaning pip install django==$LATEST, etc.

Well, while I see your point that we should view interactive use (by new users) as the prime use case, I'm not convinced that implies "install or upgrade". I'd argue that the failure mode of an implied upgrade (you upgrade an existing install without meaning to, and break another part of your system by doing so) is sufficiently bad (even for an experienced user) that it warrants a flag to say "I understand the consequences".

Project instructions saying "use pip install FOO and you're good to go" can (and should) be changed. We shouldn't be driving a decision like this based on other people's erroneous documentation, no matter how much of it there is. The wording should just be "If you don't already have FOO, use pip install FOO and you're good to go. If you have FOO already but want the new version, use pip install --upgrade FOO".

I know there's anecdotal evidence of people spending lots of time trying to work out what went wrong because they didn't include --upgrade. But how are we supposed to evaluate that, given that it's (by definition) impossible to get evidence of how many people have _not_ had any issue with the current behaviour? Make a change and wait for bug reports from people saying "I did pip install foo and it upgraded my existing foo, which broke bar - how do I unpick this mess?" Personally, I don't want to have to support people in that situation...

pfmoore on 22 Jul 2016

Mercurial measures by getting usage stats from Facebook, they have a special corporate plugin to record them.

dholth on 22 Jul 2016

I really don't understand what's so confusing about "install installs, install --upgrade installs but also upgrades if needed".

It's not really that it's confusing at a high level, but that the default behavior requires you to know what's already installed on your system in order to figure out what the outcome of the command is going to be. It's easy to not realize, particularly with virtual environments, what exactly you have installed and assume that you don't have something installed (and then get confused when you're not getting the version you expect).

I have, at any one time, something like 50-100 different virtual environments on my personal computer, one for each project I work on. It's basically impossible for me to know what's installed into a particular environment without sitting there and hitting pip list and then going through the entire list which takes way more time than I'm ever going to do.

We shouldn't be driving a decision like this based on other people's erroneous documentation, no matter how much of it there is.

I don't think this is entirely true. Neither option is objectively correct so we're lefting to trying to figure out a subjective answer of what is better, and looking at what mistakes other people made in their documentation is not a bad source of information. To use an extreme example, if literally everyone was doing it the wrong way, than that would suggest that the wrong way is too obvious and the right way isn't obvious enough.

But how are we supposed to evaluate that, given that it's (by definition) impossible to get evidence of how many people have not had any issue with the current behaviour?

Metrics in OSS is a problem :( At some point it'd be great if we can get some so we can see things like "this person ran install and then nothing else" compared to "this person ran install, then almost immediately re-ran it with --upgrade". Unfortunately that's still in the "gee it'd be nice" phase and not anywhere near being done so we're left with throwing chicken bones and trying to divine reality from imagination.

dstufft on 22 Jul 2016

the default behavior requires you to know what's already installed on your system

While I appreciate that this might be an issue, I'm not really convinced it's that major of a problem. After all, if you do pip install and the package is already present, you get immediate feedback:

(x) C:\Work\Scratch>pip install wheel
Requirement already satisfied (use --upgrade to upgrade): wheel in c:\work\scratch\x\lib\site-packages

So it's not like it's going to take you forever to find out that you need to upgrade, or how to do so.

I'd rather have a safe default with the system able to detect that you may have meant the alternative (plus a clear message telling you what to do) over a default with the potential to break unrelated stuff, and no recovery mechanism.

The more we have this discussion the less I understand the advantage of upgrade as default.

pfmoore on 22 Jul 2016

After all, if you do pip install and the package is already present, you get immediate feedback.

Sort of, though it's easy for that to get drowned out in all of the other output with even a moderate amount of packages being installed:

$ pip install Pyramid
Collecting Pyramid
  Using cached pyramid-1.7-py2.py3-none-any.whl
Collecting WebOb>=1.3.1 (from Pyramid)
  Using cached WebOb-1.6.1-py2.py3-none-any.whl
Collecting translationstring>=0.4 (from Pyramid)
  Using cached translationstring-1.3-py2.py3-none-any.whl
Collecting zope.deprecation>=3.5.0 (from Pyramid)
Collecting venusian>=1.0a3 (from Pyramid)
Requirement already satisfied (use --upgrade to upgrade): setuptools in ./lib/python3.5/site-packages (from Pyramid)
Collecting PasteDeploy>=1.5.0 (from Pyramid)
  Using cached PasteDeploy-1.5.2-py2.py3-none-any.whl
Collecting repoze.lru>=0.4 (from Pyramid)
Requirement already satisfied (use --upgrade to upgrade): zope.interface>=3.8.0 in ./lib/python3.5/site-packages (from Pyramid)
Installing collected packages: WebOb, translationstring, zope.deprecation, venusian, PasteDeploy, repoze.lru, Pyramid
Successfully installed PasteDeploy-1.5.2 Pyramid-1.7 WebOb-1.6.1 repoze.lru-0.6 translationstring-1.3 venusian-1.0 zope.deprecation-4.1.2

I'd rather have a safe default with the system able to detect that you may have meant the alternative (plus a clear message telling you what to do) over a default with the potential to break unrelated stuff, and no recovery mechanism.

See, I don't think upgrade-by-default is unsafe at all (when you take into account the other change to upgrade). I find more software that doesn't work with whatever older version of something I had installed with than I do software that doesn't work with a newer version. I already know what versions might be getting installed, because I named them explicitly on the command line, so we're not upgrading things that I didn't explicitly call out.

dstufft on 22 Jul 2016

Sort of, though it's easy for that to get drowned out in all of the other output with even a moderate amount of packages being installed:

Good point, maybe it should be highlighted (we use colours for things like warnings, this seems like a good candidate).

See, I don't think upgrade-by-default is unsafe at all

Well, suppose you have foo 1.0 installed, and bar 1.0 that depends on foo. Suppose bar works with foo 1.0 but not foo 2.0 (but the dependency is just on "foo", not "foo < 2.0" because foo 2.0 wasn't out when bar 1.0 was released, and how was the author to know?) Now if I do pip install --upgrade foo, bar breaks. And I may not even find out that bar is broken for a long time, if it's not something I use a lot. That's not a failure mode I want to have to deal with as the default behaviour - even if it's rare, and even if it's arguably bar's fault for not being more strict in its dependencies.

I don't want to turn this into an exercise in "my failure scenario is worse than yours", as that tends to make a debate way too heated (see a typical security discussion) but I do think that "not unsafe at all" is wrong - at best it's "unlikely to cause an issue".

Of course, you mention "the other change to upgrade" here. There's way too many combinations of things being proposed and becoming dependencies of one another (upgrade, upgrade all, rollback, recursive upgrade, non-eager upgrading, ...). Maybe we should take things one step at a time - why not leave this discussion for now, and focus on getting "safe upgrade" in place. Once we have pip install --upgrade in a place where we can guarantee it won't ever break someone's system, maybe we can reopen the debate on the default behaviour then?

pfmoore on 22 Jul 2016

I don't want to turn this into an exercise in "my failure scenario is worse than yours"

Exactly! Let's not.

There's way too many combinations of things being proposed and becoming dependencies of one another (upgrade, upgrade all, rollback, recursive upgrade, non-eager upgrading, ...). Maybe we should take things one step at a time - why not leave this discussion for now, and focus on getting "safe upgrade" in place.

The "other change" is the switch to non-eager upgrades. I feel, this issue deals only with change in behaviour of install and install --upgrade and thus, it should involve discussion on upgrade strategies. Everything else (upgrade-all, rollback), we explicitly decoupled when we opened this issue.

Once we have pip install --upgrade in a place where we can guarantee it won't ever break someone's system, maybe we can reopen the debate on the default behaviour then?

This would mean resolving #988 first which has been stuck for a fairly long amount of time.

I feel we've been bikeshedding and speculating what the user would do for too long. I feel it's no longer reasonable to do that without some metrics which are hard to get reliably. I saw this change as a quick-fix that provided a good middle ground until #988 landed. It's definitely not been quick and it's been debatable if it's a good middle ground. I think it might be worth it to take a step back.

It's already possible to do non-eager upgrades if you want but to figure that out it takes a google search, which is more difficult than it should be. Even if pip provides an option on install to do non-eager upgrades, it'll be better that status-quo.

Also, I don't think anyone wants the current "eager upgrade" default to be the default. If that's not the case, I must have missed it. So, why not switch to non-eager upgrades by default? As long as we switch the default upgrade strategy to be non-eager and _maybe_ provide a way to do eager upgrades, we'll be better off than status-quo.

So, assuming that no one is opposed to these two points, a minimum disruption change would be:

pip install --upgrade provides non-eager upgrades by default.
Add a --upgrade-strategy=[eager/non-eager] (with any spelling) to choose your upgrade strategy, iff you really want to provide eager upgrades.

How does this sound?

pradyunsg on 23 Jul 2016

Adding to what I just commented, this is what I feel is the path of least resistance, to get the non-recursive default behaviour through which I would like to see get through.

I feel, making install upgrade by default is essentially a separate discussion. It is something worth discussing but I feel that it shouldn't hold up the change in upgrade-strategy.

(I feel like this would mean a new issue for discussing what I just proposed but I'll take the first-opinions here before doing that)

pradyunsg on 23 Jul 2016

Just to clarify - I don't believe that non-eager upgrades fix the issue that "pip install --upgrade foo" could upgrade foo from 1.0 to 2.0, but an already-installed bar might declare a dependency on foo (with no version) but not work with 2.0? I can't see that it could (or indeed that it should) and yet that's the scenario that bothers me about making upgrade the default.

Which isn't to imply that I have a problem with your proposal to get non-eager upgrades in place as the first step (I'm +1 on that regardless).

pfmoore on 23 Jul 2016

I'd argue that the failure mode of an implied upgrade (you upgrade an existing install without meaning to, and break another part of your system by doing so) is sufficiently bad (even for an experienced user) that it warrants a flag to say "I understand the consequences".

This is literally the failure mode of every single thing that pip does. It's also the failure mode of not running pip (e.g. the publication of a security exploit that targets your current stack will cause it to go from working -> broken without you changing your environment at all). People who run pip are explicitly requesting that whatever change they have specified be made to their environment, with all the risks and benefits that entails.

No-one runs pip install foo when they already _know_ that foo is installed, because that would be silly. So users already have to be prepared for this to break their environment, because installing new packages (and pulling in their arbitrary transitive dependencies) is just as dangerous as upgrading existing packages. In fact, upgrading foo and installing foo are _exactly_ as dangerous, because they do _exactly the same thing_ -- they pull in the exact same versions of the exact same packages.

The argument that a plain pip install numpy should be interpreted as pip install numpy==$LATEST is that this is much simpler and predictable than the current thing (where pip install numpy is interpreted as pip install numpy==$CURRENTLY_INSTALLED_VERSION unless there is no currently installed version, in which case it's interpreted as pip install numpy==$LATEST -- just look how much longer that took to write). It reduces the state space that the user has to keep track of -- I can't see how reducing the possible outcomes of a command to a strict subset of what they used to be makes it _more_ dangerous :-).

It's also has the important benefit that actually _reduces_ the proliferation of paths through the pip internals -- having separate options for every little thing, no matter how use(ful/less), has a very substantial cost for maintainers, and is how pip became the "Rube Goldberg machine of sadness" described in the #pypa-dev topic.

Project instructions saying "use pip install FOO and you're good to go" can (and should) be changed.

I find it difficult to believe that you would put up with this argument if we were talking about a library API :-(. "Yes, many users of this API function call it with the default values, and yes, those work 95% of the time so that most users don't realize that their code is broken in the other 5% of cases. The solution is to keep that API the way it is and file bug reports forever telling everyone to add the unbreakme=True kwarg to every call. Because this is totally the user's fault."

we end up with this hovering over us forever, because there's always the possibility that someone could commit a PR, simply because those who objected the previous time didn't notice a discussion being reopened.

That's not how it does work, though -- notice that this change had extensive discussion on github and then there was a mailing list heads-up to make sure that no-one was surprised.

I actually kind of wish this is how it worked, because this change would be _fait accompli_ and we could all move on and stop wasting time on this ;-). And jokes aside, it might actually be healthier for the project if someone like dstufft decided to play BDFL in situations like right. Right now the de facto outcome is that changes are just impossible, and I'm starting to feel like it would be more productive to give up on trying to improve pip, and instead put my energy/recommend others put their energy into figuring out to make a viable pip fork :-(

njsmith on 24 Jul 2016

Just to clarify - I don't believe that non-eager upgrades fix the issue that "pip install --upgrade foo" could upgrade foo from 1.0 to 2.0, but an already-installed bar might declare a dependency on foo (with no version) but not work with 2.0?

FWIW, I don't think it works even if bar explicitly depends on foo==1.0.

/tmp/pip-testing
$ ls ./repo
bar-1.0.tar.gz  foo-1.0.tar.gz  foo-2.0.tar.gz

/tmp/pip-testing
$ pip install --find-links ./

/tmp/pip-testing
$ pip install --find-links ./repo bar
Collecting bar
Collecting foo==1.0 (from bar)
Building wheels for collected packages: bar, foo
  Running setup.py bdist_wheel for bar ... done
  Stored in directory: /home/pradyunsg/.cache/pip/wheels/20/cd/44/f59790040978a7eb9989ce680e85681c252516bd7fc9baf059
  Running setup.py bdist_wheel for foo ... done
  Stored in directory: /home/pradyunsg/.cache/pip/wheels/27/9a/5f/3e8efff98718d38adb7cf6b20e4435694e8c465085792441be
Successfully built bar foo
Installing collected packages: foo, bar
Successfully installed bar-1.0 foo-1.0

/tmp/pip-testing
$ pip install --upgrade foo
Requirement already up-to-date: foo in /home/pradyunsg/.venvwrap/venvs/tmp-734de48113851ca/lib/python3.5/site-packages

/tmp/pip-testing
$ pip install --find-links ./repo --upgrade foo
Collecting foo
Building wheels for collected packages: foo
  Running setup.py bdist_wheel for foo ... done
  Stored in directory: /home/pradyunsg/.cache/pip/wheels/9d/3e/ce/b183a52b3e6844394d6cbf5606acadf8c340d48ccfcf02cc1c
Successfully built foo
Installing collected packages: foo
  Found existing installation: foo 1.0
    Uninstalling foo-1.0:
      Successfully uninstalled foo-1.0
Successfully installed foo-2.0

/tmp/pip-testing
$ pip list
bar (1.0)
foo (2.0)
pip (8.1.2)
setuptools (25.0.0)
wheel (0.29.0)

/tmp/pip-testing
$ pip --version
pip 8.1.2 from /home/pradyunsg/.venvwrap/venvs/tmp-734de48113851ca/lib/python3.5/site-packages (python 3.5)

pradyunsg on 24 Jul 2016

I don't think it works even if bar explicitly depends on foo==1.0.

Right, this is the "pip needs a real resolver" bug, which will get fixed eventually but is a big task so we don't want it to block other things if at all avoidable.

OTOH the case where bar uses an unversioned dependency on foo is basically impossible to get right AFAICT, so I'm not sure what it has to do with anything. The only solution for that is "never touch your venv ever again", and even that isn't guaranteed (because of things like new security holes or changes in external APIs that you need to talk to).

njsmith on 24 Jul 2016

In fact, upgrading foo and installing foo are exactly as dangerous, because they do exactly the same thing -- they pull in the exact same versions of the exact same packages.

OK. We really are simply going to _have_ to agree to disagree on this. In my view, it's about the user's perception - "installing foo" is _adding something previously not present_ to your system, whereas "upgrading foo" is _changing something that's already there_. To the user, these are far from being the same thing.

Right now the de facto outcome is that changes are just impossible

OK, I give up. I don't believe there's consensus on this change, and I think it's wrong to implement it without consensus. You know I don't agree with it myself, but that's not the point here. Changes really aren't impossible (we've made plenty of changes, some pretty controversial) but neither side in this argument seems able to convince the other. In my view, that typically results in the status quo winning - but I'm aware that by saying that I'm going to be perceived as implying that "all I have to do to get my way is stall things". IMO, it says something about where we are at the moment that I feel that way :-(

I'm bowing out of this discussion now. If anyone makes an argument that changes my mind, I'll acknowledge that, but otherwise I have nothing more to say. If I'm the last holdout for not making this change, I give my permission to everyone to ignore me - I certainly don't feel that I (or anyone) should have a veto over changes, and I'm completely comfortable accepting a majority decision. If others do still have reservations about this change, they'll have to make their own arguments (but I'd remind participants that not everyone reads github issues - in spite of the discussion going off-track, there were some comments on distutils-sig that IMO deserve a response).

pfmoore on 24 Jul 2016

And jokes aside, it might actually be healthier for the project if someone like dstufft decided to play BDFL in situations like right. Right now the de facto outcome is that changes are just impossible

While I don't think that our current process is optimal I don't think it's quite as bad as "changes are impossible". Generally we previously would do something like bring something up on pypa-dev ML with a simple majority vote amongst pip core in cases that there wasn't a clear consensus. I think there are three active pip core devs now (Myself, @pfmoore, and @xavfernandez) so if all three of us vote you end up with a vote one way or another instead of a tie. Could we use a more formalized process? Yes probably. Could that be a BDFL role? Possibly, but I don't think that's required either.

Sadly, the current ad hoc process typically means that one of the core contributors needs to sit down and decide to push for the change and say "Ok let's vote on this" and declare some ad hoc rules for doing so.

Recall, there was agreeing on changing the behavior of --upgrade, which is the major thing that was preventing things like projects depending on Numpy to declare their dependency. This particular change is jsut an idea that came out of that and is more of a UX thing than anything else. Like any project, unless _you're_ the BDFL there are going to be times when the decision making process goes against the option you want. I haven't done any of this because I've been focusing on Warehouse lately, pip will be coming back in my cross hairs after that's launched :)

dstufft on 24 Jul 2016

No-one runs pip install foo when they already know that foo is installed, because that would be silly.

I'd guess you're right, but I'd also say a lot of people are running pip install -r requirements.txt with requirements.txt containing foo (which is equivalent to pip install foo) even though they know that foo is installed on a daily basis.
And they are happy with the fact that pip does it quickly without checking if there is something to upgrade.

I'm not against the idea that pip install foo could be equivalent to pip install foo==$LATEST (in fact I like it) but I'm against changing this fundamental behavior without a deprecation period (and an escape option to keep the old behavior).
I'm not sure we have discussed this solution already, but this could be a new --strategy option to pip install:

no-upgrade would be the default in pip 9 and pip install --strategy=no-upgrade would be the current pip install behavior
eager would be the current pip install --upgrade behavior (and --upgrade a deprecated alias for --strategy=eager)
non-eager would be the default in pip 10
we could imagine a oldest-compatible for #3188, etc

Note that you could also put strategy=non-eager in your pip.conf to directly have it being the default in pip 9.

Could we use a more formalized process? Yes probably.

:+1:

xavfernandez on 25 Jul 2016

And they are happy with the fact that pip does it quickly without checking if there is something to upgrade.

It'd probably still be pretty quick TBH. We serve responses in less than a ms from the Fastly cache :)

I'm not against the idea that pip install foo could be equivalent to pip install foo==$LATEST (in fact I like it) but I'm against changing this fundamental behavior without a deprecation period (and an escape option to keep the old behavior).

I'm fine with a deprecation period. I'm not sure about a long term option to keep the old behavior. I'm not opposed, I just want to make sure that it's something we really should support long term, options in general coming with a cost, and wanting to make sure the cost is worth it.

dstufft on 26 Jul 2016

How does this sound?

Followed up with #3972.

pradyunsg on 16 Sep 2016

Closing since #3972 is merged.

We have taken a different path to resolving the behaviour of --upgrade.

pradyunsg on 5 Feb 2017

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 2 Jun 2019

Pip: Make install command upgrade packages by default

Most helpful comment

All 99 comments

End goal (where we want to end up)

Transition option A

Transition option B

Comment

Proposal

59 has 199 comments from 56 participants, many of them just +1's. Making them wait another year is kinda rude too.

59). That's not part of this proposal. My preferred response to your

Related issues