Pip: Simplify managing packages from the Python REPL

Created on 16 Mar 2018 · 24Comments · Source: pypa/pip

For a variety of reasons, folks may end up in a situation where it isn't straightforward to get an operating system level command shell for the interactive Python environment they're currently using (e.g. Windows start menu link, Jupyter notebook kernels, etc).

The most robust currently available option for handling that scenario is to do something like the following:

import subprocess, sys
def install(*args, **kwds):
    """Install packages into the current environment"""
    cli_args = []
    for k, v in kwds:
        cli_args.append("--" + k.replace('_', '-'))
        cli_args.append(v)
    cli_args += args
    subprocess.run(sys.executable, "-m", "pip", "install", *cli_args)

This basic approach is far from perfect (e.g. it will bypass Pipfile when using pipenv, it doesn't autoreload if you upgrade an already installed and imported package, and it will fail cryptically if pip isn't installed), but those are problems that could be resolved by a more robust implementation provided by pip itself.

For example:

if Pipfile or Pipfile.lock is seen, emit a warning to stderr about it being bypassed (such a warning could potentially be added to pip by default regardless of how it's invoked)
after the installation operation completes, read the RECORD entries for any just installed modules, and compare them to __spec__.origin and importlib.util.source_from_cache(__spec__.origin__) for all of the modules in sys.modules.values() and emit a warning to stderr suggesting a Python session restart if any of the just installed files is already loaded in the module cache
if this were a public module level API in pip, then it's the from pip import install step that would fail if pip wasn't available in the current environment

This basic approach of running f"{sys.executable} -m pip install ..." could also be generalised to other pip CLI subcommands, and most of the others won't have the same cache consistency problems that install does.

public api feature request

Source

ncoghlan

👍2 ❤1

Most helpful comment

@reynoldsnlp I find this particular issue hard to work on myself because it means I page the full scope of the problem back into my brain and decide "Nah, I'm gonna go play computer games instead".

Helping someone else learn-by-doing though? That's still fun :)

ncoghlan on 4 Jan 2019

👍2

All 24 comments

(Note: this isn't a new idea, but we've historically just debated the question on distutils-sig. With pip 10 moving the implementation API out to a private submodule, that opens up the opportunity to define a public Python level API that invokes the CLI in a subprocess.)

ncoghlan on 16 Mar 2018

+5000

This is a fantastic suggestion. Even if it is not ideal for all applications, it would be a pythonic solution to a major point of confusion for many beginners. Even a rough implementation that only worked for the most basic use case, and failed gracefully with something like raise NotImplementedError('Your environment does not allow for this function. Please use pip from the command line.') would be a huge improvement for usability.

So many of my students waste hours trying to figure out how to install modules properly, and inevitably end up in my office frustrated and discouraged.

>>> from pip import install
>>> install(useful_module)

So simple!

reynoldsnlp on 13 Dec 2018

I would think that making the --user flag default would be the safest way to implement this.

reynoldsnlp on 13 Dec 2018

i would like to suggest to spearhead implementing calls to "use-cases for implementign pip and the surrounding checking in a library in order to support experimentation and not pining the tool down to just very recent pip versions

RonnyPfannschmidt on 14 Dec 2018

@RonnyPfannschmidt I'm not sure I understand. Are you saying that this shouldn't be implemented in the pip package because it is only possible for the latest versions? (i.e., are you recommending that this should be done by a separate package?) If so, could you explain a little more why?

reynoldsnlp on 28 Dec 2018

@reynoldsnlp If the functionality is in pip, then it will be intrinsically tied to the environment that you're working in. If users switch environments to one that has an older version of pip, then their preferred way of working will start failing.

By contrast, if the functionality is in a helper library, then all that matters is the version of that library, regardless of which version of pip they have access to. And unlike upgrading pip, adding a new helper library to an environment is unlikely to break anything.

The downside of it being in a helper library is that it makes discoverability much, much worse, so while I think that may be useful as a model for initially iterating on implementation details, having pip itself expose a meta API along these lines seems preferable in the long run (then the helper library would only be needed when using older pip versions, similar to many standard library backport modules)

ncoghlan on 28 Dec 2018

I agree that in the long run, having pip expose the helper API is the way to go. Is the best way to approach this to start with a helper library for older versions and then later implement it inside pip? If so, then should we do it in pypa, or should I make a first stab at this independently?

reynoldsnlp on 28 Dec 2018

If this was in pip, it would be strongly coupled to releases, and entirely impossible to do the absolutely necessary experimentation and removals

RonnyPfannschmidt on 28 Dec 2018

@reynoldsnlp To be 100% clear, yes this should be done outside of pip initially. The advantages are:

Faster iteration cycles (not tied to pip's releases)
More chance of someone doing it (not reliant on pip's limited resources)

Once it's in a working state, and has some level of user base (to give some assurance that key user requirements haven't been missed) then we can look at how (and whether) to merge into pip.

pfmoore on 28 Dec 2018

I would be happy to take this on, although I'm sure that I'm not the most qualified. I'll take a shot at it and post updates here.

reynoldsnlp on 28 Dec 2018

Please see a first shot here: https://github.com/reynoldsnlp/pip_inside/blob/master/pip_inside/__init__.py

Does this look like good ways to call the function?

I like method 1 for simplicity for human users; just copy-paste that pip install ... thing from a tutorial.

@ncoghlan what is the purpose of .replace('_', '-') in your snippet? I copied it out of blind faith, but why is it there?

reynoldsnlp on 28 Dec 2018

That's certainly a good starting point. It would obviously need a lot more real-world use to see if it met people's needs (see the points @ncoghlan made in his original post - for example, how should the implementation handle upgrading a package that's already imported in the current process: reload, error, or just ignore the problem and leave it for the user to figure out?)

what is the purpose of .replace('_', '-') in your snippet

Basically, Python function arguments can't contain -, so to supply something like --disable-pip-version-check, the caller needs to call install(..., disable_pip_version_check=True). You could of course use any translation convention you like, but this is simple and obvious.

pfmoore on 28 Dec 2018

👍1

@reynoldsnlp That's a really nice start! And you're right that being able to copy and paste a full pip command line directly into a Python string argument would be a neat feature to have.

The possibility of encountering spaces in local path names makes that a bit trickier than it might otherwise be though, so you'll likely want to use shlex.split rather than the string split() method (that will mean that backslashes and nested quotes can be used to skip splitting on some spaces, so copying and pasting from an actual command shell example should work).

Another bug I noticed in my initial sketch is that it really needs to distinguish between two kinds of keyword argument:

string values: pass to the subprocess call as two arguments (option + value)
boolean flags: pass to the subprocess call as --option-name if the value is true, otherwise do nothing with it

That could likely be handled by doing a typecheck for isinstance(value, six.string_types), and treating all non-string cases as boolean options.

Graceful handling of edge cases is going to be an ongoing theme with this utility, so the next step would be to starting adding pytest test cases, using a layout similar to the existing one in pip (https://github.com/pypa/pip/tree/master/tests/):

Grab a couple of simple test packages from https://github.com/pypa/pip/tree/master/tests/data/packages (you won't need the whole set, since you're only testing pip_inside's ability to call pip correctly, not pip's ability to install a wide variety of packages). https://github.com/pypa/pip/blob/master/tests/data/packages/simplewheel-1.0-py2.py3-none-any.whl and v2.0 of the same package would probably be good ones to use
Create tests/functional/test_install_via_api_wrapper.py to hold the test cases that could later be dropped directly into pip's own test suite (if/when the API gets added back to the main project)

Some possible example test cases to start with given the potential problems I noted above:

install("pip install --target /tmp/<tempfile-generated-dir-name>/target_dir_without_spaces '<full-path-to-sample-wheel>'")
install("pip install --target '/tmp/<tempfile-generated-dir-name>/target dir with spaces' '<full-path-to-sample-wheel>'")
install("<full-path-to-v1.0-sample-wheel>", target="/tmp/<tempfile-generated-dir-name>/") followed by install("<full-path-to-v2.0-sample-wheel>", target="/tmp/<tempfile-generated-dir-name>/", upgrade=True)

In the long run, you'd change the test cases to mock out subprocess.run and just check that the arguments passed in are the ones you expect to be receiving (since that will run much much faster than actually doing the installs, and will let you get rid of the data files from the test suite).

Initially, though, you want to build confidence that the test cases are actually testing what you want them to test, so it's better to actually run pip for real with a specified target directory.

ncoghlan on 31 Dec 2018

👍1

@ncoghlan Thanks for the detailed feedback/instructions. I am enjoying working on this and learning as I go, but I am also painfully aware that my work will not be up to your level without your help. If you ever feel like I'm getting in the way (of the project that was your idea in the first place!), I would not be offended. :-)

By the way, even though the project is currently being developed separately from pypa, it seems that continuing at least a general conversation here as a pypa issue (and keeping this pypa issue open) would be good. To keep the discussion on this page from getting too cluttered, perhaps more specific issues could be added as issues on my repo. I would be happy to add anyone interested as contributors over there.

reynoldsnlp on 1 Jan 2019

Regarding boolean flags, I had resolved the boolean flag issue by just checking if the value is True, but I like the flexibility of treating any non-string value as a boolean flag. It's a little strange that you could pass in user=False, and the result would be pip install --user, but there is no reason to ever pass False and the added flexibility of using 1 or whatever else is probably worth it.

reynoldsnlp on 1 Jan 2019

/cc @theacodes in case she has any inputs on this. :)

pradyunsg on 1 Jan 2019

@reynoldsnlp I'd still suggest checking the truth value of the passed-in non-string types, so all you'd be inferring from the type is whether the keyword referred to a boolean option flag or not. That is:

target="/some/dir/name" as a keyword argument -> pass "--target" and "/some/dir/name" as a pair of CLI arguments
target="" as a keyword argument -> throw ValueError because passing an empty string doesn't make sense
user=True as a keyword argument -> pass `"--user" as a CLI argument
user=1 as a keyword argument -> pass `"--user" as a CLI argument
user=False as a keyword argument -> no change to CLI arguments
user=0 as a keyword argument -> no change to CLI arguments

ncoghlan on 2 Jan 2019

Behaviour with boolean options could get a bit weird (because pip's boolean options themselves are weird). As a user, I'd expect user=True to force a user install, and user=False to force a non-user install. There's no --no-user option, so to force a non-user install is a no-op, except that environment variables and config file options can make --user the default. There are other cases like --no-cache and --no-color which have a False option but no True option, and the default is True.

As a starting point, @ncoghlan's description is sufficient (with modification for the default-True cases), but the implications should be clearly documented and you should be prepared for users to be confused even so (expectations for function arguments are different from expectations for command line options, and pip's command line options are inconsistent anyway).

PS There's also --use-pep517, which is tri-state (omitted, --use-pep517 and --no-use-pep517 are all valid and all mean different things).

pfmoore on 2 Jan 2019

One advantage of having this as an API built into pip would be that it would allow the option to bypass all the CLI complexity, and hit the internal settings directly. That would allow the API to be designed properly as a programming API without exposing all of the quirks of the command line API.

The comments made by @ncoghlan in https://github.com/pypa/pip/issues/5069#issuecomment-450301743 about the trade-offs between an external helper and an internal part of pip still apply though.

If the prototype is intended ultimately for merging into pip, I'd be OK with it using pip's internals, on the understanding that doing so makes it completely unsupported as a long-term external project, and the external form would be purely a way of iterating on the design.

pfmoore on 2 Jan 2019

@pfmoore I was thinking that initially the --no-some-option settings could be handled as no_some_option=True, and have a PR for that at https://github.com/reynoldsnlp/pip_inside/pull/5.

However, the issue of tri-state options is a good point to raise, so I've filed https://github.com/reynoldsnlp/pip_inside/issues/6 over on @reynoldsnlp's repo to go into that in more detail (as long as this is living outside pip, I think we want to minimise how much knowledge the wrapper needs about the different install options, but if we assume all non-string options are tri-state, then we can likely live with the error messages that pip will throw for the cases where the CLI is currently a bit inconsistent)

@reynoldsnlp Don't worry about potentially holding things up - by creating your project you've already stimulated more progress in the past few days than we'd made in years of kicking the general idea around.

It's one of those problems where simply asking "Will this work well enough? If not, why not?" is a spectacularly valuable contribution, as it lets us break the problem down into more manageable chunks, rather than getting overwhelmed trying to solve the entire thing before we even get started on anything usable :)

ncoghlan on 2 Jan 2019

@reynoldsnlp Don't worry about potentially holding things up - by creating your project you've already stimulated more progress in the past few days than we'd made in years of kicking the general idea around.

Absolutely! If I seem like I'm criticising, or trying to tear down what you're doing, I'm definitely not. The work you've done has got me thinking about aspects of pip's CLI that had been bothering me subconsciously for some time, but I'd never managed to articulate properly.

pfmoore on 2 Jan 2019

The other thing to keep in mind is that while I've been adding issues to the repo, there are no deadlines or obligations for any of this work.

Hopefully there's a personal learning pay-off for you in getting some code review and project setup recommendations from folks that have been working on Python open source projects for quite a while, but if it ever gets to feeling like more trouble than it's worth, then it's completely fine to say you've done as much as you want to for now, and take a break (either for a specified time or indefinitely).

The code will still be there for someone else to start from, and you've already made a significant contribution just in getting @pfmoore & I to start working out a few more details for a potentially viable API design :)

ncoghlan on 2 Jan 2019

@ncoghlan and @pfmoore, I definitely did not feel like you were being negative about my work! You've both been extremely helpful. I just noticed that some of your comments in this thread probably took longer to explain to me what to do than it would have been for you to just write the code yourself.

I have wanted to give back to the python community for a long time, and ironically, now that I'm trying to give back, I'm getting even more benefits by getting such great feedback and guidance. :-)

I'm teaching a new course on machine translation this semester, so my time is limited, but I have a goal to work on this at least a little every day.

reynoldsnlp on 2 Jan 2019

❤1

Helping someone else learn-by-doing though? That's still fun :)

ncoghlan on 4 Jan 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings