For a variety of reasons, folks may end up in a situation where it isn't straightforward to get an operating system level command shell for the interactive Python environment they're currently using (e.g. Windows start menu link, Jupyter notebook kernels, etc).
The most robust currently available option for handling that scenario is to do something like the following:
import subprocess, sys
def install(*args, **kwds):
"""Install packages into the current environment"""
cli_args = []
for k, v in kwds:
cli_args.append("--" + k.replace('_', '-'))
cli_args.append(v)
cli_args += args
subprocess.run(sys.executable, "-m", "pip", "install", *cli_args)
This basic approach is far from perfect (e.g. it will bypass Pipfile
when using pipenv
, it doesn't autoreload if you upgrade an already installed and imported package, and it will fail cryptically if pip
isn't installed), but those are problems that could be resolved by a more robust implementation provided by pip
itself.
For example:
Pipfile
or Pipfile.lock
is seen, emit a warning to stderr about it being bypassed (such a warning could potentially be added to pip
by default regardless of how it's invoked)__spec__.origin
and importlib.util.source_from_cache(__spec__.origin__)
for all of the modules in sys.modules.values()
and emit a warning to stderr suggesting a Python session restart if any of the just installed files is already loaded in the module cachepip
, then it's the from pip import install
step that would fail if pip
wasn't available in the current environmentThis basic approach of running f"{sys.executable} -m pip install ..."
could also be generalised to other pip
CLI subcommands, and most of the others won't have the same cache consistency problems that install
does.
(Note: this isn't a new idea, but we've historically just debated the question on distutils-sig
. With pip 10 moving the implementation API out to a private submodule, that opens up the opportunity to define a public Python level API that invokes the CLI in a subprocess.)
+5000
This is a fantastic suggestion. Even if it is not ideal for all applications, it would be a pythonic solution to a major point of confusion for many beginners. Even a rough implementation that only worked for the most basic use case, and failed gracefully with something like raise NotImplementedError('Your environment does not allow for this function. Please use pip from the command line.')
would be a huge improvement for usability.
So many of my students waste hours trying to figure out how to install modules properly, and inevitably end up in my office frustrated and discouraged.
>>> from pip import install
>>> install(useful_module)
So simple!
I would think that making the --user
flag default would be the safest way to implement this.
i would like to suggest to spearhead implementing calls to "use-cases for implementign pip and the surrounding checking in a library in order to support experimentation and not pining the tool down to just very recent pip versions
@RonnyPfannschmidt I'm not sure I understand. Are you saying that this shouldn't be implemented in the pip
package because it is only possible for the latest versions? (i.e., are you recommending that this should be done by a separate package?) If so, could you explain a little more why?
@reynoldsnlp If the functionality is in pip, then it will be intrinsically tied to the environment that you're working in. If users switch environments to one that has an older version of pip
, then their preferred way of working will start failing.
By contrast, if the functionality is in a helper library, then all that matters is the version of that library, regardless of which version of pip
they have access to. And unlike upgrading pip
, adding a new helper library to an environment is unlikely to break anything.
The downside of it being in a helper library is that it makes discoverability much, much worse, so while I think that may be useful as a model for initially iterating on implementation details, having pip
itself expose a meta API along these lines seems preferable in the long run (then the helper library would only be needed when using older pip
versions, similar to many standard library backport modules)
I agree that in the long run, having pip
expose the helper API is the way to go. Is the best way to approach this to start with a helper library for older versions and then later implement it inside pip
? If so, then should we do it in pypa, or should I make a first stab at this independently?
If this was in pip, it would be strongly coupled to releases, and entirely impossible to do the absolutely necessary experimentation and removals
@reynoldsnlp To be 100% clear, yes this should be done outside of pip initially. The advantages are:
Once it's in a working state, and has some level of user base (to give some assurance that key user requirements haven't been missed) then we can look at how (and whether) to merge into pip.
I would be happy to take this on, although I'm sure that I'm not the most qualified. I'll take a shot at it and post updates here.
Please see a first shot here: https://github.com/reynoldsnlp/pip_inside/blob/master/pip_inside/__init__.py
Does this look like good ways to call the function?
I like method 1 for simplicity for human users; just copy-paste that pip install ...
thing from a tutorial.
@ncoghlan what is the purpose of .replace('_', '-')
in your snippet? I copied it out of blind faith, but why is it there?
That's certainly a good starting point. It would obviously need a lot more real-world use to see if it met people's needs (see the points @ncoghlan made in his original post - for example, how should the implementation handle upgrading a package that's already imported in the current process: reload, error, or just ignore the problem and leave it for the user to figure out?)
what is the purpose of .replace('_', '-') in your snippet
Basically, Python function arguments can't contain -
, so to supply something like --disable-pip-version-check
, the caller needs to call install(..., disable_pip_version_check=True)
. You could of course use any translation convention you like, but this is simple and obvious.
@reynoldsnlp That's a really nice start! And you're right that being able to copy and paste a full pip command line directly into a Python string argument would be a neat feature to have.
The possibility of encountering spaces in local path names makes that a bit trickier than it might otherwise be though, so you'll likely want to use shlex.split rather than the string split()
method (that will mean that backslashes and nested quotes can be used to skip splitting on some spaces, so copying and pasting from an actual command shell example should work).
Another bug I noticed in my initial sketch is that it really needs to distinguish between two kinds of keyword argument:
--option-name
if the value is true, otherwise do nothing with itThat could likely be handled by doing a typecheck for isinstance(value, six.string_types)
, and treating all non-string cases as boolean options.
Graceful handling of edge cases is going to be an ongoing theme with this utility, so the next step would be to starting adding pytest
test cases, using a layout similar to the existing one in pip
(https://github.com/pypa/pip/tree/master/tests/):
pip_inside
's ability to call pip
correctly, not pip
's ability to install a wide variety of packages). https://github.com/pypa/pip/blob/master/tests/data/packages/simplewheel-1.0-py2.py3-none-any.whl and v2.0 of the same package would probably be good ones to usetests/functional/test_install_via_api_wrapper.py
to hold the test cases that could later be dropped directly into pip's own test suite (if/when the API gets added back to the main project)Some possible example test cases to start with given the potential problems I noted above:
install("pip install --target /tmp/<tempfile-generated-dir-name>/target_dir_without_spaces '<full-path-to-sample-wheel>'")
install("pip install --target '/tmp/<tempfile-generated-dir-name>/target dir with spaces' '<full-path-to-sample-wheel>'")
install("<full-path-to-v1.0-sample-wheel>", target="/tmp/<tempfile-generated-dir-name>/")
followed by install("<full-path-to-v2.0-sample-wheel>", target="/tmp/<tempfile-generated-dir-name>/", upgrade=True)
In the long run, you'd change the test cases to mock out subprocess.run
and just check that the arguments passed in are the ones you expect to be receiving (since that will run much much faster than actually doing the installs, and will let you get rid of the data files from the test suite).
Initially, though, you want to build confidence that the test cases are actually testing what you want them to test, so it's better to actually run pip
for real with a specified target directory.
@ncoghlan Thanks for the detailed feedback/instructions. I am enjoying working on this and learning as I go, but I am also painfully aware that my work will not be up to your level without your help. If you ever feel like I'm getting in the way (of the project that was your idea in the first place!), I would not be offended. :-)
By the way, even though the project is currently being developed separately from pypa, it seems that continuing at least a general conversation here as a pypa issue (and keeping this pypa issue open) would be good. To keep the discussion on this page from getting too cluttered, perhaps more specific issues could be added as issues on my repo. I would be happy to add anyone interested as contributors over there.
Regarding boolean flags, I had resolved the boolean flag issue by just checking if the value is True
, but I like the flexibility of treating any non-string value as a boolean flag. It's a little strange that you could pass in user=False
, and the result would be pip install --user
, but there is no reason to ever pass False
and the added flexibility of using 1
or whatever else is probably worth it.
/cc @theacodes in case she has any inputs on this. :)
@reynoldsnlp I'd still suggest checking the truth value of the passed-in non-string types, so all you'd be inferring from the type is whether the keyword referred to a boolean option flag or not. That is:
target="/some/dir/name"
as a keyword argument -> pass "--target"
and "/some/dir/name"
as a pair of CLI argumentstarget=""
as a keyword argument -> throw ValueError because passing an empty string doesn't make senseuser=True
as a keyword argument -> pass `"--user" as a CLI argumentuser=1
as a keyword argument -> pass `"--user" as a CLI argumentuser=False
as a keyword argument -> no change to CLI argumentsuser=0
as a keyword argument -> no change to CLI argumentsBehaviour with boolean options could get a bit weird (because pip's boolean options themselves are weird). As a user, I'd expect user=True
to force a user install, and user=False
to force a non-user install. There's no --no-user
option, so to force a non-user install is a no-op, except that environment variables and config file options can make --user
the default. There are other cases like --no-cache
and --no-color
which have a False
option but no True
option, and the default is True
.
As a starting point, @ncoghlan's description is sufficient (with modification for the default-True cases), but the implications should be clearly documented and you should be prepared for users to be confused even so (expectations for function arguments are different from expectations for command line options, and pip's command line options are inconsistent anyway).
PS There's also --use-pep517
, which is tri-state (omitted, --use-pep517
and --no-use-pep517
are all valid and all mean different things).
One advantage of having this as an API built into pip would be that it would allow the option to bypass all the CLI complexity, and hit the internal settings directly. That would allow the API to be designed properly as a programming API without exposing all of the quirks of the command line API.
The comments made by @ncoghlan in https://github.com/pypa/pip/issues/5069#issuecomment-450301743 about the trade-offs between an external helper and an internal part of pip still apply though.
If the prototype is intended ultimately for merging into pip, I'd be OK with it using pip's internals, on the understanding that doing so makes it completely unsupported as a long-term external project, and the external form would be purely a way of iterating on the design.
@pfmoore I was thinking that initially the --no-some-option
settings could be handled as no_some_option=True
, and have a PR for that at https://github.com/reynoldsnlp/pip_inside/pull/5.
However, the issue of tri-state options is a good point to raise, so I've filed https://github.com/reynoldsnlp/pip_inside/issues/6 over on @reynoldsnlp's repo to go into that in more detail (as long as this is living outside pip
, I think we want to minimise how much knowledge the wrapper needs about the different install options, but if we assume all non-string options are tri-state, then we can likely live with the error messages that pip will throw for the cases where the CLI is currently a bit inconsistent)
@reynoldsnlp Don't worry about potentially holding things up - by creating your project you've already stimulated more progress in the past few days than we'd made in years of kicking the general idea around.
It's one of those problems where simply asking "Will this work well enough? If not, why not?" is a spectacularly valuable contribution, as it lets us break the problem down into more manageable chunks, rather than getting overwhelmed trying to solve the entire thing before we even get started on anything usable :)
@reynoldsnlp Don't worry about potentially holding things up - by creating your project you've already stimulated more progress in the past few days than we'd made in years of kicking the general idea around.
Absolutely! If I seem like I'm criticising, or trying to tear down what you're doing, I'm definitely not. The work you've done has got me thinking about aspects of pip's CLI that had been bothering me subconsciously for some time, but I'd never managed to articulate properly.
The other thing to keep in mind is that while I've been adding issues to the repo, there are no deadlines or obligations for any of this work.
Hopefully there's a personal learning pay-off for you in getting some code review and project setup recommendations from folks that have been working on Python open source projects for quite a while, but if it ever gets to feeling like more trouble than it's worth, then it's completely fine to say you've done as much as you want to for now, and take a break (either for a specified time or indefinitely).
The code will still be there for someone else to start from, and you've already made a significant contribution just in getting @pfmoore & I to start working out a few more details for a potentially viable API design :)
@ncoghlan and @pfmoore, I definitely did not feel like you were being negative about my work! You've both been extremely helpful. I just noticed that some of your comments in this thread probably took longer to explain to me what to do than it would have been for you to just write the code yourself.
I have wanted to give back to the python community for a long time, and ironically, now that I'm trying to give back, I'm getting even more benefits by getting such great feedback and guidance. :-)
I'm teaching a new course on machine translation this semester, so my time is limited, but I have a goal to work on this at least a little every day.
@reynoldsnlp I find this particular issue hard to work on myself because it means I page the full scope of the problem back into my brain and decide "Nah, I'm gonna go play computer games instead".
Helping someone else learn-by-doing though? That's still fun :)
Most helpful comment
@reynoldsnlp I find this particular issue hard to work on myself because it means I page the full scope of the problem back into my brain and decide "Nah, I'm gonna go play computer games instead".
Helping someone else learn-by-doing though? That's still fun :)