Hello all,
The situation
Currently, there is no easy way to override the default PyPI index URL to use a URL pointed at a mirror. In corporate environments, requiring developers to use a repository mirror is quite common:
Unfortunately, this doesn't appear to be easily accommodated by pipenv. Although the mirror could be explicitly added to the Pipfile as the source for these packages, this breaks portability.
There should be a way to override the location of the PyPI index, by specifying a (true) mirror. This would only be applicable to PyPI, and not to other third-party repositories (these would still be specified explicitly in the Pipfile).
General proposal
Docker accommodates this situation by allowing the user to specify a registry mirror in the daemon's configuration file. Likewise, it'd be great if the pipenv user could specify a (true) mirror for PyPI, via an environment variable, configuration file, or command line parameter. If this value is set, pipenv should use the mirror for all PyPI packages, even if a connection to PyPI is available. In some corporate environments, PyPI remains unblocked, but policy dictates that the mirror is used for the other reasons mentioned above.
Implementation considerations
Related discussion
This has been discussed in #python and #pypa on Freenode. After some constructive back-and-forth, it was decided that it'd be helpful to open an issue here for discussion. I appreciate everyone's effort towards resolving this issue.
/cc @uranusjr @ncoghlan @altendky @njsmith
I am persuaded that this is a thing that happens commonly (corporate FW / caching proxy) -- I feel we need an override setting to specify a mirror to use instead of pypi if we find it in the pipfile-- like PIPENV_PYPI_MIRROR
or PIPENV_PYPI_CACHING_PROXY
or something like that to specify that it should be tried first, sliced into sources
in front of pypi basically.
Does that seem like it accomplishes the goal? If so, we can tag in the implementation genie to tell us why this is good or bad (@ncoghlan)
I'll start with a note of caution: until PyPI has implemented a package signing mechanism akin to PEP 458 to provide a TLS-independent way for pip
to ensure that packages that nominally originate from PyPI actually match what PyPI published, then offering the ability to transparently redirect traffic to a different server is genuinely concerning from a security perspective.
Unfortunately, that particular attack vector is already open by way of pip.conf
, so offering something comparable at the pipenv
level isn't going to make anything any worse than it already is.
Beyond that, I think a general purpose repository URL rewriting mechanism could actually be easier to document and explain than something PyPI specific, at least at the base capability layer. Something like:
pipenv --override-source-url 'default=https://pypi-proxy.example.com/api' --override-source-url 'https://pypi.python.org/simple=https://pypi-proxy.example.com/api' --override-source-url 'https://pypi.org/simple=https://pypi-proxy.example.com/api' install
(The only PyPI specific bit there would be using "default" to refer to pip's default download source, as specific in pip.conf
).
Spelling out the entire source URL override map every time would be unwieldy to use in practice though, so a couple of options for CLI sugar might look like:
pipenv --override-source-urls <config file> install
pipenv --pypi-mirror https://pypi-proxy.example.com/api install
Whether or not to expose the --override-source-url
layer immediately is a different question - it might make more sense to implement the simpler --pypi-mirror
option first, and merely keep the possibility of --override-source-url
and --override-source-urls
as possible future options in mind while doing so.
A general {given URL: override URL}
mapping was my first thought too, but on further consideration, there are some arguments for special-casing PyPI:
PyPI is pretty unique in having a well-known public URL and lots of mirrors
PyPI actually has multiple URLs (e.g., we'll probably have Pipfiles floating around for a while with both https://pypi.python.org/simple
and https://pypi.org/simple
, and maybe also https://pypi.python.org/simple/
and https://pypi.org/simple/
with the trailing slash?), and it'd be nice if we could solve this once instead of forcing each user to figure it out themselves
@njsmith See the --pypi-mirror <URL>
sugar suggestion in the last part of my post - if the initial implementation focused solely on that, then the general URL rewriting capability could start out as an internal implementation detail (driven by the fact that "PyPI" has multiple URLs that all ultimately resolve to the same place), and then be considered for exposure as a feature in its own right later on (after it's been confirmed that it's working as desired for the primary --pypi-mirror
use case).
Ah, right, I missed that :-)
Is there a general rule mapping command line arguments to some kind of more persistent configuration? I imagine that most users of this would want to set it up once and then forget it.
@ncoghlan wrote:
I'll start with a note of caution: until PyPI has implemented a package signing mechanism akin to PEP 458 to provide a TLS-independent way for pip to ensure that packages that nominally originate from PyPI actually match what PyPI published, then offering the ability to transparently redirect traffic to a different server is genuinely concerning from a security perspective.
If I'm reading my Pipfile.lock
correctly there is no relationship stored between a package and which source it was installed from. Given that the existing featureset allows multiple sources to be specified isn't that creating a similar issue? A sync
could end up getting a package from a different source than the one that was used to when creating the lockfile.
Pipfile.lock
stores a list of acceptable artifact hashes for each pinned dependency, so once you've done a lock, surreptitiously replacing packages is difficult. At lock generation time, explicitly opting in to a source in Pipfile
is saying "I trust this source not to mess with me, and will use TLS to verify that I'm actually talking to this point of origin". (I think there's an issue somewhere discussing the prospect of binding particular packages to particular source repos, although it may be in pip
or one of the other PyPA repos, rather than here)
Changing the default index URL (or adding an extra index URL) in pip.conf
, or using the override feature proposed here through a config file or shell profile based mechanism is different: that's saying "I, or some arbitrary process I ran at some point in time with write access to my home directory (such as an sdist's setup.py file), decided to configure my settings to trust this source of packages". And even a signing scheme like PEP 458 isn't a complete defence against those kinds of shenanigans if the public keys used for verification are themselves stored somewhere inside your home directory rather than in a directory that requires elevated privileges to modify.
There are good reasons why organisations with strict security requirements execute builds on locked down servers with only limited access to the internet at large, or otherwise monitor for these kinds of problems at the network level :)
Note also if you use multiple indexes and a package comes from the non-primary index it will be indicated in the lockfile.
The pep 458 concerns were essentially what I had in mind, since things that are different urls but in actuality point at pypi are different than if you just locally copied pypi and claimed it was the same.
I, or some arbitrary process I ran at some point in time with write access to my home directory (such as an sdist's setup.py file), decided to configure my settings to trust this source of packages
If this is your threat model, then I don't see how anything pipenv can do will effect it much. Someone who can modify your home directory config can also do things like insert a new directory on $PATH
and insert a fake pipenv in there that does whatever they want.
@njsmith this is also pip鈥檚 threat model, because package installation requires the execution of arbitrary code from sdist setup.py
files be allowed. That code indeed could overwrite things in your home directory like your settings, or add things to your path, or any number of things. That鈥檚 why explicitly privileging pypi (a know, trusted index) and requiring hash checking is a good step toward security. It allows centralized control and elimination of known security threats and identify verification of the packages you are downloading in a distributed fashion. What did the lockfile you downloaded say about the hash you should be getting? It doesn鈥檛 match what you鈥檙e getting from the index? In order for this mode of operation to fail you need to have failures at more than one of the local machine, index and network layer because you鈥檙e talking about having multiple corrupted packages in your application stack working in concert verifying hashes against a trusted index, and in many cases the hashes themselves came from yet another uninvolved source. So now you need to have at a minimum, all of the hash checking in both pip and pipenv somehow tampered with such that it generates hashes that are identical to the ones you are hoping for, but installs yet other malicious things?
I guess what I鈥檓 saying is, if your local machine is compromised there is nothing pip or pipenv is going to do to save you. But we can ensure that the package you鈥檙e downloading is the one you were looking for, from the place you were supposed to search for it, which can provide one element in the chain of security.
@ncoghlan @njsmith how does this all factor in with the move to push back against sudo pip install...
and the general sense I think we all have that if you're going to use pip, you probably shouldn't also use your system package manager to install python things broadly speaking. This isn't really a pipenv question maybe, but it's where the discussion is right now and this might guide the next steps...
@techalchemy I don't see any connection to this topic at all? I think the conclusion of all the above is that letting users override which mirror pipenv uses for PyPI doesn't introduce any additional threats, and doing sudo pipenv
doesn't even make sense in the first place, right?
@njsmith no I don't think anyone should use sudo pipenv
, like I mentioned it's not really on topic but since we went a bit down the threat model path, I thought it was worth exploring. Specifically:
And even a signing scheme like PEP 458 isn't a complete defence against those kinds of shenanigans if the public keys used for verification are themselves stored somewhere inside your home directory rather than in a directory that requires elevated privileges to modify.
There are good reasons why organisations with strict security requirements execute builds on locked down servers with only limited access to the internet at large, or otherwise monitor for these kinds of problems at the network level :)
If a defense at least in some capacity relies on keys being stored in a privileged location, but we are advising against using privileged python installs, I think it's possibly worth discussing. Maybe I'm wrong. But it definitely seems related to @ncoghlan's comment (but not sudo pipenv
, that should never be a thing)
Yeah that probably seemed like it came out of nowhere, just a random thought. Hopefully the additional context clears it up some
I vote we keep this issue on the topic of helping folks who need to use PyPI mirrors, rather than getting into a speculative discussion of how we might implement TUF. (Anyway, I don't think there's much we can or should do to try to defend against an attacker who has arbitrary write access to the the user's home directory.)
Okay, so lets define the behavior that we would expect or prefer. My current working understanding is that:
--pypi-mirror
is passed or PIPENV_PYPI_MIRROR
is set, we should prefer thatIt should override PyPI only, not other URLs. I guess there are probably only a few different PyPI URLs in use, so they can be listed, and if we miss one then someone will file a bug, it'll get added, and pretty soon we'll have all of them.
Seems like the right approach to me.
What @njsmith said matches my perspective as well. The 3 repo URLs I'd suggest replacing in an initial PR would be:
pip
's default setting)The trailing-slash-or-not is likely better handled as a URL normalisation step, rather than by listing the URLs separately.
Note that the requests Pipfile does have a trailing slash (at time of writing), so we probably do need to handle this one way or another.
Right, my thought was:
str.rstrip
would likely be good enough for the task, even though it would remove an arbitrary number of trailing slashes, or else we could be stricter about it, and remove at most one trailing slash)Awesome. I think this is enough to work with and simple enough to build. Thanks all!
Hope mirror feature could be added soon~
I am encountering this issue as well. The situation is:
My deployment strategy already sets up a system-wide pip.conf that refers to the internal PyPI server. Surprisingly, I found that this configuration is ignored by Pipenv.
I'm noticing that if I were to move/rename the interal PyPI, then several applications with Pipfiles would have to be updated and their Pipfile.lock files regenerated. A mirror option would provide the desired functionality. It would also work and feel less redundant if Pipenv could just read the system configuration for Pip.
PRs welcome on this one btw
Hi. I have the same need but I would split this override feature into another ticket.
Here is my expected behavior proposal:
And in a second ticket, the 鈥攐verride options can be implemented. It makes sense for example inside a CI or something.
As a side note: we heavily use pipenv in production now, but I need to remind everyone too often that they need to change their Pipfile manually when they start a new project to hit our Arrifactory Pypi repository (for information, Nexus also does a Pypi cache for free and t works great!). We have a very limiting firewall and it is a very good practice inside a company to cache external dependencies, so they can be backed up and checked for vulnerabilities for instance.
If a simple feature similar to the general or user configuration file (like we already do for pip or npm), so that we deploy it on all our workstation so our developers do less mistakes, that would be perfect for me)
Maybe I missed something, but this seems like a regression. We've been on 11.6.0 for a while, and pipenv happily delegated to the settings in our pip.conf, which point to an internal pypi mirror.
Any idea when this broke? It makes pipenv completely unusable in our context. I'm having trouble seeing this as a "missing feature" when it was apparently working fine for a long time.
To be clear: after upgrading to 2018.05.18, even with the mirror specified in our Pipfile[.lock], pipenv tries to install new packages from pypi.org.
Maybe what I'm seeing is a separate issue from this one...
@brettdh It is hard to tell without seeing your environment, but I鈥檇 think it is not the same issue. I鈥檇 suggest you do some bisecting between releases to see exactly where this changed, and open a new issue for it.
I'm working on the PR for this.
I do think this was regressed vis a vis the default setting. It may have been caught in a wave of updates for pip 10 which are not released yet but I believe we can pick this up without too much difficulty if @JacobHenner isn鈥檛 already adding it
I presume you're talking about using devpi as caching proxy for official PyPi. For pip itself, you would need to modify /etc/pip.conf
and /usr/lib64/python3.6/disutils/distutils.cfg
for pip to use your local devpi server for all requests.
However, it looks like pipenv ignores these system-wide settings, so you are forced to modify the [[source]]
config setting in Pipfile to reference your devpi server. But then if you publish your Pipfile externally, external contributors have to remove your [[source]]
settings to actually build their own environment.
I think that pipenv should just respect the global settings from /etc/pip.conf
and /usr/lib.../distutils.cfg
@polski-g
I presume you're talking about using devpi as caching proxy for official PyPi
Nexus Repository, but yeah, same idea.
However, it looks like pipenv ignores these system-wide settings
As @techalchemy mentioned, I believe that pipenv (11.6.0) used to respect pip.conf
(homedir as well), but the latest version does not - specifically, there's a hard-coded pypi.org URL somewhere (dependency resolution, IIRC) that can't be overridden.
I think that pipenv should just respect the global settings from /etc/pip.conf and /usr/lib.../distutils.cfg
Agreed - though personally I haven't had to modify distutils.cfg
in my use case.
IIRC there was a resolution to not respect pip.conf, but you鈥檒l need to dig deep into the issue tracker to find it. In any case, the ship has sailed, and with PyPI mirroring almost done, this is unlikely to change in near future.
I'm fairly confident this feature will ship in the next release (which will ship in the next day or two with luck)
Also I'm not sure about this, but it's possible we might just need to call .load()
after we create the config parser here to get the config defaults
https://github.com/pypa/pipenv/blob/master/pipenv/project.py#L573-#L577
@uranusjr as long as the mirroring configuration works (i.e. doesn't use that hardcoded pypi.org URL I mentioned), I don't see any problem with pipenv having its own configuration for this and ignoring pip's.
@brettdh Would you be able to checkout my branch and confirm it meets your
use case in your environment?
>
@JacobHenner yep, thanks. My initial testing with the --pypi-mirror
option (pipenv install
, pipenv lock
) looks like it works fine. I left a small suggestion on the PR.
I'm a bit concerned, though, that hardcoded URLs to pypi.org still appear scattered across the pipenv sources. I can't be sure which ones are correctly overridden from [[source]]
entries, and I can't remember exactly which workflow caused my issue above. So it's hard to tell if it's fixed. 馃槵
Yeah following this release I am planning a major code cleanup. Cli stuff moving to the cli, bubbling exceptions there and handling all the exits there, deduping duplicated code, etc. It鈥檚 going to be a lot of work and help will be appreciated if anyone wants to volunteer :p
Just pulled the recent version and it is still hardcoding the pypi.org in the sources. Is the goal to take the environmental variable or the pypi-mirror and put that as the default for [[source]]?
edit:
Just dug through the code.. Looks like you have
if PIPENV_TEST_INDEX:
DEFAULT_SOURCE = {
u"url": PIPENV_TEST_INDEX,
u"verify_ssl": True,
u"name": u"custom",
}
else:
DEFAULT_SOURCE = {
u"url": u"https://pypi.org/simple",
u"verify_ssl": True,
u"name": u"pypi",
}
I think if you changed that If PIPENV_TEST_INDEX to the environmental variable PIPENV_PYPI_MIRROR it would be a good start
The solution discussed here has long been implemented. The snippet you quoted is a default, i.e. used if you do not provide a source when creating the Pipfile.
No, the source should not change in the Pipfile. The goal of this change
was to allow users to override PyPI URLs with a mirror, _without_ changing
the Pipfile.
@JacobHenner The mirror handling code postprocesses the source list and replaces pypi.org
URLs with references to the specified mirror.
That's what allows the mirror override to work even if there is an explicit pypi.org
entry in the Pipfile
. pipenv
then relies on that same logic to override its own default source as well.
If there are currently cases where that postprocessing isn't being applied correctly, that's a new bug report against the already implemented feature, rather than a feature request.
I think that last comment was intended for @kylecribbs?
@JacobHenner Ah, sorry - I misinterpreted your comment as saying that this change hadn't achieved its original goal, rather than as a response to Kyle that aimed to clarify what that outcome actually was.
Most helpful comment
It should override PyPI only, not other URLs. I guess there are probably only a few different PyPI URLs in use, so they can be listed, and if we miss one then someone will file a bug, it'll get added, and pretty soon we'll have all of them.