"pip install xxx --download yyy --no-deps" runs "python setup.py egg_info" after downloading (to generate the requires.txt dependency list?). This seems unnecessary.
This is a problem in the case of "cryptography", since its setup.py uses setup_requires, which automatically downloads its dependencies at the slightest nudge, even just --help (cryptography issue #716).
This impedes using pip install --download to build a collection of trusted (checksummed) modules for later installation.
The python setup.py egg_info
command is run for metadata discovery. It's an unfortunate fact of the Python packaging ecosystem that anything related to packaging always involves arbitrary code execution (referring to setup.py
). The setuptools setup_requires
feature is a particularly nasty example of this (don't get me wrong, I understand it has its uses, but it's still nasty). So I believe this should be fixed on the side of the cryptography package, because we can't expect pip to reasonably deal with setup_requires
:
setup.py
script is evaluated in _any_ way, even if you run something very simple which seems like it should be a read only operation, e.g. python setup.py --help
or python setup.py --version
.As a bonus setuptools uses easy_install
for setup requirements which can cause several problems: Using pip install --no-index
will not stop easy_install
from downloading remote package archives and using pip install --index-url=...
will not instruct easy_install
to use the custom index URL, instead the URL has to be set in ~/.pydistutils.cfg
(you can't even configure it using an environment variable).
python setup.py --help-commands
. Setuptools can't decide what setup requirements will do (the possibilities are unbounded :-) so setuptools errs on the side of caution and always installs setup requirements before doing anything else with setup.py
.setup.py
script).It is a shame that the setup_requires
feature of setuptools is so rudimentary, it would have been nice if there was a canonical and supported way to tell setuptools which setup commands really need setup requirements. That might have avoided this mess.
@xolox thanks for the detailed description, that helped me understand what was going on here.
Another unfortunate side effect of the behavior reported by @tlynn is that we can't run parallel downloads for packages that share transitive dependencies. What one sees is that when using something like
pip install --exists-action=i --download-dir ./download-cache <lots of dependencies that share transitive dependencies>
are lots of OSErrors
about paths not existing, presumably because invocations of setup.py egg_info
on packages that are in the process of being downloaded result in files not being found.
This is probably a convoluted example, but I figured I'd share my use case here.
When we implement this, we should take care to prevent path traversal bugs like mentioned in https://github.com/pypa/pip/issues/731.
@dstufft Can this be closed since the --download
flag has been removed?
I think the issue is still valid for pip download --no-deps
?
I guess so.
What about not running setup.py egg_info
on pip install --no-deps
? (Even without --no-download
)
That would allow working around a current pylxd issue https://github.com/lxc/pylxd/issues/308#issuecomment-394951850
how about additionally not running egg-info if the tarball ships an egg_info file to begin with
i believe these days the setuptools-generated packages can be trusted
however this should be introduced over a grace period where pip warns about differences between shipped and freshly generated
i just ran into this when trying to use pip as sane caching downloader of other stuff
(its just really neat for travis use since the download uses the pip cache which is cached + all the bells and whistles)
i just tried to disable this for downloading and the issue spans resolver, preparer and a few other utilities - i will need to retry with more time
for an sdist the internals will aways fetch a class IsSDist(DistAbstraction)
from pip._interal.operations.prepare
its prep_for_dist
will always run egg_info
and assert that assert_source_matches_version
with my limited understanding of the internals this is no longer a small saturday morning hack but rather a more nuanced re-factoring and i have to pull out
As well as setup_requires
, this also affects pyproject.toml
. pip download --no-deps
will download and install anything listed in there. You can stop this by passing --no-build-isolation
, but that shouldn't be necessary because the command didn't ask for a build in the first place.
@RonnyPfannschmidt, I don't think we can trust PKG-INFO
or similar without some indicator that it's OK. I made a thread over here to discuss one possible approach.
Updated title to reflect current CLI.
I guess this bug will be old enough to drink and vote some day ;)
If anyone else needs, there's a workaround demonstrated here (warning: very hacky monkeypatch)
https://stackoverflow.com/a/60325681/674039~~ the hack no longer works :(
@wimglenn seems to be worse:
did not work with e.g. pip download --no-deps pygobject
Seriously, stuff works fine for 18 versions then basic stuff breaks...
Yeah, confirmed as of about pip 20.1 the patch no longer suffices.
I modified download.py
inside the pip distribution to actually check no_deps and pass it down to sdist by adding the parameter to the RequirementPreparer
...
And it works!!
OFC this was manually on my user-installed version of pip, not on top of a git repo or anything. So I don't have a patch right now, persay...
If anyone wants to file a PR to fix this instead of complaining how this hasn't been fixed / doesn't work, that'd be appreciated. Otherwise, please understand that you're not contributing positively to this issue and making it less likely that this gets fixed (read to understand why).
@pradyunsg I've provided a roadmap to a fix. That maybe helps someone someday down the road?
Everything I have said is true. I don't have time to fix the code, maybe someone else can submit a patch with the work I've done.
The simple patch works for you, but not pip in general since it needs to deal with way more edge cases. This is basically #7995 but for legacy setuptools projects. The root problem is that source distribution metadata is not trustworthy, and it鈥檚 difficult to avoid building metadata sinnce pip needs to check for package integrity. The thing we really need to do before any of this can reasonably happen is to have standardisation on essential sdist metadata (namely package name and version) somehow. There has been efforts on this; feel free to contribute to them.
I still don't understand, why does pip need to check for package integrity on pip download --no-deps
? The user just wants to download the artifact, and get whatever release file it would have otherwise found (given the python version and platform, plus any relevant pip config such as extra index url). In practice the name and version are right there in the filename anyway, I understand that there can be pathological cases where the filename lied about the package metadata, but why should that be a blocker? That's not really a concern for someone just using pip as a client to download distribution.
pip download foo-1.0
could find a file foo-1.0.tar.gz
which contained a project called bar, version 2.0.
Pip has to get the package metadata (by building) to confirm that the filename matches the metadata. One of the standardisation efforts @uranusjr is talking about is to make this case invalid, which would remove the need for this particular check.
It could, but who is actually doing that in practice? It seems like a hypothetical edge case that doesn鈥檛 matter much, at least not to this issue.
With the setup.py execution you could also make a dist that鈥檚 foo v0.1 usually but calls itself bar v0.2 on Tuesdays. Just because that鈥檚 possible it doesn鈥檛 mean that pip needs to attempt to check for that.
It is a hypothetical edge case, that's the point. We're trying to standardise stuff so that people can't do stupid things like that, and we can assume everything's consistent, but until we get that done, we need to check so that we can give a proper error. The alternative is to risk breaking people's systems.
Maybe a "no safety net" pip would be better for some users. I honestly don't know. As a pip developer it's very frustrating having to consider edge cases like this, but every time we think we can assume things are sensible, we get a bug report saying someone did what we didn't expect 馃檨
I can see how that could risk breaking pip install
, but I still don't see how it's relevant to this issue? If pip download --no-deps foo==1.0
should find a file foo-1.0.tar.gz
which actually contained a project called bar, version 2.0, then I would still expect (and want) the same thing from pip, i.e. to download the file foo-1.0.tar.gz to current working directory _and not extract nor execute any of the code inside_.
I can see how that could risk breaking pip install, but I still don't see how it's relevant to this issue?
Mostly because there's so much shared code, that having a special-case for download --no-deps
(and precisely that only) would be pretty difficult.
Feel free to take a look at the code and if you think you can work out how to do it cleanly, propose a PR. But I'll warn you that there's a good chance it'll get rejected, so unless you find an exceptionally clean solution, you should be prepared to see it as mainly a learning exercise...
I would still expect (and want) the same thing from pip, i.e. to download the file foo-1.0.tar.gz to current working directory
Honestly, why not just get the PyPI URL and download it directly? You seem to be going to a lot of effort (and expecting others to as well) to basically download a file whose name you know. (If you need multiple files, a quick script to read the PyPI index to get the target URL wouldn't be hard, either). Not every interaction with PyPI needs to be via pip...
Honestly, why not just get the PyPI URL and download it directly?
Hi Paul, the reason not to just get the PyPI URL and download directly is that I want to get the same file that pip install
would have chosen. And I don't know the filename ahead of time, the input is not necessarily a project name + version (pinned) but a general requirement specifier.
That seems easy enough to just write a simple PyPI client but in reality there are several factors at play here, and you would end up reinventing a lot of the code already in pip to do it correctly. Requires-Python metadata means you need to know about the runtime of pip, the compatibility tags for wheels are quite involved to handle correctly, as well as pip's possible configurations such as PIP_INDEX_URL
and PIP_EXTRA_INDEX_URL
.
So I figure the only way to reliably download the correct release file (correct meaning "same one that pip would choose") is to use pip itself. Since there is no public API here, that means using the command line interface in a subprocess.
Mostly because there's so much shared code
Erm, I don't understand all the stuff about the sdist and metadata.
My patch basically is just to turn off the sdist stuff on --download-only
- line 33 in sdist.py
goes from:
should_isolate = self.req.use_pep517 and build_isolation
to
should_isolate = self.req.use_pep517 and build_isolation and no_deps
You read in the flag instead of throwing it away (and overloading all the appropriate functions) and voila, it works?
Seems pretty clean to me.
This is basically what I had said before:
I modified download.py inside the pip distribution to actually check no_deps and pass it down to sdist by adding the parameter to the RequirementPreparer ...
As for writing a client - that's a terribly short-sighted solution. Pip servers can change, retrieval mechanisms can change, certificates, security mechanisms, etc., not to mention the code duplication, needing to keep two packages up to date versus one...
Again I don't get this metadata thing and I think perhaps this is where the whole over-complication comes from (sounds like you have something crazy going on there?), but I think even in the case that metadata has some magic to it 'download-only' equates to 'turn the magic off' - there's no need to run the magic beyond resolving the package name => file package?
@uranusjr
I read some of the threads you'd posted - seems to me --download-only
or --no-deps
should just short circuit all of the metadata logic (in sdist).
Why?
Because this use-case sidesteps all that - I don't care what the filename is or the format of sdist, I just want to know what pip thinks the package is, and give me that, let me figure out all the rest. If it can't find it or is unsure, I expect a reasonable warning or error. And then it doesn't need to step on anyone's complex backend build scripts which are doing fancier things with pip and require integrity etc.
If I'm downloading the source myself, there's a good chance I really don't care at all about integrity - I'm going to build it myself and possibly patch it myself because whatever the official packaging did didn't work and may never work (on my system). So I think the concerns here are all irrelevant? And I would propose fixing sdist but for now just ignoring its broken-ness, for a download.
https://github.com/pypa/pip/compare/master...smaudet:bugfix-no-deps-download-only
This is a potential patch against 20.1.1 - probably could be ported to master/whatever but this seems really minimal, tries to assume as little as possible, and also might help fix a lot of your other bugs that are all open around the flags all doing weird things here?
Cheers.
I really don't see why build isolation should be affected by the value of --no-deps
. This seems like it should be wrong to me. Does it pass the full test suite?
@pfmoore Hmm? Seems logical to me...
download
implies no build and hence no build isolation, and no-deps
implies don't even try to get dependency packages, for which it seems would be the only reason to have build isolation (why else would anything try to build?)
Its unclear what download
should do if you do want deps, perhaps download every dep (with no build and hence no build isolation)? That wasn't my use case... but I wasn't trying to fix the entire pip porcelain.
--no-deps
isn't just for downloads. Maybe in the context of download it makes sense for --no-deps
to imply no isolation, but IMO that's a bit too much context sensitivity in how the option should be interpreted, and it would be more confusing overall to users.
I wasn't trying to fix the entire pip porcelain
Understood - but that's the problem, whoever implements this does have to consider the whole picture...
Not to belabor this, but
whoever implements this does have to consider the whole picture...
I think you are not considering that part of that whole picture, is the now where the porcelain is broken. "Don't let perfection be the enemy of good" ...
Anyways, I'm sure I don't have enough clout to convince you (plural) that you are wrong, but ... I am of the firm belief that the community is sitting around here trying to be perfect instead of being practical or pragmatic.
I didn't attempt to PR anything for a _reason_ - if someone else wants to fight this political battle good for them, for everyone else that encounters this bug, I hope my patch helps you. :)
I think you are not considering that part of that whole picture, is the now where the porcelain is broken. "Don't let perfection be the enemy of good" ...
You may be right. It's certainly true that there's no obvious reason why the specific case of pip download --no-deps
that you're pointing out couldn't just download, and that would fit better with user expectations.
I don't care enough about this use case to implement a fix myself (personally, I'd just go to PyPI and download the files directly). If someone wants to submit a PR, and is willing to discuss/respond to larger code quality and maintainability issues, then we can see where it goes.
But understand, my reservations aren't about making things perfect, they are about maintainability questions like:
Anyway, as I say, this won't go anywhere until someone is interested enough to write some code, so let's leave it there.
Most helpful comment
I think the issue is still valid for
pip download --no-deps
?