In order to work out the dependencies that it needs to download, pip already knows how to interrogate wheel METADATA files and setup.py egg_info
invocations. Once PEP 517 reaches an acceptable state, pip will presumably also gain support for extracting this metadata from pyproject.toml based sdists as well.
Currently, there's no straightforward way for a user to request the PEP 345 dependency metadata for a project of interest that takes advantage of features like pip's cache of downloaded sdists and built wheel files.
Independently of any future enhancements to PyPI to make this kind of information available over HTTPS, it could be useful to offer something like a client side pip metadata
command that extracts the METADATA info and writes it to stdout
.
Would extending pip show
to be enough for this?
Off topic: Having this metadata stored statically in the sdist would be covered in an sdist 2.0 PEP or as a part of PEP 517?
@pradyunsg Yes, I think pip show --metadata <package>
would be a decent way of spelling it.
As far as static extraction goes, this info is usually already there as PKG-INFO
, but it isn't 100% reliable, since the key requirement for making an installable sdist is to provide a setup.py
command that does the right thing.
@dstufft Does Warehouse currently extract file listings from uploaded archives? I'm thinking there are some interesting questions we could ask & answer around the actual formats used to publish sdists based on that data (most notably, setup.py
vs setup.cfg
vs PKG-INFO
, and various combinations thereof).
If Warehouse doesn't have it, I'll see what kind of access I can get to the openshift.io data set (as I know we're extracting full archive manifests, but I'm not sure whether or not it's currently possible for me to run arbitrary queries over that data)
Yes, I think pip show --metadata
would be a decent way of spelling it.
Awesome!
2 questions:
<package>
would be a proper PEP 440 Version Specifier and the distribution whose metadata is shown would be the latest among what would be selected? (yes?)pip show
is currently limited to querying installed packages, which means I'm now less certain as to the suitability of using it for this purpose and would defer to folks like @dstufft and @pfmoore on that front.
It may more make sense to offer a new pip pkginfo
subcommand that uses the selection criterion you describe, downloads (or builds) and caches the wheel file, then emits the parsed metadata.
As far as formatting goes, I agree it would make sense to emit a JSON-ified version of METADATA, rather than the Key:Value format used by PEP 345 and pip show
.
I think we should leave pip show
as relating to installed packages only. For querying an index, we need a whole extra batch of options (--index-url
, --find-links
, etc) and they don't make sense on pip show
. I'd suggest pip query
as the name. We could then have a relatively consistent set of subcommands:
pip show
Show info for installed pacakgespip query
Get info for any package, installed or notpip list
Get summary info for all installed packagespip search
Search the package indexWe should strive for a somewhat consistent interface, so (for example) I'd recommend following pip list
and using --format=json
for getting JSON data from pip query
and a default format that's human readable (probably key-value like pip show
uses). The human-readable form would be --format=default
. This same formatting could later be extended cleanly to pip show
if there were any interest.
I don't think any pip command should return JSON format by default - the default output should be for humans. But having an option to emit machine readable data (i.e. JSON) is a great idea.
I was originally against this idea when I first started reading this as a "grab the dependency information" command seemed too niche to really promote to a top level command. However, I think the idea of a pip query
command that is similar to pip show
, but operates on a repository makes a lot of sense, and the dependency information can be just one of the pieces of information that it shows.
I also agree entirely that pip's interface should default to a human readable one, and it should use a --format=json
option to return JSONified data. That is probably something we should try to extend to as many of our commands as possible TBH.
And to answer @ncoghlan's question: No, there is no file content extraction happening in Warehouse.
Aye, while dependency extraction was the main use case I had in mind (hence the issue title), what I really meant was extraction of all the PEP 345 metadata in a way that's implicitly compatible with pip's local artifact caching, such that folks wanting to do their own automated analysis of PyPI components can more readily do things like:
If/when Warehouse does gain this metadata extraction capability for uploaded sdists, running the command as a strictly time limited client operation in a sandboxed environment with no network access also seems like it would be the safest way of actually doing it.
This seems related to issue #484.
Most helpful comment
I think we should leave
pip show
as relating to installed packages only. For querying an index, we need a whole extra batch of options (--index-url
,--find-links
, etc) and they don't make sense onpip show
. I'd suggestpip query
as the name. We could then have a relatively consistent set of subcommands:pip show
Show info for installed pacakgespip query
Get info for any package, installed or notpip list
Get summary info for all installed packagespip search
Search the package indexWe should strive for a somewhat consistent interface, so (for example) I'd recommend following
pip list
and using--format=json
for getting JSON data frompip query
and a default format that's human readable (probably key-value likepip show
uses). The human-readable form would be--format=default
. This same formatting could later be extended cleanly topip show
if there were any interest.I don't think any pip command should return JSON format by default - the default output should be for humans. But having an option to emit machine readable data (i.e. JSON) is a great idea.