Here the issue to the discussion on python-distutils: http://code.activestate.com/lists/python-distutils-sig/25409/
To get a dependency resolver for python, there needs to be a way to get the dependencies of a package. To avoid useless network traffic the dependencies of a packages ("install_requires" in setup.py) need to be accessible via an API.
This is going to be possible once PEP426 is in place.
Off topic: How does PEP426 get developed? It is soon three years old. What can I do to get it implemented?
1) Finish the draft, here are some issues: https://bitbucket.org/pypa/pypi-metadata-formats/issues?status=new&status=open&component=Metadata%202.x (note that issues are in bitbucket but the code is now in github)
2) Implement PEP 426 for pip
It looks like https://github.com/python/peps/blob/master/pep-0426.txt is the current PEP 426 draft, right?
@domenkozar, I don't see anything that looks like a current issue at that bitbucket link. Have they been addressed?
Any other current issues?
Thanks :)
Is there anything currently available and up-to-date that's better than downloading all the metadata as json via the pypi-data app, and doing
jq '{name: .info.name, requires: .info.requires_dist}' */* > requires.json
I note some issues with pypi-data at https://github.com/nathforge/pypi-data/issues/2
Is there any update on this? Wondering if it is now possible to get the list of a package's dependencies without a full download of the package.
It is not in the general case, because of limitations in the packaging formats.
From quickly analyzing the package metadata using the JSON API, it looks like out of ~120k packages in the PyPi index, only ~17k have a non null info->requires_dist field. While some packages don't indeed have any dependencies, I imagine most do. Which means that currently this field cannot be relied upon for dependency resolution.
I saw that PEP 426 has a deferred status and was wondering if there were some open issues that aimed to improve somewhat the situation with the requires_dist in the current system, without necessarily doing an in-depth redesign of the metatadata API discussed in PEP 426? Thanks.
Thanks for bringing up and discussing this issue, and sorry for the slow response! (Context, for those who don't already know it: Warehouse needs to get to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable. Towards that end, the folks working on Warehouse have gotten limited funding to concentrate on improving and deploying it, and have kicked off work towards our development roadmap. Along the way we've been able to reply to some of the older issues in the repo as well.)
Since this feature isn't something that the legacy site has, and we're prioritizing replacing the old site, I've moved it to a future milestone.
@ncoghlan am I right that @rth should be looking at pypa/packaging-problems#102 and pypa/packaging-problems#54?
Thanks and sorry again for the wait.
Thanks for the detailed response @brainwane and for linking to those issues!
I know that there are higher priority issues with the migration to Warehouse (and thank you for working on that!), I just commented for future reference while experimenting with the PyPi JSON API...
Glad to help, @rth.
For reference: PEP 426 has been withdrawn.
As you're experimenting with the JSON API, check out the other API/feeds issues in case any of them have questions you can answer! And if you have questions, please feel free to ask them here, on #pypa-dev on Freenode, or on the pypa-dev mailing list.
Note that while PEP 426 (metadata 2.0) has been withdrawn, PEP 566 (metadata 2.1) has been accepted, and that includes a canonical conversion from the baseline key:value representation to a JSON compatible representation: https://www.python.org/dev/peps/pep-0566/#json-compatible-metadata
This means that at least for projects that upload wheel files, it will be feasible for Warehouse to extract and publish the corresponding dependency metadata in a standards-backed way (since the conversion rules can also be applied to metadata 1.2).
PEP 566 (metadata 2.1) has been accepted,
This means that at least for projects that upload wheel files, it will be feasible for Warehouse to extract and publish the corresponding dependency metadata in a standards-backed way
That's really good news. Thank you for the explanations!
What determines the value of "requires_dist" given in the json api response? When I look at one of my uploads it's there, e.g. https://pypi.org/pypi/oyaml/0.2/json which correctly says requires_dist=[pyyaml]. But then on boto3 https://pypi.org/pypi/boto3/1.6.3/json it's got requires_dist=[] and that's not right, it should have botocore, jmespath, s3transfer..
Both of these projects are specifying the metadata the same way, by passing install_requires in the setup.py:setup kwargs. I heard somewhere that it's related to whether you upload a wheel or an sdist first, but this explanation doesn't make much sense to me..?
Metadata extraction currently only happens for the first uploaded artifact, and unlike wheel archives, sdists aren't required to contain metadata in a format that an index server knows how to read.
Allowing subsequent wheel uploads to supplement the metadata extracted from an sdist would be a nice Warehouse enhancement (but is separate from this issue).
After checking with @dstufft in relation to https://github.com/pypa/python-packaging-user-guide/issues/450, it seems recent versions of twine and setuptools should be uploading full project metadata regardless of the nature of the first uploaded artifact (sdist or wheel).
So the most likely cause of incomplete metadata now is the use of older upload clients (and older releases will be missing this data as well, since it needs to be generated client side and then delivered to PyPI as part of the release publication process).
@ncoghlan It would be a different story if the "requires_dist" key was not returned at all, which would be PyPI saying "I don't have this information". But my issue is that it's actually returning incorrect data, i.e. the "requires_dist" key is there, and it has a value (empty array):
$ curl -s "https://pypi.org/pypi/boto3/1.6.3/json" | jq ".info.requires_dist"
[]
Users don't seem to have a way to tell the difference between a package which genuinely has no 3rd-party requirements, and one with incorrectly parsed requirements, apart from downloading the distribution.
I think perhaps you should backfill these on all existing distributions, or at least all existing distributions which have a bdist present in index, so it no longer returns incorrect data. That should be easy enough to script and run as a once-off. Thoughts?
@wimglenn I think that's 3 separate questions:
pypi.python.org upload API has legacy in the name: it needs to be replaced with something more robust, but doing so becomes yet another compatibility migration to manage for index server implementations and upload clients).wheel or egg files. This seems plausible, since metadata can be extracted from those without running arbitrary Python code (and explicit rules could be defined for handling the cases where different binary artifacts include different metadata files). The constraint is then a combination of developer time, compute resources, and privileged backend database access, so it seems unlikely that will happen without specific funding from a user organisation or redistributor that wants to see it happen (or a successful grant application from the PSF Packaging Working Group).[1] For example, see https://github.com/pypa/twine/blob/master/twine/repository.py#L122 for upload, https://github.com/pypa/twine/blob/master/twine/package.py#L83 for extraction
setup.py is just Python, peolpe can and do, do a lot of stuff in there.For us, the main reason to check dependencies is to get this for recently published packages. So would hope these are easy to check. Being able to check for even some large percentage of packages on PyPI would be useful even if sometimes we get an error returned instead of the dependency list.
thinking in sets.
I'm trying to obtain dependencies for packages I control, and I would like for requires_dist to come from pypi. I do not want to publish wheels, though, for convoluted reasons. From the discussion here, it seems to me that it should still work.
I am using setuptools 39.0.1, with dependencies in setup.py in install_requires, setup_requires, and extras_require. Packages are published to PyPI using twine. Published sdists, however, do not have requires_dist set in the json response.
Inspecting the files that are being published, I see :
./src/<package>.egg-info relative to setup.py. I can switch back to a non-src layout it it helps.I understand that dependencies are dynamic and can be complicated. However, the sdist that I am building does have a well defined dependency list in its .egg-info. Could that not be used by pypi / warehouse, with the understanding that this metadata provided by PyPI / warehouse is an approximation only, and the dependencies might be environment dependent in general.
If there is any other way in which I can get requires_dist to show up on the json api at pypi.org/pypi, I'd like to know so that I can get this information in there. I suspect manually modifying PKG-INFO as part of the build process and injecting the contents of .egg-info/requires.txt might just do the trick, but I would like to get setuptools to do it instead, if possible. Is there some existing means by which I can make it so?
Edits :
Looking at other packages I have, I can confirm that :
So it would seem the
The pip maintainers would like this because it would really help with the resolver improvements and automated testing improvements they're making over the next few months.
PyPI's JSON API does not come from a PEP, so we're either stuck trying to add this to the existing simple API, standardizing PyPI's JSON API, or defining an entirely new replacement to the simple API. Personally I'd lean towards the last option there, but if we're going to do that, then we probably want to spend some time figuring out exactly what problems with the simple API we're trying to solve are.
@pradyunsg @uranusjr @pfmoore As we work to roll out and test the new resolver https://github.com/pypa/pip/issues/6536, or think about future versions, how much would you benefit from even a prototype or minimal version of this feature?
It would help, but it would need to be an extension to the standard for the simple API to be of significant benefit. We definitely don't want to end up special-casing PyPI/Warehouse in pip's code, and while we could add a test for whether an index supports a new API, I'd want that to be standardised (at least provisionally) or we're going to hit all sorts of backward compatibility and maintenance issues down the line.
Also, this would only be of minimal benefit unless it exposed metadata for sdists, which is a hard problem. If it only handled wheels, the only benefit would be reduced download volumes for wheels that ended up not being used. And pip's caches probably make that cost relatively small.
Personally, unless it was a standardised feature that provided sdist metadata1 I feel that the benefits would be marginal enough that I'd expect us to defer any work to use it until after the release of the new resolver, as "post-release performance tuning" work.
1 Or better still, metadata based on project and version alone, but that's not realistically achievable in the timescales we're looking at.
pypa/warehouse#8254 is a proposal, that would address this as well.
Assuming PEP 643 gets approved, we will have reliable metadata available for wheels and (increasing numbers of) sdists. Extracting that metadata and exposing it via PyPI becomes an even more attractive prospect at that point.
Pip could likely work with either the JSON API or an extension to the simple API, but either one would need standardising first.
Most helpful comment
Assuming PEP 643 gets approved, we will have reliable metadata available for wheels and (increasing numbers of) sdists. Extracting that metadata and exposing it via PyPI becomes an even more attractive prospect at that point.
Pip could likely work with either the JSON API or an extension to the simple API, but either one would need standardising first.