Pip: Move _get_abstract_dist_for out of the resolver

Created on 19 Feb 2020 · 6Comments · Source: pypa/pip

What's the problem this feature will solve?
In order to query things like metadata for a requirement, you need to prepare that requirement (and ultimately, get an "abstract distribution" for the requirement. The function that does this is currently a method of the resolver, meaning that any new resolver won't have access to it.

Rather than reimplement the same (or similar) functionality, move the function out of the resolver.

Describe the solution you'd like
Have a function in pip's internal API that returns an abstract distribution object for a requirement. Basically, Resolver._get_abstract_dist_for, but either as a standalone function or a method on theInstallRequirement (it's extremely unclear to me why this isn't a method on the requirement already...)

The new resolver will need to call this API to get dependencies for a requirement.

Alternative Solutions
We could reimplement the necessary state changes independently for the new resolver. This would duplicate work, though, as well as introducing the risk of discrepancies.

The "project" model is a more ambitious refactoring that moves all of this functionaility into a stateful "project" object. This feature is a smaller, less ambitious step to achieve just what is needed right now.

Additional context
This method is part of the process whereby an InstallRequirement moves through stages from just being an abstract description of what is needed to a concrete, installable object. Those state change methods should be easily identifiable and located close to each other, for ease of reference and understanding, but historically they were scattered across the codebase with requirement objects being progressed as dictated by the needs of the control flow.

Creating an abstract dist only makes sense for an InstallRequirement where we have identified the location of the code (a URL, or a local directory/file). There is an earlier state change, from "specifier" to actual object, that is much more tightly integrated with the current resolver (populate_link). This is currently all part of the one _make_abstract_dist_for method, and will need to be separated out.

It is likely that this change will need to be implemented as a series of smaller refactorings.

dependency resolution refactor

Source

pfmoore

All 6 comments

Would it make sense to memoize the requirement's "abstract distribution" as well?

swchoi727 on 21 Feb 2020

The result of _get_abstract_dist_for() changes when the InstallRequirement mutates (sometimes), so… no, I guess? The term does not make sense to me here, with the input parameter being mutable.

uranusjr on 21 Feb 2020

👍1

Having said that, I do think there's value in trying to reduce the number of calls we make to get metadata. In theory the metadata for a given wheel should never change, so memoising/caching it is plausible, even if the InstallRequirement itself is mutable. In practice, there's lots of environment dependencies (because build steps can basically do anything they want - "if it's Tuesday, we depend on setuptools") so getting such caching right is hard. (And yes, I know that caching is a slightly different problem than per-process memoisation).

It's something I'd like to look at longer-term, but I think we should get the code right before trying to optimise.

pfmoore on 21 Feb 2020

Honestly, I think it's OK to break users that do fancy things like "depend on X if \" where the state of environment could change as the package is being installed.

Basically, if we ask a package what it depends on and we get different answers on different tries, I think it's reasonable to blame the package for doing too-smart things.

pradyunsg on 21 Feb 2020

👍1

I think it's reasonable to blame the package for doing too-smart things.

I agree, and I actually think that it might be good to deliberately make that assumption (for example, by adding some form of cache for dependency information - whether just in-process or persistent).

As a pip-level decision, we can frame this as "pip doesn't support packages that dynamically generate dependency information that changes based on the environment" and point to our caching implementation for the precise rules.

But I think that for the long-term benefit of the ecosystem, it would be a good idea to place some restrictions on how dynamic project metadata is allowed to be in the standards. So the metadata standard could gain a statement that says something along the lines of "package metadata MUST only depend on [some set of relatively stable conditions]". Working out what a good set of conditions are would be a lot easier if we had evidence that pip assumed a particular set of rules and the world hadn't fallen apart 🙂

pfmoore on 24 Feb 2020

Strategy-wise: Another alternative is to copy useful functions out as needed. We've done a lot to break up situations where we had previously "de-duplicated" some code, because it is so much harder to reason about when there are multiple callers making different assumptions. If there is some common structure there that's useful then it would be much easier to see after making a copy of the function work for the new use case in isolation.