Context: https://github.com/pypa/pip/pull/8014/files#r407079100
I think it would be best to implement something like:
@dataclasses.dataclass()
class CandidateSequence(collections.abc.Sequence):
before: Sequence[Candidate]
found_iter: Iterator[Candidate]
after: Sequence[Candidate]
# Implement the sequence protocol and __reverse__()
# https://docs.python.org/3.8/reference/datamodel.html#object.__reversed__
class Factory:
def find_matches(self):
def found_iter():
found = self._factory.finder.find_best_candidate(...)
for ican in found.iter_applicable():
yield self._make_candidate_from_link(...)
dist = self._installed_dists.get(name)
if dist is not None:
after = [dist]
else:
after = []
return CandidateSequence(before=[], found_iter=found_iter(), after=after)
We would be able to skip network I/O if the installed version is good enough.
Hmm... Wouldn't the main change in resolvelib be dropping a reversed()
call here? I understand that flipping the order would require changing tests in resolvelib, as well as changes in the implementation on pip's side, but it would result in less code to maintain overall IIUC. I'm not sure if I'm missing context or something else here.
What's the reasoning behind introducing this complexity into pip?
Maybe I was misunderstanding want you meant; I thought you were looking to eliminate the network call to fetch the candidate list from the index?
Flipping the list order wouldn’t work, since the provider does not know whether it needs to build that list before returning it to the resolver. The provider still needs to append those remote candidates after the installed one, even if they never got accessed by the resolver, because otherwise the resolver would straight up fail when it needs to backtrack but doesn’t have those candidates to work with.
From what I can see, the only way to avoid that network call, is to delay the decision whether to network until the resolver actually tries to access the list. (And that alone might not be enough in at lot of the situations either.)
Hmm... I hear you. I was thinking the provider could return a generator that resolvelib uses (w/ highest preference first; i.e. change the find_matches API).
I think this needs a higher bandwidth discussion, since we'd also need to look into changing our implementation of get_preference if we're going to do this.
Yeah, changing it to return an iterator makes sense to me. This can likely be applied to other parts of the ResolveLib to minimise prepare calls (e.g. in criteria it can use generators instead of eagerly filtering on the candidate list).
Agreed, a highest-preference-first iterator would be an easier thing for providers to implement. If that's a possible API change for resolvelib, I'd be in favour of doing it there.
I’m anticipating the most difficulty would be occur when a parent package backtracks. Say both A 1.0 and 2.0 depend on B. If the resolver visited 2.0 but backtracked, it needs to re-visit the available B candidates when trying A 1.0. Either the resolver needs to call find_matches()
for the same package again, or it needs to implement some sort of “reusable iterator” structure like the code snippet in the issue description.
@uranusjr Would the following API change be possible for resolvelib -- if find_matches returns a generator, use it as-is, otherwise reverse the order?
No, the resolver much take something that can be iterated through multiple times, for backtracking.
Also, I checked earlier, the current provider implementation requires significant changes to be able to return a lazily evaluated iterator.