Pip: New Resolver: Avoiding network access if the installed candidate is good enough

Created on 11 Apr 2020 · 9Comments · Source: pypa/pip

Context: https://github.com/pypa/pip/pull/8014/files#r407079100

I think it would be best to implement something like:

@dataclasses.dataclass()
class CandidateSequence(collections.abc.Sequence):
    before: Sequence[Candidate]
    found_iter: Iterator[Candidate]
    after: Sequence[Candidate]

    # Implement the sequence protocol and __reverse__()
    # https://docs.python.org/3.8/reference/datamodel.html#object.__reversed__


class Factory:
    def find_matches(self):
        def found_iter():
            found = self._factory.finder.find_best_candidate(...)
            for ican in found.iter_applicable():
                yield self._make_candidate_from_link(...)

    dist = self._installed_dists.get(name)
    if dist is not None:
        after = [dist]
    else:
        after = []
    return CandidateSequence(before=[], found_iter=found_iter(), after=after)

We would be able to skip network I/O if the installed version is good enough.

new resolver triage

Source

uranusjr

All 9 comments

Hmm... Wouldn't the main change in resolvelib be dropping a reversed() call here? I understand that flipping the order would require changing tests in resolvelib, as well as changes in the implementation on pip's side, but it would result in less code to maintain overall IIUC. I'm not sure if I'm missing context or something else here.

What's the reasoning behind introducing this complexity into pip?

pradyunsg on 11 Apr 2020

Maybe I was misunderstanding want you meant; I thought you were looking to eliminate the network call to fetch the candidate list from the index?

Flipping the list order wouldn’t work, since the provider does not know whether it needs to build that list before returning it to the resolver. The provider still needs to append those remote candidates after the installed one, even if they never got accessed by the resolver, because otherwise the resolver would straight up fail when it needs to backtrack but doesn’t have those candidates to work with.

From what I can see, the only way to avoid that network call, is to delay the decision whether to network until the resolver actually tries to access the list. (And that alone might not be enough in at lot of the situations either.)

uranusjr on 11 Apr 2020

Hmm... I hear you. I was thinking the provider could return a generator that resolvelib uses (w/ highest preference first; i.e. change the find_matches API).

I think this needs a higher bandwidth discussion, since we'd also need to look into changing our implementation of get_preference if we're going to do this.

pradyunsg on 11 Apr 2020

Yeah, changing it to return an iterator makes sense to me. This can likely be applied to other parts of the ResolveLib to minimise prepare calls (e.g. in criteria it can use generators instead of eagerly filtering on the candidate list).

uranusjr on 11 Apr 2020

👍1

Agreed, a highest-preference-first iterator would be an easier thing for providers to implement. If that's a possible API change for resolvelib, I'd be in favour of doing it there.

pfmoore on 11 Apr 2020

👍1

I’m anticipating the most difficulty would be occur when a parent package backtracks. Say both A 1.0 and 2.0 depend on B. If the resolver visited 2.0 but backtracked, it needs to re-visit the available B candidates when trying A 1.0. Either the resolver needs to call find_matches() for the same package again, or it needs to implement some sort of “reusable iterator” structure like the code snippet in the issue description.

uranusjr on 11 Apr 2020

👀1 👍1

@uranusjr Would the following API change be possible for resolvelib -- if find_matches returns a generator, use it as-is, otherwise reverse the order?

pradyunsg on 24 Sep 2020

No, the resolver much take something that can be iterated through multiple times, for backtracking.

uranusjr on 24 Sep 2020

Also, I checked earlier, the current provider implementation requires significant changes to be able to return a lazily evaluated iterator.

uranusjr on 24 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

'extra' not working in pip 8.1.1

therefromhere · 3Comments

Traceback running setup.py clean: ModuleNotFoundError: No module named 'setuptools'

cjerdonek · 3Comments

Don't deprecate dependency_links or let pip use a requirements.txt file when installing from git

wrouesnel · 3Comments

Direct URL PEP 508 support and installing sub-dependencies from Git

dmfigol · 3Comments

New resolver cannot installs distributions that only have pre releases

sbidoul · 3Comments