Conan: [RFC] conan dependency resolution policy

Created on 13 Jul 2020 · 14Comments · Source: conan-io/conan

Let's say I have the following packages:

subdep/1.0, subdep/1.1.
dep/1.0 that was built with subdep/1.0. Where dep has requires='subdep/[>=1.0]'.
pkg with requires='dep/[>=1.0]'.

Now, when trying to build pkg there will be the following dependency resolution:

    dep/1.0 from local cache - Cache
    subdep/1.1 from local cache - Cache
Packages
    dep/1.0:13d87f321c5de7f5c36592ea7338ce73e9adb226 - Missing
    subdep/1.1:3f6a4e73399b653b1e362ca7aca98ae112a41b0f - Cache

Which means that pkg can't be built without rebuilding dep with subdep/1.1.

In other words it means that the only available dependency resolution policy (DRP) for conan is build_the_whole_dep_chain_with_the_latest_available_versions.

There are several problems with such DRP:

Let's say we have a super-long dep-chain pkg>dep>subdep>...>(subxN)dep.
If a new (subxN)dep is published, the whole chain is invalidated, i.e. it's impossible to build pkg without rebuilding every (subxi)dep.
It's much worse if we have a build environment, where each dependency is pre-built on CI. Each dependency has a lot of tests outside of conanfile.py, i.e. it's impossible to use --build-missing, since it will skip all the CI tests. Now, a new release of the dependency was published, so the whole dep-chain is invalidated. Which means that every developer working on a package that has this transitive dependency won't be able to work on that package until the whole chain is rebuilt (even if said dependency does not introduce any conflicts).
Let's say we have a dep-chain from the example above (i.e. pkg>dep>subdep). We have dep/1.0 that was built with subdep/1.0. Next, subdep/1.1 is released, but it's actually incompatible with dep. So, pkg can't be built, because it requires subdep/1.1 (as transitive dependency), but dep can't be rebuilt, because it's incompatible with subdep/1.1.
Let's say we have a dep-chain from the example above (i.e. pkg>dep>subdep) and we want to use a pre-built fixed version of dep in pkg via requires='dep/1.0'. This will not work (without manually specifying subdep version in pkg).

TLDR: current DRP is only usable in two scenarios:

Hard-code ALL dependency versions in recipe, which is not really scalable and nearly impossible to maintain for big dependency chains.
Build all dependencies from sources and upload them when building a package, which is not possible if there is a need for external tests (e.g. on CI) and/or if there is a controlled environment where packages are isolated (i.e. it's impossible to upload subdep as a part of the dep creation process).

The only proper way to solve this is to make DRP customizable. For example:

use-prebuilt-strict: uses only those pre-built packages that were used during the build of the relevant package. E.g. if pkg depends on dep and there is only one pre-built dep/1.0 that was built with subdep/1.0, then dep resolution of pkg will result in dep/1.0 and subdep/1.0.

[x] I've read the CONTRIBUTING guide.

triaging

Source

TheQwertiest

👍2

Most helpful comment

@memsharded What do you think about the proposed minimal version range selection? It doesn't require a graph invalidation, but it has a much higher chance of having binaries ready, which matches the intent of reusing existing binaries.

Minimonium on 14 Jul 2020

👍3

All 14 comments

Attached an example, that illustrates the problem: conan_dep_bug.zip

Repro steps:

conan create ./subdep @
conan create ./dep @
//< change version in subdep/conanfile.py to 1.1
conan create ./subdep @
conan create ./pkg @
// ^ this will fail

TheQwertiest on 13 Jul 2020

I think Conan would benefit tremendously from a minimal version range selection algorithm in the conan.conf (with a potential default behaviour candidate in some late major versions). Conan resolves around binaries and the issue is spot on that it's nearly impossible to generate binaries for ever-changing dependencies in time.

There were issues asking for a complex multi-version binary presence checking in the same manner, but the benefit of this algorithm is that it's much simpler and faster. It also brings other great benefits excellently described in https://research.swtch.com/vgo-mvs

Minimonium on 13 Jul 2020

Hi @TheQwertiest

This issue seems to be describing many different things, for example:

Let's say we have a dep-chain from the example above (i.e. pkg>dep>subdep). We have dep/1.0 that was built with subdep/1.0. Next, subdep/1.1 is released, but it's actually incompatible with dep. So, pkg can't be built, because it requires subdep/1.1 (as transitive dependency), but dep can't be rebuilt, because it's incompatible with subdep/1.1.

If a subdep/1.1 is released, but it is incompatible with dep then there are only 2 options:

dep contains a version range like [~1.0], so the 1.1 is excluded.
or subdep needs to do a major version bump, to 2.0 if it is breaking.

This is a pure versioning problem, not related to binaries at all. It is purely a version definition issue, this shouldn't depend on Conan, it should be the same in any language or package manager.

There is another thing that is not fully clear either. I guess that you have changed the default_package_id_mode to something like minor_mode, patch_mode, full_version_mode, haven't you? The conan_dep_bug.zip case you provided doesn't fail unless you change the package_id mode.

If you have changed that, then, you are exactly defining that behavior: a change in the dependency subdep/1.0 to subdep/1.1 forces a re-build of the downstream consumers. If you don't want to force a re-build, then you can select some of the other less restrictive package_id modes.

What it seems you are suggesting, if I understood it correctly has already been requested and considered a few times: resolving to different versions based on the existence or non-existence of binaries for a given configuration. And we have always concluded this is not something possible or desired, and would be something extremely problematic, not to say unfeasible. This is the rationale:

Having a dependency version and failing to retrieve it, falling back to a previous versions will be hiding all kind of CI, build and pipelines errors, that results in a missing binary, producing lots of confusion, debugging and tons of support tickets, questions, etc. This is the same reason Conan raises by default when a binary is missing instead of building it from sources. A devops principle is that if subdep/1.1 is there, it should be usable, or it shouldn't be there.
There are devops known best practices to implement this "it shouldn't be there if it is not usable". The recommended approach is using multiple server repos and do "promotions", something similar to the merge concept in source. This is not the place to fully describe this, but the idea is that there is a "develop" Artifactory repo, that contains the state of what is usable by developers and CI. When a subdep/1.1 is done, it goes first to a temporary "build" Artifactory repo. The downstream consumers of subdep/1.1 are built, and only when they are built and everything is working fine, those packages can be promoted from the "build" repo to the "develop" repo.
In any case, even if we thought it was a really necessary approach, it is still unfeasible, because the computational complexity. Right now the complexity is managed because the dependency resolution is done in 2 steps: first the "recipes" graph is resolved, and then the binaries are computed. If missing binaries result in having to fetch previous versions of the conanfile, fetch it, parse it and evaluate it and reconstruct and re-evaluate all the dependency graph, maybe needing to fetch other dependencies versions, loading and evaluating the conanfiles. This compounds more complexity if using revisions and or using multiple repositories. This would result in graphs being evaluated in minutes instead of a few seconds for simple and relatively small size repos, and completely intractable (hours) in relatively large setups, which are very common among many of the Conan users.

Please let us know if this clarifies a bit the issue.

memsharded on 13 Jul 2020

This is a pure versioning problem, not related to binaries at all. It is purely a version definition issue, this shouldn't depend on Conan, it should be the same in any language or package manager.

I have to disagree: different package/dependency managers use different policies, i.e. they don't all have "use the latest version of transitive dependency even if it's not actually used anywhere" as a single and only way. And, heck, even the ancient ivy has a lot of different policies: https://ant.apache.org/ivy/history/2.5.0/settings/conflict-managers.html.

I would even say that most managers can easily implement the following basic and very common scenario without hardcoding versions in the recipe/definition files:

Let's say we have the following project:

interface/1.0, interface/2.0
interface_implementation/1.0, that was built with interface_package/1.0.
pkg that depends on hard-coded interface_implementation/1.0.

Now, a user of pkg wants to use the same interface as interface_implementation.
Such simple scenario is impossible with the current conan DRP policy (unless the version was hard-coded in interface_implementation).

There is another thing that is not fully clear either. I guess that you have changed the default_package_id_mode to something like minor_mode, patch_mode, full_version_mode, haven't you? The conan_dep_bug.zip case you provided doesn't fail unless you change the package_id mode.
If you have changed that, then, you are exactly defining that behavior: a change in the dependency subdep/1.0 to subdep/1.1 forces a re-build of the downstream consumers. If you don't want to force a re-build, then you can select some of the other less restrictive package_id modes.

I would argue that the primary purpose of default_package_id_mode is for differentiation of produced binaries. I.e. if dependency changes then a different binary is produced, hence a new package id.
Loosening the package id requirement would be just a clutch, since it will only mask (and not solve) the dependency resolution problem at the cost of losing proper binary differentiation.

A devops principle is that if subdep/1.1 is there, it should be usable, or it shouldn't be there.

Exactly. It should be usable, but it doesn't mean that it's usage should be enforced everywhere.
Current DRP is only suitable for live-at-the-HEAD approach (there is also freeze-the-world approach, where every version is hard-coded, but it's not suitable for obvious reasons). But this is not always possible and I would say that it's actually quite a common scenario to have multiple simultaneously supported versions of the same product. I.e. one might have two supported branches of the product, [email protected]_channel and [email protected]_channel, where they depend on interface_implementation/1.0 and interface_implementation/2.0 respectively. Thus it would be really weird to enforce a different (and possible incompatible) version of interface when building a new version of [email protected]_channel (which depends on the hard-coded interface_implementation/1.0).

The recommended approach is using multiple server repos and do "promotions", something similar to the merge concept in source. This is not the place to fully describe this, but the idea is that there is a "develop" Artifactory repo, that contains the state of what is usable by developers and CI. When a subdep/1.1 is done, it goes first to a temporary "build" Artifactory repo. The downstream consumers of subdep/1.1 are built, and only when they are built and everything is working fine, those packages can be promoted from the "build" repo to the "develop" repo.

Sorry, I don't understand how this is related to this issue.
If this is an answer to the it's much worse if we have a build environment, where each dependency is pre-built on CI part: this doesn't change much, since test_repo is used only during the build (and testing) of the specific package. But in the end that package still ends up in the final release repo and the whole dependency tree will be immediately invalidated anyway.

In any case, even if we thought it was a really necessary approach, it is still unfeasible, because the computational complexity.

IMO, this is an over-complicated scenario. Currently conan uses a strict resolution scheme (by enforcing the latest version available), so a new use-from-prebuilt (naming is hard, sry) policy can be made strict as well:

class Lock
{
  deps = ...
}

def resolver(lock, cur_deps=[]):
  deps = lock.deps
  for d in deps:
    resolved_dep = resolve(d)
    if cur_deps.has(resolved_dep):
      if cur_deps.conflicts(resolved_dep):
         raise HasConflictError
      else:
        continue
    if not has_binary(resolved_dep):
      raise BinaryNotFoundError
    cur_deps.add(resolved_dep)
    resolver(get_resolved_lock_from_binary(resolved_dep), cur_deps)

This has a complexity of N*complexity_of_resolve, where N is the number of defined requirements in all packages.

TheQwertiest on 14 Jul 2020

This is a wrong hypothesis:

This has a complexity of N*complexity_of_resolve, where N is the number of defined requirements in all packages.

You cannot evaluate if there is a binary while resolving the graph (in the way up while resolving requires), because the binary package_id is computed as a function of all dependencies, including down to the package revision of every transitive dependency, which means that it is needed to have resolved the existence of all the upstream dependencies binaries first. When the binary is not found and need to try a previous version, you need to destroy the whole subgraph, reconstruct it resolving again everything, and at each node, from top to bottom, repeat the has_binary evaluation. This complexity is combinatorial and makes the problem intractable in practice.

memsharded on 14 Jul 2020

You cannot evaluate if there is a binary while resolving the graph (in the way up while resolving requires), because the binary package_id is computed as a function of all dependencies, including down to the package revision of every transitive dependency, which means that it is needed to have resolved the existence of all the upstream dependencies binaries first. When the binary is not found and need to try a previous version, you need to destroy the whole subgraph, reconstruct it resolving again everything, and at each node, from top to bottom, repeat the has_binary evaluation. This complexity is combinatorial and makes the problem intractable in practice.

Sorry, I still don't understand. My code could be simplified even further down to:

class Package
{
  deps = ...
}

def resolve_deps(pkg):
  current_lock = Lock()
  for d in pkg.dep:
    resolved_dep = resolve(d)
    if not has_binary(resolved_dep):
      raise BinaryNotFoundError
    dep_lock = get_lock_from_binary(resolved_dep) # lock from binary has fixed versions
    current_lock.update(dep_lock) # throws if there is a conflict
  return (generate_package_id(current_lock), current_lock)

So, complexity is actually len(pkg.deps)*complexity_of_resolve. Each binary already has a resolved dep graph, so there is no need for fallbacks or graph regeneration.

TheQwertiest on 14 Jul 2020

This is the process of evaluating App->LibC->LibB->LibA:

App conanfile.py is loaded and evaluated. It has a dependency to LibC/[version_range], or to LibC/version and using revisions.
A match for LibC recipe conanfile.py is searched in the local cache and/or in the server. If a version that satisfy the declaration is found, it is fetched/downloaded, the conanfile.py is loaded and evaluated. It has a dependency to LibB
At this point, the has_binary(LibC) cannot be evaluated, because the graph is not completed. We need to keep computing the graph
The process is repeated, finding, fetching, loading and evaluating LibB and LibA
Once the full graph is computed, the computation of binaries starts from the top, in this case LibA.
The package_id of LibA can be computed now. Once it is computed, the local cache and the servers are searched for this binary. If it is found, the package revision prev of this binary for LibA is available and we can move downstream.
The process is repeated for LibB, it computes its package_id, looks for binary, get the prev
If we get to LibC, compute a package_id, look for a binary, and a binary is missing, the proposed approach will try to use a previous version of LibC. That means fetching a different conanfile.py. As it can have declared other dependencies versions, it is necessary to invalidate the whole upstream subgraph, remove both LibB and LibA and start over, first re-building the whole dependency graph, then re-starting the evaluation of binaries from the top to the bottom of the graph.

It is important to highlight that a previous version or revision of a package has a different conanfile.py, that can declare different dependencies, then it needs full upstream sub-graph re-evaluation.

The missing binaries can happen at any node, it is possible now that the next iteration will produce a missing binary for LibB, and then we need to restart, invalidating again LibA, and maybe we we succeed and find a version-binary combination for all LibA, LibB and LibC, then we move down to App, it is missing the binary, and again it is needed to destroy the graph, and start from scratch with a previous version or revision of App. This is clearly combinatorial, and what make it worst, typically with large factors, as the number of versions or revisions can very easily be large.

memsharded on 14 Jul 2020

Minimonium on 14 Jul 2020

👍3

@memsharded What do you think about the proposed minimal version range selection? It doesn't require a graph invalidation, but it has a much higher chance of having binaries ready, which matches the intent of reusing existing binaries.

Yes, this is a very different story, and seems more doable from the implementation side, but we would need to understand better the use case, because it seems that many users would be very surprised of this behavior, so it needs to be opt-in. And I am missing the user story, how a version-range is useful if it always resolve to the lower version? In practice it will be the same as having the pinned version to the lower bound of the version range. Please open a new issue if you would like to keep discussing this.

memsharded on 14 Jul 2020

👍2

This is the process of evaluating App->LibC->LibB->LibA

It seems you are describing the current DRP. My suggestion was about implementing a new DRP (which will co-exist with the current one). Here is a proposed DRP:

App conanfile.py is loaded and evaluated. It has a dependency to LibC/[version_range], or to LibC/version and using revisions.
A match for LibC is searched in the local cache and/or in the server. If a version that satisfy the declaration and has_binary(LibC) is found, the LibC/conan.lock is loaded and evaluated. If there is no binary, then error is raised.
LibC/conan.lock is converted to graph and merged into App's current graph. If there is a conflict during merge, a error is raised.
Package id's for dependencies LibA and LibB are retrieved from the graph that was formed from LibC/conan.lock.
Verify that LibA and LibB (with package ids from the previous step) have binaries. If there is no binary, a error is raised.
Generate package_id('App') from the resulting graph.

TheQwertiest on 14 Jul 2020

I've never said that the suggested DRP can be implemented in the current DRP: it might share some code, but it will most likely have a different and separate implementation.

TheQwertiest on 14 Jul 2020

A match for LibC is searched in the local cache and/or in the server. If a version that satisfy the declaration and has_binary(LibC) is found, the LibC/conan.lock is loaded and evaluated. If there is no binary, then error is raised

You can, and most times will, have tons of matches for the same LibC/1.0@user/stable. Not only different versions matching a range, but also many different package_ids and/or different package revisions that matches both the version ranges, and the current settings/options values. You will have a different binary for every change in the upstream versions or options:

Different binaries with different package_ids of LibC/1.0@user/stable will result from depending on different versions of LibB and LibA.
Different binaries of LibC/1.0@user/stable will also result from having different options of the upstream, if LibB and/or LibA is shared or static, the resulting LibC binary is different.

Every different binary will induce a different subgraph upstream, and there is no ordering possible based on package_id, so selection would be quite random or require the full re-evaluation of every subgraph. When conflicts happen downstream, you cannot directly error, you should go and try a different binary of LibC that will bring its own subgraph, that in turn can conflict again. The combinatorial problem is still there.

This is an intrinsic issue from the C/C++ building model: If a package binary can change based on its dependencies, and the dependencies are defined by the package itself and every different version will have a different definition, then looking for a joint compatible set of versions and binaries is basically a SAT problem, known to be a NP hard, and with the added constraint that this is not a pure mathematical model, but requires API calls to servers, downloading, loading, parsing and evaluating files from disk.

memsharded on 14 Jul 2020

TLDR:
Conan's dependency graph is generated using only (static or dynamic) version ranges and corresponding recipes. Package id generation is applied only after the graph is generated and that's why it's impossible to check (and use) the build-status of the package for graph generation.

The whole process could be illustrated like this:

Lets say we have app that depends on dep1 and dep2:
app > dep1/[VER_RANGE_1]
v
dep2/[VER_RANGE_2]

First, the graph is generated:
app > dep1/VER_1: conanfile.py
v
dep2/VER_2: conanfile.py
Then, package id generation is applied to each recipe from the graph:
app > dep1/VER_1: PACKAGE_ID_1
v
dep2/VER_2: PACKAGE_ID_2
And only then the package status is checked:
app > dep1/VER_1/PACKAGE_ID_1: missing
v
dep2/VER_2/PACKAGE_ID_2: available

I.e. the graph generation is complete on the step 2. and the graph can't be modified in the following steps.

TheQwertiest on 27 Jul 2020

looking for a joint compatible set of versions and binaries is basically a SAT problem, known to be a NP hard, and with the added constraint that this is not a pure mathematical model, but requires API calls to servers, downloading, loading, parsing and evaluating files from disk.

The complexity can be reduced if additional constraints are applied. As mentioned before, the only thing that is needed to resolve the whole graph for app are the package_id's of it's direct dependencies. Transitive dependencies can be resolved via lock files that are published with those direct dependencies.

So, taking all this into account, here is a (rough idea of) proposed algorithm:

Prerequisites:

When the package is published, additional files are uploaded with the package:
- conan.lock that was used during the build.
- packaged_id_partial.txt: generated with the same algorithm as usual package_id, but without taking dependency versions into account. I.e. two packages will have the same partial_package_id if they were built from the same sources and with same options/settings (even if their dependency versions are different).

`app` lock file generation:

Direct dependencies versions are resolved by generating conan_partial.lock for the app:
- conan_partial.lock is a lock file that used partial_package_id algorithm for package_id calculation.
Direct dependencies package_ids are resolved:
- packaged_id_partial is retrieved using the lock file from step 2..
- Repo is searched for packages that contain packaged_id_partial.txt with the retrieved value.
- If there is no such package, error is raised.
The final lock file is generated by fetching conan.lock from each resolved direct dependency and merging them in the app's conan.lock.

Such algorithm would have a pretty low complexity (at worst direct_deps_countxdep_packages_with_same_version_count) and does not require any recipe and/or graph gen logic modification (i.e. it can be implemented outside of conan via hooks and/or additional scripts). There are two features that are required for this algorithm to work though:

Ability to calculate the package_id without building the package. This is needed for generating packaged_id_partial.txt in step 1..
Ability to enforce globally a package_id generation algorithm (via env var, so that it could be applied only to one conan invocation). This is needed for step 1. (to use with feature 1.) and for step 2.

IMO, both of these features could be useful outside of this algorithm (esp. feature 2.).

TheQwertiest on 27 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings