bazel sync command should not download unused external dependencies

Created on 9 Aug 2018  路  6Comments  路  Source: bazelbuild/bazel

Description of the problem / feature request:

When running bazel sync, all of the repos in the workspace file are cloned/downloaded regarless of if they are being specified in the deps attribute of any target.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  1. Create a new folder

  2. Create a WORKSPACE file with the following content:

load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

git_repository(
  name = "com_google_protobuf",
  remote = "https://github.com/google/protobuf",
  branch = "master",
)
  1. Create a BUILD file with the following content:
package(default_visibility = ["//visibility:public"])
  1. Run bazel sync

  2. Run bazel info to get the value of the directory

  3. Look at the external directory in the folder (ls <output_base>/external)

  4. See the folder com_google_protobuf that should not have been there since there is no target that depends on it

What operating system are you running Bazel on?

macOS High Sierra 10.13.5

What's the output of bazel info release?

development version

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

Built bazel by running bazel build //src:bazel on a cloned repo at commit: 2c9c05b3960914b9120566bf680f2280c1857f82

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

https://github.com/bazelbuild/bazel.git
2c9c05b3960914b9120566bf680f2280c1857f82
2c9c05b3960914b9120566bf680f2280c1857f82

Any other information, logs, or outputs that you want to share?

  • When running bazel build //... the com_google_protobuf folder is not created.
  • When running bazel sync on a WORKSPACE file that is using the jvm_maven_import_external rule without a target depending on it, the jar is downloaded to the external folder similar to the behavior of the git_repository rule.
P4 area-ExternalDeps team-XProduct feature request

Most helpful comment

Another nice improvement would be if one could do bazel sync @repo-name - often I just want to update one or two external dependnecies unconditionally and having to wait potentially many many minutes for that if a lot of deps have to be redownloaded is quite expensive.

All 6 comments

cc @aehlig

This is a feature request (the current specification says that all repositories are updated unconditionally, no questions asked), and also low priority as

  • a normal WORKSPACE file will only contain repositories somewhat relevant to the project, and
  • the main intended use case for sync is to update a resolved file that is committed and used by the majority of developers of the project.

Moreover, there could also be legitimate use cases for WORKSPACE-only repositories and syncs in them: those provide a consistent snapshot of a group of related projects. Such a snapshot can be used by developers of, and CI systems for, either of those projects as baseline for the state of the rest of the world.

Moreover, note that this feature is not as simple as scanning all BUILD files in the project. External repositories might (and often do) depend on other external repositories. These dependencies can only be discovered after the respective repository has been downloaded. (This also implies that only fetching what is reachable from //... comes at the cost of handling downloads sequentially that otherwise would be done in parallel.)

Without this feature, it makes bazel sync incompatible with a project like gmaven_rules that pre-computes a graph of external repositories in bzl file. The bzl file has thousands of targets like the following example, which provides almost instantaneous resolution of transitive dependencies.

aar_import_external(
      name = 'com_android_support_slices_view_28_0_0_alpha1',
      licenses = ['notice'], # apache
      aar_urls = ['https://dl.google.com/dl/android/maven2/com/android/support/slices-view/28.0.0-alpha1/slices-view-28.0.0-alpha1.aar'],
      aar_sha256 = '',
      deps = [
        '@com_android_support_slices_builders_28_0_0_alpha1//aar',
        '@com_android_support_slices_core_28_0_0_alpha1//aar',
        '@android_arch_lifecycle_extensions_1_1_0//aar',
        '@com_android_support_recyclerview_v7_28_0_0_alpha1//aar',
      ],
 )

We intend to provide the resolved file from the server side with a different mechanism.
Instead of using bazel sync to create the resolved file and commit it to the git repository, we create that file in a different and more efficient way and fetch it by custom bazel scripts to the dev machine side.
On the server side, we subscribe to push events from all 2nd party repositories and keep track of the latest commits (and shallow_since dates).

The bazel sync command, on the other hand, re-clones all external repositories, even if there is no change to them from the last time.
It also clones external repositories that have no target that depends on them (this issue).

Using a resolved file generated by an external process allows us to avoid cloning thus having results that are closer to real time.
It also enables us to keep the list of external repositories fixed for all internal repositories with bazel build command only executing the external repository rules that appear in some target's dependencies.
Another benefit is that the git log history is not cluttered by commits that only change the resolved file.

One of the drawbacks to our solution is that the resolved file internal format may change and will have to follow it.
It also adds another single point of failure to the system (the server-side), instead of just having the dev machine interacting with GitHub (where we manage our repositories).

We are waiting for the solution to workspace resolving to be more optimized before we integrate it.
We hope that bazel sync will go in that direction. Cloning shallow branches, for example, is a great move towards a lean bazel sync command.

Another nice improvement would be if one could do bazel sync @repo-name - often I just want to update one or two external dependnecies unconditionally and having to wait potentially many many minutes for that if a lot of deps have to be redownloaded is quite expensive.

Another nice improvement would be if one could do bazel sync @repo-name - often I just want to update one or two external dependnecies unconditionally and having to wait potentially many many minutes for that if a lot of deps have to be redownloaded is quite expensive.

I'm just checking if anything like this was added. If I there is an external dependency configured as a git_repository and that git repo branch being pointed to had updates that we would like to pull in, could one make bazel just fetch those? Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dslomov picture dslomov  路  61Comments

lacartelacarte picture lacartelacarte  路  87Comments

damienmg picture damienmg  路  67Comments

laurentlb picture laurentlb  路  111Comments

laurentlb picture laurentlb  路  101Comments