Bazel should be able to take a list of "dirty" files as input, and prune the target list based on it.
In CI contexts bazel often ends up doing a lot of unnecessary/redundant work. For example, in CI a common pattern is to run
bazel test //...
on a clean worker. This requires computing the full build graph, downloading all remote repositories, getting cache hits (and currently downloading all outputs) for every action in the graph, etc. As a result, it can be unnecessarily slow for mostly-cached builds, since most of this work is unrelated to the handful of files that have actually been changed or the targets that transitively depend on them. Especially in monorepo contexts, where a PR may only affect one corner of the codebase.
What a lot of users end up doing is hacking together something of their own using bazel query - something logically equivalent to
bazel test `bazel query rdeps(//..., attr('srcs', <dirty files>))`
...except hopefully with proper handling of non-src dependencies (e.g. 'data', etc), BUILD/bzl files, WORKSPACE files, _removed_ files, etc.
Unsurprisingly, this is hard to get right, and awkward to have to insert into a build flow. So it's a power feature only for users sufficiently motivated to improve build performance. But bazel knows how to do this internally - every incremental build is doing effectively this computation based on the files in the workspace that were touched or added or removed.
The request, then, is to build a proper feature around this - support something like:
git diff --name-only base...pr > dirty_files.txt
bazel test //... --prune_file=dirty_files.txt
so that CI systems can easily and scalably do pruned test runs, and without jeopardizing correctness by implementing this themselves.
_(Recommended priority: low. It _can_ be worked around via query; I only have a few examples of users doing this themselves so far. Filing this as a place to collect +1's, but suggest waiting for evidence of interested users before taking it on.)_
We quite quickly had to switch away from the pure remote cache approach to something what you are suggesting here and our monorepo is far from the biggest one. And we have also run into the challenges that you mention, like handling changes of external repositories and BUILD files correctly. I think it would help a lot of people. Some related discussions: https://github.com/bazelbuild/bazel/issues/7962 as well as https://groups.google.com/forum/m/#!msg/bazel-discuss/I9udqWIcEdI/iczVgWLOBQAJ
Edit: Also for reference, we based our ci script on https://github.com/bazelbuild/bazel/blob/master/scripts/ci/ci.sh but it has been heavily modified since.
Oh yeah in our ci we use the query result twice, once to know what to build/test and once to know what to deploy/run. So we take the result from the initial query and then filter for rules with the deployable tag. So it would be nice if the --prune-file argument can be used for query or if the list of targets can somehow else be retrieved after testing. Maybe with aquery and the --skyframe_state flag?
Most helpful comment
We quite quickly had to switch away from the pure remote cache approach to something what you are suggesting here and our monorepo is far from the biggest one. And we have also run into the challenges that you mention, like handling changes of external repositories and BUILD files correctly. I think it would help a lot of people. Some related discussions: https://github.com/bazelbuild/bazel/issues/7962 as well as https://groups.google.com/forum/m/#!msg/bazel-discuss/I9udqWIcEdI/iczVgWLOBQAJ
Edit: Also for reference, we based our ci script on https://github.com/bazelbuild/bazel/blob/master/scripts/ci/ci.sh but it has been heavily modified since.