Bazel: Tests with "exclusive" tag do not get results from remote cache

Created on 22 Sep 2017  路  21Comments  路  Source: bazelbuild/bazel

Description of the problem / feature request / question:

I am trying out the remote cache functionality described here. Currently I'm running against a local hazelcast instance. This works great most of the time. One thing I noticed is that tests with the exclusive tag, despite being cached as normal in the local cache, will not seem to use the remote cache and will always rerun if there is no passing result in the local cache. This seems like an oversight?

If possible, provide a minimal example to reproduce the problem:

BUILD

cc_test(
  name = "test",
  srcs = ["test.cc"],
  tags = ["exclusive"], # comment this out to fix the problem
)

test.cc

#include <chrono>
#include <thread>

int main() {
  std::this_thread::sleep_for(std::chrono::seconds(10));
  return 0;
}

.bazelrc

startup --host_jvm_args=-Dbazel.DigestFunction=SHA1

build --spawn_strategy=remote
build --experimental_strict_action_env
build --remote_rest_cache=http://localhost:5701/hazelcast/rest/maps/cache

test --spawn_strategy=remote
test --experimental_strict_action_env
test --remote_rest_cache=http://localhost:5701/hazelcast/rest/maps/cache

Environment info

  • Operating System:
    Ubuntu

  • Bazel version (output of bazel info release):
    release 0.5.3

  • Hazelcast version
    3.8.6

Anything else, information or logs or outputs that would be helpful?

Here's an example run. The second test will pass in 0.0s if either the exclusive tag is removed, or the second bazel clean command is omitted.

$ bazel clean && bazel test //... && bazel clean && bazel test //...                                         
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Analysed target //:test (9 packages loaded).
INFO: Found 1 test target...
Target //:test up-to-date:
  bazel-bin/test
INFO: Elapsed time: 10.405s, Critical Path: 10.20s
INFO: Build completed successfully, 7 total actions
//:test                                                                  PASSED in 10.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Analysed target //:test (9 packages loaded).
INFO: Found 1 test target...
Target //:test up-to-date:
  bazel-bin/test
INFO: Elapsed time: 10.209s, Critical Path: 10.02s
INFO: Build completed successfully, 7 total actions
//:test                                                                  PASSED in 10.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
P2 team-Remote-Exec bug

Most helpful comment

Is there any timelines to get it fixed? If not how about introducing the flag to enable #8983 conditionally, in this case many teams could benefit from exclusive tests caching as well as keep internal tests green. Thoughts?

All 21 comments

/cc @philwo because I remember there was some discussion because it also drop out of sandbox when we do that.

@buchgr

Reassigning to @buchgr, because remote execution.

@buchgr Is there a timeline for fixing this?

@RNabel yes. should be in 0.29.0

@buchgr it looks like your fix https://github.com/bazelbuild/bazel/pull/8983 got rolled back and as far as I can tell, not included in 0.29

yeah it did get rolled back :(. I broke tons of code internally. I ll try to rollforward for the 1.0 release.

@buchgr any updates on this ticket?

Yes, unfortunately it got rolled back because it broke many internal tests. I'll need to fix those before I can roll it forward. I don't have the cycles to do this right now.

This patch is extremely beneficial, any timeline towards getting this merged?

I unfortunately no longer work on Bazel and won't have the time to fix all the internal tests broken by this change.

Is there anyone that would be able to pick this up or is it possible for someone external to pick this work up? This would be a really helpful, we currently have to work around this by calling the exclusive tests ourselves after the tests that can be run in parallel.

Hi, we鈥檝e been using the patch from #8983 for a couple of months now without any issues. What I noticed is that this change enables both sandboxing and caching. Without knowing how the internal tests failed, could it be that you just need to add a local tag to disable sandboxing?

We would be really interested in getting this into Bazel as well. We have tests that need to run exclusively but we occasionally miss dependencies that would be caught with sandboxing.

Is there any timelines to get it fixed? If not how about introducing the flag to enable #8983 conditionally, in this case many teams could benefit from exclusive tests caching as well as keep internal tests green. Thoughts?

Here is a concrete use-case for why this bug is important:
We have a lot of tests that require usage of the GPU. These tests use the exclusive tag to prevent OOM issues that can arise when running multiple tests that require the GPU at once. Because of this bug these tests have to run every single time that we do a test CI build.

In my opinion this bug breaks one of the core features of bazel: targets are being built which do not need to be built because the dependencies did not change.

After some hours I was able to figure out a workaround for my use case.

We have a macro that wraps our test rule which I edited such that when the "exclusive" tag is set for a test instead of just creating that test target normally we actually create two test targets.

The first target is tagged exclusive as normal but also gets tagged gpu_test. This ensures that developers can still run bazel test foo/... and still have all their tests run as expected.

The second target is tagged manual and gpu_test. This test is effectively identical to the first but it will be excluded from any sort of foo/... queries in bazel.

The effect is that for CI we run bazel test with --test_tag_filters=-exclusive,-gpu_test to exclude the exclusive targets from our bazel test step while also adding a new test step where we query for the tests tagged as manual and gpu_test and pass those to bazel test with --jobs=1 to effectively run the tests one at a time while maintaining remote caching.

Note that the purpose of the gpu_test tag is to ensure that we aren't running other unnecessary manual tests in this step.

@coeuvre For reference, this was fixed in cl/260916180, but rolled back in cl/261644804.

8983 is rolled forward with fix as 5e5eb86e5390e9e822848ccc031e0846a520858d:

  1. Add --incompatible_exclusive_test_sandboxed flag so users can enable this feature conditionally.
  2. With that flag enabled, users who want to run exclusive tests locally can add a 'local' tag.

Which release contains this flag?

It should be contained in release 4.0. You can track the release here #12455.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kastiglione picture kastiglione  路  3Comments

GaofengCheng picture GaofengCheng  路  3Comments

sandipmgiri picture sandipmgiri  路  3Comments

ob picture ob  路  3Comments

davidzchen picture davidzchen  路  3Comments