Bazel: Tests with "exclusive" tag do not get results from remote cache

Created on 22 Sep 2017 · 21Comments · Source: bazelbuild/bazel

Description of the problem / feature request / question:

I am trying out the remote cache functionality described here. Currently I'm running against a local hazelcast instance. This works great most of the time. One thing I noticed is that tests with the exclusive tag, despite being cached as normal in the local cache, will not seem to use the remote cache and will always rerun if there is no passing result in the local cache. This seems like an oversight?

If possible, provide a minimal example to reproduce the problem:

BUILD

cc_test(
  name = "test",
  srcs = ["test.cc"],
  tags = ["exclusive"], # comment this out to fix the problem
)

test.cc

#include <chrono>
#include <thread>

int main() {
  std::this_thread::sleep_for(std::chrono::seconds(10));
  return 0;
}

.bazelrc

startup --host_jvm_args=-Dbazel.DigestFunction=SHA1

build --spawn_strategy=remote
build --experimental_strict_action_env
build --remote_rest_cache=http://localhost:5701/hazelcast/rest/maps/cache

test --spawn_strategy=remote
test --experimental_strict_action_env
test --remote_rest_cache=http://localhost:5701/hazelcast/rest/maps/cache

Environment info

Operating System:
Ubuntu
Bazel version (output of bazel info release):
release 0.5.3
Hazelcast version
3.8.6

Anything else, information or logs or outputs that would be helpful?

Here's an example run. The second test will pass in 0.0s if either the exclusive tag is removed, or the second bazel clean command is omitted.

$ bazel clean && bazel test //... && bazel clean && bazel test //...                                         
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Analysed target //:test (9 packages loaded).
INFO: Found 1 test target...
Target //:test up-to-date:
  bazel-bin/test
INFO: Elapsed time: 10.405s, Critical Path: 10.20s
INFO: Build completed successfully, 7 total actions
//:test                                                                  PASSED in 10.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Analysed target //:test (9 packages loaded).
INFO: Found 1 test target...
Target //:test up-to-date:
  bazel-bin/test
INFO: Elapsed time: 10.209s, Critical Path: 10.02s
INFO: Build completed successfully, 7 total actions
//:test                                                                  PASSED in 10.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.

P2 team-Remote-Exec bug

Source

grandseiken

👍4

Most helpful comment

Is there any timelines to get it fixed? If not how about introducing the flag to enable #8983 conditionally, in this case many teams could benefit from exclusive tests caching as well as keep internal tests green. Thoughts?

stepango on 4 Jun 2020

👍6

All 21 comments

/cc @philwo because I remember there was some discussion because it also drop out of sandbox when we do that.

damienmg on 22 Sep 2017

@buchgr

hlopko on 10 Oct 2017

Reassigning to @buchgr, because remote execution.

philwo on 19 Jul 2018

@buchgr Is there a timeline for fixing this?

RNabel on 5 Aug 2019

@RNabel yes. should be in 0.29.0

buchgr on 6 Aug 2019

🎉3

@buchgr it looks like your fix https://github.com/bazelbuild/bazel/pull/8983 got rolled back and as far as I can tell, not included in 0.29

phb on 21 Aug 2019

yeah it did get rolled back :(. I broke tons of code internally. I ll try to rollforward for the 1.0 release.

buchgr on 21 Aug 2019

@buchgr any updates on this ticket?

mandrean on 2 Dec 2019

Yes, unfortunately it got rolled back because it broke many internal tests. I'll need to fix those before I can roll it forward. I don't have the cycles to do this right now.

buchgr on 2 Dec 2019

This patch is extremely beneficial, any timeline towards getting this merged?

Qinusty on 30 Jan 2020

I unfortunately no longer work on Bazel and won't have the time to fix all the internal tests broken by this change.

buchgr on 3 Feb 2020

Is there anyone that would be able to pick this up or is it possible for someone external to pick this work up? This would be a really helpful, we currently have to work around this by calling the exclusive tests ourselves after the tests that can be run in parallel.

olib963 on 21 Feb 2020

Hi, we’ve been using the patch from #8983 for a couple of months now without any issues. What I noticed is that this change enables both sandboxing and caching. Without knowing how the internal tests failed, could it be that you just need to add a local tag to disable sandboxing?

cocreature on 18 May 2020

We would be really interested in getting this into Bazel as well. We have tests that need to run exclusively but we occasionally miss dependencies that would be caught with sandboxing.

mcwilson07 on 28 May 2020

stepango on 4 Jun 2020

👍6

Here is a concrete use-case for why this bug is important:
We have a lot of tests that require usage of the GPU. These tests use the exclusive tag to prevent OOM issues that can arise when running multiple tests that require the GPU at once. Because of this bug these tests have to run every single time that we do a test CI build.

In my opinion this bug breaks one of the core features of bazel: targets are being built which do not need to be built because the dependencies did not change.

JayThomason on 25 Aug 2020

👍5

After some hours I was able to figure out a workaround for my use case.

We have a macro that wraps our test rule which I edited such that when the "exclusive" tag is set for a test instead of just creating that test target normally we actually create two test targets.

The first target is tagged exclusive as normal but also gets tagged gpu_test. This ensures that developers can still run bazel test foo/... and still have all their tests run as expected.

The second target is tagged manual and gpu_test. This test is effectively identical to the first but it will be excluded from any sort of foo/... queries in bazel.

The effect is that for CI we run bazel test with --test_tag_filters=-exclusive,-gpu_test to exclude the exclusive targets from our bazel test step while also adding a new test step where we query for the tests tagged as manual and gpu_test and pass those to bazel test with --jobs=1 to effectively run the tests one at a time while maintaining remote caching.

Note that the purpose of the gpu_test tag is to ensure that we aren't running other unnecessary manual tests in this step.

JayThomason on 27 Aug 2020

@coeuvre For reference, this was fixed in cl/260916180, but rolled back in cl/261644804.

philwo on 21 Sep 2020

8983 is rolled forward with fix as 5e5eb86e5390e9e822848ccc031e0846a520858d:

Add --incompatible_exclusive_test_sandboxed flag so users can enable this feature conditionally.
With that flag enabled, users who want to run exclusive tests locally can add a 'local' tag.