I am trying out the remote cache functionality described here. Currently I'm running against a local hazelcast instance. This works great most of the time. One thing I noticed is that tests with the exclusive
tag, despite being cached as normal in the local cache, will not seem to use the remote cache and will always rerun if there is no passing result in the local cache. This seems like an oversight?
BUILD
cc_test(
name = "test",
srcs = ["test.cc"],
tags = ["exclusive"], # comment this out to fix the problem
)
test.cc
#include <chrono>
#include <thread>
int main() {
std::this_thread::sleep_for(std::chrono::seconds(10));
return 0;
}
.bazelrc
startup --host_jvm_args=-Dbazel.DigestFunction=SHA1
build --spawn_strategy=remote
build --experimental_strict_action_env
build --remote_rest_cache=http://localhost:5701/hazelcast/rest/maps/cache
test --spawn_strategy=remote
test --experimental_strict_action_env
test --remote_rest_cache=http://localhost:5701/hazelcast/rest/maps/cache
Operating System:
Ubuntu
Bazel version (output of bazel info release
):
release 0.5.3
Hazelcast version
3.8.6
Here's an example run. The second test will pass in 0.0s if either the exclusive
tag is removed, or the second bazel clean
command is omitted.
$ bazel clean && bazel test //... && bazel clean && bazel test //...
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Analysed target //:test (9 packages loaded).
INFO: Found 1 test target...
Target //:test up-to-date:
bazel-bin/test
INFO: Elapsed time: 10.405s, Critical Path: 10.20s
INFO: Build completed successfully, 7 total actions
//:test PASSED in 10.0s
Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Analysed target //:test (9 packages loaded).
INFO: Found 1 test target...
Target //:test up-to-date:
bazel-bin/test
INFO: Elapsed time: 10.209s, Critical Path: 10.02s
INFO: Build completed successfully, 7 total actions
//:test PASSED in 10.0s
Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
/cc @philwo because I remember there was some discussion because it also drop out of sandbox when we do that.
@buchgr
Reassigning to @buchgr, because remote execution.
@buchgr Is there a timeline for fixing this?
@RNabel yes. should be in 0.29.0
@buchgr it looks like your fix https://github.com/bazelbuild/bazel/pull/8983 got rolled back and as far as I can tell, not included in 0.29
yeah it did get rolled back :(. I broke tons of code internally. I ll try to rollforward for the 1.0 release.
@buchgr any updates on this ticket?
Yes, unfortunately it got rolled back because it broke many internal tests. I'll need to fix those before I can roll it forward. I don't have the cycles to do this right now.
This patch is extremely beneficial, any timeline towards getting this merged?
I unfortunately no longer work on Bazel and won't have the time to fix all the internal tests broken by this change.
Is there anyone that would be able to pick this up or is it possible for someone external to pick this work up? This would be a really helpful, we currently have to work around this by calling the exclusive tests ourselves after the tests that can be run in parallel.
Hi, we鈥檝e been using the patch from #8983 for a couple of months now without any issues. What I noticed is that this change enables both sandboxing and caching. Without knowing how the internal tests failed, could it be that you just need to add a local
tag to disable sandboxing?
We would be really interested in getting this into Bazel as well. We have tests that need to run exclusively but we occasionally miss dependencies that would be caught with sandboxing.
Is there any timelines to get it fixed? If not how about introducing the flag to enable #8983 conditionally, in this case many teams could benefit from exclusive tests caching as well as keep internal tests green. Thoughts?
Here is a concrete use-case for why this bug is important:
We have a lot of tests that require usage of the GPU. These tests use the exclusive tag to prevent OOM issues that can arise when running multiple tests that require the GPU at once. Because of this bug these tests have to run every single time that we do a test CI build.
In my opinion this bug breaks one of the core features of bazel: targets are being built which do not need to be built because the dependencies did not change.
After some hours I was able to figure out a workaround for my use case.
We have a macro that wraps our test rule which I edited such that when the "exclusive" tag is set for a test instead of just creating that test target normally we actually create two test targets.
The first target is tagged exclusive
as normal but also gets tagged gpu_test
. This ensures that developers can still run bazel test foo/...
and still have all their tests run as expected.
The second target is tagged manual
and gpu_test
. This test is effectively identical to the first but it will be excluded from any sort of foo/...
queries in bazel.
The effect is that for CI we run bazel test
with --test_tag_filters=-exclusive,-gpu_test
to exclude the exclusive targets from our bazel test step while also adding a new test step where we query for the tests tagged as manual
and gpu_test
and pass those to bazel test
with --jobs=1
to effectively run the tests one at a time while maintaining remote caching.
Note that the purpose of the gpu_test
tag is to ensure that we aren't running other unnecessary manual tests in this step.
@coeuvre For reference, this was fixed in cl/260916180, but rolled back in cl/261644804.
--incompatible_exclusive_test_sandboxed
flag so users can enable this feature conditionally.Which release contains this flag?
It should be contained in release 4.0. You can track the release here #12455.
Most helpful comment
Is there any timelines to get it fixed? If not how about introducing the flag to enable #8983 conditionally, in this case many teams could benefit from exclusive tests caching as well as keep internal tests green. Thoughts?