Bazel: `bazel coverage` fails over remote ex for java_test

Created on 22 Feb 2018 · 30Comments · Source: bazelbuild/bazel

Description of the problem / feature request:

Tooling for coverage fails when running under remote execution (LcovMerger specifically)

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Run a test target with bazel coverage --spawn_strategy=remote --remote_executor=... //path/to:test

What operating system are you running Bazel on?

Ubuntu 14.04

What's the output of `bazel info release`?

release 0.10.1

Any other information, logs, or outputs that you want to share?

Relevant portion of coverage execution

+ exec bazel-out/k8-fastbuild/bin/external/bazel_tools/tools/test/LcovMerger --coverage_dir=%execroot%/_coverage/path/to/test/test --output_file=%execroot%/bazel-out/k8-fastbuild/testlogs/path/to/test/coverage.dat
%execroot%/bazel-out/k8-fastbuild/bin/external/bazel_tools/tools/test/LcovMerger: Cannot locate runfiles directory. (Set $JAVA_RUNFILES to inhibit searching.)

P2 team-Remote-Exec bug

Source

werkt

👍1

Most helpful comment

I'm working on this. Hopefully, I have a fix soon.

oquenchil on 18 Sep 2020

👍4

All 30 comments

Is this bug still an issue?cc @lberki @iirina

buchgr on 5 Feb 2019

I see the same issue when running some of our C++/Python tests with remote execution. Bazel version is a slightly patched version of 0.25.3.

vmrob on 12 Sep 2019

I've come up with a workaround that basically involves the below patch. I have a feeling there is a more correct way to do this--possibly by including some implicit dependencies in the relevant coverage-aware Bazel rules, but I am not familiar enough with the codebase to efficiently tackle that.

commit ba16394fa207709b2b2a2fdc22de30cc72cf4398 (HEAD -> rbe-coverage)
Author: Victor Robertson <[email protected]>
Date:   Wed Sep 11 23:01:49 2019 -0700

    Fix coverage for RBE [EP-12079]

    This patch fixes code coverage in RBE by removing the distinction
    between JAVA_RUNFILES and TEST_SRCDIR in the collect_coverage.sh script.
    This is important as Bazel will not send JAVA_RUNFILES to the remote
    environment. Unfortunately, this means that a Java sdk will be included
    with every coverage test--perhaps this isn't so bad.

    This also means that tests with coverage support must include the
    dependencies normally provided by Bazel. In cruise/cruise, we use the
    following additional dependencies:

    - @embedded_jdk//:jdk
    - @bazel_tools//tools/jdk:JacocoCoverageRunner
    - @bazel_tools//tools/test/CoverageOutputGenerator/java/com/google/devtools/coverageoutputgenerator:Main

diff --git a/tools/test/collect_coverage.sh b/tools/test/collect_coverage.sh
index a9e212ce5d..df3b22cc78 100755
--- a/tools/test/collect_coverage.sh
+++ b/tools/test/collect_coverage.sh
@@ -188,7 +188,8 @@ if [[ $DISPLAY_LCOV_CMD ]] ; then
   echo "-----------------"
 fi

-# JAVA_RUNFILES is set to the runfiles of the test, which does not necessarily
-# contain a JVM (it does only if the test has a Java binary somewhere). So let
-# the LCOV merger discover where its own runfiles tree is.
-JAVA_RUNFILES= exec $LCOV_MERGER_CMD
+# cruise: Since JAVA_RUNFILES would never be pushed to RBE, we allow it to
+# remain as it is and ensure that TEST_SRCDIR has the right dependencies to run
+# coverage reports. Normally this would be preceeded with JAVA_RUNFILES= to
+# clear it out.
+exec $LCOV_MERGER_CMD

As mentioned in the patch, this means our test configurations include this little kludge. You can imagine how this is used (data = add_coverage_test_runtime_data()) along with the associated config.

def add_coverage_test_runtime_data():
    return select({
        "@//tools:is_coverage_build": [
            "@embedded_jdk//:jdk",
            "@bazel_tools//tools/jdk:JacocoCoverageRunner",
            "@bazel_tools//tools/test/CoverageOutputGenerator/java/com/google/devtools/coverageoutputgenerator:Main",
        ],
        "//conditions:default": [],
    })

Hopefully this information helps.

vmrob on 12 Sep 2019

The test action only stages the files-to-build, but not the runfiles tree. The coverage tool used by Google is a single, self-contained executable file. This is at odds with the new coverage tool, which is built as a Java binary, and the wrapper script for java_binary expects a runfiles tree. This happens to work locally because the wrapper script manages to escape the symlink sandbox by following a symlink back to the exec root (it wouldn't work with stricter local sandboxing).

I'm still not sure how best to fix it, but I can improve the situation. I'm reluctant to change how the test action stages the coverage tool. However, I have two changes to make the coverage tools repository self-contained (this is a nice cleanup in any case and makes it more flexible in the future), and I have a prototype for making the coverage tools repository compatible with the test action setup by making it work without a runfiles tree. That works on Linux and MacOS because it currently assumes the presence of /bin/bash.

ulfjack on 7 Dec 2019

I'm still not sure how best to fix it, but I can improve the situation. I'm reluctant to change how the test action stages the coverage tool.

Why the reluctance? Wouldn't adding the coverage binary properly as a tool to the test action (i.e., fix #4033) be the cleanest resolution?

benjaminp on 9 Dec 2019

👍1

Definitely maybe. Merging the runfiles tree into the tests runfiles tree can potentially cause conflicts, and we also don't want to tests to interact with it directly - both the tool and the contract are subject to change. We could stage the runfiles tree separately from the test runfiles tree, although I suspect that that isn't possible in Bazel right now. At the same time, the tool is a single self-contained deploy jar - it's only the java_binary wrapper script that needs a runfiles tree, not the tool itself.

ulfjack on 9 Dec 2019

The JVM could be in runfiles, so I don't think they're trivial to dispense with for the coverge merger.

I suppose you could split the coverage merging into a separate spawn like test.xml generation, but the test itself would need to glob all the intermediate coverage artifacts into a zip or something, which is pointless overhead.

benjaminp on 9 Dec 2019

Actually, I discussed splitting it into a separate spawn earlier today, and I think that would have some advantages. We are already close to the per-action execution time limit in some cases, and doing coverage work in the same action is problematic. We can use tree artifacts to track the intermediate artifacts to avoid zipping / unzipping.

You're right about the JVM.

ulfjack on 10 Dec 2019

I'm wondering if moving the lcov merger to a post-process would allow us to use an aspect to attach it to the test. We currently require that all test rules depend on the lcov_merger in order to get the postprocessing into the test action, and that requires extra work from rule authors and is also not documented anywhere.

ulfjack on 17 Dec 2019

That will probably fix #6293, too.

benjaminp on 17 Dec 2019

Fully integrating Go coverage requires some path to generate lcov data. However, I agree that moving to a post-process (aspect or no) would make it more flexible; it allows pulling back the coverage data even if the coverage tools for language X don't support lcov conversion yet (or ever). There's a similar problem with Python coverage which I looked into today - coverage.py doesn't support generating lcov, so right now it is impossible to integrate into Bazel's coverage system, and you can't even manually post-process because the files are silently dropped.

ulfjack on 17 Dec 2019

I'll look into declaring a tree artifact for the coverage output dir, if I find the time.

ulfjack on 17 Dec 2019

I was just hoping to not have to to pass --test_env=LCOV_MERGER=/bin/true.

benjaminp on 17 Dec 2019

👍1

It looks like it works, and it makes it easier to debug. Unfortunately, the shell wrappers put a bunch of unrelated stuff into the _coverage directory. :-/

ulfjack on 19 Dec 2019

As a workaround, you can do the following (currently requires Bazel built from HEAD):

Clone https://github.com/ulfjack/bazel/tree/coverage-3
Build the coverage output generator

bazel build //tools/test/CoverageOutputGenerator/java/com/google/devtools/coverageoutputgenerator:coverage_output_generator.zip

Unzip result into a new directory:

unzip path/to/clone/bazel-bin/tools/test/CoverageOutputGenerator/java/com/google/devtools/coverageoutputgenerator/coverage_output_generator.zip

Use --override_repository with bazel coverage:

bazel coverage //java/foo:foo_test --config=remote --override_repository=remote_coverage_tools=/path/to/unzipped/coverage/output/generator/

Alternatively, you can pull back the full coverage outputs and manually post-process them:

bazel coverage //java/foo:foo_test --config=remote --experimental_fetch_all_coverage_outputs

The downside of this second approach is that it also pulls back a bunch of (potentially) large, unnecessary files.

ulfjack on 29 Jan 2020

It seems like it should be possible to move the coverage merging to a post-process using the coverage tree artifact as an intermediary.

ulfjack on 29 Jan 2020

Ulf, do you plan to do anything here?

meisterT on 12 May 2020

👍1

I tried the --exerimental_fetch_all_coverage_outputs flag and I just get:

<snip>/_coverage/gcov is a symbolic link to an absolute path /usr/bin/gcov. Symlinks to absolute paths in action outputs are not supported.

finn-ball on 25 May 2020

@finn-ball Bazel puts a bunch of files into what should be the coverage output directory, and one of them is a symlink. However, I am under the impression that this error message comes from the remote cache, which would imply that you're using the flag for local execution. Do you see problems with coverage in local execution?

ulfjack on 25 May 2020

(When I wrote the flag, I was planning to gradually clean up the coverage runner to avoid putting these files in the coverage directory.)

ulfjack on 25 May 2020

With the --experimental_fetch_all_coverage_outputs:

local execution works
local execution with remote-cache works
remote execution fails

Remote execution works with the:

--test_env=LCOV_MERGER=/bin/true

finn-ball on 25 May 2020

However, I am under the impression that this error message comes from the remote cache, which would imply that you're using the flag for local execution. Do you see problems with coverage in local execution?

@ulfjack - By using remote-execution, am I not also using the remote cache anyway?

finn-ball on 26 May 2020

Ah, I see where that comes from - the remote executor checks the symlinks it gets from the remote system before it re-creates them locally, and it causes that error message.

ulfjack on 26 May 2020

Indeed. If I use --test_env=LCOV_MERGER=/bin/true, then my coverage.dat files are returned but they are empty. Are there any other ways around this?

finn-ball on 26 May 2020

This has been opened for more than 2 years now. Can’t this be escalated a bit?

thekyz on 17 Sep 2020

I'm working on this. Hopefully, I have a fix soon.

oquenchil on 18 Sep 2020

👍4

Actually, I discussed splitting it into a separate spawn earlier today, and I think that would have some advantages. We are already close to the per-action execution time limit in some cases, and doing coverage work in the same action is problematic. We can use tree artifacts to track the intermediate artifacts to avoid zipping / unzipping.

You're right about the JVM.

Hey Ulf, it looks like I have something that is working after splitting coverage post-processing into its own spawn. But now I was wondering why your initial thought was to split it into a different spawn in the same action and not a new separate action altogether.

oquenchil on 29 Sep 2020

Good point. I think I simply didn't consider it. Putting it in a separate action definitely has advantages.

ulfjack on 29 Sep 2020

@oquenchil - excellent! Have you merged your fix? I just tried this again with bazel v3.7.2 and I still get the same errors as before.

finn-ball on 20 Jan 2021

You have to use the flag --experimental_split_coverage_postprocessing. See this test as an example.

oquenchil on 21 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Cannot find requested resource bundle for locale en_US

sandipmgiri · 3Comments

Targets and packages can't be excluded with except for query and cquery deps

lisaonduty · 3Comments

Error: LinkageError occurred while loading main class com.google.devtools.build.lib.bazel.Bazel java.lang.ExceptionInInitializerError: null

PanJinquan · 3Comments

--action_env doesn't apply to genrule with host configuration

meteorcloudy · 3Comments

Cannot define string literals in cc_library

f1recracker · 3Comments

Bazel: `bazel coverage` fails over remote ex for java_test

Description of the problem / feature request:

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

What operating system are you running Bazel on?

What's the output of bazel info release?

Any other information, logs, or outputs that you want to share?

Most helpful comment

All 30 comments

Related issues

What's the output of `bazel info release`?