Bazel: sh_test does not work remotely on Windows

Created on 3 Apr 2018  路  12Comments  路  Source: bazelbuild/bazel

Description of the problem / feature request:

sh_test rules (and probably others) do not work on Windows when using remote execution, due to a missing runfiles manifest file.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

On a Windows Server 2016 VM with Bazel 0.11.1 installed:

  1. Build and run Local Remote Worker:
$ cd $home
$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel
$ bazel build //src/tools/remote:worker
$ bazel-out/x64_windows-fastbuild/bin/src/tools/remote/worker.exe --listen_port=8080 --work_path=C:\tmp\lre --debug
  1. Create a Bazel directory with a simple shell test:
$ cd $home
$ mkdir bazel_test
$ cd bazel_test

Copy these files to there:
BUILD: https://gist.github.com/jasharpe/4566c496b222eaf45771ee815b1544e7
.bazelrc: https://gist.github.com/jasharpe/9edadeee13396b0efdddd6d21faeae3a
pass.sh: https://gist.github.com/jasharpe/70df48ca39a799825cc03099e9083086

  1. Demonstrate that it works locally:
$ cd $home\bazel_test
$ bazel --bazelrc=.bazelrc test --test_output=all :pass_test
  1. Demonstrate that it does not work remotely:
$ cd $home\bazel_test
$ bazel --bazelrc=.bazelrc test --test_output=all --config=remote :pass_test

The error printed is something like:

grep: /c/tmp/lre/build-e5526681-466d-4e6b-b5a4-a774ea7e3238/bazel-out/x64_windows-fastbuild/bin/pass_test.exe.runfiles/MANIFEST: No such file or directory
external/bazel_tools/tools/test/test-setup.sh: line 230: : command not found

Indeed if you check this directory, it does not contain a file called MANIFEST:

 ls C:\tmp\lre\build-e5526681-466d-4e6b-b5a4-a774ea7e3238\bazel-out\x64_windows-fastbuild\bin\pass_test.exe.runfiles\


    Directory:
    C:\tmp\lre\build-e5526681-466d-4e6b-b5a4-a774ea7e3238\bazel-out\x64_windows-fastbuild\bin\pass_test.exe.runfiles


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         4/3/2018   6:24 PM                __main__

What operating system are you running Bazel on?

Windows Server 2016 (1709)

What's the output of bazel info release?

release 0.11.1

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

N/A

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

Closest thing seems #1593 and #2296 but I'm not sure it's the same problem, and both were closed a while ago.

Any other information, logs, or outputs that you want to share?

Replace these lines with your answer.

If the files are large, upload as attachment or provide link.

windows bug

All 12 comments

3726 is pretty relevant as well, in particular "the manifest is not present in a sandbox", with reply "I'm not sure we have adequately defined / designed a more permanent solution, but I'm sure it isn't making the current runfiles manifests (that contain absolute paths) available in the sandbox."

My repro attempt failed with this error:

C:\tmp\lre-test>bazel --bazelrc=.bazelrc test --test_output=all --config=remote :all
ERROR: Failed to init auth credentials: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
java.lang.NullPointerException
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:871)
        at com.google.devtools.build.lib.events.Event.<init>(Event.java:52)
        at com.google.devtools.build.lib.events.Event.error(Event.java:165)
        at com.google.devtools.build.lib.events.Event.error(Event.java:200)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:474)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:218)
        at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:58)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:851)
        at com.google.devtools.build.lib.server.GrpcServerImpl.access$2100(GrpcServerImpl.java:109)
        at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:916)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:871)
        at com.google.devtools.build.lib.events.Event.<init>(Event.java:52)
        at com.google.devtools.build.lib.events.Event.error(Event.java:165)
        at com.google.devtools.build.lib.events.Event.error(Event.java:200)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:474)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:218)
        at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:58)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:851)
        at com.google.devtools.build.lib.server.GrpcServerImpl.access$2100(GrpcServerImpl.java:109)
        at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:916)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Server terminated abruptly (error code: 14, error message: '', log file: 'c:\users\laszlocsomor\appdata\local\temp\_bazel_laszlocsomor\fhlz3hz6/server/jvm.out')

The jvm.out file is empty.

I don't know what file I should set to GOOGLE_APPLICATION_CREDENTIALS.

Ah, sorry. This works when using a GCE VM. It may be sufficient to set --auth_enabled=false. This works for me and I'm hopeful it would resolve that problem. I've updated the gist for the .bazelrc to reflect this.

Update:

It turns out that remote execution for Windows provides quite a different environment. In fact, a
MANIFEST file is not required at all! I was able to make some hacky (but simple) changes to Bazel and get the sh_test above working.

Because of the nature of remote execution, all the runfiles are actually present in the runfiles tree on the remote worker (this is because you must ship actual files to remote workers on all platforms, so symlinks are resolved before packaging up the input tree). This contrasts to local execution of Windows, where the runfiles tree is empty, and the MANIFEST is used to find actual files.

This means that remote Windows execution looks a lot like remote Linux execution from the point of view of runfiles.

Two places are affected by this for Windows remote execution:

  1. For tests, in test-setup.sh, it is correct to take the !RUNFILES_MANIFEST_ONLY branch (that is, the first one) that has an implementation of rlocation using the TEST_SRCDIR. In particular for tests, this means that this setting of RUNFILES_MANIFEST_ONLY should be conditional on whether we're executing remotely.

    I'm not sure what the best solution is for making the setting of RUNFILES_MANIFEST_ONLY aware of whether execution is local or remote.

  2. For tests and non-tests, launchers need to be aware somehow of whether they are executing locally or remotely, and use or not use the manifest as appropriate.

    I believe that when executing remotely, the launcher implementation of rlocation should skip the manifest lookup, and instead just return basically $RUNFILES_DIR + "/" + path.

    I'm not sure what the best solution is for the launcher knowing whether or not it is executing locally or remotely.

I will prepare an example pull request containing my hacky change for illustration purposes.

Thanks for investigating!
The runfile library can already switch to a directory based runfile lookup if RUNFILES_MANIFEST_ONLY isn't set to 1
See https://github.com/bazelbuild/bazel/blob/2a5512fa3041df96b140e96a30112d5137be8b63/src/tools/runfiles/java/com/google/devtools/build/runfiles/Runfiles.java#L55-L61

But we do need to make change to the launcher

Yun, I guess the launcher can use the runfiles library when it's ready. Unfortunately I'm finding insufficient time lately to finish anything, including the runfiles library.

@jasharpe , the thanks --auth_enabled=false Bazel flag did indeed help.

I discussed with @meteorcloudy and I will try to make a pull request to (for now) change the launcher to attempt to fall back to a directory approach if the MANIFEST file is missing. I think this simplifies things since it'll just work in both environments without having to pass any state in. (For tests the launcher can check the setting of RUNFILES_MANIFEST_ONLY, but this doesn't work for other remotely run executables like tools for genrules. So the options are this falling back strategy, or adding some additional environment variables.)

Any objections to that?

The issue of not setting RUNFILES_MANIFEST_ONLY for remote tests I will handle separately.

As far as I know the cases I will need to handle for remote execution are:

  • Tests.
  • Data dependencies of tests.
  • Tools of genrules.

Since I promised to share my hacky change to make sh_tests work, here it is:

DISCLAIMER: Hacky!

https://github.com/jasharpe/bazel/compare/master...jasharpe:hacky_runfiles

May be useful in understanding what issues I'm trying to deal with.

The repro case in the initial issue is for a test (sh_test in particular), but another interesting case for a fix to be tested on is genrules. I have created a repro case for this. Here's the BUILD file:

sh_binary(
    name = "tool",
    srcs = ["tool.sh"],
)

genrule(
    name = "genrule",
    srcs = [],
    outs = ["foo.txt"],
    cmd = "$(location :tool) > \"$@\"",
    tools = [":tool"],
)

tool.sh can just be an empty file.

If you build this genrule remotely you get:

LAUNCHER ERROR: Couldn't find MANIFEST file under C:/<redacted>/bazel-out/host/bin/tool.exe.runfiles\

One more thing that satisfies my own curiosity, I just discovered that the reason manifests aren't written for remote builds is this: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/analysis/SourceManifestAction.java#L141

One more thing that satisfies my own curiosity, I just discovered that the reason manifests aren't written for remote builds is this: https://github.com/bazelbuild/bazel/blob/47d1e21d182eaee34a13a11ebf2da3eb733037cf/src/main/java/com/google/devtools/build/lib/analysis/SourceManifestAction.java#L140-L141

The comment's reasoning makes sense though.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ajaysaini-sgvu picture ajaysaini-sgvu  路  3Comments

cyberbono3 picture cyberbono3  路  3Comments

alexandrvb picture alexandrvb  路  3Comments

GaofengCheng picture GaofengCheng  路  3Comments

kastiglione picture kastiglione  路  3Comments