sh_test rules (and probably others) do not work on Windows when using remote execution, due to a missing runfiles manifest file.
On a Windows Server 2016 VM with Bazel 0.11.1 installed:
$ cd $home
$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel
$ bazel build //src/tools/remote:worker
$ bazel-out/x64_windows-fastbuild/bin/src/tools/remote/worker.exe --listen_port=8080 --work_path=C:\tmp\lre --debug
$ cd $home
$ mkdir bazel_test
$ cd bazel_test
Copy these files to there:
BUILD: https://gist.github.com/jasharpe/4566c496b222eaf45771ee815b1544e7
.bazelrc: https://gist.github.com/jasharpe/9edadeee13396b0efdddd6d21faeae3a
pass.sh: https://gist.github.com/jasharpe/70df48ca39a799825cc03099e9083086
$ cd $home\bazel_test
$ bazel --bazelrc=.bazelrc test --test_output=all :pass_test
$ cd $home\bazel_test
$ bazel --bazelrc=.bazelrc test --test_output=all --config=remote :pass_test
The error printed is something like:
grep: /c/tmp/lre/build-e5526681-466d-4e6b-b5a4-a774ea7e3238/bazel-out/x64_windows-fastbuild/bin/pass_test.exe.runfiles/MANIFEST: No such file or directory
external/bazel_tools/tools/test/test-setup.sh: line 230: : command not found
Indeed if you check this directory, it does not contain a file called MANIFEST:
ls C:\tmp\lre\build-e5526681-466d-4e6b-b5a4-a774ea7e3238\bazel-out\x64_windows-fastbuild\bin\pass_test.exe.runfiles\
Directory:
C:\tmp\lre\build-e5526681-466d-4e6b-b5a4-a774ea7e3238\bazel-out\x64_windows-fastbuild\bin\pass_test.exe.runfiles
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 4/3/2018 6:24 PM __main__
Windows Server 2016 (1709)
bazel info release?release 0.11.1
bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.N/A
git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?N/A
Closest thing seems #1593 and #2296 but I'm not sure it's the same problem, and both were closed a while ago.
Replace these lines with your answer.
If the files are large, upload as attachment or provide link.
My repro attempt failed with this error:
C:\tmp\lre-test>bazel --bazelrc=.bazelrc test --test_output=all --config=remote :all
ERROR: Failed to init auth credentials: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:871)
at com.google.devtools.build.lib.events.Event.<init>(Event.java:52)
at com.google.devtools.build.lib.events.Event.error(Event.java:165)
at com.google.devtools.build.lib.events.Event.error(Event.java:200)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:474)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:218)
at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:58)
at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:851)
at com.google.devtools.build.lib.server.GrpcServerImpl.access$2100(GrpcServerImpl.java:109)
at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:916)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:871)
at com.google.devtools.build.lib.events.Event.<init>(Event.java:52)
at com.google.devtools.build.lib.events.Event.error(Event.java:165)
at com.google.devtools.build.lib.events.Event.error(Event.java:200)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:474)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:218)
at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:58)
at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:851)
at com.google.devtools.build.lib.server.GrpcServerImpl.access$2100(GrpcServerImpl.java:109)
at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:916)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Server terminated abruptly (error code: 14, error message: '', log file: 'c:\users\laszlocsomor\appdata\local\temp\_bazel_laszlocsomor\fhlz3hz6/server/jvm.out')
The jvm.out file is empty.
I don't know what file I should set to GOOGLE_APPLICATION_CREDENTIALS.
Ah, sorry. This works when using a GCE VM. It may be sufficient to set --auth_enabled=false. This works for me and I'm hopeful it would resolve that problem. I've updated the gist for the .bazelrc to reflect this.
Update:
It turns out that remote execution for Windows provides quite a different environment. In fact, a
MANIFEST file is not required at all! I was able to make some hacky (but simple) changes to Bazel and get the sh_test above working.
Because of the nature of remote execution, all the runfiles are actually present in the runfiles tree on the remote worker (this is because you must ship actual files to remote workers on all platforms, so symlinks are resolved before packaging up the input tree). This contrasts to local execution of Windows, where the runfiles tree is empty, and the MANIFEST is used to find actual files.
This means that remote Windows execution looks a lot like remote Linux execution from the point of view of runfiles.
Two places are affected by this for Windows remote execution:
For tests, in test-setup.sh, it is correct to take the !RUNFILES_MANIFEST_ONLY branch (that is, the first one) that has an implementation of rlocation using the TEST_SRCDIR. In particular for tests, this means that this setting of RUNFILES_MANIFEST_ONLY should be conditional on whether we're executing remotely.
I'm not sure what the best solution is for making the setting of RUNFILES_MANIFEST_ONLY aware of whether execution is local or remote.
For tests and non-tests, launchers need to be aware somehow of whether they are executing locally or remotely, and use or not use the manifest as appropriate.
I believe that when executing remotely, the launcher implementation of rlocation should skip the manifest lookup, and instead just return basically $RUNFILES_DIR + "/" + path.
I'm not sure what the best solution is for the launcher knowing whether or not it is executing locally or remotely.
I will prepare an example pull request containing my hacky change for illustration purposes.
Thanks for investigating!
The runfile library can already switch to a directory based runfile lookup if RUNFILES_MANIFEST_ONLY isn't set to 1
See https://github.com/bazelbuild/bazel/blob/2a5512fa3041df96b140e96a30112d5137be8b63/src/tools/runfiles/java/com/google/devtools/build/runfiles/Runfiles.java#L55-L61
But we do need to make change to the launcher
Yun, I guess the launcher can use the runfiles library when it's ready. Unfortunately I'm finding insufficient time lately to finish anything, including the runfiles library.
@jasharpe , the thanks --auth_enabled=false Bazel flag did indeed help.
I discussed with @meteorcloudy and I will try to make a pull request to (for now) change the launcher to attempt to fall back to a directory approach if the MANIFEST file is missing. I think this simplifies things since it'll just work in both environments without having to pass any state in. (For tests the launcher can check the setting of RUNFILES_MANIFEST_ONLY, but this doesn't work for other remotely run executables like tools for genrules. So the options are this falling back strategy, or adding some additional environment variables.)
Any objections to that?
The issue of not setting RUNFILES_MANIFEST_ONLY for remote tests I will handle separately.
As far as I know the cases I will need to handle for remote execution are:
Since I promised to share my hacky change to make sh_tests work, here it is:
DISCLAIMER: Hacky!
https://github.com/jasharpe/bazel/compare/master...jasharpe:hacky_runfiles
May be useful in understanding what issues I'm trying to deal with.
The repro case in the initial issue is for a test (sh_test in particular), but another interesting case for a fix to be tested on is genrules. I have created a repro case for this. Here's the BUILD file:
sh_binary(
name = "tool",
srcs = ["tool.sh"],
)
genrule(
name = "genrule",
srcs = [],
outs = ["foo.txt"],
cmd = "$(location :tool) > \"$@\"",
tools = [":tool"],
)
tool.sh can just be an empty file.
If you build this genrule remotely you get:
LAUNCHER ERROR: Couldn't find MANIFEST file under C:/<redacted>/bazel-out/host/bin/tool.exe.runfiles\
One more thing that satisfies my own curiosity, I just discovered that the reason manifests aren't written for remote builds is this: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/analysis/SourceManifestAction.java#L141
One more thing that satisfies my own curiosity, I just discovered that the reason manifests aren't written for remote builds is this: https://github.com/bazelbuild/bazel/blob/47d1e21d182eaee34a13a11ebf2da3eb733037cf/src/main/java/com/google/devtools/build/lib/analysis/SourceManifestAction.java#L140-L141
The comment's reasoning makes sense though.