Bazel: Rules_go tests are failing on Windows with Bazel 1.0

Created on 11 Oct 2019  路  23Comments  路  Source: bazelbuild/bazel

https://buildkite.com/bazel/rules-go-golang/builds/1491

Executing tests from //tests/legacy/examples/stamped_bin:stamped_test
-----------------------------------------------------------------------------
--- FAIL: TestBuild (1.83s)
    stamped_test.go:101: Starting local Bazel server and connecting to it...
        Server crashed during startup. Now printing c:\users\b\_bazel_b\ibnjdh6i\server\jvm.out
        I/O Error: C:/users/b/_bazel_b/ibnjdh6i/server/server_info.rawproto.tmp -> C:/users/b/_bazel_b/ibnjdh6i/server/server_info.rawproto (Permission denied)
        exit status 37
--- FAIL: TestBuildWithoutStamp (1.83s)
    stamped_test.go:111: expected tests to have failed (instead got exit code 37)
FAIL
cleanup error: remove D:\b\jylqiqkr\bazel_testing\bazel_go_test\main: The process cannot access the file because it is being used by another process

Bazel 1.0 crashes inside the test due to some permission denied error.

We didn't catch this in downstream test because the rules_go tests invoke bazel from PATH. On CI, it would actually be Bazelisk, but Bazelisk always points to the latest stable version instead of using Bazel@HEAD.

P1 area-Windows breakage release blocker team-XProduct bug

All 23 comments

@dslomov This is probably a release blocker for 1.0.1, but I'm still investigating the true culprit.

But I found Bazel@HEAD didn't fail, so there must be a commit after 1.0 that already fixed the problem.

Another bisect told me https://github.com/bazelbuild/bazel/commit/b698e20cb9640bb7f3a6934224960233dac03ef8 fixed this issue. We'll need to either revert b7f5605d0634b8e571665bfb5c407f091e1ac4a4 or cherry-pick b698e20cb9640bb7f3a6934224960233dac03ef8 into 1.0

/cc related parties
@dslomov for 1.0.1 release
@philwo for Bazelisk improvement to catch similar problem in future
@michajlo for the original culprit
@laszlocsomor for the fix
@jayconrod for the rules_go owner

What I still don't understand is how did b7f5605 caused the failure and how did b698e20 fix it...

I can just confirm this happening for me as well. Occationally on my local machine, but every time on the server.

Should this block 1.1? (#9982)

Yes, although this issue is not happening from HEAD since b698e20, but I don't think that's a proper fix.

I've seen this bug on local builds too. I have no stable repro, but once it hits, it stays; you must remove the ".rawproto" file to fix it.

I have a stable reproduce locally and found a fix for this. Sending a CL now.

The mv there is supposed to be atomic, moveFile isn't. Can we find another way to work around the issue?

@meteorcloudy I wanted to follow up on your original comment:

We didn't catch this in downstream test because the rules_go tests invoke bazel from PATH. On CI, it would actually be Bazelisk, but Bazelisk always points to the latest stable version instead of using Bazel@HEAD.

Is there a way that rules_go tests can invoke the version of Bazel being tested? From an example test log, I see that bazel is invoked directly as D:\temp\tmpzmgcwe5d\bazel-bin\src\bazel.exe.

I remember running into an issue like this a long time ago back on Jenkins, and there used to be a BAZEL environment variable. I'd be happy to make the tests use something like that if it were added back, but it might be easiest just to add that temp directory to the front of PATH, assuming there's nothing else in the way.

So currently, we set USE_BAZEL_VERSION=D:\temp\tmpzmgcwe5d\bazel-bin\src\bazel.exe for downstream pipeline. If you just invoke Bazel from path, Bazelisk will correctly invoke the bazel binary being tested. You don't have to do anything now. :)

Why was this fix not included in 1.1? :(

Edit: It seems like the fix is in.
However, if the file exists from a previous version, Bazel won't recover. I think it is a bit unstable to assume there is no file? Maybe check if serverInfoFile exists, and if it is, delete it first?

If Bazel crashes before "deleteAtExit" happens, this unrecoverable error (or manual deletion is required) happens.

618e5a2 is included in 1.1, the moveFile function always try to delete the target file before moving that's why switching to this function will work.
Rules_go is green now: https://buildkite.com/bazel/rules-go-golang

I just did a quick test in Bazel 1.1 on Windows:

touch /path/to/workspace/server/server_info.rawproto
bazel build //...

Starting local Bazel server and connecting to it...
Server crashed during startup. Now printing /path/to/workspace/server\jvm.out
I/O Error: /path/to/workspace/server/server_info.rawproto.tmp -> /path/to/workspace/server/server_info.rawproto (Permission denied)
pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ bazel clean --expunge
INFO: Invocation ID: b3f448cb-677e-4372-9533-50a08d489c76
INFO: Reading 'startup' options from c:\tools\msys64\home\pcloudy\.bazelrc: --output_user_root=C:/src/tmp
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=320
INFO: Options provided by the client:
  Inherited 'build' options: --python_path=C:/Python36/python.exe
INFO: Reading rc options for 'clean' from c:\tools\msys64\home\pcloudy\workspace\my_tests\bazel\.bazelrc:
  Inherited 'build' options: --android_aapt=aapt2
INFO: Reading rc options for 'clean' from c:\tools\msys64\home\pcloudy\.bazelrc:
  Inherited 'build' options: --curses=yes --color=yes --verbose_failures --announce_rc --disk_cache=C:/src/tmp/bazel_disk_cache
INFO: Starting clean.
WARNING: Waiting for server process to terminate (waited 5 seconds, waiting at most 60)
WARNING: Waiting for server process to terminate (waited 10 seconds, waiting at most 60)

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ mkdir -p /c/src/tmp/wa6d46al/server

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ touch /c/src/tmp/wa6d46al/server/server_info.rawproto

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ ls /c/src/tmp/wa6d46al/server/server_info.rawproto
/c/src/tmp/wa6d46al/server/server_info.rawproto

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ bazel
Starting local Bazel server and connecting to it...
                                                           [bazel release 1.1.0]
Usage: bazel <command> <options> ...

Available commands:
  analyze-profile     Analyzes build profile data.
  aquery              Analyzes the given targets and queries the action graph.
  build               Builds the specified targets.
  canonicalize-flags  Canonicalizes a list of bazel options.
  clean               Removes output files and optionally stops the server.
  coverage            Generates code coverage report for specified test targets.
  cquery              Loads, analyzes, and queries the specified targets w/ configurations.
  dump                Dumps the internal state of the bazel server process.
  fetch               Fetches external repositories that are prerequisites to the targets.
  help                Prints help for commands, or the index.
  info                Displays runtime info about the bazel server.
  license             Prints the license of this software.
  mobile-install      Installs targets to mobile devices.
  print_action        Prints the command line args for compiling a file.
  query               Executes a dependency graph query.
  run                 Runs the specified target.
  shutdown            Stops the bazel server.
  sync                Syncs all repositories specified in the workspace file
  test                Builds and runs the specified test targets.
  version             Prints version information for bazel.

Getting more help:
  bazel help <command>
                   Prints help and options for <command>.
  bazel help startup_options
                   Options for the JVM hosting bazel.
  bazel help target-syntax
                   Explains the syntax for specifying targets.
  bazel help info-keys
                   Displays a list of keys used by the info command.

@ozio85 Are you sure you are using 1.1? I don't have this problem when I tried it.

I too can run bazel, you have to run an actual command, like build or clean.

Edit: clean seems to work though..

Edit 2: Hmm.. after running clean, it seems like it is working again. I will keep investigating if it has something todo with first running 0.29.1, then 1.1.0

My error! I had by mistake uploaded Bazel 1.0.0 to our artifactory, but named it Bazel 1.1.0 .. so i was running 1.1.0 locally, but the server ran 1.0.0..! Sorry for the trouble!

No worries at all, glad it worked! \o/

Was this page helpful?
0 / 5 - 0 ratings