https://buildkite.com/bazel/rules-go-golang/builds/1491
Executing tests from //tests/legacy/examples/stamped_bin:stamped_test
-----------------------------------------------------------------------------
--- FAIL: TestBuild (1.83s)
stamped_test.go:101: Starting local Bazel server and connecting to it...
Server crashed during startup. Now printing c:\users\b\_bazel_b\ibnjdh6i\server\jvm.out
I/O Error: C:/users/b/_bazel_b/ibnjdh6i/server/server_info.rawproto.tmp -> C:/users/b/_bazel_b/ibnjdh6i/server/server_info.rawproto (Permission denied)
exit status 37
--- FAIL: TestBuildWithoutStamp (1.83s)
stamped_test.go:111: expected tests to have failed (instead got exit code 37)
FAIL
cleanup error: remove D:\b\jylqiqkr\bazel_testing\bazel_go_test\main: The process cannot access the file because it is being used by another process
Bazel 1.0 crashes inside the test due to some permission denied error.
We didn't catch this in downstream test because the rules_go tests invoke bazel from PATH. On CI, it would actually be Bazelisk, but Bazelisk always points to the latest stable version instead of using Bazel@HEAD.
@dslomov This is probably a release blocker for 1.0.1, but I'm still investigating the true culprit.
https://github.com/bazelbuild/bazel/commit/b7f5605d0634b8e571665bfb5c407f091e1ac4a4 is probably the culprit.
But I found Bazel@HEAD didn't fail, so there must be a commit after 1.0 that already fixed the problem.
Another bisect told me https://github.com/bazelbuild/bazel/commit/b698e20cb9640bb7f3a6934224960233dac03ef8 fixed this issue. We'll need to either revert b7f5605d0634b8e571665bfb5c407f091e1ac4a4 or cherry-pick b698e20cb9640bb7f3a6934224960233dac03ef8 into 1.0
/cc related parties
@dslomov for 1.0.1 release
@philwo for Bazelisk improvement to catch similar problem in future
@michajlo for the original culprit
@laszlocsomor for the fix
@jayconrod for the rules_go owner
What I still don't understand is how did b7f5605 caused the failure and how did b698e20 fix it...
I can just confirm this happening for me as well. Occationally on my local machine, but every time on the server.
Should this block 1.1? (#9982)
Yes, although this issue is not happening from HEAD since b698e20, but I don't think that's a proper fix.
I've seen this bug on local builds too. I have no stable repro, but once it hits, it stays; you must remove the ".rawproto" file to fix it.
I have a stable reproduce locally and found a fix for this. Sending a CL now.
The mv there is supposed to be atomic, moveFile isn't. Can we find another way to work around the issue?
Reopening because of a rollback: https://github.com/bazelbuild/bazel/commit/ac99c298f377f21707738eecc83252c932522a86
@meteorcloudy I wanted to follow up on your original comment:
We didn't catch this in downstream test because the rules_go tests invoke bazel from PATH. On CI, it would actually be Bazelisk, but Bazelisk always points to the latest stable version instead of using Bazel@HEAD.
Is there a way that rules_go tests can invoke the version of Bazel being tested? From an example test log, I see that bazel is invoked directly as D:\temp\tmpzmgcwe5d\bazel-bin\src\bazel.exe.
I remember running into an issue like this a long time ago back on Jenkins, and there used to be a BAZEL environment variable. I'd be happy to make the tests use something like that if it were added back, but it might be easiest just to add that temp directory to the front of PATH, assuming there's nothing else in the way.
So currently, we set USE_BAZEL_VERSION=D:\temp\tmpzmgcwe5d\bazel-bin\src\bazel.exe for downstream pipeline. If you just invoke Bazel from path, Bazelisk will correctly invoke the bazel binary being tested. You don't have to do anything now. :)
Why was this fix not included in 1.1? :(
Edit: It seems like the fix is in.
However, if the file exists from a previous version, Bazel won't recover. I think it is a bit unstable to assume there is no file? Maybe check if serverInfoFile exists, and if it is, delete it first?
If Bazel crashes before "deleteAtExit" happens, this unrecoverable error (or manual deletion is required) happens.
618e5a2 is included in 1.1, the moveFile function always try to delete the target file before moving that's why switching to this function will work.
Rules_go is green now: https://buildkite.com/bazel/rules-go-golang
I just did a quick test in Bazel 1.1 on Windows:
touch /path/to/workspace/server/server_info.rawproto
bazel build //...
Starting local Bazel server and connecting to it...
Server crashed during startup. Now printing /path/to/workspace/server\jvm.out
I/O Error: /path/to/workspace/server/server_info.rawproto.tmp -> /path/to/workspace/server/server_info.rawproto (Permission denied)
pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ bazel clean --expunge
INFO: Invocation ID: b3f448cb-677e-4372-9533-50a08d489c76
INFO: Reading 'startup' options from c:\tools\msys64\home\pcloudy\.bazelrc: --output_user_root=C:/src/tmp
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=320
INFO: Options provided by the client:
Inherited 'build' options: --python_path=C:/Python36/python.exe
INFO: Reading rc options for 'clean' from c:\tools\msys64\home\pcloudy\workspace\my_tests\bazel\.bazelrc:
Inherited 'build' options: --android_aapt=aapt2
INFO: Reading rc options for 'clean' from c:\tools\msys64\home\pcloudy\.bazelrc:
Inherited 'build' options: --curses=yes --color=yes --verbose_failures --announce_rc --disk_cache=C:/src/tmp/bazel_disk_cache
INFO: Starting clean.
WARNING: Waiting for server process to terminate (waited 5 seconds, waiting at most 60)
WARNING: Waiting for server process to terminate (waited 10 seconds, waiting at most 60)
pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ mkdir -p /c/src/tmp/wa6d46al/server
pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ touch /c/src/tmp/wa6d46al/server/server_info.rawproto
pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ ls /c/src/tmp/wa6d46al/server/server_info.rawproto
/c/src/tmp/wa6d46al/server/server_info.rawproto
pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ bazel
Starting local Bazel server and connecting to it...
[bazel release 1.1.0]
Usage: bazel <command> <options> ...
Available commands:
analyze-profile Analyzes build profile data.
aquery Analyzes the given targets and queries the action graph.
build Builds the specified targets.
canonicalize-flags Canonicalizes a list of bazel options.
clean Removes output files and optionally stops the server.
coverage Generates code coverage report for specified test targets.
cquery Loads, analyzes, and queries the specified targets w/ configurations.
dump Dumps the internal state of the bazel server process.
fetch Fetches external repositories that are prerequisites to the targets.
help Prints help for commands, or the index.
info Displays runtime info about the bazel server.
license Prints the license of this software.
mobile-install Installs targets to mobile devices.
print_action Prints the command line args for compiling a file.
query Executes a dependency graph query.
run Runs the specified target.
shutdown Stops the bazel server.
sync Syncs all repositories specified in the workspace file
test Builds and runs the specified test targets.
version Prints version information for bazel.
Getting more help:
bazel help <command>
Prints help and options for <command>.
bazel help startup_options
Options for the JVM hosting bazel.
bazel help target-syntax
Explains the syntax for specifying targets.
bazel help info-keys
Displays a list of keys used by the info command.
@ozio85 Are you sure you are using 1.1? I don't have this problem when I tried it.
I too can run bazel, you have to run an actual command, like build or clean.
Edit: clean seems to work though..
Edit 2: Hmm.. after running clean, it seems like it is working again. I will keep investigating if it has something todo with first running 0.29.1, then 1.1.0
My error! I had by mistake uploaded Bazel 1.0.0 to our artifactory, but named it Bazel 1.1.0 .. so i was running 1.1.0 locally, but the server ran 1.0.0..! Sorry for the trouble!
No worries at all, glad it worked! \o/