Error log:
https://buildkite.com/bazel/google-bazel-presubmit/builds/36688#7500893b-351a-4549-86ae-19803c73cbd1
Looks like something changed so bazel frontend is unable to communicate with the server inside the docker container for RBE autoconfig.
We see this too with our projects and RBE and 3.4.0:
Starting local Bazel server and connecting to it...
--
聽 | ... still trying to connect to local Bazel server after 10 seconds ...
聽 | ... still trying to connect to local Bazel server after 20 seconds ...
聽 | ... still trying to connect to local Bazel server after 30 seconds ...
聽 | ... still trying to connect to local Bazel server after 40 seconds ...
聽 | ... still trying to connect to local Bazel server after 50 seconds ...
聽 | ... still trying to connect to local Bazel server after 60 seconds ...
聽 | ... still trying to connect to local Bazel server after 70 seconds ...
聽 | ... still trying to connect to local Bazel server after 80 seconds ...
聽 | ... still trying to connect to local Bazel server after 90 seconds ...
聽 | ... still trying to connect to local Bazel server after 100 seconds ...
聽 | ... still trying to connect to local Bazel server after 110 seconds ...
聽 | FATAL: couldn't connect to server (2502) after 120 seconds.
Yes, we're trying to diagnose and get out a patch as fast as we can.
Same issue here in a Docker runner on CircleCI:
jobs:
build:
docker:
- image: circleci/node:12.16.1
environment:
NODE_OPTIONS: --max_old_space_size=4096
resource_class: medium+
```
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
... still trying to connect to local Bazel server after 10 seconds ...
... still trying to connect to local Bazel server after 20 seconds ...
... still trying to connect to local Bazel server after 30 seconds ...
... still trying to connect to local Bazel server after 40 seconds ...
... still trying to connect to local Bazel server after 50 seconds ...
... still trying to connect to local Bazel server after 60 seconds ...
... still trying to connect to local Bazel server after 70 seconds ...
... still trying to connect to local Bazel server after 80 seconds ...
... still trying to connect to local Bazel server after 90 seconds ...
... still trying to connect to local Bazel server after 100 seconds ...
... still trying to connect to local Bazel server after 110 seconds ...
FATAL: couldn't connect to server (1521) after 120 seconds.
Makefile:5: recipe for target 'build' failed
make: * [build] Error 37
```
@nathanhleung Can you confirm that in your case no remote execution or rbe_autoconfig
stuff is involved and this happens just when running Bazel 3.4.0 inside the Docker container on CircleCI?
Repro without RBE:
docker run --rm -i -t l.gcr.io/google/rbe-ubuntu16-04@sha256:5464e3e83dc656fc6e4eae6a01f5c2645f1f7e95854b3802b85e86484132d90e bash
# wget https://github.com/bazelbuild/bazelisk/releases/download/v1.3.0/bazelisk-linux-amd64
# chmod +x bazelisk-linux-amd64
# touch WORKSPACE
# ./bazelisk-linux-amd64 info
Starting local Bazel server and connecting to it...
... still trying to connect to local Bazel server after 10 seconds ...
... still trying to connect to local Bazel server after 20 seconds ...
... still trying to connect to local Bazel server after 30 seconds ...
... still trying to connect to local Bazel server after 40 seconds ...
... still trying to connect to local Bazel server after 50 seconds ...
... still trying to connect to local Bazel server after 60 seconds ...
... still trying to connect to local Bazel server after 70 seconds ...
@nathanhleung Can you confirm that in your case no remote execution or
rbe_autoconfig
stuff is involved and this happens just when running Bazel 3.4.0 inside the Docker container on CircleCI?
Not sure how to confirm, but this is our Bazel install step:
install_bazel:
# From https://docs.bazel.build/versions/master/install-ubuntu.html
steps:
- run: |
sudo apt install curl gnupg apt-transport-https
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt \
stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
- run: sudo apt-get update && sudo apt-get -y install bazel
.bazelrc
generated by this script:
#!/bin/bash
# Exit on error
set -e
echo "# Generated by ./scripts/generate_bazelrc.sh" > .bazelrc
echo "build --remote_cache=https://$BAZEL_CACHE_USER:[email protected]" >> .bazelrc
(nothing else)
And build command:
bazel build //src/server_bin_deploy.jar
We have confirmed that https://github.com/bazelbuild/bazel/commit/0415511cf8afc1cabef0fa2ec52715b6b7ea94ef fixes the issue
To avoid conflict, we should also cherry-pick https://github.com/bazelbuild/bazel/commit/08bf9066c51365a288e9a4b546c8f07416ceaaa2
(My observation has been that enabling IPv6 in the container fixes this issue.)
Patch release can be found here: https://releases.bazel.build/3.4.1/rc1/index.html
Downstream pipeline for Bazel 3.4.1rc1 running here: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/1564
The bug is mitigated and Bazel 3.4.1 is released. Assigning to Yun who offered to look into creating a test (and probably has an incentive to roll forward the Grpc update).
FYI @olekw
This is now fixed with the release of Bazel 3.4.1.
BTW, https://github.com/netty/netty/issues/10402 is the underlying issue. It could be worked around by not attempting to bind [::1]
if io.netty.util.NetUtil.isIpV4StackPreferred()
returns true.
Most helpful comment
BTW, https://github.com/netty/netty/issues/10402 is the underlying issue. It could be worked around by not attempting to bind
[::1]
ifio.netty.util.NetUtil.isIpV4StackPreferred()
returns true.