Bazel: Change of --output_base causes subsequent builds to fail

Created on 24 Jan 2020  路  5Comments  路  Source: bazelbuild/bazel

Description of the problem / feature request:

Change in --output_base parameter causes the build to break with very obscure error messages.

Feature requests: what underlying problem are you trying to solve with this feature?

When build is done using some IDEs (VSCode or IntelliJ) they tend to set their own --output_base, different from the one configured by the user for the command line builds. This should not cause any problems, but unfortunately it appears that after --output_base changes the build is broken.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Select any simplest Bazel project. Build it with bazel --output_base=C:/blah1 build ... so it builds successfully. Then try to build it with bazel --output_base=C:/blah2 build ... this time it breaks with the error messages which don't make any sense.

What operating system are you running Bazel on?

The problem does not seem to be OS dependent.

What's the output of bazel info release?

2.0.0

Any other information, logs, or outputs that you want to share?

I figured that the problem is must probably caused by the stale "courtesy" symlink in WORKSPACE folder. After the first build Bazel creates "courtesy" symlink such as bazel-<workspace_name> in the workspace folder and it points inside output_base. When we issue second build command with the different output_base Bazel is smart enough to realize that and spawn second build server process, but unfortunately that stale bazel-<workspace_name> symlink still stays and points inside the old output_base which seems to confuse Bazel. Running bazel clean between builds or simply deleting of that symlink fixes the problem. It looks like when Bazel discovers the change in startup parameters which warrants spawning new build server it should at the same time remove existing courtesy symlinks as they are not valid anymore and cause the build to fail.

P2 team-Core team-XProduct bug

Most helpful comment

The basic problem here is that the bazel-$WORKSPACE convenience symlink is blindly traversed by Bazel: you can even do bazel build //bazel-$WORKSPACE/... and it will load the labels without any issues, and even build the targets there if you're lucky. When you switch output bases, the symlinks are broken, but the fundamental issue is visiting that convenience symlink in the first place.

My suggestion for a workaround for now is to do echo "bazel-$WORKSPACE" >> .bazelignore. The .bazelignore file in the root of the workspace tells Bazel not to consider those directories. Of course, Bazel should be smart enough to not consider them on its own.

cc @mhy1992

All 5 comments

Reproduced with Bazel 2.0:

03:30:25 /tmp/ws
$ cat BUILD
genrule(
    name = "g",
    outs = ["g.txt"],
    cmd = "touch $@",
)
03:30:28 /tmp/ws
$ cat WORKSPACE
03:30:31 /tmp/ws
$ bazel --output_base=/tmp/one build //... && bazel --output_base=/tmp/two build //...
INFO: Analyzed target //:g (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:g up-to-date:
  bazel-bin/g.txt
INFO: Elapsed time: 0.119s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
ERROR: error loading package 'bazel-ws/external/bazel_tools/tools/build_defs/pkg': Label '//tools/python:private/defs.bzl' is invalid because 'tools/python' is not a package; perhaps you meant to put the colon here: '//:tools/python/private/defs.bzl'?
INFO: Elapsed time: 0.219s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (12 packages loaded)
    currently loading: bazel-ws/external/bazel_tools/tools/test/CoverageOutp\
utGenerator/java/com/google/devtools/coverageoutputgenerator ... (2 packages\
)
    Fetching @rules_java; fetching

I'm seeing something very similar, but even more basic when using the latest pre-release vscode-bazel (which sets --output_base by default). vscode-bazel runs this command:

bazel --output_base=/tmp/ee79067f914abe58284ab7a8abdc7f7d query ...:* --output=package

It fails with the following output:

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Call stack for the definition of repository 'rules_cc' which is a http_archive (rule definition at /tmp/ee79067f914abe58284ab7a8abdc7f7d/external/bazel_tools/tools/build_defs/repo/http.bzl:292:16):
 - /tmp/ee79067f914abe58284ab7a8abdc7f7d/external/bazel_tools/tools/build_defs/repo/utils.bzl:205:9
 - /DEFAULT.WORKSPACE.SUFFIX:302:1
ERROR: error loading package 'bazel-sdk/external/bazel_tools/third_party/jarjar': Label '//tools/jdk:remote_java_tools_aliases.bzl' is invalid because 'tools/jdk' is not a package; perhaps you meant to put the colon here: '//:tools/jdk/remote_java_tools_aliases.bzl'?

If I remove the --output_base, the query succeeds just fine. If I change the --output_base to --output_user_root, the query succeeds just fine. It's almost like bazel cannot modify things in install_base when outputBase has been specified like that. Perhaps the query is sandboxed and overriding output_base is causing install_base to not be in the sandbox? Just wild-guessing.

For reference, I've seen a number of different but very similar errors and they all involve downloading http_archive rules (or similar download rules) for setting up a workspace prior to running a query. I think the intent of output_base was to not disturb the main workspace environment (and thus be able to run concurrently), but maybe the install_base is write-only in that case? For what it's worth the install_base HAD been populated with the downloads prior to running the query with --output_base, so I'm guessing the need to re-download is because changing output_base invalidated something. Maybe that's a clue?

Is anyone actively working on this? This is a pretty annoying bug that affects how e.g. Jenkins build nodes have to be spawned because the cannot handle multiple executors being isolated with output_base. I'd be willing to investigate a fix but am not familiar enough with bazel's codebase to even know where to start looking. Any pointers would be appreciated :)

I found a workaround, but I don't know why it works:

  • Let $WORKSPACE=app
  • Remove the existing symlink bazel-app
  • Replace with a dummy file echo "dummy" >| bazel-app
  • The builds no longer fail, but you get a warning: failed to create one or more convenience symlinks for prefix 'bazel-'

I tried setting --symlink_prefix, --experimental_use_sandboxfs, but it did not work. It's as if 'bazel-' is hardcoded somewhere.

I hope that helps.

The basic problem here is that the bazel-$WORKSPACE convenience symlink is blindly traversed by Bazel: you can even do bazel build //bazel-$WORKSPACE/... and it will load the labels without any issues, and even build the targets there if you're lucky. When you switch output bases, the symlinks are broken, but the fundamental issue is visiting that convenience symlink in the first place.

My suggestion for a workaround for now is to do echo "bazel-$WORKSPACE" >> .bazelignore. The .bazelignore file in the root of the workspace tells Bazel not to consider those directories. Of course, Bazel should be smart enough to not consider them on its own.

cc @mhy1992

Was this page helpful?
0 / 5 - 0 ratings