Bazel: Multiplatform output paths are safe, correct, and efficient

Created on 25 Oct 2018 · 18Comments · Source: bazelbuild/bazel

Tracking issue on Bazel Configurability Roadmap

By "multiplatform" I mean any scenario where two different rules in the same build build with different settings. This also includes non-platform settings like app version, but "multiplatform" is a concise term to capture the essence.

Long-story short is bazel-out/$(cpu)-$compilation_mode)/... doesn't work well for multiplatform builds:

Unrelated actions can inadvertently write to the same output path (correctness issue)
cpu is redundant for cpu-agnostic actions (efficiency issue: switching up the CPU shouldn't require re-executing these actions: see https://github.com/bazelbuild/bazel/issues/6527)
Actions that depend on flags that aren't CPU or compilation mode write to the same path when those flags change (correctness issue)
All the above can destroy remote execution efficiency

This issue tracks the long and complicated effort of making a better output path syntax. Expect the next deliverable on this to be a design doc.

P2 team-Configurability feature request

Source

gregestren

👍3

Most helpful comment

This is being up-prioritized with about ~1 dev's full-time commitment over the next 3 months.

gregestren on 15 Jul 2020

🎉8 👍5 ❤3

All 18 comments

2018 EOY update:

See https://github.com/bazelbuild/bazel/issues/6527#issuecomment-458357722.

More updates coming Q1'2019.

gregestren on 29 Jan 2019

April '19 update:

Detailed plans at Experimental Content-Based Output Paths (please comment!).

Goal is to get an --experimental prototype available this summer that automatically caches multiplatform Java compilation.

gregestren on 30 Apr 2019

P1 issue review: still relevant, still very much intend to explore this but I simply am unable to put time into it at the moment. Hoping I can pick this up next quarter.

gregestren on 11 May 2020

This is being up-prioritized with about ~1 dev's full-time commitment over the next 3 months.

gregestren on 15 Jul 2020

🎉8 👍5 ❤3

Is there an escalation path at Google (i.e., someone we could reach out to) that could help align on business priorities?

michaelmartak on 29 Jul 2020

Write to me as a technical contact ([email protected]) explaining your needs as best you can. I'm happy to chat technical concerns and CC in folks who can help with business priorities.

gregestren on 29 Jul 2020

To elaborate on https://github.com/bazelbuild/bazel/issues/6526#issuecomment-658927627,

I believe the generic solution described in this issue and https://github.com/bazelbuild/bazel/issues/6526#issuecomment-488103473 is nuanced enough that it'd have to go through a long experimental phase before we could consider productionizing parts.

I'd still like to get to that phase because it would still let interested folks opt in, explore, and help evolve its path.

But we're also trying to explore if there are more limited variations we could hack out more quickly while avoiding the deeper design issues. That's going to be the focus of the current up-prioritization. We have an idea of something (hopefully) quick and dirty that could approximate a lot of this, probably with a small code injection into the remote executor client. I'll continue to follow up here.

Speaking of, is anyone interested in this and not using remote execution?

gregestren on 29 Jul 2020

Speaking of, is anyone interested in this and not using remote execution?

We are. We're in a setup where we have macOS dev machines and Linux CI machines for Android builds. We're hoping to use remote exec at some point but atm we're only using a remote cache, which we're thinking this might help with since right now the 2 platforms don't share cache hits

keith on 29 Jul 2020

👍1

Acknowledged, thanks.

gregestren on 30 Jul 2020

Exact same with @keith for my team's use case except plain old Java->jars, not Android.

plaird on 30 Jul 2020

👍1

I'm sorry I haven't updated this for a while. Quick update is I recently experimented with a limited form of this as suggested at https://github.com/bazelbuild/bazel/issues/6526#issuecomment-665836057. Initial results look promising.

I want to do another test over a sample project (maybe Bazel itself?) to verify the results. Then I need to look at injection points, since Bazel has different APIs for delegating to local and remote executors and this change is likely to live in the implementation layer.

I'm spending a good chunk of this week doing the above. As always, please ping (or reach out to me directly) if you're wondering what's up in between updates.

gregestren on 27 Oct 2020

Thanks for update @gregestren! Your work in this area is very much appreciated!

Would you like to elaborate about the scope of the “limited form" of #6526? Which use cases do you expect it to support, and which not?

I interpret it as that a complete and generic solution with production quality of #6526, is still the final goal, but realistically more than a year away. Is that correctly interpreted? Would you dare to make a very rough time estimate?

Again, thank you for all effort in this area, it is very important for us to not explode the executor workload when using transitions for our c/c++ applications, in examples like: https://groups.google.com/g/bazel-discuss/c/zVEc7gzbyu0

ulrfa on 12 Nov 2020

@ulrfa sure!

The generic approach I outlined in https://github.com/bazelbuild/bazel/issues/6526#issuecomment-488103473 tries to balance a variety of needs, including the need that the paths the executor sees are identical to what appears in Bazel's final output tree. That makes actions that write manifests or debug symbol paths safe.

If we drop that requirement, that opens up a much simpler algorithm: strip the config-specific info completely from the paths before shipping them to the executor, then add them back when writing them to Bazel's output tree. So bazel-out/x86-fastbuild-someconfighash/mypkg/myoutput gets staged as bazel-out/mypkg/myoutput, cached-checked on the executor accordingly, executed, and rewritten back to its original path when done.

That exposes the risks from my first paragraph. But not every action has that risk. Lots of actions truly don't care what their input or output paths look like. So this new approach would introduce criteria for which actions are "safe" in this regard and rewrite paths for safe actions. We could presumably start with a small and conservative safety set, then expand as we vet more actions.

Java actions I think are particularly good candidates for this. C++ has the extra challenge of debug mode symbol paths. But that's only a certain subset of C++ actions. Not all of them.

For https://groups.google.com/g/bazel-discuss/c/zVEc7gzbyu0, another complementary idea is "trimming" - if it's really only the binary that consumes the flag, we could simply remove that flag from configurations in its dependencies. I already have a tool we could conceptually use to make this happen. But it'd require preprocessing: every time a BUILD file changes you'd have to rerun that tool to annotate the BUILD rules. A 100% automatic approach would be ideal.

Time-wise, I'd like to share some clearer experimental results on some Java actions over the next month or two. If that all looks good I don't see why we can't enable this limited approach by, say, January. It might take more tweaks to figure out the C++ nuances.

gregestren on 16 Nov 2020

Thanks @gregestren!

C++ has the extra challenge of debug mode symbol paths. But that's only a certain subset of C++ actions. Not all of them.

What subset of C++ actions do you mean? Does the subset include all actions compiling source code with debug symbol paths? Unfortunately we need to compile our C/C++ code with debug symbol paths.

For https://groups.google.com/g/bazel-discuss/c/zVEc7gzbyu0, another complementary idea is "trimming" - if it's really only the binary that consumes the flag, we could simply remove that flag from configurations in its dependencies. I already have a tool we could conceptually use to make this happen. But it'd require preprocessing: every time a BUILD file changes you'd have to rerun that tool to annotate the BUILD rules. A 100% automatic approach would be ideal.

Trimming is interesting! I guess that would also reduce build graph size and RAM requirement. I will have a look at your tool! But unfortunately, we have a deep build graph, with many configuration options consumed by lots of cc_library. It would be hard for us without an automatic approach.

Do you as final goal, aim for an automatic trimming solution and/or an output path solution handling C/C++ code with debug symbol paths? If yes, would you like to give a rough time estimate?

I'm sorry to bother you about the time estimates. We are considering if going all-in with transitions, and your input about what to expect, and roughly when, is essential for us in that decision.

ulrfa on 17 Nov 2020

What subset of C++ actions do you mean? Does the subset include all actions compiling source code with debug symbol paths? Unfortunately we need to compile our C/C++ code with debug symbol paths.

Yes, I mean actions that rely on paths for resolving debug symbols vs. those that don't. Although it's not just that, it's also whatever consumes those paths (like gdb). If you're not actually debugging maybe this doesn't matter. But if you need debug symbol paths I guess that's not the case?

This isn't to say there aren't options. We could conceivably rewrite the symbol paths after the fact. But that'd be a specialized effort.

Trimming is interesting! I guess that would also reduce build graph size and RAM requirement. I will have a look at your tool! But unfortunately, we have a deep build graph, with many configuration options consumed by lots of cc_library. It would be hard for us without an automatic approach.

They key point in my mind is if your top-level binary is the only one that actually consumes the flag in question, then we'd have some real options, no matter what cc_librarys in the subgraph do. If those cc_librarys really need to behave differently based on these options then by definition they wouldn't be shareable anyway. We'd need more details on exactly how the flag is used to clarify assumptions.

Do you as final goal, aim for an automatic trimming solution and/or an output path solution handling C/C++ code with debug symbol paths? If yes, would you like to give a rough time estimate?

That would be wonderful, but it's an ambitious goal that I can't credibly put a timeline on. I'm trying to focus effort on incremental steps forward, so we can see credible practical progress vs. a reallllly long wait with unclear outcome.

So in my view the status quo is for us to identify optimizable use cases and try to optimize them. Not try to automatically make everything work at peak efficiency.

I'm sorry to bother you about the time estimates. We are considering if going all-in with transitions, and your input about what to expect, and roughly when, is essential for us in that decision.

No worries. I'm not sure my input is helping you with this decision. I guess I'm ultimately saying we need to understand the precise requirements of specific builds and aim optimizations at improving those builds (and whatever other builds have the same patterns). So the real answer, as usual, is in the details.

gregestren on 23 Nov 2020

Hi!
I work at the same project as ulrfa, I have also written this question in the forum https://groups.google.com/g/bazel-discuss/c/zVEc7gzbyu0/m/5UcZ8aXOBQAJ

I try here to describe our use case:

We build C/C++ applications for an embedded system with quite large build graphs with very many configurable options using "User-defined build settings" https://docs.bazel.build/versions/master/skylark/config.html

Examples of configurable options are:

Select bazel targets based on HW configuration,
Select bazel targets based on in what environment the application will be used, like test environment or customer environment
Select bazel targets or set C defines (-D flag) in cc_* targets for stubbed testing where some parts of the system are stubbed for testing purpose

The targets that are affected by the options can be at any level in the dependency tree.
Many options have private visibility and only affect a sub-part of the system, but we depend on that the correct command line options are set when the application is built.

The typical use case is that you build one application with some specified configurable options.

If you do this on the command line everything will be built in the default output tree.
If you change one option or build another application with one option that differs, only the targets that are affected by the option will be rebuilt, everything else can be reused.

If you do this in a transition, nothing can be reused between the builds.

This will cause a lot of rebuilds if something in e.g. some common code is changed. It will also force the need of a much larger remote cache storage.

We need to be able to debug the application targets, we use gdb and depend on that the debug symbols are correct to be able to show the source files.