When the output directory hash for the bazel-out subdirectory for generated artifacts is computed on a transition, it is based upon a delta of the build settings before and after the transition. As a result, multiple output directories with different hashes are getting generated for targets that are built with the exact same build settings.
==> BUILD.bazel <==
load(":stuff.bzl", "fr")
fr(
name = "FW",
)
==> stuff.bzl <==
def _tr_impl(settings, attr):
return {
"//command_line_option:cpu": "ios_x86_64",
}
tr = transition(
implementation = _tr_impl,
inputs = ["//command_line_option:cpu"],
outputs = ["//command_line_option:cpu"],
)
def _fr_impl(ctx):
return []
fr = rule(
cfg = tr,
implementation = _fr_impl,
attrs = {
"_whitelist_function_transition": attr.label(
default = "@bazel_tools//tools/whitelists/function_transition_whitelist",
),
},
)
Run these commands:
> bazelisk cquery FW
# output includes "//sample:FW (3079a19075a19231a074cb24e8a1e6d40cc62546a3a4a625aed22fcbdfa6c465)"
> bazelisk cquery FW --cpu=ios_x86_64
# output includes "//sample:FW (e4ca851ff6d7872e00a13f59323655db1a538255f7c9f581cbc45ab47b005464)"
> bazelisk config 3079a19075a19231a074cb24e8a1e6d40cc62546a3a4a625aed22fcbdfa6c465 > /tmp/a
> bazelisk config e4ca851ff6d7872e00a13f59323655db1a538255f7c9f581cbc45ab47b005464 > /tmp/b
> diff /tmp/a /tmp/b
1,2c1,2
< BuildConfiguration 3079a19075a19231a074cb24e8a1e6d40cc62546a3a4a625aed22fcbdfa6c465:
< Skyframe Key: BuildConfigurationValue.Key[3079a19075a19231a074cb24e8a1e6d40cc62546a3a4a625aed22fcbdfa6c465]
---
> BuildConfiguration e4ca851ff6d7872e00a13f59323655db1a538255f7c9f581cbc45ab47b005464:
> Skyframe Key: BuildConfigurationValue.Key[e4ca851ff6d7872e00a13f59323655db1a538255f7c9f581cbc45ab47b005464]
24c24
< affected by starlark transition: [cpu]
---
> affected by starlark transition: []
79c79
< transition directory name fragment: ST-20f68df4034c
---
> transition directory name fragment: null
Observe that the transition directory name fragments are different, even though the other build settings are the same.
Mac OS 10.15.6.
bazel info release?release 3.5.0
bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.It does not include development version.
git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?I am not running bazel from a local git checkout of the bazel repo. I am using the release build.
The FunctionTransitionUtil.java file in bazel is intentionally computing the output directory hash only based upon the options that are affected by the starlark transition. It looks like this commit updated bazel to look at the delta of build settings: https://github.com/bazelbuild/bazel/commit/6360ff7ed15c2008c534d58022da4b0a2c739e6e cc @juliexxia
In this branch, run repro.sh to reproduce the problem: https://github.com/segiddins/bazel-config-multiple-hashes.git
My team has confirmed this is a regression since bazel version 3.2.0.
In this PR, I also reproduced the problem: https://github.com/bazel-ios/rules_ios/pull/124
cc @juliexxia, this is a regression we found in Bazel 3.5 in the configuration workstream, and I think you're the right person to triage this?
Yep! apologies, I'm a little confused by the issue: historically, starlark transitions always updated the output directory. What 6360ff7 did was make it so that this only happened in cases where options were actually given new values. The output directory of starlark transitioned targets being different isn't new so I'm confused about what the regression is here. Are you saying the output directory wasn't being updated in 3.2.0 and now is?
The FunctionTransitionUtil.java file in bazel is intentionally computing the output directory hash only based upon the options that are affected by the starlark transition. It looks like this commit updated bazel to look at the delta of build settings: 6360ff7 cc @juliexxia
Are you saying this is undesired behavior?
FWIW, you're right, the repro above should not have an updated transition directory fragment because as you say, it's the same configuration. But not having the updated transition directory fragment is new as of 6360ff7.
So I _think_ I can sum up the issue as: "bazel is using multiple output directories for configurations _that are otherwise identical_". This means targets are getting configured & built multiple times for identical configurations, and can mess up things like rules that try to do IDE configuration, because targets that should be unique in fact are showing up under multiple output directories.
This makes starlark configurations _path dependent_, in a sense, where the _incoming_ configuration can have an effect on the output configuration's config hash. I don't think this is intentional? My expectation as a rule author is that "identical" configurations should always map to the config hash, and the set of affected by starlark transition flags should not, by itself, change the config hash.
So I think I can sum up the issue as: "bazel is using multiple output directories for configurations that are otherwise identical". This means targets are getting configured & built multiple times for identical configurations, and can mess up things like rules that try to do IDE configuration, because targets that should be unique in fact are showing up under multiple output directories.
Yep! This all makes sense and I agree. Before 6360ff7, this wasn't the case, we were hashing all the values that were set by transitions regardless of they were the same value as they came into the transition. After 6360ff7, what you're saying should hold true as we slimmed down the group of post-transition configurations that get an updated output directory to those where the transition actually changed the value of an option (by comparing the output value to the input value).
What I'm confused about is the timeline/regression. Starlark transitions have historically always updated the output directory and only recently, with 6360ff7, have started not always updating the output directory, only doing so when a flag is actually changed. So I'm confused about how you weren't seeing updated output directory paths before 6360ff7 and are seeing them now since it should be exactly the opposite. I wanted to clarify that's indeed what you're seeing so I can dig further.
I think we maybe were seeing updating output directory paths, but they were being updated to the same value? Because all the outputs were being taken into consideration?
Again, pure speculation, but I feel like what needs to be done is look at the entire output configuration diffed against the "default" configuration, rather than only considering the diff of transition input/output flags against the incoming configuration? Not sure if that makes any sense.
So I'm confused about how you weren't seeing updated output directory paths before 6360ff7 and are seeing them now since it should be exactly the opposite
From looking at the diff, I have a hunch that the change from https://github.com/bazelbuild/bazel/commit/6360ff7ed15c2008c534d58022da4b0a2c739e6e#diff-01af620735f7a29048977650332a354cR415-R416 to only looking at inputs to looking at inputs and outputs might be the trigger?
I think we maybe were seeing updating output directory paths, but they were being updated to the same value?
It would be really helpful to get confirmation/clarification about what you were seeing in old bazel that you're seeing differently in new bazel and why it's causing issues for you.
Again, pure speculation, but I feel like what needs to be done is look at the entire output configuration diffed against the "default" configuration, rather than only considering the diff of transition input/output flags against the incoming configuration? Not sure if that makes any sense.
That's the purpose of the "affected by starlark transition" bit. So the way the logic works:
# what happens when a transition is performed
maybe_new_values = get the map returned by the transition implementation function
this_time_affected_options = {}
for option, new_value in maybe_new_values:
if new_value != input value:
put option in affected_options
if affected_options != empty:
String to_hash
affected_options = affected_optins.add_all(this_time_affected_options)
for option in affected_options
to_hash+=option + value
transition directory name fragment = hash(to_hash)
The key here is that we maintain a list of affected options over an entire build and we only update the transition directory name fragment when a transition is performed. So given the following graph structure:
A
I transition1 that sets --foo=bar
v
B
I transition2 that sets --foo=blargh
v
C
I transition3 that sets --zoom=zop
v
D
If you build A with --foo=bar, then in theory* you get the following configured targets:
A: transition directory name fragment = null, affected by starlark transition = {}
B: transition directory name fragment = null, affected by starlark transition = {}
C: transition directory name fragment = hash_of("foo=blargh"), affected by starlark transition = {foo}
D: transition directory name fragment = hash_of("foo=blargh, zoom=zop"), affected by starlark transition = {foo,zop}
From looking at the diff, I have a hunch that the change from 6360ff7#diff-01af620735f7a29048977650332a354cR415-R416 to only looking at inputs to looking at inputs and outputs might be the trigger?
All this does is give the input dict to the transition access to the default values of the output settings. Build settings that are set to their default value are not stored in the configuration (so that configurations with non-set build settings and explicitly-set-to-default build settings look the same) so if we don't have a recorded value for the build setting in the configuration, we need to know what the default value of the setting is so we can either record the new value or recognize the new value as the default and continue to ignore it.
I said in theory* in the previous post because it sounds like you're getting different results. But again, need some clarity on what you're seeing.
What would help provide that clarity? Given I have the sort of single-file repro, I'm happy to do whatever debugging / logging you think would be helpful
Do you mind verifying for me that you're running Bazel 3.5.0? Is it possible to run it with Bazel 3.5.0 proper and see if you get the same results? I notice you're running bazelisk - are you setting the bazel version anywhere? See: https://github.com/bazelbuild/bazelisk#how-does-bazelisk-know-which-bazel-version-to-run
I'm running the following repro (adapted from yours) with regular bazel 3.5.1 and I'm not seeing the same results you are:
==> BUILD.bazel <==
load(":stuff.bzl", "fr", "other")
fr(
name = "FW",
)
other(name = "other")
==> stuff.bzl <==
def _tr_impl(settings, attr):
return {
"//command_line_option:cpu": "ios_x86_64",
}
tr = transition(
implementation = _tr_impl,
inputs = [],
outputs = ["//command_line_option:cpu"],
)
def _fr_impl(ctx):
return []
fr = rule(
cfg = tr,
implementation = _fr_impl,
attrs = {
"_whitelist_function_transition": attr.label(
default = "@bazel_tools//tools/whitelists/function_transition_whitelist",
),
},
)
other = rule(implementation = _fr_impl)
$ bazel cquery :all --cpu=ios_x86_64
INFO: Analyzed 3 targets (0 packages loaded, 38 targets configured).
INFO: Found 3 targets...
//:FR (9ea510b1ff7233388168cb46511ecff9b13a00b0bd26a252d2099dc6b8a22a89)
//:other (9ea510b1ff7233388168cb46511ecff9b13a00b0bd26a252d2099dc6b8a22a89)
INFO: Elapsed time: 0.088s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
$ bazel config 9ea510b1ff7233388168cb46511ecff9b13a00b0bd26a252d2099dc6b8a22a89 | grep "transition"
INFO: Displaying config with id 9ea510b1ff7233388168cb46511ecff9b13a00b0bd26a252d2099dc6b8a22a89
affected by starlark transition: []
transition directory name fragment: null
Another helpful verification would be at what bazel version were you not seeing this behavior because I would expect this behavior to be in all bazel versions before 3.5.0
https://github.com/segiddins/bazel-config-multiple-hashes.git shows this is a regression between 3.4.0 and 3.5.0
There seem to be some build failures In that repro. Where are the config hashes that you use at the end coming from? I don't see them show up anytime before the config command:
"bazel config 1908601e8157a3765e39ede2fc98d23c9c1be24080e5589fc71fa6a15d7d9ff1 28e83bd7c518774a6378a7ee5024c969ab10fdacb062577d0f528bb70b3e6917"
Can you run the repro I wrote above? It's smaller and should give a clear signal.
There seem to be some build failures In that repro
The build failures are intentional -- that's how I grab the pair of configs to diff.
I or @amberdixon will try to run your repro.
Hi Julie! Sorry about the delayed response here. Here are my responses to your comments and my notes about the repro case.
Are you saying the output directory wasn't being updated in 3.2.0 and now is?
That's fine that the output directory gets updated. But it seems like the parameters used to compute the new directory hash should be a function of all the build settings in the new configuration, not just the changed settings. If you only look at the changed settings on the transition, then different hashes will end up getting used for targets that are actually configured with the same settings.
Are you saying this is undesired behavior?
After 6360ff7, what you're saying should hold true as we slimmed down the group of post-transition configurations that get an updated output directory to those where the transition actually changed the value of an option (by comparing the output value to the input value).
The intent of 6360ff7 is to not change the output directory hash on a no-op. That is perfectly fine. However the bug (we think) is that this commit computes the hash from the changed build settings, not from all of them. This becomes a problem with one or more build settings changes. Therefore, this commit is not just affecting no-op transitions .. it's affecting all transitions.
You presented an example with these targets and transitions:
A: transition directory name fragment = null, affected by starlark transition = {}
B: transition directory name fragment = null, affected by starlark transition = {}
C: transition directory name fragment = hash_of("foo=blargh"), affected by starlark transition = {foo}
D: transition directory name fragment = hash_of("foo=blargh, zoom=zop"), affected by starlark transition = {foo,zop}
For D, the transition directory name fragment will only be hash_of("zoom=zop"). This is beacuse foo=blargh has not changed on the transition between C -> D.
Let's consider the following scenario. In your dependency graph, let's say there is another way of getting to D from E, as follows:
E
I transition3 that sets --zoom=zop, --foo=blargh
v
D
In either case, D is configured with --zoom=zop, --foo=blargh. But we see that transitioning from C->D vs. from E->D results in a different output directory has for D.
C: transition directory name fragment = hash_of("foo=blargh"), affected by starlark transition = {foo}
D: transition directory name fragment = hash_of("zoom=zop"), affected by starlark transition = {zop}
E: transition directory name fragment = null, affected by starlark transition = {}
D: transition directory name fragment = hash_of("foo=blargh, zoom=zop"), affected by starlark transition = {foo,zop}
All this does is give the input dict to the transition access to the default values of the output settings. Build settings that are set to their default value are not stored in the configuration (so that configurations with non-set build settings and explicitly-set-to-default build settings look the same) so if we don't have a recorded value for the build setting in the configuration, we need to know what the default value of the setting is so we can either record the new value or recognize the new value as the default and continue to ignore it.
Perhaps the problem here is that the code doesn't actually know what the "default" settings are. Therefore it regards any unchanged setting as a "default."
Confirmed we are using Bazel 3.5.0 with bazelisk, as follows:
~/Development/rules_ios/sample master * cat ../.bazelversion
3.5.0
~/Development/rules_ios/sample master * ls ../WORKSPACE # the workspace root is at the rules_ios directory
../WORKSPACE
~/Development/rules_ios/sample master * ls WORKSPACE
ls: WORKSPACE: No such file or directory
~/Development/rules_ios/sample master * ls
BUILD.bazel stuff.bzl
In your repro case, you ran bazel cquery :all --cpu=ios_x86_64. However, in my repro case, I first ran bazel cquery without the cpu=ios_x86_64 flag. That caused the transition to kick in and set the cpu flag. Since the cpu flag was being changed, its value affected the output directory hash.
Subsequently, I ran bazel cquery with the cpu=ios_x86_64 flag. In this case, the transition did not change anything. Therefore, no new output directory hash was computed.
But now unfortunately we find that there are different hashes being used for FW, even though in either case, it's being built with the same CPU flag.
I hope that helps illustrate the problem!
On my end, I will try to repro this problem on an older bazel version and also on 3.5.1.
For D, the transition directory name fragment will only be hash_of("zoom=zop"). This is because foo=blargh has not changed on the transition between C -> D.
This isn't true because the "affected by starlark transition" is a list that is maintained and grown over the entire build. This list is what's used to determine what options will be hashed to form the output directory. So what happens in the code is that D should have the same output directory regardless of it was built via E or C (assuming they set the same values) because in both cases, the list of "affected by starlark transition" contains both foo and zop
Here's the code that adds the newly updated options to the running list of options affected by starlark options
https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/starlark/FunctionTransitionUtil.java;l=373?q=FunctionTransitionUtil&ss=bazel
Here's the code that runs over the entire running list of changed options, not just the newly updated options:
https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/starlark/FunctionTransitionUtil.java;l=385?q=FunctionTransitionUtil&ss=bazel
In your repro case, you ran bazel cquery :all --cpu=ios_x86_64. However, in my repro case, I first ran bazel cquery without the cpu=ios_x86_64 flag. That caused the transition to kick in and set the cpu flag. Since the cpu flag was being changed, its value affected the output directory hash.
Subsequently, I ran bazel cquery with the cpu=ios_x86_64 flag. In this case, the transition did not change anything. Therefore, no new output directory hash was computed.
What is the unexpected behavior here? The behavior of the output directory hash changing without --cpu=ios_x86_64 set on the command line and not changing with --cpu=ios_x86_64 set on the command line is what I would expect to happen.
This isn't true because the "affected by starlark transition" is a list that is maintained and grown over the entire build. This list is what's used to determine what options will be hashed to form the output directory. So what happens in the code is that D should have the same output directory regardless of it was built via E or C (assuming they set the same values) because in both cases, the list of "affected by starlark transition" contains both foo and zop
Ah yes, you are right, the "affected by starlark transition" does seem to aggregate settings that are affected by transitions. However, I still think there are some problems with the current implementation. In my company's repo, I have a scenario where all the build settings for a target are the same, regardless of where we traverse into it from:
AppTarget -> D
TestTarget -> D
D
In each case, D ends up with the same set of build settings. And yet 3 different directory hashes are generated and the "affected by starlark transition" list looks different in each case. I'll work on providing you with a repro case for this scenario.
What is the unexpected behavior here? The behavior of the output directory hash changing without --cpu=ios_x86_64 set on the command line and not changing with --cpu=ios_x86_64 set on the command line is what I would expect to happen.
In the original repro case I presented, we see the FW is being built with the same value for cpu. We see that there is no diff in /tmp/a and /tmp/b, other than which fields were affected by starlark transition. At the end of the day, the built artifact is being built with the same settings and therefore should map to the same output directory hash (I think).
In each case, D ends up with the same set of build settings. And yet 3 different directory hashes are generated and the "affected by starlark transition" list looks different in each case. I'll work on providing you with a repro case for this scenario.
Gotcha, that doesn't sound right. Thanks in advance for the repro. That would be very helpful. Is it possible the flags on the original command line/blazerc is different in the different cases? That would also lead to different directory hashes even if D ended up in the same configuration across the builds.
In the original repro case I presented, we see the FW is being built with the same value for cpu. We see that there is no diff in /tmp/a and /tmp/b, other than which fields were affected by starlark transition. At the end of the day, the built artifact is being built with the same settings and therefore should map to the same output directory hash (I think).
The last sentence is correct! Thanks for clarifying. Using your repro I was unable to get the same results from my end, I got the expected same output directory hash. Not sure what's going on but hopefully a further repro will help us get to the bottom of it.
It would be helpful if you could try setting up and running the series of command I posted above as well (in https://github.com/bazelbuild/bazel/issues/12171#issuecomment-702313974)
I have tried the following on bazel versions 3.5.0 and 3.5.1, but am seeing the same behavior.
I have a scenario where all the build settings for a target are the same, regardless of where we traverse into it from:
AppTarget -> D
TestTarget -> D
DIn each case, D ends up with the same set of build settings. And yet 3 different directory hashes are generated and the "affected by starlark transition" list looks different in each case. I'll work on providing you with a repro case for this scenario.
I think this particular issue has to do with the way the AppleCrosstoolTransition from the AppleBinary rule in the bazel repo gets applied. (There is different from the issue I reported in the initial description in the bug report, where the directory output hash differs, depending on whether or not the build setting was applied via the command line or via a transition.)
Here's how to repro the issue with AppleCrosstoolTransition problem:
==> stuff.bzl <==
def _tor_impl(settings, attr):
return {
"//command_line_option:cpu": "ios_x86_64",
}
tor = transition(
implementation = _tor_impl,
inputs = [],
outputs = [
"//command_line_option:cpu",
],
)
def _ftor_impl(ctx):
return []
ftor = rule(
implementation = _ftor_impl,
attrs = {
"deps": attr.label_list(cfg=tor),
"_whitelist_function_transition": attr.label(
default = "@bazel_tools//tools/whitelists/function_transition_whitelist",
),
},
)
==> BUILD.bazel <==
load("@build_bazel_rules_apple//apple:ios.bzl", "ios_application")
load("@rules_cc//cc:defs.bzl", "objc_library")
load(":stuff.bzl", "ftor")
load(":stufforig.bzl", "fr")
fr(
name = "FWorig",
)
exports_files([
"Info.plist",
])
objc_library(
name = "D",
srcs = glob(["FW.h", "FW.m"]),
)
ftor(
name="TestTarget",
deps = [":D"],
)
ios_application(
name = "AppTarget",
deps = [":D"],
bundle_id = "ios.app",
infoplists = [":Info.plist"],
families = [
"iphone",
"ipad",
],
minimum_os_version = "12.0",
)
In this case, you see that D is hashing to 3 different directory hashes:
~/sample-bug-report-bazel bazelisk cquery :all
INFO: Analyzed 6 targets (1 packages loaded, 13 targets configured).
INFO: Found 6 targets...
//:AppTarget (df1be34ef9742c8a8fbf877248919bd886d53a7454913b50c6d6349e4be2a8b5)
//:AppTarget.__internal__.apple_binary (e00315f4608662cd39dfa498f714a535dca6a0152986056deab77e9fc1fc2511)
//:AppTarget.swift_runtime_linkopts (df1be34ef9742c8a8fbf877248919bd886d53a7454913b50c6d6349e4be2a8b5)
//:AppTarget.swift_runtime_linkopts (fcdd87f586a42f4db4b5f3495f5afb0aa75a975f985cbdb6fb009552b188c547)
//:AppTarget_entitlements (df1be34ef9742c8a8fbf877248919bd886d53a7454913b50c6d6349e4be2a8b5)
//:AppTarget_entitlements (fcdd87f586a42f4db4b5f3495f5afb0aa75a975f985cbdb6fb009552b188c547)
//:D (bb37f927e397a8b32cb6c9b78a0ff679ed31cb2fe60c868e9034d1c1337b3ece)
//:D (e00315f4608662cd39dfa498f714a535dca6a0152986056deab77e9fc1fc2511)
//:D (fcdd87f586a42f4db4b5f3495f5afb0aa75a975f985cbdb6fb009552b188c547)
//:TestTarget (df1be34ef9742c8a8fbf877248919bd886d53a7454913b50c6d6349e4be2a8b5)
INFO: Elapsed time: 0.219s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
~/sample-bug-report-bazel
Notice that D has 3 different configs. For the config bb37f927e397a8b32cb6c9b78a0ff679ed31cb2fe60c868e9034d1c1337b3ece, D is getting built via an ongoing edge transition from TestTarget, which applies the build setting "cpu." For the config e00315f4608662cd39dfa498f714a535dca6a0152986056deab77e9fc1fc2511 however, D is getting built via AppTarget, which applies AppleCrosstoolTransition. In either case, the build settings for D do not differ. But in the first case, the "affected by starlark transition" array includes cpu, whereas in the second case it does not.
~/sample-bug-report-bazel bazelisk config bb37f927e397a8b32cb6c9b78a0ff679ed31cb2fe60c868e9034d1c1337b3ece e00315f4608662cd39dfa498f714a535dca6a0152986056deab77e9fc1fc2511
INFO: Displaying diff between configs bb37f927e397a8b32cb6c9b78a0ff679ed31cb2fe60c868e9034d1c1337b3ece and e00315f4608662cd39dfa498f714a535dca6a0152986056deab77e9fc1fc2511
Displaying diff between configs bb37f927e397a8b32cb6c9b78a0ff679ed31cb2fe60c868e9034d1c1337b3ece and e00315f4608662cd39dfa498f714a535dca6a0152986056deab77e9fc1fc2511
FragmentOptions com.google.devtools.build.lib.analysis.config.CoreOptions {
affected by starlark transition: [cpu], []
transition directory name fragment: ST-20f68df4034c, null
}
This is a problem, since in the second case, the AppleCrossTool transition is actually updating the cpu build setting:
~/sample-bug-report-bazel bazelisk cquery 'allpaths(AppTarget, D)' --transitions=full
INFO: Analyzed 2 targets (0 packages loaded, 0 targets configured).
INFO: Found 2 targets...
AppleCrosstoolTransition -> //:D (fcdd87f586a42f4db4b5f3495f5afb0aa75a975f985cbdb6fb009552b188c547)
ComposingTransition -> //:AppTarget.__internal__.apple_binary (e00315f4608662cd39dfa498f714a535dca6a0152986056deab77e9fc1fc2511)
deps#//:D#((AppleBinaryTransition + AppleCrosstoolTransition) + (TestTrimmingTransition + ConfigFeatureFlagTaggedTrimmingTransition)) ( -> 82a7b165ed9b0c3f60255615cb8e9f7976b06f58eff45ab48d0bcdd99e4e1bb1)
platforms:[@local_config_platform//:host] -> [[]]
cpu:darwin_x86_64 -> [ios_x86_64]
apple configuration distinguisher:APPLE_CROSSTOOL -> [APPLEBIN_IOS]
apple_platform_type:macos -> [ios]
apple_split_cpu: -> [x86_64]
ios_minimum_os:null -> [com.google.devtools.build.lib.rules.apple.DottedVersion$Option@1707a3]
deps#//:AppTarget.swift_runtime_linkopts#(AppleBinaryTransition + (TestTrimmingTransition + ConfigFeatureFlagTaggedTrimmingTransition)) ( -> 82a7b165ed9b0c3f60255615cb8e9f7976b06f58eff45ab48d0bcdd99e4e1bb1)
platforms:[@local_config_platform//:host] -> [[]]
cpu:darwin_x86_64 -> [ios_x86_64]
apple configuration distinguisher:APPLE_CROSSTOOL -> [APPLEBIN_IOS]
apple_platform_type:macos -> [ios]
apple_split_cpu: -> [x86_64]
ios_minimum_os:null -> [com.google.devtools.build.lib.rules.apple.DottedVersion$Option@1707a3]
//:AppTarget.swift_runtime_linkopts (fcdd87f586a42f4db4b5f3495f5afb0aa75a975f985cbdb6fb009552b188c547)
deps#//:D#(AppleCrosstoolTransition + (TestTrimmingTransition + ConfigFeatureFlagTaggedTrimmingTransition)) ( -> fcdd87f586a42f4db4b5f3495f5afb0aa75a975f985cbdb6fb009552b188c547)
//:AppTarget (df1be34ef9742c8a8fbf877248919bd886d53a7454913b50c6d6349e4be2a8b5)
deps#//:AppTarget.__internal__.apple_binary#((AppleCrosstoolTransition + NoTransition) + (TestTrimmingTransition + ConfigFeatureFlagTaggedTrimmingTransition)) ( -> 114a3a25c1d8e83729ab0498cf22b9e0edbba990567fc771ebf62ff4ce51fa56)
platforms:[@local_config_platform//:host] -> [[]]
cpu:darwin -> [darwin_x86_64]
apple configuration distinguisher:UNKNOWN -> [APPLE_CROSSTOOL]
INFO: Elapsed time: 0.103s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
Is it possible the flags on the original command line/blazerc is different in the different cases? That would also lead to different directory hashes even if D ended up in the same configuration across the builds.
In the example above, we don't have different command line or bazelrc setting ... but isn't it a bug that would result in different directory hashes, even if D ended up in the same configuration across builds?
It would be helpful if you could try setting up and running the series of command I posted above as well (in #12171 (comment))
I ran your repro steps. I saw the same result as you. However, in order to repro this issue, I think you would need to run the following steps. First, run everything with the cpu command line flag, just like you did. Then try running it without the command line flag. You will notice that FW is now pointing to another output directory hash ... despite the fact that the build settings for FW have not changed (in the second case, the cpu build setting for FW gets applied from the transition, not the command line).
~/sample-bug-report-bazel bazelisk cquery --cpu=ios_x86_64 :all
INFO: Build option --cpu has changed, discarding analysis cache.
INFO: Analyzed 2 targets (0 packages loaded, 9 targets configured).
INFO: Found 2 targets...
//:FW (9a6098a784cca06503a5a50a9b561f408b1e05a1d2fa52a39f25e778b3fe29c8)
//:other (9a6098a784cca06503a5a50a9b561f408b1e05a1d2fa52a39f25e778b3fe29c8)
INFO: Elapsed time: 0.105s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
~/sample-bug-report-bazel bazelisk cquery :all
INFO: Build option --cpu has changed, discarding analysis cache.
INFO: Analyzed 2 targets (0 packages loaded, 14 targets configured).
INFO: Found 2 targets...
//:FW (fc05326ef17847a4afeca3b58dfb49098eb8bb048b976e7bceb4e1309e4727e0)
//:other (df1be34ef9742c8a8fbf877248919bd886d53a7454913b50c6d6349e4be2a8b5)
INFO: Elapsed time: 0.086s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
~/sample-bug-report-bazel bazelisk config 9a6098a784cca06503a5a50a9b561f408b1e05a1d2fa52a39f25e778b3fe29c8 fc05326ef17847a4afeca3b58dfb49098eb8bb048b976e7bceb4e1309e4727e0
INFO: Displaying diff between configs 9a6098a784cca06503a5a50a9b561f408b1e05a1d2fa52a39f25e778b3fe29c8 and fc05326ef17847a4afeca3b58dfb49098eb8bb048b976e7bceb4e1309e4727e0
Displaying diff between configs 9a6098a784cca06503a5a50a9b561f408b1e05a1d2fa52a39f25e778b3fe29c8 and fc05326ef17847a4afeca3b58dfb49098eb8bb048b976e7bceb4e1309e4727e0
FragmentOptions com.google.devtools.build.lib.analysis.config.CoreOptions {
affected by starlark transition: [], [cpu]
transition directory name fragment: null, ST-20f68df4034c
}
Hi @juliexxia ! I was wondering if my latest comment was helpful. I was hoping that the bazel team could address the issue where different directory hashes are used for artifacts built with the same set of build settings. We do see that different hashes are getting used when the build settings are applied from the command line vs. from a transition. It has necessitated a number of workarounds on our end. Thanks!
Hi! Apologies it's been a busy couple of weeks. I'll read through you last couple of responses and get back to you in the next couple of days.
Notice that D has 3 different configs. For the config bb37f927e397a8b32cb6c9b78a0ff679ed31cb2fe60c868e9034d1c1337b3ece, D is getting built via an ongoing edge transition from TestTarget, which applies the build setting "cpu." For the config e00315f4608662cd39dfa498f714a535dca6a0152986056deab77e9fc1fc2511 however, D is getting built via AppTarget, which applies AppleCrosstoolTransition. In either case, the build settings for D do not differ. But in the first case, the "affected by starlark transition" array includes cpu, whereas in the second case it does not.
AppleCrosstoolTransition is a native transition defined in java code: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/rules/objc/AppleCrosstoolTransition.java;l=39?q=AppleCrosstoolTransition&ss=bazel. The affected by starlark transition andtransition directory name fragment bits are only set by starlark transitions. Which leads to...
In the example above, we don't have different command line or bazelrc setting ... but isn't it a bug that would result in different directory hashes, even if D ended up in the same configuration across builds
It's not ideal. Ideally we would have perfectly deduplicated outputs, but it's not technically a bug. Bazel doesn't actually guarantee any output directory structure; how we do output directories has changed over time and will continue to change. When it's difficult to get the de-duping just right, we prefer to play it safe and create new output paths instead of risking clobbering existing artifacts. For example, we use the affected by starlark transition andtransition directory name fragment as an overly cautious way of making sure starlark transitions aren't overwriting anything.
You will notice that FW is now pointing to another output directory hash ... despite the fact that the build settings for FW have not changed
But the build settings for FW have changed in the second case because the default value of --cpu is not ios_x86_64, it's one of the values here depending on your OS: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/config/AutoCpuConverter.java;l=35;drc=a5a4f47295ccb6cd81cf8732772c849b1a8b9d58. Let's say the default value of --cpu is detected as "k8" for examples sake.
In the bazelisk cquery --cpu=ios_x86_64 :all case, FW and other have the same configuration hash because the transition sets --cpu to to same value as the command line - we register that as a no change and don't update anything. Both FW and other are built with --cpu=ios_x86_64.
In the ~/sample-bug-report-bazel bazelisk cquery :all case, FW and other have different configuration hashes because for :FW, --cpu=ios_x86_64 and for :other, --cpu=k8.
The key here is affected by starlark transition updates to all native flags that are changed in value from the command line configuration.
I think the motivating problem we're facing is that, before this change, it was possible for a starlark transition to force a target into a canonical configuration, so when it appeared in the configured target graph multiple times, it would always be a single target. That meant, for example, you wouldn't try to link two (identical) static libraries -- which is super important when building native code! Now, because of the native options, there's no way to force a single configuration for a target -- either you get a consistent value for the native option (and an inconsistent affected by starlark transition), or vice-versa.
I think that gets at the crux of the issue -- with this change, there's no way for us to write custom starlark transitions that interplay safely with the native transitions.
A couple questions -
Does the bug title still accurately represent what we're talking about here? If not, can it be updated?
before this change, it was possible for a starlark transition to force a target into a canonical configuration, so when it appeared in the configured target graph multiple times, it would always be a single target.
Which change are you talking about specifically? Can you give a minimal example of the multiple configured targets that used to all have the same configuration and now don't?
Historically it's never been possible for starlark transitions to put a configured target into any configuration that a native transition could but it in. Historically, transition directory name fragment has always been updated by a starlark transition and never updated by a native transition. Recently I made a change so that starlark transitions that don't actually change anything (e.g. set --cpu=k8 when --cpu already =k8) don't update transition directory name fragment or affected by starlark transition. So in the new world we are strictly closer to the concept of starlark transitions being able to produce configurations that are the same as native transitions.
Sorry, I didn't mean put in the same configuration as a native transition. I meant it was possible to force a target with a star lark transition into a single (given a reasonable universe of incoming configurations) output configuration. Eg something that looked like a constant transition (same output hash given some restricted set of inputs) would lead to the same output configuration. This is no longer true because now only changed settings get taken into consideration when computing the directory fragment, meaning a constant transition can still result in multiple fragments, depending on the input configurations. @amberdixon To correct me if I'm wrong
This is no longer true because now only changed settings get taken into consideration when computing the directory fragment, meaning a constant transition can still result in multiple fragments, depending on the input configurations. @amberdixon To correct me if I'm wrong
Thank you, that's clarifying. Is there a scenario in a single build where this is the case? For example:
either you get a consistent value for the native option (and an inconsistent affected by starlark transition), or vice-versa.
it would be helpful to have a repro and command line that shows this happening.
Hey all, I think I've finally wrangled a small repro of the situation you're discussing and a fix is on the way. Stay tuned.
I spent some time looking at this issue in the context of rules_ios and it looks like there are a few cases that result in "different transitions on the CLI".
First different transitions were applied differently when swift_library was built with ios_application vs rules_ios's apple_framework from either the CLI or in Bazel. This was fixed in the following PRs by auditing transitions / configs and fixing the cases where they were different:
There is a test case in this PR https://github.com/bazel-ios/rules_ios/pull/196
Building a dependency of an ios_applicaiton will net a different build when the dep is built directly via CLI or with another top level target type. This is because rules_apple and rules_ios apply a iOS specific transition. A workaround is to build the target in question along side a rules_apple.ios_unit_test, rules_apple.ios_applicaiton, and now a rules_ios.apple_framework. IMO it'd be cool if we could apply iOS transitions to the CLI build in the same way they were applied with the ios_application rule.
Most helpful comment
Hey all, I think I've finally wrangled a small repro of the situation you're discussing and a fix is on the way. Stay tuned.