Since Bazel 3.4, rule_implementation_hash in bazel query's proto output doesn't always change for rule implementation changes
-- ./WORKSPACE --
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "io_bazel_rules_go",
sha256 = "08c3cd71857d58af3cda759112437d9e63339ac9c6e0042add43f4d94caf632d",
urls = [
"https://mirror.bazel.build/github.com/bazelbuild/rules_go/releases/download/v0.24.2/rules_go-v0.24.2.tar.gz",
"https://github.com/bazelbuild/rules_go/releases/download/v0.24.2/rules_go-v0.24.2.tar.gz",
],
)
load("@io_bazel_rules_go//go:deps.bzl", "go_register_toolchains", "go_rules_dependencies")
go_rules_dependencies()
go_register_toolchains()
-- ./rules/heatpipe.bzl --
load("@io_bazel_rules_go//go:def.bzl", "go_rule")
def _heatpipe_impl(ctx):
ctx.actions.write(ctx.outputs.types_out, "types1")
ctx.actions.write(ctx.outputs.fx_out, "fx1")
return [
DefaultInfo(files = depset([ctx.outputs.types_out, ctx.outputs.fx_out])),
]
_heatpipe_rule = go_rule(
_heatpipe_impl,
attrs = {
"topic": attr.string(
doc = "The topic for the schema",
mandatory = True,
),
"types_out": attr.output(
doc = "The __types.go file to be generated",
mandatory = True,
),
"fx_out": attr.output(
doc = "The __fx.go file to be generated",
mandatory = True,
),
},
)
def heatpipe(name, topic, types_out = "types.go", fx_out = "fx.go"):
_heatpipe_rule(
name = name,
topic = topic,
types_out = types_out,
fx_out = fx_out,
)
-- ./user_created/BUILD.bazel --
load("//rules:heatpipe.bzl", "heatpipe")
heatpipe(
name = "heatpipe",
topic = "hp-yodlee-integration-user_created",
)
-- ./rules/BUILD.bazel --
$ bazel query //user_created:heatpipe --output=proto | protoc --decode_raw | grep rule_implementation_hash -A2
Loading: 0 packages loaded
Loading: 1 packages loaded
Loading: 1 packages loaded
1: "$rule_implementation_hash"
2: 2
5: "45b15fdaa24bca7551749305bd313cb60bd329deabd30dc53d653fc3145d5d1d"
$ sed -i '' "s/types1/types2/" rules/heatpipe.bzl
$ bazel query //user_created:heatpipe --output=proto | protoc --decode_raw | grep rule_implementation_hash -A2
Loading: 0 packages loaded
Loading: 1 packages loaded
Loading: 1 packages loaded
1: "$rule_implementation_hash"
2: 2
5: "45b15fdaa24bca7551749305bd313cb60bd329deabd30dc53d653fc3145d5d1d"
macOS
bazel info release?release 3.5.0-homebrew
@alexjski 9f2cab5d84678e492d8f5848525633f1c73f0994 is suspect as the rule hash is now the transitive hash of the defining module not necessarily the top-level loading one.
@linzhp Thank you very much for filing this! @michajlo saw this Github entry and suspected the underlying bug was the explanation for some very mysterious nondeterminism we've been recently seeing internally at Google with a system that relies on caches of the rule_implementation_hash thing. After staring at your repro (ty for that!) and 9f2cab5, I'm pretty sure this issue is relevant.
@benjaminp Thank you very much for your prompt and spot-on insights, as always. You are completely correct about the culprit commit. The bug here is pretty blatant.
Here is the setup of my more minimal repro:
$ find hash -type f | while read P; do echo --- $P ---; cat $P; done
--- hash/a/a.bzl ---
load("//hash/b:b.bzl", "make_a")
def f():
return "old"
a = make_a(f)
--- hash/a/BUILD ---
--- hash/b/b.bzl ---
def make_a(impl):
return rule(
implementation = impl
)
--- hash/b/BUILD ---
--- hash/BUILD ---
load("//hash/a:a.bzl", "a")
a(name = "t")
I've also edited the issue title to more precisely describe the bug.
Github isn't letting me assign to @alexjski (Github "org" membership issue?) but, rest assured, Alex (or someone else) will work on this with very high priority tomorrow or Monday.
@alandonovan , for awareness
The rule definition environment (RDE---a concept that desperately needs an explanatory doc comment somewhere, not least because the term is overloaded as the name of an unrelated Java class) is based on the hash of the source of the .bzl file containing the immediate call to rule(), plus all bzl files it transitively loads. It is not based on the .bzl file that is being initialized. In other words, it's the innermost frame on the call stack, not the outermost. This was discussed at length in the context of cr/311173293 aka 9f2cab5d84678e492d8f5848525633f1c73f0994 (see https://github.com/bazelbuild/bazel/blob/460ab684b4203cf87663038e67db352775adfdde/src/main/java/com/google/devtools/build/lib/analysis/starlark/StarlarkRuleClassFunctions.java#L352) and the sequence of changes of which it was a part; Alex Jurkowski @alexjski has the freshest knowledge here.
BTW, Nathan: head dir/* is your friend for listing files preceded by their file name. (Add -n 999 if any file is long.)
The rule definition environment (RDE---a concept that desperately needs an explanatory doc comment somewhere, not least because the term is overloaded as the name of an unrelated Java class) is based on the hash of the source of the .bzl file containing the immediate call to
rule(), plus all bzl files it transitively loads. It is not based on the .bzl file that is being initialized. In other words, it's the innermost frame on the call stack, not the outermost. This was discussed at length in the context of cr/311173293 aka 9f2cab5 (seehttps://github.com/bazelbuild/bazel/blob/460ab684b4203cf87663038e67db352775adfdde/src/main/java/com/google/devtools/build/lib/analysis/starlark/StarlarkRuleClassFunctions.java#L352
) and the sequence of changes of which it was a part; Alex Jurkowski @alexjski has the freshest knowledge here.
What is the use for that definition? In the most extreme thought experiment, every rule in the universe could have the same hash if they all were defined through the make_a in haxorz 's example.
I suggest the innermost module at rule export time as a more intuitive and鈥攊n the context of this bug, anyway鈥攃orrect definition. (I suppose that's the same as the old method because module globals are immutable after creation.)
BTW, Nathan:
head dir/*is your friend for listing files preceded by their file name. (Add-n 999if any file is long.)
head -v for the filenames.
+1 to what @benjaminp said. I was going to say the same thing (and give a more extreme version of the thought experiment where make_a uses **kwargs). The current/new meaning of "rule definition environment" is not useful. The old meaning was more useful. Let's go back to something as useful as the old meaning.
Alan, chat me internally if you want to see some real examples of code like my make_a.
[@alandonovan] This was discussed at length in the context of cr/311173293 aka 9f2cab5 and the sequence of changes of which it was a part
I admittedly wasn't part of that discussion, nor did I read through it last night before posting here. Do you happen to remember if the situation in this Github issue's title was considered during the design discussion? If not, why not? And if so, why was the consequence considered WAI and not-bad?
[@alandonovan] BTW, Nathan: head dir/* is your friend for listing files preceded by their file name. (Add -n 999 if any file is long.)
Awesome, thanks!
What is the use for that definition?
Good question; I'm struggling to page back into memory our lengthy conversations about this, which unfortunately left no residue in the code. A superficial observation: the offending CL at least made the treatment of the label portion consistent with the digest portion. Before, the label came from the innermost .bzl file and the digest from the outermost .bzl file on the call stack.
At least a part of the conversation was about the infeasibility of deeply hashing Starlark function values, and the assumption that Starlark will ~soon support nested def statements with lexical scope, aka closures, which means that a rule class created by a function in Q.bzl may be a closure over values supplied by a function in P.bzl, where P loads Q. This seems to argue for the RDE of a RuleClass being a property of the label + digest of the outermost frame, which is the opposite of what the change did. Alex, do you remember more?
every rule in the universe could have the same hash if they all were defined through the make_a in haxorz 's example.
If the purpose of the hash is to detect changes in the logic of a rule, then that's not necessarily wrong, as all the rules will have the same load graph of bzl sources.
@benjaminp, you are absolutely right that my change (9f2cab5) caused the issue described -- that's an excellent find. @linzhp, thank you for filing that issue and for the repro.
We have a fix for the bug already (b9bb10252a004047f9a997165aa1ab7d6c6fa2e7). We will cherry-pick that into Bazel 3.5.1 release.
Personally, I would like to apologize to everyone involved for having caused that issue. I did not realize this edge case when working on that change, which I thought was a pure cleanup effort. I am sorry to everyone who has been affected by failures related to this and spent time diagnosing/recovering from problems which were caused by it.
Thanks for getting this resolved so quickly
Alex, the fault was mine: you worked on the clean-up at my behest, and the suggestion to make the behavioral change along with the refactoring was mine based on a misunderstanding of the inconsistency between the label and digest portions. Thanks again for making the change, and for unmaking it when the problem emerged.
Am I wrong or did the fix not make it into 3.5.1 after all?
@rohansingh what makes you think so? I tried the opening example with bazelisk and 3.5.0 and could repro. After switching to 3.5.1 the hash changed.
I was just going by the fix commit that @alexjski mentioned above, b9bb102. That commit doesn't seem to be in 3.5.1 according to GitHub.
Ah, ok - see the list of commits here: https://github.com/bazelbuild/bazel/commits/3.5.1
It shows https://github.com/bazelbuild/bazel/commit/f1f941194ce04b96328154241fcfad0392b5cb38 which is the same as the one that you reference - it has a different hash since it was a cherrypick. You can see the 3.5.1 tag there.
Ah, my mistake. Thanks!
This is still an issue in 3.6. Did you forget to include the fix?
@linzhp I fear you are correct, that commit is missing from 3.6.0. cc @laurentlb
Verified the issue is fixed in 3.7
Most helpful comment
@linzhp Thank you very much for filing this! @michajlo saw this Github entry and suspected the underlying bug was the explanation for some very mysterious nondeterminism we've been recently seeing internally at Google with a system that relies on caches of the
rule_implementation_hashthing. After staring at your repro (ty for that!) and 9f2cab5, I'm pretty sure this issue is relevant.@benjaminp Thank you very much for your prompt and spot-on insights, as always. You are completely correct about the culprit commit. The bug here is pretty blatant.
Here is the setup of my more minimal repro:
I've also edited the issue title to more precisely describe the bug.