IF
--incompatible_override_toolchain_transition
is in effect;bazel cquery deps(Foo)
THEN
WARNING: Targets were missing from graph` is generated.The small repro with Python toolchain is published here
We get exactly the same problem with C++ toolchain as well.
Windows, Linux, OSX.
bazel info release
?3.4.1
This means that something referred to that target, but it hadn't actually been analyzed as part of the build (in this case, '//:tab_python_runtime_pairis discovered as the
toolchainattribute of
//:tab_python_toolchain, but since
toolchain` is a no-dependency label, there was no call to actually analyze it).
What is your preference? Do not show the message? Make it clearer what is happening?
Adding @juliexxia as a cquery expert while we determine the right course of action.
Frankly I still don't quite understand what causes the warning. And the fact that the warning shows up only when --incompatible_override_toolchain_transition
is in effect makes it even more confusing.
In this case warning does not look actionable, like - what we supposed to fix to make the warning go away?
Looks like, without --incompatible_override_toolchain_transition
, cquery sees the expected dependency on the toolchain //:tab_python_runtime_pair
(in the host configuration), which in turn depends on @python_linux//:tab_python_runtime
(also in the host configuration).
With --incompatible_override_toolchain_transition
, the toolchain is still //:tab_python_runtime_pair
, but in the target configuration, and there is no dependency to @python_linux
, because //:tab_python_runtime_pair
isn't being fully analyzed.
If I try to build //:main
, I get an error:
ERROR: /usr/local/google/home/jcater/repos/cquery_targets_missing_repro/BUILD.bazel:16:10: Couldn't build runfiles for //:main: //:main: missing input file 'external/python_linux/python', owner: '@python_linux//:python'
This may be due to me having to complete the BUILD files for linux (your repro only has Windows paths). Based on the Windows config, I assumed that the correct URL for linux would be https://www.python.org/ftp/python/3.8.5/Python-3.8.5.tgz
, is this correct?
It is not the correct path and I don't see the correct one there. Need 30 min to make Linux compatible repro.
Thanks!
re: this specific warning in the query environment.
It's difficult to be totally sure without running the full repro, but the error is coming from here: https://github.com/bazelbuild/bazel/blob/487d0b25213e723339f4f67143f5554e67ffcb3e/src/main/java/com/google/devtools/build/lib/query2/PostAnalysisQueryEnvironment.java#L260 which seems to indicate that there's a mismatch between what targets the query engine expects to be in the graph and what targets are actually there (which we already knew).
We ran into a similar issue a while back with mysteriously not-there garden variety (not-toolchain) nodes in the cquery graph which may be related? https://github.com/bazelbuild/bazel/blob/487d0b25213e723339f4f67143f5554e67ffcb3e/src/main/java/com/google/devtools/build/lib/query2/PostAnalysisQueryEnvironment.java#L424
Basically, I think that while trying to get all dependencies of a particular target in the graph, if any calculated dependency does not fall into any cases in this if/else, this warning will print. Not totally sure without a repro though.
Repro updated.
Command to run is bazel cquery "deps(main)"
I just added almost empty folder named pseudo_python and pointed new_local_repository
to it. We don't need to execute any actual Python to see the repro.
Thanks!
Okay, I see the same behavior: with the flag, //:tab_python_runtime_pair
is in the target config (as expected), and there's no dependency on @python_linux
. Without the flag, //:tab_python_runtime_pair
is in the host config (also as expected), and there is a dependency on @python_linux
. That's unexpected, and I'll debug a bit to try and see why.
One question I had: why are you using the flag --incompatible_override_toolchain_transition
? No toolchains have actually migrated yet, and it's only intended for compatibility testing, not for users. It's possible that part of this failure is because the python rules aren't ready for this yet.
We are not using --incompatible_override_toolchain_transition
directly, but we are using --all_incompatible_changes
end except for that warning everything else works just fine.
I am not saying this warning is a blocker by any means, more like an indication of the possible bug which better be discovered early than later.
It's definitely surprising, I'm going to keep investigating.
Okay, added some test code to PostAnalysisQueryEnvironment to dig into skyframe and report configured targets with the same label but which aren't the same, and here we are:
WARNING: Targets were missing from graph: [ConfiguredTargetKey{label=//:tab_python_runtime_pair, config=BuildConfigurationValue.Key[285b04bb4645367b5ec24e973f3b8f58c7ba97955812264d6468841139055aed]}]
WARNING: Found similar configured target keys: [ConfiguredTargetKeyWithToolchainContext{label=//:tab_python_runtime_pair, config=BuildConfigurationValue.Key[285b04bb4645367b5ec24e973f3b8f58c7ba97955812264d6468841139055aed], toolchainContextKey=ToolchainContextKey{configurationKey=BuildConfigurationValue.Key[285b04bb4645367b5ec24e973f3b8f58c7ba97955812264d6468841139055aed], requiredToolchainTypeLabels=[@bazel_tools//tools/python:toolchain_type, @bazel_tools//tools/cpp:toolchain_type], execConstraintLabels=[], shouldSanityCheckConfiguration=false}}]
So, the issue is that, when toolchain transitions are in use, toolchains actually have ConfiguredTargetKeyWithToolchainContext
(with an added toolchain context key), not a ConfiguredTargetKey
(without). Unfortunately, the cquery infrastructure doesn't use the right key types, so it's not found.
Coincidentally, @juliexxia was working on this just yesterday, so it should be fixed soon.
Thanks, @konste, for bringing this to my attention.
As far as I understand the fix is released with the version 3.7.0. Is that right?
I ask because we have switched to 3.7.0 and I still see this issue repro, although on a smaller scale than before.
If it is indeed supposed to be fixed in 3.7.0. then I will work on a repro.
31b0453b84eb2131ff075b9022cb2d6c95e1c71a is in 3.7.0. If you're still seeing strange results, and if it's not a problem to come up with a repro, please do and we'll take a look some more.
I have tried my original repro here and it still reproduces on both Windows and Linux. Try the command bazel cquery "deps(main)"
.
@katre This issue needs to be re-activated. Hopefully, it requires just a little amendment to your existing fix and it would not take much of your time.
@katre could we please get some traction on this issue so the fix can make it to 4.0? It produces confusing WARNING for our users which we cannot suppress and we get support tickets for that.
I'm sorry this has been sitting for so long. I have been snowed under by other projects and this has slipped through.
Unfortunately, since the cut for 4.0 is happening shortly and I am busy with BazelCon activities, I don't know if I'll be able to debug this further and identify a fix before the cut.
@katre Thank you for getting back to me! I realized that meeting 4.0 cut time is tough and retract my ask for it. Still, it would be great to have this issue fixed this year. We try to stay at the edge and upgrade to every monthly Bazel release, often including RCs when they contain the fixes we need, so missing 4.0 cut off is not that important as we hopefully can get the fix only about a month later.
Okay, I've debugged this further and I can see the problem, but not the solution.
Here's the data flow:
bazel cquery deps(main)
corresponds to a call to DepsFunction.eval
with a ConfiguredTargetQueryEnvironment
to handle the actual processing.DepsFunction.eval
is a generic type T
, which for ConfiguredTargetQueryEnvironment
is a ConfiguredTarget
object.ConfiguredTarget
objects do not know their own ConfiguredTargetKey
. This is for both memory reasons (to keep as little data in memory as possible) and correctness reasons (in theory, multiple ConfiguredTargetKey
instances could resolve to the same ConfiguredTarget
).DepsFunction.eval
calls into PostAnalysisQueryEnvironment.getFwdDeps
. This takes a ConfiguredTarget
, and finds all of the SkyKey
instances that are dependencies of it.ConfiguredTarget
for //:main
, and finds the dependency keys. This includes many types, but the one of interest is for //:tab_python_runtime_pair
, and is actually ConfiguredTargetKeyWithToolchainContext{label=//:tab_python_runtime_pair, config=BuildConfigurationValue.Key[cfc863e38d78b2e0698e051b076d50722ce20dda390189a09d4b6038e48fa3a6], toolchainContextKey=ToolchainContextKey{configurationKey=BuildConfigurationValue.Key[cfc863e38d78b2e0698e051b076d50722ce20dda390189a09d4b6038e48fa3a6], requiredToolchainTypeLabels=[@bazel_tools//tools/python:toolchain_type, @bazel_tools//tools/cpp:toolchain_type], execConstraintLabels=[], shouldSanityCheckConfiguration=false}}
.getFwdDeps
then processes the keys it discovered, locating the ConfiguredTarget
values for each, and storing them as PostAnalysisQueryEnvironment.ClassifiedDependency
instances. Importantly, at this point the SkyKey
that was used to find the ConfiguredTarget
is lost (in PostAnalysisQueryEnvironment.targetifyValues
.PostAnalysisQueryEnvironment.ClassifiedDependency
object, it would still be lost, because getFwdDeps
has to return a Set<ConfiguredTarget>
back to DepsFunction
. This is because DepsFunction
only knows the type the ConfiguredTargetQueryEnvironment
is parameterized with, which is ConfiguredTarget
.DepsFunction.eval
, the ConfiguredTarget
for //:tab_python_runtime_pair
is checked, but when ConfiguredTargetQueryEnvironment.getSkyKey
is called, it returns a plain ConfiguredTargetKey
, not a ConfiguredTargetKeyWithToolchainContext
.ConfiguredTarget
can be found to match, this giving the observed message.Okay, that's a lot. The root cause here is losing the correct SkyKey
instance between rounds of DepsFunction.eval
. There are a few possible solutions:
ConfiguredTarget
does not know its own ConfiguredTargetKey
. This has the drawbacks mentioned above.PostAnalysisQueryEnvironment.getFwdDeps
to do a second search: if there is no target for a given key, do a looser search for any ConfiguredTarget
with the same label and config key, but ignore the ToolchainContext
.//:tab_python_runtime_pair
? It would then be present twice, with different keys, and the wrong one might be selected.ConfiguredTargetQueryEnvironment
to use a different type, which retains the key. Using PostAnalysisQueryEnvironment.ClassifiedDependency
might work, if it was extended to also store the key. In this way, we can be sure that successive rounds of DepsFunction.eval
use the correct SkyKey
for each target.One important note: this analysis is focused on the proper handling of deps
, but this could conceivably happen in any cquery
function, so we need to be sure to handle it in a generic fashion for all of them.
@juliexxia, please take a look at my analysis and let me know what you think.
Thanks for this thorough investigation @katre.
- Change the invariant that a ConfiguredTarget does not know its own ConfiguredTargetKey. This has the drawbacks mentioned above.
I think this would have a lot of side effects outside of this issue (the ones you mentioned) and is probably a non-starter.
- Update PostAnalysisQueryEnvironment.getFwdDeps to do a second search: if there is no target for a given key, do a looser search for any ConfiguredTarget with the same label and config key, but ignore the ToolchainContext.
This could conceivably work, but what if something in the build did depend directly on //:tab_python_runtime_pair? It would then be present twice, with different keys, and the wrong one might be selected.- Change ConfiguredTargetQueryEnvironment to use a different type, which retains the key. Using PostAnalysisQueryEnvironment.ClassifiedDependency might work, if it was extended to also store the key. In this way, we can be sure that successive rounds of DepsFunction.eval use the correct SkyKey for each target.
My intuition is that (3) is on the principle the right choice here. Though (2) maybe not actually turn up a huge problem today, it feels more like a bandaid that could lead to more issues down the road - and a tricky bandaid to get right at that. Though ConfiguredTargetKey(WithToolchainContext)
s are an implementation detail that the user really doesn't need to know about, with the divergence between ConfiguredTargetKey
and ConfiguredTargetKeyWithToolchainContext
I think in reality they are the granularity at which cquery should operate.
@katre do you have a sense on how invasive of a change this might be? I'm not positive it has to be too too complicated if we're replacing cquery's ConfiguredTarget
parameterization with some other object that is like a ConfiguredTarget
+ ConfiguredTargetKey
and the existing logic can be mostly kept the same with just one more layer of abstraction?
https://github.com/bazelbuild/bazel/pull/12508 is the still-in-progress PR. It manages the main issues, but there's something wrong with the alias handling that's causing test failures. Also needs a lot of documentation and polish. Currently updates about 25 files, but that will change as I do some cleanup work and fix some APIs.
Since this seems like the right approach I'll keep pushing but it's not a small or easy change.
Okay, I have a clean-looking fix, but it requires a decent bit of cleanup. I'll send out the stages for review over the next week or so (given the break for US holidays this week).
Super! Thanks @katre for a kind of unexpected ton of work! But as far as I understand this fix prevents much more serious problems in the future than the cosmetic issue I initially reported.
Which version of Bazel shall we expect to see the fix in?
Bazel 4.1 would be the release to contain this. I'm not sure it's something we can cherrypick into 4.0, unfortunately.
Most helpful comment
Okay, I have a clean-looking fix, but it requires a decent bit of cleanup. I'll send out the stages for review over the next week or so (given the break for US holidays this week).