I'm setting some env variables via action_env but they are sometimes ignored when building a library.
In this specific case I'm trying to build TensorFlow 2.3.0 with Bazel 3.4.1 and am setting CPATH via action_env but the compilation fails to find some headers and by inspecting the output I see CPATH not being passed to all compiler invocations.
Build TensorFlow according to instructions and set --action_env=CPATH=/foo and observe the output
I observed the issue for tensorflow/python/tools/BUILD:226:10 C++ compilation of rule '//tensorflow/core/platform/default:mutex'. The definition for that is at https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/core/platform/default/BUILD#L206:
cc_library(
name = "mutex",
srcs = [
"mutex.cc",
"mutex_data.h",
],
hdrs = ["//tensorflow/core/platform:mutex.h"],
tags = [
"manual",
"no_oss",
"nobuilder",
],
textual_hdrs = ["mutex.h"],
deps = [
"//tensorflow/core/platform",
"//tensorflow/core/platform:macros",
"//tensorflow/core/platform:thread_annotations",
"//tensorflow/core/platform:types",
"@nsync//:nsync_cpp",
],
)
Log output:
(cd /tmp/easybuild-tmp/eb-tHWBp_/tmphugHLy-bazel-build/execroot/org_tensorflow && \
exec env - \
LD_LIBRARY_PATH=<redacted> \
PATH=<redacted> \
PWD=/proc/self/cwd \
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.d '-frandom-seed=bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.o' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/ppc-opt-exec-50AE0418/bin -iquote external/com_google_absl -iquote bazel-out/ppc-opt-exec-50AE0418/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/ppc-opt-exec-50AE0418/bin/external/nsync -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -g0 -g0 '-std=c++14' -c tensorflow/core/platform/default/mutex.cc -o bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.o)
Execution platform: @local_execution_config_platform//:platform
As you can see only the default env vars are set but none of my action_envs (I set more than CPATH)
As the cc_library rule invocation is as simple as can be and I see that with other such invocations too (e.g. https://github.com/tensorflow/tensorflow/issues/37861#issuecomment-686418236) I assume a bug in Bazel or a misunderstanding of the rules by the TF team or me. In that case I'd kindly ask for clarification.
I've also checked that with TF 2.1.0 and Bazel 0.29.1 this was not an issue but worked as it should for the same file (at least the rule invocation was not changed)
However for other compilations it seems to work. E.g. for SUBCOMMAND: # @com_google_absl//absl/strings:strings [action 'Compiling external/com_google_absl/absl/strings/internal/memutil.cc', configuration: cab848d308c51b34c791977a9ba0d73e541cabe37f3e6da7b163b34d8bf29b6b, execution platform: @local_execution_config_platform//:platform] I see all my action_env variables being passed and that is also added with cc_library: https://github.com/abseil/abseil-cpp/blob/df3ea785d8c30a9503321a3d35ee7d35808f190d/absl/strings/BUILD.bazel#L30
What could be the reason for that?
Potentially related to https://github.com/bazelbuild/bazel/issues/12049
In examining the output I've seen that SUBCOMMAND: # <name> [action 'Compiling <cc>', configuration: <hash>] shows differing values for "configuration". For e.g. "5145bac2d59ecd1d49c139f258a492145fb9a97392eff5f5e72d0b94901ca2a6" all action_env values are there but for "10b8e9fd33cf42203bb848e842763cc669d976ce3de802a9b8c5cf443259dc2c" only the minimal env is used.
Does this help? What is this "configuration" and how is it influenced?
Likely to be #4008.
I'm not sure it is the same because in my case it is not a genrule that fails but a cc_library. But let's see: Building TF with 1 process only it fails on '//tensorflow/cc:cc_op_gen_main' which is
cc_library(
name = "cc_op_gen_main",
srcs = [
"framework/cc_op_gen.cc",
"framework/cc_op_gen.h",
"framework/cc_op_gen_main.cc",
],
...
)
That is only used by tf_gen_op_wrapper_cc:
def tf_gen_op_wrapper_cc(
name,
out_ops_file,
pkg = "",
op_gen = clean_dep("//tensorflow/cc:cc_op_gen_main"),
deps = None,
include_internal_ops = 0,
# ApiDefs will be loaded in the order specified in this list.
api_def_srcs = []):
# Construct an op generator binary for these ops.
tool = out_ops_file + "_gen_cc"
if deps == None:
deps = [pkg + ":" + name + "_op_lib"]
tf_cc_binary(
name = tool,
copts = tf_copts(),
linkopts = if_not_windows(["-lm", "-Wl,-ldl"]) + lrt_if_needed(),
linkstatic = 1, # Faster to link this one-time-use binary dynamically
deps = [op_gen] + deps,
)
...
So there it is in deps not tool. Not sure how to go further up the call chain. Can I make Bazel print what it is currently following?
I think you can use -s?
That is --subcommands, yes. Already using it and lead me to seeing where the action_env is missing. Now I would like to know the rule invocation chain to see where it might be coming from and why the action_env is missing there.
I noticed that for SUBCOMMAND: # //tensorflow/core/framework:summary_proto_genproto [action 'ProtoCompile tensorflow/core/framework/summary.pb.cc', configuration: 8b4acdbe969a46c4d3fa5723b9b1930adc1b1ea5a72f6ca24e87d24e41e941cc, execution platform: @local_execution_config_platform//:platform] the env is also missing. According to #4008 you could add use_default_shell_env = True, but that is already there. So maybe not that issue?
Edit: My best trace so far is the change in "configuration" I mentioned in https://github.com/bazelbuild/bazel/issues/12059#issuecomment-689647000
Dependencies like llvm-project are compiled with action_env and the working configuration while the failing ones use another hash. What is that configuration? When/How does it change? How can I find that in the TF source?
Ok, it looks like you were on the right track with #4008. I bisected the TF sources between 2.2 and 2.3 to find the commit which introduced the breaking changed and found it to be this one: https://github.com/tensorflow/tensorflow/commit/f827c023906e7d30f0e5f2992b111ab34153310a The parent commit 3006330ea0c3e88651195ac7c7de654291377ebb works. However in that case it isn't "tools" but "exec_tools" where the former worked.
It would be awesome if there could be some quick fix so action_env variables are passed through to dependencies build through exec_tools dependency
Is still something we need to work on?
Yes. As you noticed in https://github.com/tensorflow/tensorflow/pull/43156#issuecomment-704367191 it isn't clear why and when tools or exec_tools are to be used and if exec_tools would be correct then there needs to be some way to pass environment variables through to it. Otherwise it will be impossible to build e.g. TF on some systems
It's possible we no longer need exec_tools, it was an intermediate step in removing Py2 stuff. We can start making PRs to remove that and see what breaks/works.