Please provide the following information. The more we know about your system and use case, the more easily and likely we can help.
I have a problem using bazel to build the rules_scala on a custom platform. I am sure it is because the shell envs are not propagated correctly to a subprocess that is spawned by some actions of bazel. Please see the log information that I provide in the last section.
Operating System:
CentOS 6.7
Bazel version (output of bazel info release
):
0.8.1
If bazel info release
returns "development version" or "(@non-git)", please tell us what source tree you compiled Bazel from; git commit hash is appreciated (git rev-parse HEAD
):
(e.g. StackOverflow answers,
GitHub issues,
email threads on the bazel-discuss
Google group)
(If they are large, please upload as attachment or provide link).
The command I ran to build rules_scala is bazel build -s --verbose_failures --sandbox_debug //src/...
>>>>> # //src/scala/io/bazel/rules_scala/tut_support:tut_compiler [action 'scala //src/scala/io/bazel/rules_scala/tut_support:tut_compiler']
(cd /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
exec env - \
/bin/bash -c '
rm -f bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar @bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar_zipper_args
')
>>>>> # @scala//:scala-reflect [action 'Extracting interface @scala//:scala-reflect [for host]']
(cd /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
exec env - \
LD_LIBRARY_PATH=/cm/shared/gcc/6.4.0/lib64:/cm/shared/gcc/6.4.0/lib:/cm/shared/apps/mpc/1.0.3/lib:/cm/shared/apps/gcc/4.9.2/gmp-6.0/lib:/cm/shared/apps/mpfr/3.1.3/lib:/cm/shared/apps/sqlite3/3.15.0/lib:/cm/shared/apps/libevent/2.1.5-beta/lib:/cm/shared/apps/cudnn/6.0/lib64:/cm/shared/apps/java/jdk1.8.0_112/lib:/cm/shared/apps/python/3.6.0/lib:/cm/shared/apps/cuda/8.0/lib64:/cm/shared/apps/binutils/2.25/src/lib:/cm/shared/apps/slurm/current/lib/slurm:/cm/shared/apps/slurm/current/lib:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/compiler/lib/intel64:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/mkl/lib/intel64:/home-4/[email protected]/lib:/home-4/[email protected]/opt/lib \
PATH=/cm/shared/gcc/6.4.0/bin:/home-4/[email protected]/go/bin:/home-4/[email protected]/.local/bin:/home-4/[email protected]/opt/go/bin:/home-4/[email protected]/maven/bin:/home-4/[email protected]/arcanist/bin:/home-4/[email protected]/opt/bin:/cm/shared/apps/sqlite3/3.15.0/bin:/cm/shared/apps/tmux/2.1/bin:/cm/shared/apps/libevent/2.1.5-beta/bin:/cm/shared/apps/java/jdk1.8.0_112/bin:/cm/shared/apps/python/3.6.0/bin:/cm/shared/apps/cuda/8.0/bin:/cm/shared/apps/binutils/2.25/src/bin:/cm/shared/apps/binutils:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/bin/intel64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/opt/dell/srvadmin/bin \
external/bazel_tools/tools/jdk/ijar/ijar external/scala/lib/scala-reflect.jar bazel-out/host/genfiles/external/scala/_ijar/scala-reflect/external/scala/lib/scala-reflect-ijar.jar)
>>>>> # @scala//:scala-reflect [action 'Extracting interface @scala//:scala-reflect']
(cd /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
exec env - \
LD_LIBRARY_PATH=/cm/shared/gcc/6.4.0/lib64:/cm/shared/gcc/6.4.0/lib:/cm/shared/apps/mpc/1.0.3/lib:/cm/shared/apps/gcc/4.9.2/gmp-6.0/lib:/cm/shared/apps/mpfr/3.1.3/lib:/cm/shared/apps/sqlite3/3.15.0/lib:/cm/shared/apps/libevent/2.1.5-beta/lib:/cm/shared/apps/cudnn/6.0/lib64:/cm/shared/apps/java/jdk1.8.0_112/lib:/cm/shared/apps/python/3.6.0/lib:/cm/shared/apps/cuda/8.0/lib64:/cm/shared/apps/binutils/2.25/src/lib:/cm/shared/apps/slurm/current/lib/slurm:/cm/shared/apps/slurm/current/lib:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/compiler/lib/intel64:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/mkl/lib/intel64:/home-4/[email protected]/lib:/home-4/[email protected]/opt/lib \
PATH=/cm/shared/gcc/6.4.0/bin:/home-4/[email protected]/go/bin:/home-4/[email protected]/.local/bin:/home-4/[email protected]/opt/go/bin:/home-4/[email protected]/maven/bin:/home-4/[email protected]/arcanist/bin:/home-4/[email protected]/opt/bin:/cm/shared/apps/sqlite3/3.15.0/bin:/cm/shared/apps/tmux/2.1/bin:/cm/shared/apps/libevent/2.1.5-beta/bin:/cm/shared/apps/java/jdk1.8.0_112/bin:/cm/shared/apps/python/3.6.0/bin:/cm/shared/apps/cuda/8.0/bin:/cm/shared/apps/binutils/2.25/src/bin:/cm/shared/apps/binutils:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/bin/intel64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/opt/dell/srvadmin/bin \
external/bazel_tools/tools/jdk/ijar/ijar external/scala/lib/scala-reflect.jar bazel-out/local-fastbuild/genfiles/external/scala/_ijar/scala-reflect/external/scala/lib/scala-reflect-ijar.jar)
>>>>> # @scala//:scala-compiler [action 'Extracting interface @scala//:scala-compiler']
(cd /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
exec env - \
LD_LIBRARY_PATH=/cm/shared/gcc/6.4.0/lib64:/cm/shared/gcc/6.4.0/lib:/cm/shared/apps/mpc/1.0.3/lib:/cm/shared/apps/gcc/4.9.2/gmp-6.0/lib:/cm/shared/apps/mpfr/3.1.3/lib:/cm/shared/apps/sqlite3/3.15.0/lib:/cm/shared/apps/libevent/2.1.5-beta/lib:/cm/shared/apps/cudnn/6.0/lib64:/cm/shared/apps/java/jdk1.8.0_112/lib:/cm/shared/apps/python/3.6.0/lib:/cm/shared/apps/cuda/8.0/lib64:/cm/shared/apps/binutils/2.25/src/lib:/cm/shared/apps/slurm/current/lib/slurm:/cm/shared/apps/slurm/current/lib:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/compiler/lib/intel64:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/mkl/lib/intel64:/home-4/[email protected]/lib:/home-4/[email protected]/opt/lib \
PATH=/cm/shared/gcc/6.4.0/bin:/home-4/[email protected]/go/bin:/home-4/[email protected]/.local/bin:/home-4/[email protected]/opt/go/bin:/home-4/[email protected]/maven/bin:/home-4/[email protected]/arcanist/bin:/home-4/[email protected]/opt/bin:/cm/shared/apps/sqlite3/3.15.0/bin:/cm/shared/apps/tmux/2.1/bin:/cm/shared/apps/libevent/2.1.5-beta/bin:/cm/shared/apps/java/jdk1.8.0_112/bin:/cm/shared/apps/python/3.6.0/bin:/cm/shared/apps/cuda/8.0/bin:/cm/shared/apps/binutils/2.25/src/bin:/cm/shared/apps/binutils:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/bin/intel64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/opt/dell/srvadmin/bin \
external/bazel_tools/tools/jdk/ijar/ijar external/scala/lib/scala-compiler.jar bazel-out/local-fastbuild/genfiles/external/scala/_ijar/scala-compiler/external/scala/lib/scala-compiler-ijar.jar)
>>>>> # @io_bazel_rules_scala_org_tpolecat_tut_core//jar:jar [action 'Extracting interface @io_bazel_rules_scala_org_tpolecat_tut_core//jar:jar']
(cd /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
exec env - \
LD_LIBRARY_PATH=/cm/shared/gcc/6.4.0/lib64:/cm/shared/gcc/6.4.0/lib:/cm/shared/apps/mpc/1.0.3/lib:/cm/shared/apps/gcc/4.9.2/gmp-6.0/lib:/cm/shared/apps/mpfr/3.1.3/lib:/cm/shared/apps/sqlite3/3.15.0/lib:/cm/shared/apps/libevent/2.1.5-beta/lib:/cm/shared/apps/cudnn/6.0/lib64:/cm/shared/apps/java/jdk1.8.0_112/lib:/cm/shared/apps/python/3.6.0/lib:/cm/shared/apps/cuda/8.0/lib64:/cm/shared/apps/binutils/2.25/src/lib:/cm/shared/apps/slurm/current/lib/slurm:/cm/shared/apps/slurm/current/lib:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/compiler/lib/intel64:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/mkl/lib/intel64:/home-4/[email protected]/lib:/home-4/[email protected]/opt/lib \
PATH=/cm/shared/gcc/6.4.0/bin:/home-4/[email protected]/go/bin:/home-4/[email protected]/.local/bin:/home-4/[email protected]/opt/go/bin:/home-4/[email protected]/maven/bin:/home-4/[email protected]/arcanist/bin:/home-4/[email protected]/opt/bin:/cm/shared/apps/sqlite3/3.15.0/bin:/cm/shared/apps/tmux/2.1/bin:/cm/shared/apps/libevent/2.1.5-beta/bin:/cm/shared/apps/java/jdk1.8.0_112/bin:/cm/shared/apps/python/3.6.0/bin:/cm/shared/apps/cuda/8.0/bin:/cm/shared/apps/binutils/2.25/src/bin:/cm/shared/apps/binutils:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/bin/intel64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/opt/dell/srvadmin/bin \
external/bazel_tools/tools/jdk/ijar/ijar external/io_bazel_rules_scala_org_tpolecat_tut_core/jar/tut-core_2.11-0.4.8.jar bazel-out/local-fastbuild/genfiles/external/io_bazel_rules_scala_org_tpolecat_tut_core/jar/_ijar/jar/external/io_bazel_rules_scala_org_tpolecat_tut_core/jar/tut-core_2.11-0.4.8-ijar.jar)
ERROR: /home-4/[email protected]/rules_scala/src/scala/scripts/BUILD:41:1: error executing shell command: '
rm -f bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar @ba...' failed (Exit 1): process-wrapper failed: error executing command
(cd /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
exec env - \
/home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala/_bin/process-wrapper '--timeout=-1' '--kill_delay=15' /bin/bash -c '
rm -f bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar @bazel-out/local-fastbuild/bin/src/scala/scripts/bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar_zipper_args
').
/home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala/_bin/process-wrapper: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala/_bin/process-wrapper)
INFO: Elapsed time: 0.820s, Critical Path: 0.14s
From the log message, you can see the last command doesn't have PATH and LD_LIBRARY_PATH propagated but the others have. I highlighted the command that reports the error:
(cd /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
exec env - \
/home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala/_bin/process-wrapper '--timeout=-1' '--kill_delay=15' /bin/bash -c '
rm -f bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar @bazel-out/local-fastbuild/bin/src/scala/scripts/bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar_zipper_args
')
Do you mean this command?
(cd /home-4/[email protected]/.cache/bazel/[email protected]/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
exec env - \
/bin/bash -c '
rm -f bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar @bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar_zipper_args
')
Why does it require PATH and LD_LIBRARY_PATH env vars?
Hi, I am sorry about the confusion. I have edited the original post to clarify your confusion. In fact, I would like to make two points:
1) The command I pointed out at last causes the error. Apparently this is because process-wrapper is a 'cc_binary' and compiled with a customized tool chain. All the libraries, like libstdc++ is in the ${LD_LIBRARY_PATH}. However when creating this action, the exec env -
clears the parent process's env vars so that the process-wrapper binary can't be linked to the correct stdc++ library. Actually, the command wrapped by the process-wrapper could even execute successfully by itself since the java command doesn't need any external cc libraries.
2) In general, I found all bazel_tools implemented with cc (https://github.com/bazelbuild/bazel/tree/9d9ac15b69530edd83c1b95f98a70efa8f98a27a/src/main/tools) needs the use_default_shell_env = True
. This is because all of them are linked to libstdc++. If you simply do exec env -
, you will probably mess up the runtime linkage.
@aehlig Can we propagate PATH
and LD_LIBRARY_PATH
for every action running with process-wrapper?
A simple question: why are some rules wrapped with process-wrapper but some arenât? Is this because of sandboxing? I donât want to scan a bunch of codebase so just ask directly to you guys.
@meteorcloudy @aehlig I tried to add --action_env to force the bazel executor to take PATH and LD_LIBRARY_PATH yesterday but I didn't succeed. It turns out that sometimes 'action_env' works but sometimes not. I am just wondering why bazel doesn't honor the shell envs in java rules if bazel needs to build some cc utilities like 'process_wrapper'. I tested process_wrapper with cc rules and python rules. They work with it but when I am trying to build java header jars with process_wrapper, bazel doesn't honor the PATH and LD_LIBRARY_PATH envs.
I appreciate the idea of sandboxing everything into an isolated environment. But a successful practice of this would be either
1) move the source of cc compiler and libraries fully inside the workspace.
2) creates symbolic links of those libraries or binaries to some folder inside the bazel workspace. In this case, ENVs could be dishonored.
We're not going to forward env variables by default - doing so is fundamentally incompatible with remote execution and remote caching, which are both important to us (and many of our users). There is a separate issue that --action_env is not forwarded to all actions at #3320.
@ulfjack Then why do you make cc_rules respect shell envs? I mean this behavior will break a lot of local compilation in customized environments. And at user level, there is no way to control whether envs could be passed to a specific rule so that there is no way to fix it.
Actually the problem here is to compile a java header jar, which doesnât need host envs at all. However, when wrapping the rule by process-wrapper, the process-wrapper tool itself needs to link the cc library. This is a little ridiculous since the helper destroys the work that it is helping with.
I donât really understand why passing Envs would disturb remote execution. Assume if users make remote environment identical with local, why is it a problem?
It's a two-step process - there's a piece of code that's responsible for configuring the C++ toolchain that's separate from the rules. The rules themselves don't respect shell envs. The code that's configuring the toolchain can be swaped out as a whole for code that does not look at the local envs. If there are specific issues with the existing code for local execution, then we can discuss that separately.
It's correct that action_env is not working for all actions right now. There's a separate bug for that.
That said, it seems pretty unusual to require LD_LIBRARY_PATH to run basic binaries.
The problem isn't the remote environment, but the local one. Any env variable that we forward to the remote machines as part of remote execution poisons the remote cache. For example, if you and your colleague have an env variable USERNAME, and that's forward to the remote cache, then you cannot get any cache hits from your colleague and vice versa. Even worse if you have env variables that are more volatile (changing quickly), you can't even get cache hits from yourself.
I think the problem is pretty serious.
when i install brazel from source by compile.sh
ERROR: /home/mpi/tensorflow/downloads/bazel-0.5.4/src/main/protobuf/BUILD:70:1: error executing shell command: 'cp 'bazel-out/local-opt/bin/src/main/protobuf/command_server_java_grpc_srcs.jar' 'bazel-out/local-opt/bin/src/main/protobuf/command_server_java_grpc_srcs.srcjar'' failed (Exit 127): bash failed: error executing command
(cd /tmp/bazel_DUyJtkXd/out/execroot/io_bazel && \
exec env - \
/bin/bash -c 'cp '\''bazel-out/local-opt/bin/src/main/protobuf/command_server_java_grpc_srcs.jar'\'' '\''bazel-out/local-opt/bin/src/main/protobuf/command_server_java_grpc_srcs.srcjar'\''').
/bin/bash: cp: command not found
Target //src:bazel failed to build
INFO: Elapsed time: 49.533s, Critical Path: 30.28s
+ fail 'Could not build Bazel'
+ local exitCode=1
+ [[ 1 = \0 ]]
I âm sure I have set the "build --action_env"
but it seems not propagate to the "exec env - " command !!!
@ulfjack is there any progress about the bug "action_env is not working for all actions" ??
because of the bug, compile.sh doesn't work. I think it's a very serious bug.
What makes you think that the error you posted is due to env variables?
@ulfjack
exec env - /bin/bash -c
this command clear all the env variables, and discard any action_env I set.
basic "cp" can not be found
What kind of setup do you have such that cp isn't in /bin or /usr/bin?
You should check if this works with a more recent bazel release. If I read the code correctly, this is from a genrule, and they should already forward the action env, even if not all actions do.
@ulfjack
I 'm sure "cp" command is not under /bin or /usr/bin. And I do not have root privilege. so could you please help me how to propogate "$PATH" to bazel BUILD.
I have tried every solution in the google (include --action_envăuse_default_shell_env), but does not work
Did you try with a more recent Bazel release?
@ulfjack , I used bazel 0.10.0.
Are there any news about this issue?
We have to build bazel with another gcc than /usr/bin/gcc. Then we run into the problem where process-wrapper fails for targets, since the local /usr/lib64/libstdc++.so is too old:
.../execroot/flexbs/_bin/process-wrapper: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found
.../execroot/flexbs/_bin/process-wrapper: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found
...
Our machine park runs several different Linux distributions, and the local gcc is generally too old for Bazel. So building Bazel with the local compiler and install it locally is not an option for us. Instead we build all our tools with our own gcc, installed on a network disk, which produces binaries that work on all other platforms.
Is it possible to build Bazel in a way where process-wrapper and other internal tools become independent of the environment, for instance by linking them statically or passing --rpath to the linker?
My problem turned out to have a simple solution: We can link process-wrapper statically when we build Bazel (by updating src/main/tools/BUILD).
@emusand Thatâs what I did. Apparently, current Bazel implementations are not in favor of customized Linux environments. And I donât think this will be fixed.
I have recently updated a bunch of actions to correctly take --action_env into account - the changes should all be in 0.14.0. If you know about specific actions that still don't do it correctly, please do let me know.
I would also be open to merging a patch that makes process-wrapper be statically linked by default. (And possibly change it so that it's a pure C binary and doesn't need to link against libstdc++.)
I certainly want Bazel to work better with unusual setups, within reason, but I won't be able to do all the work myself.
@emusand : can you share how you were able to get around this, what you had to add in the BUILD file?
Hi Rahul,
I added link option "-static", to statically link all tool binaries.
First I added parameter linkopts = ['static'] to the tool cc_binary rules
in the BUILD files. Then I realized that I just had to add option "bazel
--linkopt=-static" on the command line, without the need of patching the
BUILD files.
Den tis 14 aug. 2018 kl 16:16 skrev Rahul Roy notifications@github.com:
@emusand https://github.com/emusand : can you share how you were able
to get around this, what you had to add in the BUILD file?â
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/4137#issuecomment-412887148,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHm6KQ3fTOuTNmdDl-Jfuk4vmxrkCzG1ks5uQtuzgaJpZM4QlES-
.
Any update?
any update? This is still an issue as of latest release
I think @philwo was suggesting that local actions get the local PATH forwarded, even if it's not used for remote actions.
(That wouldn't help for LD_LIBRARY_PATH.)
If someone has a repro for me, I may be able to take a look.
repro is a bit tough, shove a really old gcc in /usr/ and install a new one in /usr/local/. Even if you set path to be /usr/local/bin and LD_LIBRARY_PATH to /usr/local/lib you'll hit certain steps that try to grab libstdc++ from /usr/lib (I have hit this with ray and tensor flow)
If you start from a centos 6.5 docker image, yum install gcc (4.8) into /usr/bin, then manually install gcc 4.9.3 into /usr/local/bin, you'll be unable to compile tensorflow 1.* or ray, even though both are compatible with 4.9.3 (and even if you setup your paths and still them to point to /usr/local/bin/gcc)
I am experiencing this issue (building ray via Bazel on ppc64le machine (Summit at Oak Ridge)). I tried the "link static" suggestion and my system did not like that (all sorts of missing libraries). I am wondering if there is a different workaround? E.g., if I wanted to hack the Bazel code/configuration (I am building Bazel from source), what would I change to implement the suggestion above to simply pass PATH and LD_LIBRARY_PATH to the step that uses process-wrapper? There is a hint above about "use_default_shell_env = True", but I'm not sure I can follow it. Would I just have to turn it to true in every call to run_shell()/run() in the Bazel's "*.bzl" files? (I tried something like that to no effect, but I may have done it wrong.) Something more? Thank you!
I have encountered very similar problems with building Ray via Bazel (v1.1.0). I tried different methods, and successfully solved it in some way (no need to hack Bazel code, just a bunch of ENV variable settings. Details are shared as below), which I believe can be applied to build other projects like TensorFlow. I am on a ppc64le machine using a gcc (v7.3.0) at a customized location, because the gcc at /usr/bin or /usr/local/bin is too old and I have no root privilege to upgrade it. Before we continue, make sure that PATH
and LD_LIBRARY_PATH
are properly set for the new gcc:
$ export PATH=/private/var/packages/gcc/7.3.0/bin:$PATH
$ export LD_LIBRARY_PATH=/private/var/packages/gcc/7.3.0/lib64:$LD_LIBRARY_PATH
The first (perhaps the most important) problem that I need to deal with is `GLIBCXX_3.4.21' not found error. An example of the error info is:
/private/var/.cache/bazel/_bazel_name/install/863382820ae9540178f3de18543a9280/_embedded_binaries/process-wrapper:
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found
(required by /private/var/.cache/bazel/_bazel_name/install/863382820ae9540178f3de18543a9280/_embedded_binaries/process-wrapper)
As indicated by this very github issue, this is caused by ENV var (esp. LD_LIBRARY_PATH) not propagated to process-wrapper, when we use Bazel to build some project like Ray. My workaround solution is to make process-wrapper statically linked with libstdc++
. To achieve this, I need to rebuild Bazel itself because process-wrapper is a helper tool that is bundled into the Bazel executable and is placed in the install base, the first time when Bazel is launched (a good explanation can be found here).
Let's dive into detailed steps. First, before we rebuild Bazel, we can check that process-wrapper is indeed dynamically linked to libstdc++
using ldd
command:
$ cd /private/var/.cache/bazel/_bazel_name/install/863382820ae9540178f3de18543a9280/_embedded_binaries/
$ ldd process-wrapper
linux-vdso64.so.1 => (0x00003fff9b490000)
libstdc++.so.6 => /private/var/packages/gcc/7.3.0/lib64/libstdc++.so.6 (0x00003fff9b260000)
libm.so.6 => /lib64/libm.so.6 (0x00003fff9b150000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00003fff9b110000)
libgcc_s.so.1 => /private/var/packages/gcc/7.3.0/lib64/libgcc_s.so.1 (0x00003fff9b0d0000)
libc.so.6 => /lib64/libc.so.6 (0x00003fff9aee0000)
/lib64/ld64.so.2 (0x00003fff9b4b0000)
The fact that libstdc++.so.6
appears in the output of ldd
confirmed that process-wrapper is dynamically linked to libstdc++
(similarly for libgcc_s.so.1
). We need to remove its dynamic dependencies on libstdc++.so.6
and libgcc_s.so.1
by rebuilding Bazel from source. After following step 1 and 2.1 in the instruction, I used the following command for step 2.2:
$ env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" BAZEL_LINKOPTS=-static-libstdc++:-static-libgcc BAZEL_LINKLIBS=-l%:libstdc++.a:-lm bash ./compile.sh
There are a few things to note in the above command. (1) I used the ENV variable BAZEL_LINKOPTS
to instruct the building process to make Bazel statically linked to libstdc++
and libgcc
. (2) I used ENV variable BAZEL_LINKLIBS
to specify the necessary the libraries. It's important to explicitly specify the static libstdc++ library -l%:libstdc++.a
here, because gcc
will still dynamically link the output binary to libstdc++
even if the options -static-libstdc++
and -lstdc++
are given (a more detailed explanation about this can be found here). After Bazel is successfully rebuilt, we could then launch the new Bazel executable and check the lib dependencies of process-wrapper
, which is placed at a different directory than before:
$ cd /private/var/.cache/bazel/_bazel_name/install/9d07c1e5d1f16ee5678323cb375d7c07/_embedded_binaries/
$ ldd process-wrapper
linux-vdso64.so.1 => (0x00003fffad4c0000)
libm.so.6 => /lib64/libm.so.6 (0x00003fffad3b0000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00003fffad370000)
libc.so.6 => /lib64/libc.so.6 (0x00003fffad180000)
/lib64/ld64.so.2 (0x00003fffad4e0000)
Notice that both libstdc++.so.6
and libgcc_s.so.1
disappeared. We succeeded! It means that we don't need LD_LIBRARY_PATH to correctly invoke process-wrapper
anymore.
Now we can build Ray (v0.7.7) using the new Bazel that we just rebuilt from previous step. Assuming that other library dependencies of Ray, such as Apache Arrow, have been properly installed (detailed on how to install them can be found here), I used the following command to build ray:
$ cd ray/python
$ env BAZEL_LINKOPTS=-static-libstdc++:-static-libgcc BAZEL_LINKLIBS=-l%:libstdc++.a:-lm BAZEL_CXXOPTS=-std=gnu++0x python setup.py bdist_wheel
Notice that I am still using ENV variables BAZEL_LINKOPTS and BAZEL_LINKLIBS to instruct Bazel to statically link libstdc++
and libgcc
when building Ray. This is because Bazel will create many intermediate binaries, and we still need to make sure they are all statically linked to libstdc++
and libgcc
. I also used another new ENV variable BAZEL_CXXOPTS to instruct Bazel to use option -std=gnu++0x
for gcc
. This is because by default gcc
will use a higher C++ standard (i.e., gnu++14), which would result in compilation errors when it tries to compile the plasma (for details, see here).
When the above step is successful, you will see a .whl
file called ray-0.7.7-cp37-cp37m-linux_ppc64le.whl
in the directory ray/python/dist
. The following command can be used to install Ray as a python library:
$ cd ray/python/dist
$ pip install ray-0.7.7-cp37-cp37m-linux_ppc64le.whl
Now, you can use Ray in python!
@forestliurui this looks SUPER helpful, I will try to use the same process to build ray in my environment
@forestliurui this looks SUPER helpful, I will try to use the same process to build ray in my environment
@timkpaine Thanks. I am happy if this could be helpful to others. @pgraf Maybe this is also helpful in your case.
I just ran into this again: The generate-xml.sh
shell script contained in Bazel is also run via the process-wrapper but any stdout/stderr is supressed. Hence all I got was:
ERROR: /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/TensorFlow/tensorflow-r2.4/tensorflow/c/BUILD:613:11: failed (Exit 1): generate-xml.sh failed: error executing command
(cd /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmpb7zFlQ-bazel-tf/20db8ac50b74c328e6dea9b20829b459/execroot/org_tensorflow && \
exec env - \
PATH=/usr/bin:/bin \
TEST_BINARY=tensorflow/c/c_test \
TEST_NAME=//tensorflow/c:c_test \
TEST_SHARD_INDEX=0 \
TEST_TOTAL_SHARDS=0 \
/dev/shm/output_user_root/20db8ac50b74c328e6dea9b20829b459/execroot/org_tensorflow/external/bazel_tools/tools/test/generate-xml.sh bazel-out/ppc-opt/testlogs/tensorflow/c/c_test/test.log bazel-out/ppc-opt/testlogs/tensorflow/c/c_test/test.xml 21 0)
So only a very generic failure. After days of digging I modified the Bazel sources so I found that there are some log files with that executions stdout/stderr and that told me it is the process-wrapper again.
So take this as another datapoint to do something about this please
Most helpful comment
I have encountered very similar problems with building Ray via Bazel (v1.1.0). I tried different methods, and successfully solved it in some way (no need to hack Bazel code, just a bunch of ENV variable settings. Details are shared as below), which I believe can be applied to build other projects like TensorFlow. I am on a ppc64le machine using a gcc (v7.3.0) at a customized location, because the gcc at /usr/bin or /usr/local/bin is too old and I have no root privilege to upgrade it. Before we continue, make sure that
PATH
andLD_LIBRARY_PATH
are properly set for the new gcc:Rebuild Bazel (in order to statically link process-wrapper with libstdc++)
The first (perhaps the most important) problem that I need to deal with is `GLIBCXX_3.4.21' not found error. An example of the error info is:
As indicated by this very github issue, this is caused by ENV var (esp. LD_LIBRARY_PATH) not propagated to process-wrapper, when we use Bazel to build some project like Ray. My workaround solution is to make process-wrapper statically linked with
libstdc++
. To achieve this, I need to rebuild Bazel itself because process-wrapper is a helper tool that is bundled into the Bazel executable and is placed in the install base, the first time when Bazel is launched (a good explanation can be found here).Let's dive into detailed steps. First, before we rebuild Bazel, we can check that process-wrapper is indeed dynamically linked to
libstdc++
usingldd
command:The fact that
libstdc++.so.6
appears in the output ofldd
confirmed that process-wrapper is dynamically linked tolibstdc++
(similarly forlibgcc_s.so.1
). We need to remove its dynamic dependencies onlibstdc++.so.6
andlibgcc_s.so.1
by rebuilding Bazel from source. After following step 1 and 2.1 in the instruction, I used the following command for step 2.2:There are a few things to note in the above command. (1) I used the ENV variable
BAZEL_LINKOPTS
to instruct the building process to make Bazel statically linked tolibstdc++
andlibgcc
. (2) I used ENV variableBAZEL_LINKLIBS
to specify the necessary the libraries. It's important to explicitly specify the static libstdc++ library-l%:libstdc++.a
here, becausegcc
will still dynamically link the output binary tolibstdc++
even if the options-static-libstdc++
and-lstdc++
are given (a more detailed explanation about this can be found here). After Bazel is successfully rebuilt, we could then launch the new Bazel executable and check the lib dependencies ofprocess-wrapper
, which is placed at a different directory than before:Notice that both
libstdc++.so.6
andlibgcc_s.so.1
disappeared. We succeeded! It means that we don't need LD_LIBRARY_PATH to correctly invokeprocess-wrapper
anymore.Build Ray (using the new Bazel we just rebuilt)
Now we can build Ray (v0.7.7) using the new Bazel that we just rebuilt from previous step. Assuming that other library dependencies of Ray, such as Apache Arrow, have been properly installed (detailed on how to install them can be found here), I used the following command to build ray:
Notice that I am still using ENV variables BAZEL_LINKOPTS and BAZEL_LINKLIBS to instruct Bazel to statically link
libstdc++
andlibgcc
when building Ray. This is because Bazel will create many intermediate binaries, and we still need to make sure they are all statically linked tolibstdc++
andlibgcc
. I also used another new ENV variable BAZEL_CXXOPTS to instruct Bazel to use option-std=gnu++0x
forgcc
. This is because by defaultgcc
will use a higher C++ standard (i.e., gnu++14), which would result in compilation errors when it tries to compile the plasma (for details, see here).When the above step is successful, you will see a
.whl
file calledray-0.7.7-cp37-cp37m-linux_ppc64le.whl
in the directoryray/python/dist
. The following command can be used to install Ray as a python library:Now, you can use Ray in python!