I'm on the grpc/grpc project. We recently switched to using thread-local on one of our key structs. Although this project builds fine with make on Mac and Linux, it stopped building with bazel on Mac after we made this switch. Looking into it, it seems like tensorflow/serving also has the same issue.
$ git clone [email protected]:grpc/grpc
$ cd grpc
$ bazel build //:grpc
bazel info release): release 0.8.1-homebrewRelated issues are tensorflow/serving#1 and grpc/grpc#13856
$ bazel build --verbose_failures //:grpc
INFO: Analysed target //:grpc (0 packages loaded).
INFO: Found 1 target...
ERROR: /Users/vpai/Git/grpc/BUILD:224:1: Linking of rule '//:grpc' failed (Exit 1): cc_wrapper.sh failed: error executing command
(cd /private/var/tmp/_bazel_vpai/5285458b308b3aadd65cb54a5ac76b0c/execroot/com_github_grpc_grpc && \
exec env - \
APPLE_SDK_PLATFORM=MacOSX \
APPLE_SDK_VERSION_OVERRIDE=10.13 \
PATH=/Users/vpai/google-cloud-sdk/bin:/usr/local/git/current/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/go/bin \
TMPDIR=/var/folders/xd/2k15ssh10lz6088_k_2lddtw007xtq/T/ \
XCODE_VERSION_OVERRIDE=9.1.0 \
external/local_config_cc/cc_wrapper.sh -fobjc-link-runtime -Wl,-S -shared -o bazel-out/darwin-fastbuild/bin/libgrpc.so bazel-out/darwin-fastbuild/bin/_objs/grpc/src/core/lib/surface/init.o bazel-out/darwin-fastbuild/bin/_objs/grpc/src/core/plugin_registry/grpc_plugin_registry.o -pthread -headerpad_max_install_names -lc++ -no-canonical-prefixes -undefined dynamic_lookup)
Use --sandbox_debug to see verbose messages from the sandbox
clang: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
ld: illegal thread local variable reference to regular symbol __ZN9grpc_core7ExecCtx9exec_ctx_E for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Target //:grpc failed to build
INFO: Elapsed time: 0.504s, Critical Path: 0.31s
FAILED: Build did NOT complete successfully
Can you isolate the difference between your make and bazel link invocation? You can execute bazel with --subcommands to emit all command lines. You can then compare which options are different between your make build and bazel build.
Hi there! Thanks for the response. Here's an example command from make that actually builds the relevant .o; note that generating the .a is just a matter of libtool in that case:
c++ -Ithird_party/protobuf/src -Ithird_party/googletest/googletest/include -Ithird_party/googletest/googlemock/include -Ithird_party/boringssl/include -Ithird_party/cares -Ithird_party/cares/cares -g -Wall -Wextra -Werror -Wno-long-long -Wno-unused-parameter -DOSATOMIC_USE_INLINED=1 -O2 -fPIC -I. -Iinclude -I/Users/vpai/Git/grpc/gens -I/usr/local/include -DNDEBUG -DINSTALL_PREFIX=\"/usr/local\" -std=c++11 -stdlib=libc++ -fno-exceptions -fno-rtti -fno-exceptions -MMD -MF /Users/vpai/Git/grpc/objs/opt/src/core/lib/iomgr/exec_ctx.dep -c -o /Users/vpai/Git/grpc/objs/opt/src/core/lib/iomgr/exec_ctx.o src/core/lib/iomgr/exec_ctx.cc
Here's the result of finding the version from the c++ command:
$ c++ -v
Apple LLVM version 9.0.0 (clang-900.0.38)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
And here's what I get from bazel in building the object file (which is fine):
SUBCOMMAND: # //:grpc_base_c [action 'Compiling src/core/lib/iomgr/exec_ctx.cc']
(cd /private/var/tmp/_bazel_vpai/c6cc7641f7bf8c8335597635bcca2d28/execroot/com_github_grpc_grpc && \
exec env - \
APPLE_SDK_PLATFORM=MacOSX \
APPLE_SDK_VERSION_OVERRIDE=10.13 \
PATH=/Users/vpai/google-cloud-sdk/bin:/usr/local/git/current/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/go/bin \
TMPDIR=/var/folders/xd/2k15ssh10lz6088_k_2lddtw007xtq/T/ \
XCODE_VERSION_OVERRIDE=9.1.0 \
external/local_config_cc/wrapped_clang '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -O0 -DDEBUG '-std=c++11' -iquote . -iquote bazel-out/darwin-fastbuild/genfiles -iquote external/bazel_tools -iquote bazel-out/darwin-fastbuild/genfiles/external/bazel_tools -iquote external/com_google_absl -iquote bazel-out/darwin-fastbuild/genfiles/external/com_google_absl -iquote external/com_github_madler_zlib -iquote bazel-out/darwin-fastbuild/genfiles/external/com_github_madler_zlib -isystem include -isystem bazel-out/darwin-fastbuild/genfiles/include -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/com_github_madler_zlib -isystem bazel-out/darwin-fastbuild/genfiles/external/com_github_madler_zlib -MD -MF bazel-out/darwin-fastbuild/bin/_objs/grpc_base_c/src/core/lib/iomgr/exec_ctx.d '-frandom-seed=bazel-out/darwin-fastbuild/bin/_objs/grpc_base_c/src/core/lib/iomgr/exec_ctx.o' -D__CLANG_SUPPORT_DYN_ANNOTATION__ '-isysroot __BAZEL_XCODE_SDKROOT__' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c src/core/lib/iomgr/exec_ctx.cc -o bazel-out/darwin-fastbuild/bin/_objs/grpc_base_c/src/core/lib/iomgr/exec_ctx.o)
And then what I get from bazel in actually doing the library build:
SUBCOMMAND: # //:grpc [action 'Linking libgrpc.so']
(cd /private/var/tmp/_bazel_vpai/c6cc7641f7bf8c8335597635bcca2d28/execroot/com_github_grpc_grpc && \
exec env - \
APPLE_SDK_PLATFORM=MacOSX \
APPLE_SDK_VERSION_OVERRIDE=10.13 \
PATH=/Users/vpai/google-cloud-sdk/bin:/usr/local/git/current/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/go/bin \
TMPDIR=/var/folders/xd/2k15ssh10lz6088_k_2lddtw007xtq/T/ \
XCODE_VERSION_OVERRIDE=9.1.0 \
external/local_config_cc/cc_wrapper.sh -fobjc-link-runtime -Wl,-S -shared -o bazel-out/darwin-fastbuild/bin/libgrpc.so bazel-out/darwin-fastbuild/bin/_objs/grpc/src/core/lib/surface/init.o bazel-out/darwin-fastbuild/bin/_objs/grpc/src/core/plugin_registry/grpc_plugin_registry.o -pthread -headerpad_max_install_names -lc++ -no-canonical-prefixes -undefined dynamic_lookup)
ERROR: /Users/vpai/Git/grpc-bazel-ready/BUILD:224:1: Linking of rule '//:grpc' failed (Exit 1): cc_wrapper.sh failed: error executing command
(cd /private/var/tmp/_bazel_vpai/c6cc7641f7bf8c8335597635bcca2d28/execroot/com_github_grpc_grpc && \
exec env - \
APPLE_SDK_PLATFORM=MacOSX \
APPLE_SDK_VERSION_OVERRIDE=10.13 \
PATH=/Users/vpai/google-cloud-sdk/bin:/usr/local/git/current/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/go/bin \
TMPDIR=/var/folders/xd/2k15ssh10lz6088_k_2lddtw007xtq/T/ \
XCODE_VERSION_OVERRIDE=9.1.0 \
external/local_config_cc/cc_wrapper.sh -fobjc-link-runtime -Wl,-S -shared -o bazel-out/darwin-fastbuild/bin/libgrpc.so bazel-out/darwin-fastbuild/bin/_objs/grpc/src/core/lib/surface/init.o bazel-out/darwin-fastbuild/bin/_objs/grpc/src/core/plugin_registry/grpc_plugin_registry.o -pthread -headerpad_max_install_names -lc++ -no-canonical-prefixes -undefined dynamic_lookup)
Use --sandbox_debug to see verbose messages from the sandbox
clang: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
ld: illegal thread local variable reference to regular symbol __ZN9grpc_core7ExecCtx9exec_ctx_E for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Target //:grpc failed to build
There are certainly differences but nothing seems to speak to thread-local variables as far as I can tell.
We've had some further discussion about this in grpc/grpc#13856 (cc @yashykt) . I can make this work by using thread-local variables based on pthread_key functions rather than using the __thread declaration. This seems like what we would expect from a versioning issue. Is there a way to tell what version of clang is being used with bazel or any other versioning info that would help here?
For osx-with-xcode toolchain it's a little bit complicated, you have to take a look into generated external/local_config_cc/cc_wrapper.sh file, you'll see the compiler invocation there.
Hmm, found that and it's just an invocation of /usr/bin/clang which is the same as my standard c++
$ /usr/bin/clang -v
Apple LLVM version 9.0.0 (clang-900.0.38)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
I have a workaround, if that helps. We have been using thread-local for quite a while in gRPC (using our own portable macros that effectively expand to __thread on Mac) but this was our first use of that in a header file. grpc/grpc#13916 removes the use of that in a header file and keeps it exclusively in the matching source file, and bazel builds with that just fine on Mac.
We can use this workaround for now, but I hope that we get a longer-term solution that allows us to use thread-local in header files at some point. I know that we're not the only ones to see this issue when building with bazel on mac.
Looks like this is still a problem with bazel 1.0
I've just run across this problem using grpc 1.25.0 on Mac. If you clone https://gitlab.com/BuildGrid/buildbox/buildbox-common/blob/edbaunton/bazel/WORKSPACE and run bazel build //... the problem can be reproduced:
ld: illegal thread local variable reference to regular symbol __ZN9grpc_core7ExecCtx9exec_ctx_E for architecture x86_64
@edbaunton your problem might have been solved by https://github.com/grpc/grpc/pull/20510 (available in 1.26.0 release).
@jtattermusch it's (still) failing for me with 1.26 as well as master:
$ bazel build :all
INFO: Running bazel wrapper (see //tools/bazel for details), bazel version 1.0.0 will be used instead of system-wide bazel installation.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 615 0 615 0 0 2123 0 --:--:-- --:--:-- --:--:-- 2120
100 39.2M 100 39.2M 0 0 1896k 0 0:00:21 0:00:21 --:--:-- 1302k
Starting local Bazel server and connecting to it...
INFO: Writing tracer profile to '/private/var/tmp/_bazel_nathan/029e4463ebaf163bdbac3eb656d38193/command.profile.gz'
INFO: SHA256 (https://boringssl.googlesource.com/boringssl/+archive/83da28a68f32023fd3b95a8ae94991a07b1f6c62.tar.gz) = 7b4fafe3e4af9d2acb33dfe18f22cc3b07d23bac3a53d3096d33ec0427268883
DEBUG: Rule 'boringssl' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "7b4fafe3e4af9d2acb33dfe18f22cc3b07d23bac3a53d3096d33ec0427268883"
DEBUG: Call stack for the definition of repository 'boringssl' which is a http_archive (rule definition at /private/var/tmp/_bazel_nathan/029e4463ebaf163bdbac3eb656d38193/external/bazel_tools/tools/build_defs/repo/http.bzl:262:16):
- /private/var/tmp/_bazel_nathan/029e4463ebaf163bdbac3eb656d38193/external/com_github_grpc_grpc/bazel/grpc_deps.bzl:125:9
- /source/grpc/grpc/bazel/test/python_test_repo/WORKSPACE:8:1
INFO: Analyzed 19 targets (63 packages loaded, 3530 targets configured).
INFO: Found 19 targets...
ERROR: /private/var/tmp/_bazel_nathan/029e4463ebaf163bdbac3eb656d38193/external/com_github_grpc_grpc/BUILD:329:1: Linking of rule '@com_github_grpc_grpc//:grpc' failed (Exit 1) cc_wrapper.sh failed: error executing command external/local_config_cc/cc_wrapper.sh -lc++ -fobjc-link-runtime -Wl,-S -shared -o bazel-out/darwin-fastbuild/bin/external/com_github_grpc_grpc/libgrpc.so ... (remaining 8 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
ld: illegal thread local variable reference to regular symbol __ZN9grpc_core7ExecCtx9exec_ctx_E for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
INFO: Elapsed time: 78.994s, Critical Path: 21.15s
INFO: 200 processes: 200 darwin-sandbox.
FAILED: Build did NOT complete successfully
$ clang --version
Apple clang version 11.0.0 (clang-1100.0.33.16)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
$ git rev-parse HEAD
528dbb13e400654f9c73d8e26e70be48f27052c4
$ bazel version
Bazelisk version: v1.2.1
INFO: Running bazel wrapper (see //tools/bazel for details), bazel version 1.0.0 will be used instead of system-wide bazel installation.
Build label: 1.0.0
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Oct 10 10:19:27 2019 (1570702767)
Build timestamp: 1570702767
Build timestamp as int: 1570702767
Ok, the fix is actually in https://github.com/grpc/grpc/pull/13929 (which has been merged a long time ago).
For me, the build does pass (bazel 1.0.0 and clang Apple LLVM version 9.1.0)
bazel build :all -> INFO: Build completed successfully, 1361 total actions
Can you double check that your build actually defines GPR_PTHREAD_TLS (and not GPR_GCC_TLS) as seen here: https://github.com/grpc/grpc/pull/13929/files#diff-dd49c38c34f9401e10f1921931d6c121R215 . For that to happen, you need to have build --copt -DGRPC_BAZEL_BUILD in your bazelrc
https://github.com/grpc/grpc/pull/13929/files#diff-4c3a4e61471476647ad1c0987a744414R2
@jtattermusch yeah, that works but now this configuration option leaks into my project - i'm trying to use the grpc rules and not just build grpc. is there some way to embed this into the grpc workspace?
The #define is checked in a header, so presumably anyone who includes the header would also need to set the requisite define. I am not a bazel expert but for a gRPC user to be able to get this flag transparently, the gRPC rules might be able to leverage this new Skylark functionality this.
Should this flag be enabled everywhere or only on Mac?
@NathanHowell I'm happy to accept patches (agreed that specifying the --copt in the .rc file might not be ideal).
Looks like defining --copt=-DGRPC_BAZEL_BUILD in our bazel.rc file is not ideal as the users don't inherit this setting (and they end up with wrong type of thread locals being used on mac).
The
#defineis checked in a header, so presumably anyone who includes the header would also need to set the requisite define. I am not a bazel expert but for a gRPC user to be able to get this flag transparently, the gRPC rules might be able to leverage this new Skylark functionality this.Should this flag be enabled everywhere or only on Mac?
Only on mac
Is the defines attribute not enough?
https://docs.bazel.build/versions/master/be/c-cpp.html#cc_library.defines
It propagates the define to every dependent rule.
FYI you also sometimes need build --host_copt=-DGRPC_BAZEL_BUILD not just --copt in user .bazelrc to fix the issue.
FYI you also sometimes need
build --host_copt=-DGRPC_BAZEL_BUILDnot just--coptin user.bazelrcto fix the issue.
I think that was before https://github.com/grpc/grpc/pull/24247. See https://github.com/grpc/grpc/issues/13856 for more details.
Nope, I have my build fail on master unless I pass both options, also the --copt=-DGRPC_BAZEL_BUILD is still in https://github.com/grpc/grpc/blob/master/tools/bazel.rc#L4 so I'd assume it would get removed once not required anymore.
Nope, I have my build fail on master unless I pass both options, also the
--copt=-DGRPC_BAZEL_BUILDis still in https://github.com/grpc/grpc/blob/master/tools/bazel.rc#L4 so I'd assume it would get removed once not required anymore.
What is the exact command you're running in "your build"?
I am including grpc as a dependency of my project. I can't seem to reproduce the issue using grpc repository directly. Still I wanted to put the host_copt out there for people using non-master versions. I am not sure how/where the difference between grpc from a git clone and from building it as an http_archive appears in my pipeline.
Ok, when I tried master, I tried it a week ago, when the fix was not yet landed. I can confirm that --copt and --host_copt are only needed for versions prior to the fix. Current master builds successfully without any additions to bazelrc on mac both in darwin_fastbuild and host.
I can confirm this was fixed by https://github.com/grpc/grpc/pull/24247. More context is here: https://github.com/grpc/grpc/issues/13856
I don't have the permissions to close this issue myself, so please feel free to do that.
@vjpai
I believe this is also the cause of https://github.com/abseil/abseil-cpp/issues/848
The summary here is this:
cc_library targets have an implicit output for a "nodeps" shared library (.so). This artifact is used with cc_test with --dynamic_mode set. When you build //package:all, these will be included.
Due to the two-level namespaces used by Mach-O, you must link against all shared libraries containing the symbols you reference. This is directly at odds with these "nodeps" artifacts, so the crosstool sets -undefined dynamic_lookup.
This mode is deprecated on every platform but macOS, meanwhile TLS is a new(er) feature on Apple platforms.
For whatever reason, they do not work together. I think the fix here is to just remove support for supports_dynamic_linker feature in the Apple crosstools.
SGTM to remove that feature.
Just for clarity, I'd note that despite the rather generic-sounding name, the bazel "supports_dynamic_linker" feature basically only controls whether these weird nodeps libraries are created (and used for cc_test). It does not impact the ability to create dynamic libraries in any other context.
(@jyknight: thanks for that clarification, that's very reassuring.)
Based on the advice of @derekmauro, disabling supports_dynamic_linker on macOS by specifying the following in .bazelrc appears to fix the problem:
build:macos --features=-supports_dynamic_linker
This does mean cc_test targets are no longer dynamically linked, which is a bit unfortunate in terms of disk space usage, though.
I was experiencing the same problem and I was able to get that workaround to get me a successful test run. (Though to get my bazelrc file to see the rule I had to add --enable_platform_specific_config to my command line invocation of bazel. I'm on bazel version 3.3.0).
This workaround also uncovered a latent bug I had where a header file had a function implementation that wasn't inlined and I started getting duplicate symbol errors. That were easy enough to fix in my own code.
It now occurs to me that I had for a long time been building on MacOS just fine, because I habitually build everything with --dynamic_mode=off. (That has been helping me discover several subtle bugs early, so I consider that good practice.) It was only when I started a new project and didn't provide that flag that I started hitting the MacOS TLS bug.
Is there a possibility of retaining dynamic linking of tests on macOS? That doesn't seem like something that should be fundamentally impossible.
With the current shipping version(s) of Xcode, it is impossible.
While there are some tests that will happen to work, it's not a mode that Bazel can/should support on Darwin.
Given that -undefined dynamic_lookup is deprecated, it seems unlikely that Apple will fix this.
I think -undefined dynamic_lookup is also how Python extension modules are built on macOS, unless the information at this link is outdated:
https://github.com/boostorg/build/issues/69
Separately, though, what prevents bazel from providing the shared library dependencies to the linker?
@trybka It looks like your fix (https://github.com/bazelbuild/bazel/commit/ec5553352f2f661d39ac4cf665dd9b3c779e614c) is causing breakages in rules_go and rules_rust on macOS. Could you take a look?
At first glance I'd've said that the BUILD file is simply broken: https://github.com/bazelbuild/rules_rust/blob/8c388e1b816d0a7e5a7d3cc5d213be7f35299cf5/examples/ffi/rust_calling_c/c/BUILD#L23
A cc_library named "native_matrix_so" implicitly generates an output named "libnative_matrix_so.so" (along with "libnative_natrix_so.a"), yet, this build file declares a separate cc_binary rule which has the same name. Thus, the output name conflicts. So, the failure looks expected. Only... apparently it was working before. That it was _working_ seems unexpected to me!
Ahh...so, here's the Bazel code which registers the failure outputs:
https://github.com/bazelbuild/bazel/blob/308bce36cba46095fe41866e703710035ddddada/src/main/java/com/google/devtools/build/lib/rules/cpp/CcLibrary.java#L254
That code generates a dummy "error" output, when !supportsDynamicLinker. But, in the case where it _is_ supported, it only generates a lib*.so output when srcs is not empty. So...that explains things here.
This is unfortunate -- and perhaps could be improved in Bazel, so that the behavior is more consistent. But, this also seems like a bug in the example BUILD file to have written rules with conflicting (or potentially-conflicting) output names, and can be easily solved by renaming one of those two rules.
This is a different issue. Looking at the rules_go build file:
https://github.com/bazelbuild/rules_go/blob/5e733237761fbe70d10afd3156e73355392a66b2/tests/legacy/examples/cgo/cc_dependency/BUILD.bazel#L22
For some reason, this seems to go to extra effort to do something broken -- it's explicitly using the nodeps shared-library output from cc_library (via the filegroup asking for output_group = "dynamic_library",). This is odd, and I don't know any good reason why it'd want to do that. I imagine someone was just confused and wrote this accidentally?
The typical way to do this -- and what this example ought to be doing I think -- is to use a cc_binary instead of the cc_library/filegroup pair. Something like:
cc_binary(
name = "c_version_so",
srcs = ["c_version.c", "c_version.h"],
linkshared=True,
)