TensorFlow Cython extension fails to link on Mac with Bazel 0.23.0 but not with Bazel 0.22.0.
Repro, may require installing TF's build deps first.
$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ yes '' | ./configure # Choose default options.
$ bazel build tensorflow/python:framework_fast_tensor_util
With bazel 0.23.0, the build action seems to be:
external/local_config_cc/cc_wrapper.sh -lc++ -fobjc-link-runtime -shared -o bazel-out/darwin-opt/bin/tensorflow/python/framework/fast_tensor_util.so -Wl,-force_load,bazel-out/darwin-opt/bin/tensorflow/python/_objs/framework/fast_tensor_util.so/fast_tensor_util.o -headerpad_max_install_names -no-canonical-prefixes '-mmacosx-version-min=10.14')
which produces
...
INFO: From Compiling tensorflow/python/framework/fast_tensor_util.cpp:
In file included from bazel-out/darwin-opt/genfiles/tensorflow/python/framework/fast_tensor_util.cpp:581:
In file included from bazel-out/darwin-opt/genfiles/external/local_config_python/numpy_include/numpy/arrayobject.h:4:
In file included from bazel-out/darwin-opt/genfiles/external/local_config_python/numpy_include/numpy/ndarrayobject.h:12:
In file included from bazel-out/darwin-opt/genfiles/external/local_config_python/numpy_include/numpy/ndarraytypes.h:1822:
bazel-out/darwin-opt/genfiles/external/local_config_python/numpy_include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it with " \
^
1 warning generated.
ERROR: /Users/phawkins/p/jax/tensorflow/tensorflow/python/BUILD:6155:1: Linking of rule '//tensorflow/python:framework/fast_tensor_util.so' failed (Exit 1)
Undefined symbols for architecture x86_64:
"_PyBaseObject_Type", referenced from:
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_1AppendBFloat16ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_3AppendFloat16ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_5AppendFloat32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_7AppendFloat64ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_9AppendInt32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_11AppendUInt32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_13AppendInt64ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
...
"_PyBuffer_Release", referenced from:
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_1AppendBFloat16ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__Pyx__GetBufferAndValidate(bufferinfo*, _object*, __Pyx_TypeInfo*, int, int, int, __Pyx_BufFmt_StackElem*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_3AppendFloat16ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_5AppendFloat32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_7AppendFloat64ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_9AppendInt32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
__pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_11AppendUInt32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
...
"_PyBytes_FromStringAndSize", referenced from:
_PyInit_fast_tensor_util in fast_tensor_util.o
"_PyCFunction_NewEx", referenced from:
_PyInit_fast_tensor_util in fast_tensor_util.o
"_PyCFunction_Type", referenced from:
__Pyx_PyObject_Append(_object*, _object*) in fast_tensor_util.o
__Pyx_PyObject_CallOneArg(_object*, _object*) in fast_tensor_util.o
...
On bazel 0.22.0, the corresponding build action is:
external/local_config_cc/cc_wrapper.sh -lc++ -fobjc-link-runtime -shared -o bazel-out/darwin-opt/bin/tensorflow/python/framework/fast_tensor_util.so -Wl,-force_load,bazel-out/darwin-opt/bin/tensorflow/python/_objs/framework/fast_tensor_util.so/fast_tensor_util.o -headerpad_max_install_names -no-canonical-prefixes -undefined dynamic_lookup
Note in particular the -undefined dynamic_lookup seems to have gone missing, and I think that's causing the link errors.
(Unrelated: the additional mmacosx-min-version is also surprising to me — is there a way to set that minimum version if targeting an older Mac OS than the build host?)
Mac OS 10.14.3
bazel info release?$ bazel info release
Starting local Bazel server and connecting to it...
release 0.23.0
git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?$ git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
https://github.com/tensorflow/tensorflow.git
f1bbf1d83e88e044a10066ac7fe1975bf76c9c58
f1bbf1d83e88e044a10066ac7fe1975bf76c9c58
No.
I'm likely hitting the same bug on bazel 0.23.1 (see link above).
Problem persists at bazel HEAD (a24b5f389e54d631cfc91ae55b4001e01aa38931).
I can confirm that adding -undefined dynamic_lookup makes the build not fail (at this for @protobuf_archive//:python/google/protobuf/internal/_api_implementation.so), though of course that isn't much of a surprise. Trying a full TensorFlow build now. Here's the hack I used to add -undefined dynamic_lookup:
diff --git a/tools/osx/crosstool/cc_toolchain_config.bzl.tpl b/tools/osx/crosstool/cc_toolchain_config.bzl.tpl
index 0a1d49986d..603d0ee25b 100644
--- a/tools/osx/crosstool/cc_toolchain_config.bzl.tpl
+++ b/tools/osx/crosstool/cc_toolchain_config.bzl.tpl
@@ -5379,7 +5379,7 @@ def _impl(ctx):
ACTION_NAMES.objc_executable,
ACTION_NAMES.objcpp_executable,
],
- flag_groups = [flag_group(flags = ["-headerpad_max_install_names"])],
+ flag_groups = [flag_group(flags = ["-undefined", "dynamic_lookup", "-headerpad_max_install_names"])],
with_features = [with_feature_set(not_features = [
"bitcode_embedded",
"bitcode_embedded_markers",
The above hack to add -undefined dynamic_lookup seems to make TensorFlow work.
@hlopko This sounds like an issue with cpp rules and crosstool.
Definitely C++ rules bug. I think it was introduced in January (https://github.com/bazelbuild/bazel/commit/2d0e27e8bc7452758e8f50b51fd470efb8111e1f), will verify once I have my mac with me. I'm surprised this was not reported before (and not covered by any of our tests :( ).
I can approve the bug! I first thought it had something to do with Xcode.
So the workaround is to run Bazel with BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 environment variable. This bug only affects the toolchain that is used when Bazel detects Xcode, and this toolchain works for both C++ and ObjC. If you use the C++ only toolchai, it will use -undefined dynamic_lookup as expected.
Given there is a workaround that works somewhat (it doesn't when somebody has a build with ObjC), is the cherrypick into 0.24 or patch release of 0.23 warranted @katre?
Adding release blocker label to increase visibility, but feel free to remove and close this issue when you decide not to cherrypick. Thanks!
Is this error Tensorflow-specific, or for all macOS builds of c++/objc?
Nope, affects all c++ builds on mac that involve shared libraries with undefined symbols (see the test in https://github.com/bazelbuild/bazel/commit/314cf1f9e4b332955c4800b2451db4e926c3e092).
I will cherrypick this into 0.24.0 (once the other release-blocker is fixed).
To use BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 as an environment variable does not work for me.
So it still fails? Can you share the generated command (run Bazel with -s)?
Yes, theoretically I could, but it outputs a quite unreadable huge mess. I ran bazel build -s --config=opt //tensorflow/tools/pip_package:build_pip_package
@rudolfwilliam What if you just build @protobuf_archive//:python/google/protobuf/internal/_api_implementation.so as I did above?
@girving Thank you for the advice, but it does not work either! Maybe I am just missing something.
I just came across this bug when building the pip for Tensorflow Federated. First time using Bazel.
@girving How would I apply this option: -undefined dynamic_lookup In the bazel build? I realize that this is a real noob question.
@rudolfwilliam The point of building less is that you get a smaller -s output, not that it works.
Ok, yes you are right, the output gets a lot smaller. But I really do not know what all that means:
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Analysed target @protobuf_archive//:python/google/protobuf/internal/_api_implementation.so (15 packages loaded, 142 targets configured).
INFO: Found 1 target...
SUBCOMMAND: # @protobuf_archive//:python/google/protobuf/internal/_api_implementation.so [action 'Linking external/protobuf_archive/python/google/protobuf/internal/_api_implementation.so']
(cd /private/var/tmp/_bazel_klaus-rudolfkladny/e1bca1ffcaa1597d473e8195dd072d14/execroot/org_tensorflow && \
exec env - \
APPLE_SDK_PLATFORM=MacOSX \
APPLE_SDK_VERSION_OVERRIDE=10.14 \
PATH='/anaconda3/bin:/Users/klaus-rudolfkladny/.cargo/bin:/Library/Frameworks/Python.framework/Versions/3.6/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/usr/local/share/dotnet:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Applications/Xamarin Workbooks.app/Contents/SharedSupport/path-bin' \
PYTHON_BIN_PATH=/anaconda3/bin/python \
PYTHON_LIB_PATH=/anaconda3/lib/python3.6/site-packages \
TF_DOWNLOAD_CLANG=1 \
TF_NEED_CUDA=0 \
TF_NEED_OPENCL_SYCL=0 \
TF_NEED_ROCM=0 \
XCODE_VERSION_OVERRIDE=10.1.0 \
external/local_config_download_clang/cc_wrapper.sh -lc++ -fobjc-link-runtime -shared -o bazel-out/darwin-opt/bin/external/protobuf_archive/python/google/protobuf/internal/_api_implementation.so -Wl,-force_load,bazel-out/darwin-opt/bin/external/protobuf_archive/_objs/python/google/protobuf/internal/_api_implementation.so/api_implementation.o -headerpad_max_install_names -no-canonical-prefixes '-mmacosx-version-min=10.14')
After that I receive the usual error.
@rudolfwilliam that log suggests that you're still using the C++ + ObjC toolchain. What is your bazel version? Did you run bazel as BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 bazel build //@protobuf_archive//:python/google/protobuf/internal/_api_implementation.so?
TensorFlow is also currently broken on macos by this issue:
https://buildkite.com/bazel/tensorflow/builds/2410#5888e757-f673-4295-a3cf-922ae3d7026d
Is this going to be fixed in 0.24.0?
Yes, I am waiting for a fix for another release-blocker before creating the next candidate.
@hlopko I am using bazel version 0.23.2 which apparently is the most recent version. Yes, I set BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 as an environment variable.
Then I have no idea what's going on. I cannot reproduce this myself. 0.24 will be released today-ish, and it has a fix. So let's close this issue, and if your issue persists in 0.24, pls create a new issue. Thank you all!
Please follow #6968 for status on the release of 0.24.0. Note that a new release blocker has been identified.
Also note that you can (and should!) test your use case with the 0.24.0rc7 that was sent out a few days ago: https://releases.bazel.build/0.24.0/rc7/index.html
Most helpful comment
Yes, I am waiting for a fix for another release-blocker before creating the next candidate.