Bazel: TensorFlow Cython extension fails to link on Mac with Bazel 0.23.0 but not with Bazel 0.22.0.

Created on 2 Mar 2019  Â·  28Comments  Â·  Source: bazelbuild/bazel

Description of the problem / feature request:

TensorFlow Cython extension fails to link on Mac with Bazel 0.23.0 but not with Bazel 0.22.0.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Repro, may require installing TF's build deps first.

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ yes '' | ./configure  # Choose default options.
$ bazel build tensorflow/python:framework_fast_tensor_util

With bazel 0.23.0, the build action seems to be:

  external/local_config_cc/cc_wrapper.sh -lc++ -fobjc-link-runtime -shared -o bazel-out/darwin-opt/bin/tensorflow/python/framework/fast_tensor_util.so -Wl,-force_load,bazel-out/darwin-opt/bin/tensorflow/python/_objs/framework/fast_tensor_util.so/fast_tensor_util.o -headerpad_max_install_names -no-canonical-prefixes '-mmacosx-version-min=10.14')

which produces

...
INFO: From Compiling tensorflow/python/framework/fast_tensor_util.cpp:
In file included from bazel-out/darwin-opt/genfiles/tensorflow/python/framework/fast_tensor_util.cpp:581:
In file included from bazel-out/darwin-opt/genfiles/external/local_config_python/numpy_include/numpy/arrayobject.h:4:
In file included from bazel-out/darwin-opt/genfiles/external/local_config_python/numpy_include/numpy/ndarrayobject.h:12:
In file included from bazel-out/darwin-opt/genfiles/external/local_config_python/numpy_include/numpy/ndarraytypes.h:1822:
bazel-out/darwin-opt/genfiles/external/local_config_python/numpy_include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with "          "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it with " \
 ^
1 warning generated.
ERROR: /Users/phawkins/p/jax/tensorflow/tensorflow/python/BUILD:6155:1: Linking of rule '//tensorflow/python:framework/fast_tensor_util.so' failed (Exit 1)
Undefined symbols for architecture x86_64:
  "_PyBaseObject_Type", referenced from:
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_1AppendBFloat16ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_3AppendFloat16ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_5AppendFloat32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_7AppendFloat64ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_9AppendInt32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_11AppendUInt32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_13AppendInt64ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      ...
  "_PyBuffer_Release", referenced from:
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_1AppendBFloat16ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __Pyx__GetBufferAndValidate(bufferinfo*, _object*, __Pyx_TypeInfo*, int, int, int, __Pyx_BufFmt_StackElem*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_3AppendFloat16ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_5AppendFloat32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_7AppendFloat64ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_9AppendInt32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      __pyx_pw_10tensorflow_6python_9framework_16fast_tensor_util_11AppendUInt32ArrayToTensorProto(_object*, _object*, _object*) in fast_tensor_util.o
      ...
  "_PyBytes_FromStringAndSize", referenced from:
      _PyInit_fast_tensor_util in fast_tensor_util.o
  "_PyCFunction_NewEx", referenced from:
      _PyInit_fast_tensor_util in fast_tensor_util.o
  "_PyCFunction_Type", referenced from:
      __Pyx_PyObject_Append(_object*, _object*) in fast_tensor_util.o
      __Pyx_PyObject_CallOneArg(_object*, _object*) in fast_tensor_util.o
...

On bazel 0.22.0, the corresponding build action is:

  external/local_config_cc/cc_wrapper.sh -lc++ -fobjc-link-runtime -shared -o bazel-out/darwin-opt/bin/tensorflow/python/framework/fast_tensor_util.so -Wl,-force_load,bazel-out/darwin-opt/bin/tensorflow/python/_objs/framework/fast_tensor_util.so/fast_tensor_util.o -headerpad_max_install_names -no-canonical-prefixes -undefined dynamic_lookup

Note in particular the -undefined dynamic_lookup seems to have gone missing, and I think that's causing the link errors.

(Unrelated: the additional mmacosx-min-version is also surprising to me — is there a way to set that minimum version if targeting an older Mac OS than the build host?)

What operating system are you running Bazel on?

Mac OS 10.14.3

What's the output of bazel info release?

$ bazel info release
Starting local Bazel server and connecting to it...
release 0.23.0

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

$ git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
https://github.com/tensorflow/tensorflow.git
f1bbf1d83e88e044a10066ac7fe1975bf76c9c58
f1bbf1d83e88e044a10066ac7fe1975bf76c9c58

Have you found anything relevant by searching the web?

No.

P1 release blocker team-Rules-CPP bug

Most helpful comment

Yes, I am waiting for a fix for another release-blocker before creating the next candidate.

All 28 comments

I'm likely hitting the same bug on bazel 0.23.1 (see link above).

Problem persists at bazel HEAD (a24b5f389e54d631cfc91ae55b4001e01aa38931).

I can confirm that adding -undefined dynamic_lookup makes the build not fail (at this for @protobuf_archive//:python/google/protobuf/internal/_api_implementation.so), though of course that isn't much of a surprise. Trying a full TensorFlow build now. Here's the hack I used to add -undefined dynamic_lookup:

diff --git a/tools/osx/crosstool/cc_toolchain_config.bzl.tpl b/tools/osx/crosstool/cc_toolchain_config.bzl.tpl
index 0a1d49986d..603d0ee25b 100644
--- a/tools/osx/crosstool/cc_toolchain_config.bzl.tpl
+++ b/tools/osx/crosstool/cc_toolchain_config.bzl.tpl
@@ -5379,7 +5379,7 @@ def _impl(ctx):
                     ACTION_NAMES.objc_executable,
                     ACTION_NAMES.objcpp_executable,
                 ],
-                flag_groups = [flag_group(flags = ["-headerpad_max_install_names"])],
+                flag_groups = [flag_group(flags = ["-undefined", "dynamic_lookup", "-headerpad_max_install_names"])],
                 with_features = [with_feature_set(not_features = [
                     "bitcode_embedded",
                     "bitcode_embedded_markers",

The above hack to add -undefined dynamic_lookup seems to make TensorFlow work.

@hlopko This sounds like an issue with cpp rules and crosstool.

Definitely C++ rules bug. I think it was introduced in January (https://github.com/bazelbuild/bazel/commit/2d0e27e8bc7452758e8f50b51fd470efb8111e1f), will verify once I have my mac with me. I'm surprised this was not reported before (and not covered by any of our tests :( ).

I can approve the bug! I first thought it had something to do with Xcode.

So the workaround is to run Bazel with BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 environment variable. This bug only affects the toolchain that is used when Bazel detects Xcode, and this toolchain works for both C++ and ObjC. If you use the C++ only toolchai, it will use -undefined dynamic_lookup as expected.

Given there is a workaround that works somewhat (it doesn't when somebody has a build with ObjC), is the cherrypick into 0.24 or patch release of 0.23 warranted @katre?

Adding release blocker label to increase visibility, but feel free to remove and close this issue when you decide not to cherrypick. Thanks!

Is this error Tensorflow-specific, or for all macOS builds of c++/objc?

Nope, affects all c++ builds on mac that involve shared libraries with undefined symbols (see the test in https://github.com/bazelbuild/bazel/commit/314cf1f9e4b332955c4800b2451db4e926c3e092).

I will cherrypick this into 0.24.0 (once the other release-blocker is fixed).

To use BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 as an environment variable does not work for me.

So it still fails? Can you share the generated command (run Bazel with -s)?

Yes, theoretically I could, but it outputs a quite unreadable huge mess. I ran bazel build -s --config=opt //tensorflow/tools/pip_package:build_pip_package

@rudolfwilliam What if you just build @protobuf_archive//:python/google/protobuf/internal/_api_implementation.so as I did above?

@girving Thank you for the advice, but it does not work either! Maybe I am just missing something.

I just came across this bug when building the pip for Tensorflow Federated. First time using Bazel.

@girving How would I apply this option: -undefined dynamic_lookup In the bazel build? I realize that this is a real noob question.

@rudolfwilliam The point of building less is that you get a smaller -s output, not that it works.

Ok, yes you are right, the output gets a lot smaller. But I really do not know what all that means:

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Analysed target @protobuf_archive//:python/google/protobuf/internal/_api_implementation.so (15 packages loaded, 142 targets configured).
INFO: Found 1 target...
SUBCOMMAND: # @protobuf_archive//:python/google/protobuf/internal/_api_implementation.so [action 'Linking external/protobuf_archive/python/google/protobuf/internal/_api_implementation.so']
(cd /private/var/tmp/_bazel_klaus-rudolfkladny/e1bca1ffcaa1597d473e8195dd072d14/execroot/org_tensorflow && \
  exec env - \
    APPLE_SDK_PLATFORM=MacOSX \
    APPLE_SDK_VERSION_OVERRIDE=10.14 \
    PATH='/anaconda3/bin:/Users/klaus-rudolfkladny/.cargo/bin:/Library/Frameworks/Python.framework/Versions/3.6/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/usr/local/share/dotnet:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Applications/Xamarin Workbooks.app/Contents/SharedSupport/path-bin' \
    PYTHON_BIN_PATH=/anaconda3/bin/python \
    PYTHON_LIB_PATH=/anaconda3/lib/python3.6/site-packages \
    TF_DOWNLOAD_CLANG=1 \
    TF_NEED_CUDA=0 \
    TF_NEED_OPENCL_SYCL=0 \
    TF_NEED_ROCM=0 \
    XCODE_VERSION_OVERRIDE=10.1.0 \
  external/local_config_download_clang/cc_wrapper.sh -lc++ -fobjc-link-runtime -shared -o bazel-out/darwin-opt/bin/external/protobuf_archive/python/google/protobuf/internal/_api_implementation.so -Wl,-force_load,bazel-out/darwin-opt/bin/external/protobuf_archive/_objs/python/google/protobuf/internal/_api_implementation.so/api_implementation.o -headerpad_max_install_names -no-canonical-prefixes '-mmacosx-version-min=10.14')

After that I receive the usual error.

@rudolfwilliam that log suggests that you're still using the C++ + ObjC toolchain. What is your bazel version? Did you run bazel as BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 bazel build //@protobuf_archive//:python/google/protobuf/internal/_api_implementation.so?

TensorFlow is also currently broken on macos by this issue:
https://buildkite.com/bazel/tensorflow/builds/2410#5888e757-f673-4295-a3cf-922ae3d7026d

Is this going to be fixed in 0.24.0?

Yes, I am waiting for a fix for another release-blocker before creating the next candidate.

@hlopko I am using bazel version 0.23.2 which apparently is the most recent version. Yes, I set BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 as an environment variable.

Then I have no idea what's going on. I cannot reproduce this myself. 0.24 will be released today-ish, and it has a fix. So let's close this issue, and if your issue persists in 0.24, pls create a new issue. Thank you all!

Please follow #6968 for status on the release of 0.24.0. Note that a new release blocker has been identified.

Also note that you can (and should!) test your use case with the 0.24.0rc7 that was sent out a few days ago: https://releases.bazel.build/0.24.0/rc7/index.html

Was this page helpful?
0 / 5 - 0 ratings