I'm trying to build the TensorFlow with Intel MKL on macOS, which depends on a precompiled library Intel MKL. The building process failed, and it told dyld: Library not loaded: @rpath/libmklml.dylib, which is the mentioned third party library. I've set the rule according to the document as following shows:
# third_party/mkl/BUILD
cc_library(
name = "intel_binary_blob",
srcs = [
"@mkl//:libmklml",
"@mkl//:libiomp5",
],
hdrs = ["@mkl//:mkl_headers"],
strip_include_prefix = "/external/mkl/include",
visibility = ["//visibility:public"],
)
where mkl_headers, libmklml and libimp5 are respectively:
# third_party/mkl/mkl.BUILD
# Note: This file will be symbol linked to the downloaded position
filegroup(
name = "mkl_headers",
srcs = glob(["include/*"]),
visibility = ["//visibility:public"],
)
load("@org_tensorflow//tensorflow:tensorflow.bzl",
"if_darwin",
"if_linux_x86_64")
filegroup(
name = "libmklml",
srcs = if_darwin(["lib/libmklml.dylib"])
+ if_linux_x86_64(["lib/libmklml_intel.so"]),
visibility = ["//visibility:public"],
)
filegroup(
name = "libiomp5",
srcs = if_darwin(["lib/libiomp5.dylib"])
+ if_linux_x86_64(["lib/libiomp5.so"]),
visibility = ["//visibility:public"],
)
However, this building procedure can work on Linux. Considering the constraint of SIP on macOS, I've replaced the compiler with latest llvm installed with HomeBrew, and default Python too with the HomeBrew's version, but they seemed to be useless.
Error log:
dyld: Library not loaded: @rpath/libmklml.dylib
Referenced from: /private/var/tmp/_bazel_shilei/fe2ffd3189bc27eabf3757c054a2a694/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/gen_parsing_ops_py_wrappers_cc
Reason: image not found
The easiest way to reproduce the problem is to execute the following command on macOS:
bazel build --copt=-DINTEL_MKL_DNN --config=mkl -c opt //tensorflow/tools/pip_package:build_pip_package
macOS 10.13.13
bazel info release?release 0.11.0-homebrew
bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.brew install bazel
Nope.
ERROR: tensorflow/tensorflow/python/BUILD:1429:1: Executing genrule //tensorflow/python:parsing_ops_pygenrule failed (Aborted): bash failed: error executing command /bin/bash bazel-out/darwin-py3-opt/genfiles/tensorflow/python/parsing_ops_pygenrule.genrule_script.sh
dyld: Library not loaded: @rpath/libmklml.dylib
Referenced from: /private/var/tmp/_bazel_shilei/fe2ffd3189bc27eabf3757c054a2a694/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/gen_parsing_ops_py_wrappers_cc
Reason: image not found
bazel-out/darwin-py3-opt/genfiles/tensorflow/python/parsing_ops_pygenrule.genrule_script.sh: line 2: 21161 Abort trap: 6
@mhlopko : could you please confirm that this is an issue with the C++ rules, and if so, comment on priority / best person to work on it / ETA?
Hi @laszlocsomor, in order to verify whether there is issue in the rules, I built a toy project, which only depends on the external library. Here is the project files structure:
test_bazel
โโโ BUILD
โโโ WORKSPACE
โโโ main.cc
โโโ main.hpp
โโโ third_party
โย ย โโโ BUILD
โย ย โโโ mkl
โย ย โย ย โโโ BUILD
โย ย โย ย โโโ LICENSE
โย ย โย ย โโโ build_defs.bzl
โย ย โย ย โโโ mkl.BUILD
โย ย โโโ repo.bzl
โโโ workspace.bzl
The contents in those relative files are:
// main.cc
#include "main.hpp"
int main() {
func();
return 0;
}
// main.hpp
#include "i_malloc.h" // This is a function in the external precompiled library mkl
#include <iostream>
void func() {
std::cout << "Hello World from Bazel." << std::endl;
}
# BUILD
cc_binary(
name = "main",
srcs = [
"main.cc",
"main.hpp"
],
deps = [
"//third_party/mkl:intel_binary_blob"
],
)
where the //third_party/mkl:intel_binary_blob is same with those in first floor.
Now build the project via bazel build //:main, it is ok.
test_bazel > bazel build //:main
WARNING: /Users/shilei/Documents/codes/test_bazel/third_party/mkl/BUILD:15:12: in srcs attribute of cc_library rule //third_party/mkl:intel_binary_blob: please do not import '@mkl//:lib/libmklml.dylib' directly. You should either move the file to this package or depend on an appropriate rule there
WARNING: /Users/shilei/Documents/codes/test_bazel/third_party/mkl/BUILD:15:12: in srcs attribute of cc_library rule //third_party/mkl:intel_binary_blob: please do not import '@mkl//:lib/libiomp5.dylib' directly. You should either move the file to this package or depend on an appropriate rule there
WARNING: /Users/shilei/Documents/codes/test_bazel/third_party/mkl/BUILD:13:1: in linkstatic attribute of cc_library rule //third_party/mkl:intel_binary_blob: setting 'linkstatic=1' is recommended if there are no object files
INFO: Analysed target //:main (0 packages loaded).
INFO: Found 1 target...
Target //:main up-to-date:
bazel-bin/main
INFO: Elapsed time: 1.173s, Critical Path: 1.00s
INFO: Build completed successfully, 3 total actions
And when I tried to execute the generated binary, it told me the same error that library not loaded.
bazel-bin > ./main
dyld: Library not loaded: @rpath/libmklml.dylib
Referenced from: /private/var/tmp/_bazel_shilei/197de9994dc99e0d9bcfa4519b2bf57e/execroot/test_bazel/bazel-out/darwin-fastbuild/bin/./main
Reason: image not found
Abort trap: 6
However, there are some warinings I didn't notice before, please do not import '@mkl//:lib/libiomp5.dylib' directly. You should either move the file to this package or depend on an appropriate rule there. You could see in my rule that these two files are imported through the rule of filegroup:
filegroup(
name = "libmklml",
srcs = if_darwin(["lib/libmklml.dylib"])
+ if_linux_x86_64(["lib/libmklml_intel.so"]),
visibility = ["//visibility:public"],
)
filegroup(
name = "libiomp5",
srcs = if_darwin(["lib/libiomp5.dylib"])
+ if_linux_x86_64(["lib/libiomp5.so"]),
visibility = ["//visibility:public"],
)
And similarly, these rules can work on Linux, and the generated binary could be executed.
Besides, I've also tried to fix the warning by putting the cc_library rule into the file mkl.BUILD, and the attributes srcs and hdrs were directly the path of the files, like hdrs = glob(["include/*"]),. More precisely, the content of mkl.BUILD is:
# mkl.BUILD
filegroup(
name = "LICENSE",
srcs = [
"license.txt",
],
visibility = ["//visibility:public"],
)
cc_library(
name = "intel_binary_blob",
srcs = [
"lib/libmklml.dylib",
"lib/libiomp5.dylib"
],
hdrs = glob(["include/*"]),
strip_include_prefix = "include",
visibility = ["//visibility:public"],
)
# BUILD of this project
cc_binary(
name = "main",
srcs = [
"main.cc",
"main.hpp"
],
deps = [
"@mkl//:intel_binary_blob"
],
)
It would fail too. Therefore, I guess whether there are some issues in the rule cc_library with a third party _precompiled_ library on macOS?
@tianshilei1992 , thank you for such a detailed bug report!
I started looking at it, but something I noticed and wanted point out, in hopes it would happen to be the solution: have you looked at the cc_import rule to import the .so/.dylib instead of using a cc_library?
@tianshilei1992 , I have good news and bad news.
The somewhat good news is that this is a known issue. The bad news is, as @mhlopko tells me, that fixing it is going to take more time.
The other somewhat good news is that I reduced your example to an even smaller one: https://github.com/laszlocsomor/projects/commit/c802b5953c4ca9046d1023ba59b4e6bcfb384ce2
@laszlocsomor , thanks for your notification. It's really sad to hear that.
Thanks for your refinement. My original idea is to build the codes on different platforms with the same command, thus some conditional statements are involved.
Hi @laszlocsomor ,
I have encountered a similar situation with that mentioned above.
I am curious to know, if there would be a rough idea about when to expect this fixing by @mhlopko ? And if it might take quite a period, would there be a patch provided to almost solve this problem?
Thanks in advance!
@tianshilei1992 I expect that this is caused by how @rpath works on darwin, I think we just have to patch the binary to find dylibs properly. https://github.com/bazelbuild/bazel/blob/master/tools/cpp/osx_cc_wrapper.sh.tpl shows what we do already. But I cannot repro because unsupported option '-fopenmp', do you build with clang or gcc? Which version? How did you install on mac?
Also can you try with bazel built@HEAD? My latest fix to osx_cc_wrapper is not yet released (https://github.com/bazelbuild/bazel/commit/f98a7a2fedb3e714cef1038dcb85f83731150246) maybe you'll be lucky :) Also you can take a look at https://github.com/bazelbuild/bazel/issues/4594 and https://github.com/bazelbuild/bazel/issues/3811 and https://github.com/bazelbuild/bazel/issues/1576, might be duplicate of this one.
@laszlocsomor I'm not sure if your repo is a repro of this issue. The problem you observe is that you renamed libmylib-macos.dylib to smth else, and main-with-lib won't find it. If you rename it back, all is green. Or is that what's happening with libmkl too?
@mhlopko We build with clang, you could install on Mac through
brew install llvm and brew install libomp and then set the env CC=/path/to/new/clang and CXX=/path/to/new/clang++.
In the meantime, can you try with bazel build from HEAD? See "Compiling bazel" from https://www.bazel.build/contributing.html.
Hi @mhlopko , I've tried to build the latest bazel (HEAD->master) using bazel build //src:bazel, and use the generated binary in bazel-bin/src/bazel to build the tensorflow, but it failed to detect the target CPU with the error error: unknown target CPU 'armv7-a', actually I was building on the macOS. In addition to the tensorflow, I've also tried to build my toy project mentioned above with the generated binary through bazel/bazel-bin/src/bazel build //:main, unfortunately it failed too. The detailed error information is attached.
test_bazel > ../../bazel/bazel-bin/src/bazel build //:main --verbose_failures
INFO: Analysed target //:main (0 packages loaded).
INFO: Found 1 target...
ERROR: /Users/shilei/Documents/codes/test_bazel/BUILD:1:1: Linking of rule '//:main' failed (Exit 1): cc_wrapper.sh failed: error executing command
(cd /private/var/tmp/_bazel_shilei/197de9994dc99e0d9bcfa4519b2bf57e/execroot/test_bazel && \
exec env - \
PWD=/proc/self/cwd \
external/local_config_cc/cc_wrapper.sh -o bazel-out/darwin-fastbuild/bin/main -Wl,-rpath,@loader_path/_solib_darwin/_U_S_Sthird_Uparty_Smkl_Clibiomp5___Uexternal_Smkl_Slib -Wl,-rpath,@loader_path/_solib_darwin/_U_S_Sthird_Uparty_Smkl_Clibmklml___Uexternal_Smkl_Slib -Lbazel-out/darwin-fastbuild/bin/_solib_darwin/_U_S_Sthird_Uparty_Smkl_Clibiomp5___Uexternal_Smkl_Slib -Lbazel-out/darwin-fastbuild/bin/_solib_darwin/_U_S_Sthird_Uparty_Smkl_Clibmklml___Uexternal_Smkl_Slib bazel-out/darwin-fastbuild/bin/_objs/main/main.pic.o -liomp5 -lmklml -undefined dynamic_lookup -headerpad_max_install_names -lstdc++ -lm -Wl,-S)
Use --sandbox_debug to see verbose messages from the sandbox
clang-5.0: error: unable to execute command: Executable "ld" doesn't exist!
clang-5.0: error: linker command failed with exit code 1 (use -v to see invocation)
Target //:main failed to build
INFO: Elapsed time: 0.226s, Critical Path: 0.05s
FAILED: Build did NOT complete successfully
I don't know whether my bazel build is correct because I didn't bootstrap it before. According to https://docs.bazel.build/versions/master/install-compile-source.html#compiling-bazel-from-source-bootstrapping, I should run the compile.sh first with the release version and then build the development version. However, an error occurred when I tried to build the release version.
ERROR: /Users/shilei/Documents/bazel-0/src/main/java/com/google/devtools/build/lib/BUILD:1177:1: Executing genrule //src/main/java/com/google/devtools/build/lib:merge_licenses failed (Exit 126): bash failed: error executing command
(cd /tmp/bazel_ZrLMMUQs/out/execroot/io_bazel && \
exec env - \
PATH=/usr/local/opt/python/bin:/usr/local/Cellar/llvm/5.0.1/bin:/Users/shilei/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin \
/bin/bash bazel-out/darwin-opt/genfiles/src/main/java/com/google/devtools/build/lib/merge_licenses.genrule_script.sh)
xargs: src/main/java/com/google/devtools/build/lib/merge_licenses.sh: Permission denied
Target //src:bazel failed to build
INFO: Elapsed time: 272.270s, Critical Path: 66.37s
FAILED: Build did NOT complete successfully
This is super strange. Please verify following:
bazel clean --expunge && bazel build //tensorflow:foo
works with bazel 0.11.0, and fails with bazel@HEAD. That would be very strange because we have tensorflow on our CI and it's green currently. No local changes? No extra options passed to bazel?
The same info would be useful for your toy project. The error message says that your C++ toolchain is not autodetected properly, but I don't remember any big changes there. Did you by chance upgrade clang recently? If you run bazel clean --expunge && bazel build //:main with released bazel, does it work? Can you send me contents of working and failing $execroot/external/local_config_cc? Do you have xcode installed? If yes, does anything change when exporting BAZEL_USE_CPP_ONLY_TOOLCHAIN=1 environment variable?
Bootstrap procedure is only needed when you don't have bazel already
You can build Bazel from source without using an existing Bazel binary by doing the following:
If you have bazel, then bazel build //src:bazel is the way to build.
Thanks!
@mhlopko Yes, released bazel could build the toy project, only build, and the generated binary file failed to execute due to the @rpath problem.
Xcode was not installed. I used the clang installed via HomeBrew (brew install llvm) to support the OpenMP flags. The relative env CC and CXX have been set correctly.
CC=/usr/local/Cellar/llvm/5.0.1/bin/clang
CXX=/usr/local/Cellar/llvm/5.0.1/bin/clang++
When using my built bazel, the contents of $execroot/external/local_config_cc are:
test_bazel > ll /private/var/tmp/_bazel_shilei/197de9994dc99e0d9bcfa4519b2bf57e/external/local_config_cc/
total 136
drwxr-xr-x 9 shilei wheel 306 Mar 26 15:41 ./
drwxr-xr-x 8 shilei wheel 272 Mar 26 15:41 ../
-rwxr-xr-x 1 shilei wheel 3085 Mar 26 15:41 BUILD*
-rwxr-xr-x 1 shilei wheel 29272 Mar 26 15:41 CROSSTOOL*
-rw-r--r-- 1 shilei wheel 111 Mar 26 15:41 WORKSPACE
-rwxr-xr-x 1 shilei wheel 3283 Mar 26 15:41 cc_wrapper.sh*
lrwxr-xr-x 1 shilei wheel 114 Mar 26 15:41 dummy_toolchain.bzl@ -> /private/var/tmp/_bazel_shilei/197de9994dc99e0d9bcfa4519b2bf57e/external/bazel_tools/tools/cpp/dummy_toolchain.bzl
drwxr-xr-x 3 shilei wheel 102 Mar 26 15:41 tools/
-rwxr-xr-x 1 shilei wheel 19276 Mar 26 15:41 xcode-locator-bin*
UPDATE: If I unset the two env, the toy project could be built with my built bazel and the generated binary file could be run correctly. I guess this could at least show that the @rpath problem has been fixed.
As for the TensorFlow build, the command $PATH/TO/BUILT/bazel build --copt=-DINTEL_MKL_DNN --config=mkl -c opt //tensorflow/tools/pip_package:build_pip_package was executed to build the MKL-DNN version. The full logs are:
(tf_mkldnn) dl_framework-intel_tensorflow > /Users/shilei/Documents/bazel/bazel-bin/src/bazel build --copt=-DINTEL_MKL_DNN --config=mkl -c opt //tensorflow/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
..............
DEBUG: /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/tensorflow/workspace.bzl:54:5:
Current Bazel is not a release version, cannot check for compatibility.
DEBUG: /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/tensorflow/workspace.bzl:55:5: Make sure that you are running at least Bazel 0.5.4.
WARNING: /private/var/tmp/_bazel_shilei/fe2ffd3189bc27eabf3757c054a2a694/external/protobuf_archive/WORKSPACE:1: Workspace name in /private/var/tmp/_bazel_shilei/fe2ffd3189bc27eabf3757c054a2a694/external/protobuf_archive/WORKSPACE (@com_google_protobuf) does not match the name given in the repository's definition (@protobuf_archive); this will cause a build error in future versions
WARNING: /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/third_party/mkl/BUILD:15:12: in srcs attribute of cc_library rule //third_party/mkl:intel_binary_blob: please do not import '@mkl//:lib/libmklml.dylib' directly. You should either move the file to this package or depend on an appropriate rule there
WARNING: /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/third_party/mkl/BUILD:15:12: in srcs attribute of cc_library rule //third_party/mkl:intel_binary_blob: please do not import '@mkl//:lib/libiomp5.dylib' directly. You should either move the file to this package or depend on an appropriate rule there
WARNING: /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/third_party/mkl/BUILD:13:1: in linkstatic attribute of cc_library rule //third_party/mkl:intel_binary_blob: setting 'linkstatic=1' is recommended if there are no object files
WARNING: /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/tensorflow/core/BUILD:1865:1: in includes attribute of cc_library rule //tensorflow/core:framework_headers_lib: '../../external/nsync/public' resolves to 'external/nsync/public' not below the relative path of its package 'tensorflow/core'. This will be an error in the future. Since this rule was created by the macro 'cc_header_only_library', the error might have been caused by the macro implementation in /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/tensorflow/tensorflow.bzl:1163:30
WARNING: /private/var/tmp/_bazel_shilei/fe2ffd3189bc27eabf3757c054a2a694/external/grpc/WORKSPACE:1: Workspace name in /private/var/tmp/_bazel_shilei/fe2ffd3189bc27eabf3757c054a2a694/external/grpc/WORKSPACE (@com_github_grpc_grpc) does not match the name given in the repository's definition (@grpc); this will cause a build error in future versions
WARNING: /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:exporter': No longer supported. Switch to SavedModel immediately.
WARNING: /Users/shilei/Documents/codes/dl_frameworks/dl_framework-intel_tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:gc': No longer supported. Switch to SavedModel immediately.
INFO: Analysed target //tensorflow/tools/pip_package:build_pip_package (259 packages loaded).
INFO: Found 1 target...
ERROR: /private/var/tmp/_bazel_shilei/fe2ffd3189bc27eabf3757c054a2a694/external/jpeg/BUILD:269:1: C++ compilation of rule '@jpeg//:simd_armv7a' failed (Exit 1)
error: unknown target CPU 'armv7-a'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 68.472s, Critical Path: 3.18s
FAILED: Build did NOT complete successfully
Now the interesting thing is why the target CPU was detected as armv7-a in the last build.
Hi @laszlocsomor @mhlopko
I noticed that the version of 0.12 has been released, and I've taken a try. The good news is that bazel 0.12 seems to fix the issue of @rpath on macOS. Our project could be built with the latest release.
Hi @tianshilei1992
Did you figure out why the target architecture was detected as armv7-a? I am running into a similar issue (on bazel 0.12) and scratching my head. Feel free to contact me elsewhere if you want to continue this conversation outside of the issue...
I also had the same problem, I think it's bazel's bug in version 0.12.0. I solved the problem by installing version 0.11.1 here.
Hi @FrederickGeek8 @csarron , I don't know how you installed the bazel, but the false detection of platform didn't occur in the released version. I installed it with HomeBrew through brew install bazel. I guess you maybe built from sources because I found that problem when I built from sources before the 0.12 released.
@tianshilei1992 I installed it from homebrew by brew upgrade bazel and it caused the problem. Anyway, if the release version works, it's not a big issue.
Most helpful comment
Hi @laszlocsomor @mhlopko
I noticed that the version of 0.12 has been released, and I've taken a try. The good news is that bazel 0.12 seems to fix the issue of @rpath on macOS. Our project could be built with the latest release.