We are having a few issues building GPU on the latest HEAD. Initially got past some local_cuda_config errors by pinning to the latest head last yesterday (https://github.com/tensorflow/tensorflow/commit/a5f8f42). But we are now receiving this error when attempting to build:
Basing on CUDA 7.5, cuDNN 5.1.3.
Bazel 0.3.1
# env
CUDA_HOME=/usr/local/cuda
CUDA_PATH=/usr/local/cuda-7.5
CUDA_TOOLKIT_PATH=/usr/local/cuda-7.5
CUDA_VERSION=7.5
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-7.5/targets/x86_64-linux/lib:/usr/lib/x86_64-linux-gnu/
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
TF_CUDA_COMPUTE_CAPABILITIES=3.0
TF_CUDA_VERSION=7.5
TF_CUDNN_VERSION=5.1.3
TF_NEED_CUDA=1
bazel build -c opt --config=cuda --verbose_failures tensorflow_serving/...
INFO: Reading 'startup' options from /root/.bazelrc: --batch
____Loading package: tensorflow_serving/servables/tensorflow/testdata
____Loading...
Unhandled exception thrown during build; message: Unrecoverable error while evaluating node 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@8db88dd6' (requested by nodes 'CONFIGURATION_COLLECTION:com.google.devtools.build.lib.skyframe.ConfigurationCollectionValue$ConfigurationCollectionKey@54fe1e0', 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@e90862b7', 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@ff24d609')
____Elapsed time: 1.529s
java.lang.RuntimeException: Unrecoverable error while evaluating node 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@8db88dd6' (requested by nodes 'CONFIGURATION_COLLECTION:com.google.devtools.build.lib.skyframe.ConfigurationCollectionValue$ConfigurationCollectionKey@54fe1e0', 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@e90862b7', 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@ff24d609')
at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:1070)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:474)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: com.google.devtools.build.lib.packages.NoSuchTargetException: no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool': target 'crosstool' not declared in package 'third_party/gpus/crosstool' defined by /root/.cache/bazel/_bazel_root/1b03e6b0b95a8320062041ca0659e00e/external/org_tensorflow/third_party/gpus/crosstool/BUILD
at com.google.devtools.build.lib.rules.cpp.CrosstoolConfigurationLoader.getCrosstoolProtofromBuildFile(CrosstoolConfigurationLoader.java:179)
at com.google.devtools.build.lib.rules.cpp.CrosstoolConfigurationLoader.findCrosstoolConfiguration(CrosstoolConfigurationLoader.java:239)
at com.google.devtools.build.lib.rules.cpp.CrosstoolConfigurationLoader.readCrosstool(CrosstoolConfigurationLoader.java:281)
at com.google.devtools.build.lib.rules.cpp.CppConfigurationLoader.createParameters(CppConfigurationLoader.java:128)
at com.google.devtools.build.lib.rules.cpp.CppConfigurationLoader.create(CppConfigurationLoader.java:73)
at com.google.devtools.build.lib.rules.cpp.CppConfigurationLoader.create(CppConfigurationLoader.java:48)
at com.google.devtools.build.lib.skyframe.ConfigurationFragmentFunction.compute(ConfigurationFragmentFunction.java:78)
at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:1016)
... 4 more
Caused by: com.google.devtools.build.lib.packages.NoSuchTargetException: no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool': target 'crosstool' not declared in package 'third_party/gpus/crosstool' defined by /root/.cache/bazel/_bazel_root/1b03e6b0b95a8320062041ca0659e00e/external/org_tensorflow/third_party/gpus/crosstool/BUILD
at com.google.devtools.build.lib.packages.Package.makeNoSuchTargetException(Package.java:559)
at com.google.devtools.build.lib.packages.Package.getTarget(Package.java:543)
at com.google.devtools.build.lib.skyframe.SkyframePackageLoaderWithValueEnvironment.getTarget(SkyframePackageLoaderWithValueEnvironment.java:71)
at com.google.devtools.build.lib.skyframe.ConfigurationFragmentFunction$ConfigurationBuilderEnvironment.getTarget(ConfigurationFragmentFunction.java:193)
at com.google.devtools.build.lib.rules.cpp.CrosstoolConfigurationLoader.getCrosstoolProtofromBuildFile(CrosstoolConfigurationLoader.java:177)
... 11 more
java.lang.RuntimeException: Unrecoverable error while evaluating node 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@8db88dd6' (requested by nodes 'CONFIGURATION_COLLECTION:com.google.devtools.build.lib.skyframe.ConfigurationCollectionValue$ConfigurationCollectionKey@54fe1e0', 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@e90862b7', 'CONFIGURATION_FRAGMENT:com.google.devtools.build.lib.skyframe.ConfigurationFragmentValue$ConfigurationFragmentKey@ff24d609')
at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:1070)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:474)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: com.google.devtools.build.lib.packages.NoSuchTargetException: no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool': target 'crosstool' not declared in package 'third_party/gpus/crosstool' defined by /root/.cache/bazel/_bazel_root/1b03e6b0b95a8320062041ca0659e00e/external/org_tensorflow/third_party/gpus/crosstool/BUILD
at com.google.devtools.build.lib.rules.cpp.CrosstoolConfigurationLoader.getCrosstoolProtofromBuildFile(CrosstoolConfigurationLoader.java:179)
at com.google.devtools.build.lib.rules.cpp.CrosstoolConfigurationLoader.findCrosstoolConfiguration(CrosstoolConfigurationLoader.java:239)
at com.google.devtools.build.lib.rules.cpp.CrosstoolConfigurationLoader.readCrosstool(CrosstoolConfigurationLoader.java:281)
at com.google.devtools.build.lib.rules.cpp.CppConfigurationLoader.createParameters(CppConfigurationLoader.java:128)
at com.google.devtools.build.lib.rules.cpp.CppConfigurationLoader.create(CppConfigurationLoader.java:73)
at com.google.devtools.build.lib.rules.cpp.CppConfigurationLoader.create(CppConfigurationLoader.java:48)
at com.google.devtools.build.lib.skyframe.ConfigurationFragmentFunction.compute(ConfigurationFragmentFunction.java:78)
at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:1016)
... 4 more
Caused by: com.google.devtools.build.lib.packages.NoSuchTargetException: no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool': target 'crosstool' not declared in package 'third_party/gpus/crosstool' defined by /root/.cache/bazel/_bazel_root/1b03e6b0b95a8320062041ca0659e00e/external/org_tensorflow/third_party/gpus/crosstool/BUILD
at com.google.devtools.build.lib.packages.Package.makeNoSuchTargetException(Package.java:559)
at com.google.devtools.build.lib.packages.Package.getTarget(Package.java:543)
at com.google.devtools.build.lib.skyframe.SkyframePackageLoaderWithValueEnvironment.getTarget(SkyframePackageLoaderWithValueEnvironment.java:71)
at com.google.devtools.build.lib.skyframe.ConfigurationFragmentFunction$ConfigurationBuilderEnvironment.getTarget(ConfigurationFragmentFunction.java:193)
at com.google.devtools.build.lib.rules.cpp.CrosstoolConfigurationLoader.getCrosstoolProtofromBuildFile(CrosstoolConfigurationLoader.java:177)
... 11 more
It seems like the crosstool/BUILD file is empty and the CROSSTOOL.tpl hasn't been converted to a CROSSTOOL file.
$ ls tensorflow/third_party/gpus/crosstool/
BUILD BUILD.tpl CROSSTOOL.tpl LICENSE clang
$ cat tensorflow/third_party/gpus/crosstool/BUILD
$
Any insight or place to poke around would be awesome, thanks!
As a note we are building inside of a Docker container based upon the one in the repository. The only difference is the from: is FROM nvidia/cuda:7.5-cudnn5-devel and we explicitly define the ENV variables.
I also have been able to build tensorflow from within serving without issue:
.../serving/tensorflow $ bazel build -c opt --config=cuda tensorflow/...
I'm having the exact same issues with the same versions. Also basing my image on nvidia/cuda:7.5-cudnn5-devel.
edit: issue is also present when using cuDNN 5.0.5
I have the same issue, not inside a Docker container, in a regular build with:
bazel build -c opt --config=cuda tensorflow_serving/...
Building Tensorflow standalone with GPU support works fine, but not as part of a tensorflow serving build. (Cuda 8.0 and Cudnn 5.1.5)
+cc @damienmg since the exception might not be expected.
I'll try to repro this to debug.
Indeed I don't know what's happening but we should not see a stacktrace. @davidzchen, can you file a bug report to bazel with your findings?
@bjelkenhed , is there any way to workaround this? I got the same issue.
@bjelkenhed @wangyongliang I tried using a few different commits and also tried manually editing the tensorflow/third_party/gpus/crosstool/BUILD files based on what was previously generated. I couldn't get anything that to work, but this is also the first I've used Bazel. I'd also appreciate a workaround if anyone can think of one.
@wangyongliang @KellenSunderland We have not been able to find any workaround for this problem. If anyone is able to build tensorflow serving with GPU support at the moment it would be very good to know.
@wangyongliang @KellenSunderland We have not been able to find any workaround for this problem. If anyone is able to build tensorflow serving with GPU support at the moment it would be very good to know.
I had the the same issue. Firstly, to check your TF Cuda config is correct: cd tensorflow && bazel query 'kind(rule, @local_config_cuda//...)' --output label_kind should produce:
config_setting rule @local_config_cuda//cuda:using_nvcc
config_setting rule @local_config_cuda//cuda:using_clang_opt
config_setting rule @local_config_cuda//cuda:using_clang
config_setting rule @local_config_cuda//cuda:darwin
cc_library rule @local_config_cuda//cuda:cupti_headers
cc_library rule @local_config_cuda//cuda:cupti_dsos
cc_library rule @local_config_cuda//cuda:cudart_static
cc_library rule @local_config_cuda//cuda:cuda
cc_library rule @local_config_cuda//cuda:curand
cc_library rule @local_config_cuda//cuda:cufft
cc_library rule @local_config_cuda//cuda:cudnn
cc_library rule @local_config_cuda//cuda:cudart
cc_library rule @local_config_cuda//cuda:cuda_headers
cc_library rule @local_config_cuda//cuda:cublas
cc_toolchain_suite rule @local_config_cuda//crosstool:toolchain
cc_toolchain rule @local_config_cuda//crosstool:cc-compiler-local
cc_toolchain rule @local_config_cuda//crosstool:cc-compiler-darwin
filegroup rule @local_config_cuda//crosstool:empty
running the same command from the tensorflow serving repository root will fail (with errors) for 2 reasons:
1. the crosstool in tools/bazel.rc is invalid (AFAIK). change @org_tensorflow//third_party/gpus/crosstool to @local_config_cuda//crosstool:toolchain.
2. the cuda_configure repository rule will fail (haven't looked in to why exactly), but essentially an bazel clean --expunge && export TF_NEED_CUDA=1 will fix this.
Then, run bazel query 'kind(rule, @local_config_cuda//...)' again and all is well (for me at least); the cuda tool chain should be created in $(bazel info output_base)/external/local_config_cuda/cuda
@rayglover-ibm Ray, I got the following error after the change of bazel.rc
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/local_config_cuda/cuda/BUILD:52:1: no such target '//tensorflow:darwin': target 'darwin' not declared in package 'tensorflow' defined by /serving/tensorflow/BUILD and referenced by '@local_config_cuda//cuda:cudart_static'.
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/local_config_cuda/cuda/BUILD:52:1: no such target '//tensorflow:darwin': target 'darwin' not declared in package 'tensorflow' defined by /serving/tensorflow/BUILD and referenced by '@local_config_cuda//cuda:cudart_static'.
Your thought? Thanks a lot
@denverdino Try updating the Tensorflow submodule to latest, I think that darwin config condition was recently fixed here by @davidzchen
@rayglover-ibm It works like a charm! Thanks!
Thanks for looking into this, @rayglover-ibm!
FYI, the fix for the Darwin configuration condition will be merged into tensorflow upstream in tensorflow/tensorflow#4676. I am currently working on fixing the references in serving to targets in the @org_tensorflow workspace and will submit a PR for that shortly.
FYI, if anyone looking for a solution for the issue op posted, @rayglover-ibm solution works for this problem. I did as he suggest in step 1 and step 2 and this time build succeed.
I had the same problem but then i reinstalled (downgraded) bazel from 4.2 to 3.2 and with @rayglover-ibm's suggested fix it started compiling. Looks like bazel compiler version is important too.
Hello, I tested @rayglover-ibm 's solution, and it works well for the latest master of tf-serving. I am wondering why this fix had not been merged to the repo of tf-serving? Is there any plan to support GPU officially? Thanks!
Closing - please see the latest Docker examples for bringing up a build environment.
@gautamvasudevan That docker requires nvidia-docker to use GPU sometime we just want to run the model_server as it is without worrying about docker. Would you please help to get the proper instruction on building GPU based TF serving model server from source.
Most helpful comment
I had the the same issue. Firstly, to check your TF Cuda config is correct:
cd tensorflow && bazel query 'kind(rule, @local_config_cuda//...)' --output label_kindshould produce:running the same command from the tensorflow serving repository root will fail (with errors) for 2 reasons:
1. the crosstool in
tools/bazel.rcis invalid (AFAIK). change@org_tensorflow//third_party/gpus/crosstoolto@local_config_cuda//crosstool:toolchain.2. the
cuda_configurerepository rule will fail (haven't looked in to why exactly), but essentially anbazel clean --expunge && export TF_NEED_CUDA=1will fix this.Then, run
bazel query 'kind(rule, @local_config_cuda//...)'again and all is well (for me at least); the cuda tool chain should be created in$(bazel info output_base)/external/local_config_cuda/cuda