I am trying to compile tensorflow_model_server from master.
Error:
ERROR: no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool': target 'crosstool' not declared in package 'third_party/gpus/crosstool' defined by /home/username/.cache/bazel/_bazel_jorge/2fd988219920b10e9ede8d3b5720f3d2/external/org_tensorflow/third_party/gpus/crosstool/BUILD.
Steps to reproduce:
cd tensorflow
./configure
cd ..
bazel build -c opt --config=cuda --genrule_strategy=standalone --spawn_strategy=standalone --verbose_failures //tensorflow_serving/model_servers:tensorflow_model_server
As a side note when I try to compile tensorflow_model_server from an external project it works but doesn't have support for the GPU.
The solutions of https://github.com/tensorflow/serving/issues/225 don't work
EDITED: finally I made this script in order to compile it with CUDA support: https://gist.github.com/jorgemf/0f2025a45e1568663f4c20551a5881f1
I am facing same problem
I was able to make it compile. Here is an script to do it so: https://gist.github.com/jorgemf/0f2025a45e1568663f4c20551a5881f1
You only need to modify the variables and the exports with the values you want and everything works.
It works because:
./configure doesn't export and they are not visible when compiling TensorFlow Servingserving/tools/bazel.rc the you have to replace @org_tensorflow//third_party/gpus/crosstool with @local_config_cuda//crosstool:toolchainserving/tensorflow/third_party/gpus/cuda_configure.bzl when it is available-c opt --config=cuda --spawn_strategy=standalone as options to compile //tensorflow_serving/model_servers:tensorflow_model_server but it should work for other targets@jorgemf works for me
@jorgemf I got successful compile with your script, but it seems doesn't have support for the GPU.
after added with tf.device("/gpu") to mnist_saved_model.py, I got follow error message:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device to node 'Variable_1': Could not satisfy explicit device specification '/device:GPU:*' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Steps to reproduce:
git clone --recurse-submodules https://github.com/tensorflow/serving
cd serving
export TF_NEED_CUDA=1
export TF_NEED_GCP=1
export TF_NEED_JEMALLOC=1
export TF_NEED_HDFS=0
export TF_NEED_OPENCL=0
export TF_ENABLE_XLA=0
export TF_CUDA_VERSION=8.0
export TF_CUDNN_VERSION=5
export TF_CUDA_COMPUTE_CAPABILITIES="3.5,5.2,6.1"
export CUDA_TOOLKIT_PATH="/usr/local/cuda"
export CUDNN_INSTALL_PATH="/usr/local/cuda"
export GCC_HOST_COMPILER_PATH="/usr/bin/gcc"
export PYTHON_BIN_PATH="/home/opt/anaconda/envs/py2/bin/python"
export CC_OPT_FLAGS="-march=native"
export PYTHON_LIB_PATH="/home/opt/anaconda/envs/py2/lib/python2.7/site-packages"
cd tensorflow
./configure
cd ..
# Ref: https://github.com/tensorflow/serving/issues/318#issuecomment-283498443
sed -i.bak 's/@org_tensorflow\/\/third_party\/gpus\/crosstool/@local_config_cuda\/\/crosstool:toolchain/g' tools/bazel.rc
bazel build -c opt --config=cuda --spawn_strategy=standalone //tensorflow_serving/model_servers:tensorflow_model_server
# add `with tf.device("/gpu")` to `mnist_saved_model.py`
sed -i '138s/.*/with tf.device("\/gpu"):/' tensorflow_serving/example/mnist_saved_model.py
sed -i '139s/.*/ if __name__ == "__main__":/' tensorflow_serving/example/mnist_saved_model.py
sed -i '140s/.*/ tf.app.run()/' tensorflow_serving/example/mnist_saved_model.py
bazel build //tensorflow_serving/example:mnist_saved_model
bazel-bin/tensorflow_serving/example/mnist_saved_model /tmp/mnist_model
with tf.device("/gpu") is not a device, it should be with tf.device("/gpu:0"). In order to check if it has been compiled with gpu support you only need to execute nvidia-smi when the program is launched, if it has support for GPU it will use all the GPU memory even before creating any graph.
Thanks @jorgemf for the compile script. It worked for me too, and it was a lot simpler than my solution at #349 :). However, it doesn't seem to be using the GPU for me either. tensorflow_model_server does not appear as a process listed by nvidia-smi.
My saved model does not explicitly request GPU allocation, but it should use the GPU by default, if available. And, as you say, tf-serving should allocate most GPU RAM on launch, and it clearly doesn't.
This used to work with an older version of serving, so I guess it's related to recent changes.
@vtablan have you set $TENSORFLOW_SERVING_REPO_PATH? Otherwise it might not work.
I have just tested and it doesn't compile, I am not sure whether is my scripts fault or due to some internal change. Anyway I cannot review the scrip for every commit. Here is the error:
ERROR: 417f6219aa9e6aa8dd92c15ce8c78038/external/tf_serving/tensorflow_serving/batching/BUILD:141:1: C++ compilation of rule '@tf_serving//tensorflow_serving/batching:batching_session' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 174 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
Try a version of 15 days ago, the same I used to compile it. It should work. In my experience TensorFlow Serving is in development and broken a lot of times.
Interesting... I had cloned my repository just before posting my previous comment, and it compiled fine for me, with your script. I had edited your script to hardcode the location of the repository, and the use of pyhton3, and associated python path. Other than that, I have made no changes to your script.
What I was saying above is that the tensorflow_model_server binary produced by the compilation does not seem to be using the GPU. I can work with that (for my model, CPU is fast enough at application time), so I don't think I'll spend more time on this. If I find a few minutes, I might try compiling the r0.5.1 version, which is meant to be a 'release', and see if that fares any better.
I'll post an update if that's successful.
@jorgemf Success - I now have a tf-model_server that does indeed use the GPU. To get there I used:
Thanks again for providing the script!
Master compiles now for me with GPU support. Closing the issue
I wonder if the script is the new version... below is the error: @jorgemf
compile_tensorflow_serving.sh: 35: compile_tensorflow_serving.sh: function: not found
/usr/local/lib/python2.7/dist-packages,/usr/lib/python2.7/dist-packages
compile_tensorflow_serving.sh: 63: compile_tensorflow_serving.sh: Syntax error: "}" unexpected
@sailor88128 It might be you are using another shell. It works for linux only
I just use it in nvidia-docker ubuntu 16. Oh, I use cudnn 6.0.21, but I have change 7.0 to 6.0 in the script, is that the problem?
@sailor88128 Yes it is. The script is very specific because it doesn't work. You have to use the versions in the script and the correct bazel version that I do not remember now. Otherwise it wont compile
Oh, got it. Thanks a lot.
Is there any image (dockerfile) you used, with ubuntu 16 + tensorflow-gpu + cuda8.0 + tf serving?
@jorgemf
@sailor88128, no. I used my local machine. You can try the official images of tensorflow: https://hub.docker.com/r/tensorflow/tensorflow/tags/
Most helpful comment
I was able to make it compile. Here is an script to do it so: https://gist.github.com/jorgemf/0f2025a45e1568663f4c20551a5881f1
You only need to modify the variables and the exports with the values you want and everything works.
It works because:
./configuredoesn't export and they are not visible when compiling TensorFlow Servingserving/tools/bazel.rcthe you have to replace@org_tensorflow//third_party/gpus/crosstoolwith@local_config_cuda//crosstool:toolchainserving/tensorflow/third_party/gpus/cuda_configure.bzlwhen it is available-c opt --config=cuda --spawn_strategy=standaloneas options to compile//tensorflow_serving/model_servers:tensorflow_model_serverbut it should work for other targets