I have been trying to set tensorflow_serving up on a docker container for the past day or so with no luck.
I followed the instructions very carefully
docker build --pull -t $USER/tensorflow-serving-devel -f Dockerfile.devel .
docker run -it $USER/tensorflow-serving-devel
git clone --recurse-submodules https://github.com/tensorflow/serving
cd serving/tensorflow
./configure
cd ..
bazel test tensorflow_serving/...
But always end up here:
[4,197 / 6,007] Still waiting for 200 jobs to complete:
Running (standalone):
Compiling external/org_tensorflow/tensorflow/core/kernels/argmax_op.cc\
[for host], 996 s
Compiling external/org_tensorflow/tensorflow/core/kernels/batch_matmul\
_op_real.cc [for host], 992 s
Compiling external/org_tensorflow/tensorflow/core/kernels/bias_op.cc [\
for host], 623 s
Compiling external/org_tensorflow/tensorflow/core/kernels/cwise_op_gre\
ater.cc [for host], 130 s
Compiling external/org_tensorflow/tensorflow/core/kernels/cwise_op_equ\
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/org_tensorflow/tensorflow/core/kernels/BUILD:1296:1: C++ compilation of rule '@org_tensorflow//tensorflow/core/kernels:cwise_op' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 ... (remaining 103 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.8/README.Bugs> for instructions.
INFO: Elapsed time: 2312.066s, Critical Path: 2111.50s
//tensorflow_serving/batching:basic_batch_scheduler_test NO STATUS
//tensorflow_serving/batching:batch_scheduler_retrier_test NO STATUS
//tensorflow_serving/batching:batch_scheduler_test NO STATUS
//tensorflow_serving/batching:batching_session_test NO STATUS
//tensorflow_serving/batching:shared_batch_scheduler_test NO STATUS
//tensorflow_serving/batching:streaming_batch_scheduler_test NO STATUS
//tensorflow_serving/batching/test_util:puppet_batch_scheduler_test NO STATUS
//tensorflow_serving/core:aspired_versions_manager_benchmark NO STATUS
//tensorflow_serving/core:aspired_versions_manager_builder_test NO STATUS
//tensorflow_serving/core:aspired_versions_manager_test NO STATUS
//tensorflow_serving/core:basic_manager_test NO STATUS
//tensorflow_serving/core:caching_manager_test NO STATUS
//tensorflow_serving/core:eager_load_policy_test NO STATUS
//tensorflow_serving/core:eager_unload_policy_test NO STATUS
//tensorflow_serving/core:loader_harness_test NO STATUS
//tensorflow_serving/core:manager_test NO STATUS
//tensorflow_serving/core:servable_data_test NO STATUS
//tensorflow_serving/core:servable_id_test NO STATUS
//tensorflow_serving/core:servable_state_monitor_test NO STATUS
//tensorflow_serving/core:simple_loader_test NO STATUS
//tensorflow_serving/core:source_adapter_test NO STATUS
//tensorflow_serving/core:source_router_test NO STATUS
//tensorflow_serving/core:static_manager_test NO STATUS
//tensorflow_serving/core:static_source_router_test NO STATUS
//tensorflow_serving/core:storage_path_test NO STATUS
//tensorflow_serving/model_servers:server_core_test NO STATUS
//tensorflow_serving/resources:resource_tracker_test NO STATUS
//tensorflow_serving/resources:resource_util_test NO STATUS
//tensorflow_serving/servables/hashmap:hashmap_source_adapter_test NO STATUS
//tensorflow_serving/servables/tensorflow:session_bundle_source_adapter_test NO STATUS
//tensorflow_serving/servables/tensorflow:simple_servers_test NO STATUS
//tensorflow_serving/sources/storage_path:file_system_storage_path_source_test NO STATUS
//tensorflow_serving/sources/storage_path:static_storage_path_source_test NO STATUS
//tensorflow_serving/util:any_ptr_test NO STATUS
//tensorflow_serving/util:cleanup_test NO STATUS
//tensorflow_serving/util:event_bus_test NO STATUS
//tensorflow_serving/util:fast_read_dynamic_ptr_benchmark NO STATUS
//tensorflow_serving/util:fast_read_dynamic_ptr_test NO STATUS
//tensorflow_serving/util:inline_executor_test NO STATUS
//tensorflow_serving/util:observer_test NO STATUS
//tensorflow_serving/util:optional_test NO STATUS
//tensorflow_serving/util:periodic_function_test NO STATUS
//tensorflow_serving/util:threadpool_executor_test NO STATUS
//tensorflow_serving/util:unique_ptr_with_deps_test NO STATUS
Executed 0 out of 45 tests: 1 fails to build and 44 were skipped.
I would really appreciate to know whats going on here and why the container compiled from the official github repo isn't working
This is an internal compiler error which usually suggests a memory limitation. See https://github.com/tensorflow/serving/issues/182 for example.
You can try bazel build --local_resources 5000,1.0,1.0 tensorflow_serving/... for example (or whatever constraints make sense for you machine/container).
I'm running into a similar issue. I've tried changing the version of gcc to see if that fixed the errors but I'm getting similar errors for both gcc 4.9.4 and 4.8.5
Here's the output of running bazel build --local_resources 5000,1.0,1.0 tensorflow_serving/... with gcc 4.9.4
ERROR: /serving/tensorflow_serving/servables/tensorflow/BUILD:176:1: Linking of rule '//tensorflow_serving/servables/tensorflow:simple_servers_test' failed: gcc failed: error executing command /usr/bin/gcc -o bazel-out/local-fastbuild/bin/tensorflow_serving/servables/tensorflow/simple_servers_test -pthread -Wl,-no-as-needed -B/usr/bin -B/usr/bin -pass-exit-codes '-Wl,--build-id=md5' ... (remaining 3 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
...
collect2: error: ld returned 1 exit status
INFO: Elapsed time: 100.527s, Critical Path: 81.51s
and when running using gcc (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5:
root@6ff2fb9120b0:/serving# bazel build --local_resources 5000,1.0,1.0 tensorflow_serving/...
INFO: Reading 'startup' options from /root/.bazelrc: --batch
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
WARNING: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/inception_model/WORKSPACE:1: Workspace name in /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/inception_model/WORKSPACE (@inception) does not match the name given in the repository's definition (@inception_model); this will cause a build error in future versions.
INFO: Found 176 targets...
ERROR: /serving/tensorflow_serving/util/BUILD:110:1: Linking of rule '//tensorflow_serving/util:fast_read_dynamic_ptr_benchmark' failed: gcc failed: error executing command /usr/bin/gcc -o bazel-out/local-fastbuild/bin/tensorflow_serving/util/fast_read_dynamic_ptr_benchmark '-Wl,-rpath,$ORIGIN/../../_solib_k8/' -Lbazel-out/local-fastbuild/bin/_solib_k8 -pthread ... (remaining 8 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
...
1> const, 0, Eigen::InnerStride<-1> > >::size() const'
collect2: error: ld returned 1 exit status
INFO: Elapsed time: 141.595s, Critical Path: 120.39s
docker stats shows a memory limit of 15.65GB
I was having this problem in OSX. I found that using --local_resources 5000,1.0,1.0 as well as increasing the container resources seems to have made everything work (it's still compiling and has been for > hour). In OSX you can increase the container resource through the docker application in preferences. Hope this helps...
@perdasilva did this fix the bazel build issue? (and how long did it run for in the end?)
My environment: OSX 10.11.6, Macbook Pro, 16 GB RAM.
I followed the docker installation instruction as per This GitHub - refer to the Jupiter Notebook - with a small tweak.
The tweak: I specified the bazel build step like this, and it worked with no error this time (took about 2 hour 40 minute to run - with lots of warning / non-error / non-critical messages popping up in terminal):
bazel build -c opt --jobs 1 --local_resources 2048,0.5,1.0 --verbose_failures tensorflow_serving/...
I can probably reduce the job run time more by tweaking this script a bit more. (also, I used bazel 0.5.4, instead of 0.5.1 - as it did not allow me to build with 0.5.1)
Command:
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
Output:
usage: bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
Flags:
--port=8500 int32 port to listen on
--enable_batching=false bool enable batching
--batching_parameters_file="" string If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
--model_config_file="" string If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)
--model_name="default" string name of model (ignored if --model_config_file flag is set
--model_base_path="" string path to export (ignored if --model_config_file flag is set, otherwise required)
--file_system_poll_wait_seconds=1 int32 interval in seconds between each poll of the file system for new model version
--tensorflow_session_parallelism=0 int64 Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--platform_config_file="" string If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)
Looks ok!
Most helpful comment
I was having this problem in OSX. I found that using --local_resources 5000,1.0,1.0 as well as increasing the container resources seems to have made everything work (it's still compiling and has been for > hour). In OSX you can increase the container resource through the docker application in preferences. Hope this helps...