Serving: Tensorflow Serving docker failed compilation

Created on 28 Oct 2016  路  4Comments  路  Source: tensorflow/serving

I have been trying to set tensorflow_serving up on a docker container for the past day or so with no luck.

I followed the instructions very carefully

docker build --pull -t $USER/tensorflow-serving-devel -f Dockerfile.devel .

docker run -it $USER/tensorflow-serving-devel

git clone --recurse-submodules https://github.com/tensorflow/serving
cd serving/tensorflow
./configure
cd ..
bazel test tensorflow_serving/...

But always end up here:

[4,197 / 6,007] Still waiting for 200 jobs to complete:
      Running (standalone):
        Compiling external/org_tensorflow/tensorflow/core/kernels/argmax_op.cc\
 [for host], 996 s
        Compiling external/org_tensorflow/tensorflow/core/kernels/batch_matmul\
_op_real.cc [for host], 992 s
        Compiling external/org_tensorflow/tensorflow/core/kernels/bias_op.cc [\
for host], 623 s
        Compiling external/org_tensorflow/tensorflow/core/kernels/cwise_op_gre\
ater.cc [for host], 130 s
        Compiling external/org_tensorflow/tensorflow/core/kernels/cwise_op_equ\
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/org_tensorflow/tensorflow/core/kernels/BUILD:1296:1: C++ compilation of rule '@org_tensorflow//tensorflow/core/kernels:cwise_op' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 ... (remaining 103 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.8/README.Bugs> for instructions.
INFO: Elapsed time: 2312.066s, Critical Path: 2111.50s
//tensorflow_serving/batching:basic_batch_scheduler_test              NO STATUS
//tensorflow_serving/batching:batch_scheduler_retrier_test            NO STATUS
//tensorflow_serving/batching:batch_scheduler_test                    NO STATUS
//tensorflow_serving/batching:batching_session_test                   NO STATUS
//tensorflow_serving/batching:shared_batch_scheduler_test             NO STATUS
//tensorflow_serving/batching:streaming_batch_scheduler_test          NO STATUS
//tensorflow_serving/batching/test_util:puppet_batch_scheduler_test   NO STATUS
//tensorflow_serving/core:aspired_versions_manager_benchmark          NO STATUS
//tensorflow_serving/core:aspired_versions_manager_builder_test       NO STATUS
//tensorflow_serving/core:aspired_versions_manager_test               NO STATUS
//tensorflow_serving/core:basic_manager_test                          NO STATUS
//tensorflow_serving/core:caching_manager_test                        NO STATUS
//tensorflow_serving/core:eager_load_policy_test                      NO STATUS
//tensorflow_serving/core:eager_unload_policy_test                    NO STATUS
//tensorflow_serving/core:loader_harness_test                         NO STATUS
//tensorflow_serving/core:manager_test                                NO STATUS
//tensorflow_serving/core:servable_data_test                          NO STATUS
//tensorflow_serving/core:servable_id_test                            NO STATUS
//tensorflow_serving/core:servable_state_monitor_test                 NO STATUS
//tensorflow_serving/core:simple_loader_test                          NO STATUS
//tensorflow_serving/core:source_adapter_test                         NO STATUS
//tensorflow_serving/core:source_router_test                          NO STATUS
//tensorflow_serving/core:static_manager_test                         NO STATUS
//tensorflow_serving/core:static_source_router_test                   NO STATUS
//tensorflow_serving/core:storage_path_test                           NO STATUS
//tensorflow_serving/model_servers:server_core_test                   NO STATUS
//tensorflow_serving/resources:resource_tracker_test                  NO STATUS
//tensorflow_serving/resources:resource_util_test                     NO STATUS
//tensorflow_serving/servables/hashmap:hashmap_source_adapter_test    NO STATUS
//tensorflow_serving/servables/tensorflow:session_bundle_source_adapter_test NO STATUS
//tensorflow_serving/servables/tensorflow:simple_servers_test         NO STATUS
//tensorflow_serving/sources/storage_path:file_system_storage_path_source_test NO STATUS
//tensorflow_serving/sources/storage_path:static_storage_path_source_test NO STATUS
//tensorflow_serving/util:any_ptr_test                                NO STATUS
//tensorflow_serving/util:cleanup_test                                NO STATUS
//tensorflow_serving/util:event_bus_test                              NO STATUS
//tensorflow_serving/util:fast_read_dynamic_ptr_benchmark             NO STATUS
//tensorflow_serving/util:fast_read_dynamic_ptr_test                  NO STATUS
//tensorflow_serving/util:inline_executor_test                        NO STATUS
//tensorflow_serving/util:observer_test                               NO STATUS
//tensorflow_serving/util:optional_test                               NO STATUS
//tensorflow_serving/util:periodic_function_test                      NO STATUS
//tensorflow_serving/util:threadpool_executor_test                    NO STATUS
//tensorflow_serving/util:unique_ptr_with_deps_test                   NO STATUS

Executed 0 out of 45 tests: 1 fails to build and 44 were skipped.

I would really appreciate to know whats going on here and why the container compiled from the official github repo isn't working

builinstall performance

Most helpful comment

I was having this problem in OSX. I found that using --local_resources 5000,1.0,1.0 as well as increasing the container resources seems to have made everything work (it's still compiling and has been for > hour). In OSX you can increase the container resource through the docker application in preferences. Hope this helps...

All 4 comments

This is an internal compiler error which usually suggests a memory limitation. See https://github.com/tensorflow/serving/issues/182 for example.

You can try bazel build --local_resources 5000,1.0,1.0 tensorflow_serving/... for example (or whatever constraints make sense for you machine/container).

I'm running into a similar issue. I've tried changing the version of gcc to see if that fixed the errors but I'm getting similar errors for both gcc 4.9.4 and 4.8.5

Here's the output of running bazel build --local_resources 5000,1.0,1.0 tensorflow_serving/... with gcc 4.9.4

ERROR: /serving/tensorflow_serving/servables/tensorflow/BUILD:176:1: Linking of rule '//tensorflow_serving/servables/tensorflow:simple_servers_test' failed: gcc failed: error executing command /usr/bin/gcc -o bazel-out/local-fastbuild/bin/tensorflow_serving/servables/tensorflow/simple_servers_test -pthread -Wl,-no-as-needed -B/usr/bin -B/usr/bin -pass-exit-codes '-Wl,--build-id=md5' ... (remaining 3 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
...
collect2: error: ld returned 1 exit status
INFO: Elapsed time: 100.527s, Critical Path: 81.51s

and when running using gcc (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5:

root@6ff2fb9120b0:/serving# bazel build --local_resources 5000,1.0,1.0 tensorflow_serving/...
INFO: Reading 'startup' options from /root/.bazelrc: --batch
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
WARNING: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/inception_model/WORKSPACE:1: Workspace name in /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/inception_model/WORKSPACE (@inception) does not match the name given in the repository's definition (@inception_model); this will cause a build error in future versions.
INFO: Found 176 targets...
ERROR: /serving/tensorflow_serving/util/BUILD:110:1: Linking of rule '//tensorflow_serving/util:fast_read_dynamic_ptr_benchmark' failed: gcc failed: error executing command /usr/bin/gcc -o bazel-out/local-fastbuild/bin/tensorflow_serving/util/fast_read_dynamic_ptr_benchmark '-Wl,-rpath,$ORIGIN/../../_solib_k8/' -Lbazel-out/local-fastbuild/bin/_solib_k8 -pthread ... (remaining 8 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
...
1> const, 0, Eigen::InnerStride<-1> > >::size() const'
collect2: error: ld returned 1 exit status
INFO: Elapsed time: 141.595s, Critical Path: 120.39s

docker stats shows a memory limit of 15.65GB

I was having this problem in OSX. I found that using --local_resources 5000,1.0,1.0 as well as increasing the container resources seems to have made everything work (it's still compiling and has been for > hour). In OSX you can increase the container resource through the docker application in preferences. Hope this helps...

@perdasilva did this fix the bazel build issue? (and how long did it run for in the end?)

Update 1

My environment: OSX 10.11.6, Macbook Pro, 16 GB RAM.

I followed the docker installation instruction as per This GitHub - refer to the Jupiter Notebook - with a small tweak.

The tweak: I specified the bazel build step like this, and it worked with no error this time (took about 2 hour 40 minute to run - with lots of warning / non-error / non-critical messages popping up in terminal):

bazel build -c opt --jobs 1 --local_resources 2048,0.5,1.0 --verbose_failures  tensorflow_serving/...

I can probably reduce the job run time more by tweaking this script a bit more. (also, I used bazel 0.5.4, instead of 0.5.1 - as it did not allow me to build with 0.5.1)

Verify Build Correctly

Command:

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server

Output:

usage: bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
Flags:
    --port=8500                         int32   port to listen on
    --enable_batching=false             bool    enable batching
    --batching_parameters_file=""       string  If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
    --model_config_file=""              string  If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)
    --model_name="default"              string  name of model (ignored if --model_config_file flag is set
    --model_base_path=""                string  path to export (ignored if --model_config_file flag is set, otherwise required)
    --file_system_poll_wait_seconds=1   int32   interval in seconds between each poll of the file system for new model version
    --tensorflow_session_parallelism=0  int64   Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
    --platform_config_file=""           string  If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)

Looks ok!

Was this page helpful?
0 / 5 - 0 ratings