Serving: Connect Failed Error while querying the server with inception_client.py

Created on 6 Sep 2017 · 13Comments · Source: tensorflow/serving

Hi, I was following the steps for TensorFlow Serving as per the guide

I was able to complete all the steps successfully up till 'Start the Server'. When I try to execute the below command to query the results from the model, I get the following error. I am getting this error when both my server and client are on my local machine, when they are both on the AWS instance. I also get this issue when I run the server on my AWS instance and client from my local machine.

root@xxxxxxxxxxx:/serving# bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=Test.jpg Traceback (most recent call last): File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/tf_serving/tensorflow_serving/example/inception_client.py", line 56, in <module> tf.app.run() File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/tf_serving/tensorflow_serving/example/inception_client.py", line 51, in main result = stub.Predict(request, 1000.0) # 10 secs timeout File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 309, in __call__ self._request_serializer, self._response_deserializer) File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 195, in _blocking_unary_unary raise _abortion_error(rpc_error_call) grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.UNAVAILABLE, details="Connect Failed")

I would appreciate thoughts and suggestions.
PS: I have tried increasing the timeout but that didn't help.

Source

arush1204

Most helpful comment

Ah, ok, your ModelServer didn't actually start successfully because it hit a problem that was recently introduced. We'll need to update the tutorial documentation, but could you please first try the suggested fix?

1) Re-export the inception model, but this time update the command to:
bazel-bin/tensorflow_serving/example/inception_saved_model --checkpoint_dir=inception-v3 --output_dir=/tmp/inception-export
(note: only the output_dir changed)

2) Run the ModelServer using the new path:
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/tmp/inception-export &> inception_log &

Please let me know if this works and I'll update the tutorial.

kirilg on 7 Sep 2017

👍2 🎉1

All 13 comments

It seems the client failed to connect to the server. Did you run a the Server in a separate process (separate terminal tab)? Did it come up successfully? Can you include the command you used to run the server and the logs that were printed?

kirilg on 6 Sep 2017

@kirilg Thanks for your thoughts.
I did run the server in a separate session.

Here is how I started the server:

First I run my docker container: nvidia-docker run -v /home/ubuntu/rest/tf_serving:/tf_serving -it $USER/inception_serving
Then I run the server:
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=inception-export &> inception_log &
Here's the output: [1] 16

Afterwards, I open a new session: nvidia-docker run -v /home/ubuntu/rest/tf_serving:/tf_serving -it $USER/inception_serving
Then I run the following in the new session to query the server: bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=./Test.jpg

Here's the inception_log:

2017-09-06 05:04:22.197984: I tensorflow_serving/model_servers/main.cc:147] Building single TensorFlow model file config:  model_name: inception model_base_path: inception-export
2017-09-06 05:04:22.198218: F tensorflow_serving/model_servers/main.cc:410] Non-OK-status: ServerCore::Create(std::move(options), &core) status: Invalid argument: Expected model inception to have an absolute path; got base_path()=inception-export

arush1204 on 7 Sep 2017

Please let me know if this works and I'll update the tutorial.

kirilg on 7 Sep 2017

👍2 🎉1

Thanks @kirilg for the suggestion. Unfortunately, I get the same error.

I re-exported the inception model with the new output_dir you suggested above.
Then I ran the ModelServer using the new path.

I'm assuming that ModelServer didn't start.
When I run the ModelServer with the old path and try to run another command in the same session I get the following aborted message at the end:

root@xxxxxxxxx:/serving# bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=inception-export &> inception_log &

[3] 2895

root@xxxxxxxxx:/serving# ls
AUTHORS          WORKSPACE              bazel-serving     inception-v3-2016-03-01.tar.gz  tools
CONTRIBUTING.md  Xiang_Xiang_panda.jpg  bazel-testlogs    inception_log
LICENSE          bazel-bin              core              tensorflow
README.md        bazel-genfiles         inception-export  tensorflow_serving
RELEASE.md       bazel-out              inception-v3      tf_models
[3]+  Aborted                 (core dumped) bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=inception-export &> inception_log

When I try to run the ModelServer with the new path and then run another command, I don't get the aborted message:
root@xxxxxxxxxx:/serving# bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/tmp/inception-export &> inception_log &

[3] 2898

root@xxxxxxxxxx:/serving# ls
AUTHORS          WORKSPACE              bazel-serving     inception-v3-2016-03-01.tar.gz  tools
CONTRIBUTING.md  Xiang_Xiang_panda.jpg  bazel-testlogs    inception_log
LICENSE          bazel-bin              core              tensorflow
README.md        bazel-genfiles         inception-export  tensorflow_serving
RELEASE.md       bazel-out              inception-v3      tf_models

Not sure if this is relevant, but just so you know. Anything else I could try.

These are the inception_logs :

2017-09-06 05:04:22.197984: I tensorflow_serving/model_servers/main.cc:147] Building single TensorFlow model file config:  model_name: inception model_base_path: inception-export
2017-09-06 05:04:22.198218: F tensorflow_serving/model_servers/main.cc:410] Non-OK-status: ServerCore::Create(std::move(options), &core) status: Invalid argument: Expected model inception to have an absolute path; got base_path()=inception-export

Thanks again for your help!

arush1204 on 7 Sep 2017

Yes, run with the new path (--model_base_path=/tmp/inception-export). It looks like that didn't fail right? Can you look inside the file inception_log to verify that you don't see any errors there and if it looks good, try running the client to issue a request to the server

bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=Test.jpg

(or if that doesn't work, use the full absolute path to Test.jpg).

kirilg on 7 Sep 2017

I ran with the new path (--model_base_path=/tmp/inception-export). Here are the inception_logs:

2017-09-07 01:56:09.460856: I tensorflow_serving/model_servers/main.cc:147] Building single TensorFlow model file config:  model_name: inception model_base_path: /tmp/inception-export
2017-09-07 01:56:09.461123: I tensorflow_serving/model_servers/server_core.cc:434] Adding/updating models.
2017-09-07 01:56:09.461161: I tensorflow_serving/model_servers/server_core.cc:485]  (Re-)adding model: inception
2017-09-07 01:56:09.561783: I tensorflow_serving/core/basic_manager.cc:705] Successfully reserved resources to load servable {name: inception version: 1}
2017-09-07 01:56:09.561844: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: inception version: 1}
2017-09-07 01:56:09.561862: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: inception version: 1}
2017-09-07 01:56:09.561893: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /tmp/inception-export/1
2017-09-07 01:56:09.561916: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:236] Loading SavedModel from: /tmp/inception-export/1
2017-09-07 01:56:09.613371: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2017-09-07 01:56:09.727658: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:155] Restoring SavedModel bundle.
2017-09-07 01:56:10.027711: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running LegacyInitOp on SavedModel bundle.
2017-09-07 01:56:10.091405: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:284] Loading SavedModel: success. Took 529470 microseconds.
2017-09-07 01:56:10.091689: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: inception version: 1}
E0907 01:56:10.093972820    5837 ev_epoll1_linux.c:1051]     grpc epoll fd: 3
2017-09-07 01:56:10.095541: I tensorflow_serving/model_servers/main.cc:288] Running ModelServer at 0.0.0.0:9000 ...

I gave the full path to the image. Still get the same error message.
root@xxxxxxxxxx:/serving# bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=/serving/panda.jpg

arush1204 on 7 Sep 2017

Seems the server started up correctly.

What's the error? "Connect Failed"?
Try running the client from the same terminal in which you ran the server. Also verify that /serving/panda.jpg exists by running ls /serving/panda.jpg

kirilg on 7 Sep 2017

Thanks for your help @kirilg
Running the client from the same terminal as the server returns the expected response.
But, opening a new session and running the client results in the "Connect Failed" error as mentioned earlier.
The image exists and the path is correct.
Anything else I could try?

arush1204 on 7 Sep 2017

The issue with the --server=localhost:9000 param. You're not talking to 'localhost' if not from within the same docker container, so the host is not found. You'll need to contact its actual address, or maybe include a -p 9000:9000 argument to your docker run command in order to publish the container's port to the host. Another approach is to continue with the tutorial and see what IPs get used in the kubernetes deployment.

kirilg on 7 Sep 2017

Thanks @kirilg
It works now!
Here's what I did:

Run docker: docker run -v /Users/<user>/images/samples:/serving/samples -p9000:9000 -it $USER/inception_serving
Test docker container on local machine:

# cd /serving
# bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/tmp/inception-export

Open another terminal window:

$ docker ps
$ docker exec -t -i <container-name> /bin/bash

Run the python inception client:
# bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=/serving/panda.jpg

arush1204 on 7 Sep 2017

That's great! Glad it works now :)

kirilg on 7 Sep 2017

@kirilg Could you suggest an approach or documentation on how could I replace the pre-trained inception model with a transfer learning based retrained inception model
I assume that I'd export the retrained model and specify checkpoints. Something similar to this?
bazel-bin/tensorflow_serving/example/inception_saved_model --checkpoint_dir=inception-v3-retrained --output_dir=/tmp/inception_retrained-export

I'd appreciate your thoughts. Thanks again!

arush1204 on 7 Sep 2017

@kirilg @arush1204 thanks, I met the same problem and solved it by your suggestions, best.