Hi, I was following the steps for TensorFlow Serving as per the guide
I was able to complete all the steps successfully up till 'Start the Server'. When I try to execute the below command to query the results from the model, I get the following error. I am getting this error when both my server and client are on my local machine, when they are both on the AWS instance. I also get this issue when I run the server on my AWS instance and client from my local machine.
root@xxxxxxxxxxx:/serving# bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=Test.jpg
Traceback (most recent call last):
File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/tf_serving/tensorflow_serving/example/inception_client.py", line 56, in <module>
tf.app.run()
File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/tf_serving/tensorflow_serving/example/inception_client.py", line 51, in main
result = stub.Predict(request, 1000.0) # 10 secs timeout
File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 309, in __call__
self._request_serializer, self._response_deserializer)
File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 195, in _blocking_unary_unary
raise _abortion_error(rpc_error_call)
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.UNAVAILABLE, details="Connect Failed")
I would appreciate thoughts and suggestions.
PS: I have tried increasing the timeout but that didn't help.
It seems the client failed to connect to the server. Did you run a the Server in a separate process (separate terminal tab)? Did it come up successfully? Can you include the command you used to run the server and the logs that were printed?
@kirilg Thanks for your thoughts.
I did run the server in a separate session.
Here is how I started the server:
nvidia-docker run -v /home/ubuntu/rest/tf_serving:/tf_serving -it $USER/inception_servingbazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=inception-export &> inception_log &[1] 16Afterwards, I open a new session: nvidia-docker run -v /home/ubuntu/rest/tf_serving:/tf_serving -it $USER/inception_serving
Then I run the following in the new session to query the server: bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=./Test.jpg
Here's the inception_log:
2017-09-06 05:04:22.197984: I tensorflow_serving/model_servers/main.cc:147] Building single TensorFlow model file config: model_name: inception model_base_path: inception-export
2017-09-06 05:04:22.198218: F tensorflow_serving/model_servers/main.cc:410] Non-OK-status: ServerCore::Create(std::move(options), &core) status: Invalid argument: Expected model inception to have an absolute path; got base_path()=inception-export
Ah, ok, your ModelServer didn't actually start successfully because it hit a problem that was recently introduced. We'll need to update the tutorial documentation, but could you please first try the suggested fix?
1) Re-export the inception model, but this time update the command to:
bazel-bin/tensorflow_serving/example/inception_saved_model --checkpoint_dir=inception-v3 --output_dir=/tmp/inception-export
(note: only the output_dir changed)
2) Run the ModelServer using the new path:
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/tmp/inception-export &> inception_log &
Please let me know if this works and I'll update the tutorial.
Thanks @kirilg for the suggestion. Unfortunately, I get the same error.
I'm assuming that ModelServer didn't start.
When I run the ModelServer with the old path and try to run another command in the same session I get the following aborted message at the end:
root@xxxxxxxxx:/serving# bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=inception-export &> inception_log &
[3] 2895
root@xxxxxxxxx:/serving# ls
AUTHORS WORKSPACE bazel-serving inception-v3-2016-03-01.tar.gz tools
CONTRIBUTING.md Xiang_Xiang_panda.jpg bazel-testlogs inception_log
LICENSE bazel-bin core tensorflow
README.md bazel-genfiles inception-export tensorflow_serving
RELEASE.md bazel-out inception-v3 tf_models
[3]+ Aborted (core dumped) bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=inception-export &> inception_log
When I try to run the ModelServer with the new path and then run another command, I don't get the aborted message:
root@xxxxxxxxxx:/serving# bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/tmp/inception-export &> inception_log &
[3] 2898
root@xxxxxxxxxx:/serving# ls
AUTHORS WORKSPACE bazel-serving inception-v3-2016-03-01.tar.gz tools
CONTRIBUTING.md Xiang_Xiang_panda.jpg bazel-testlogs inception_log
LICENSE bazel-bin core tensorflow
README.md bazel-genfiles inception-export tensorflow_serving
RELEASE.md bazel-out inception-v3 tf_models
Not sure if this is relevant, but just so you know. Anything else I could try.
These are the inception_logs :
2017-09-06 05:04:22.197984: I tensorflow_serving/model_servers/main.cc:147] Building single TensorFlow model file config: model_name: inception model_base_path: inception-export
2017-09-06 05:04:22.198218: F tensorflow_serving/model_servers/main.cc:410] Non-OK-status: ServerCore::Create(std::move(options), &core) status: Invalid argument: Expected model inception to have an absolute path; got base_path()=inception-export
Thanks again for your help!
Yes, run with the new path (--model_base_path=/tmp/inception-export). It looks like that didn't fail right? Can you look inside the file inception_log to verify that you don't see any errors there and if it looks good, try running the client to issue a request to the server
bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=Test.jpg
(or if that doesn't work, use the full absolute path to Test.jpg).
I ran with the new path (--model_base_path=/tmp/inception-export). Here are the inception_logs:
2017-09-07 01:56:09.460856: I tensorflow_serving/model_servers/main.cc:147] Building single TensorFlow model file config: model_name: inception model_base_path: /tmp/inception-export
2017-09-07 01:56:09.461123: I tensorflow_serving/model_servers/server_core.cc:434] Adding/updating models.
2017-09-07 01:56:09.461161: I tensorflow_serving/model_servers/server_core.cc:485] (Re-)adding model: inception
2017-09-07 01:56:09.561783: I tensorflow_serving/core/basic_manager.cc:705] Successfully reserved resources to load servable {name: inception version: 1}
2017-09-07 01:56:09.561844: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: inception version: 1}
2017-09-07 01:56:09.561862: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: inception version: 1}
2017-09-07 01:56:09.561893: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /tmp/inception-export/1
2017-09-07 01:56:09.561916: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:236] Loading SavedModel from: /tmp/inception-export/1
2017-09-07 01:56:09.613371: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2017-09-07 01:56:09.727658: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:155] Restoring SavedModel bundle.
2017-09-07 01:56:10.027711: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running LegacyInitOp on SavedModel bundle.
2017-09-07 01:56:10.091405: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:284] Loading SavedModel: success. Took 529470 microseconds.
2017-09-07 01:56:10.091689: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: inception version: 1}
E0907 01:56:10.093972820 5837 ev_epoll1_linux.c:1051] grpc epoll fd: 3
2017-09-07 01:56:10.095541: I tensorflow_serving/model_servers/main.cc:288] Running ModelServer at 0.0.0.0:9000 ...
I gave the full path to the image. Still get the same error message.
root@xxxxxxxxxx:/serving# bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=/serving/panda.jpg
Seems the server started up correctly.
What's the error? "Connect Failed"?
Try running the client from the same terminal in which you ran the server. Also verify that /serving/panda.jpg exists by running ls /serving/panda.jpg
Thanks for your help @kirilg
Running the client from the same terminal as the server returns the expected response.
But, opening a new session and running the client results in the "Connect Failed" error as mentioned earlier.
The image exists and the path is correct.
Anything else I could try?
The issue with the --server=localhost:9000 param. You're not talking to 'localhost' if not from within the same docker container, so the host is not found. You'll need to contact its actual address, or maybe include a -p 9000:9000 argument to your docker run command in order to publish the container's port to the host. Another approach is to continue with the tutorial and see what IPs get used in the kubernetes deployment.
Thanks @kirilg
It works now!
Here's what I did:
docker run -v /Users/<user>/images/samples:/serving/samples -p9000:9000 -it $USER/inception_serving# cd /serving
# bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/tmp/inception-export
$ docker ps
$ docker exec -t -i <container-name> /bin/bash
# bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=/serving/panda.jpgThat's great! Glad it works now :)
@kirilg Could you suggest an approach or documentation on how could I replace the pre-trained inception model with a transfer learning based retrained inception model
I assume that I'd export the retrained model and specify checkpoints. Something similar to this?
bazel-bin/tensorflow_serving/example/inception_saved_model --checkpoint_dir=inception-v3-retrained --output_dir=/tmp/inception_retrained-export
I'd appreciate your thoughts. Thanks again!
@kirilg @arush1204 thanks, I met the same problem and solved it by your suggestions, best.
Most helpful comment
Ah, ok, your ModelServer didn't actually start successfully because it hit a problem that was recently introduced. We'll need to update the tutorial documentation, but could you please first try the suggested fix?
1) Re-export the inception model, but this time update the command to:
bazel-bin/tensorflow_serving/example/inception_saved_model --checkpoint_dir=inception-v3 --output_dir=/tmp/inception-export(note: only the output_dir changed)
2) Run the ModelServer using the new path:
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/tmp/inception-export &> inception_log &Please let me know if this works and I'll update the tutorial.