Hi,
is there any option to ping tensorflow serving, in order to check if the service is up and running ?
That is especially helpful in environments where each service monitored via ping and health endpoints.
Thanks,
As far as I know the tensorflow_model_server does not support a health check RPC as in
https://github.com/grpc/grpc/blob/master/doc/health-checking.md
What I've used instead in my project are GetModelMetadataRequest calls to the servers to check for liveness and model presence.
Thanks, I eventually ended up doing same, but still, feels as not the correct approach :-)
I think that they can implement sort of ping/health end point
Another scenario (which I'm facing) is using TF Serving behind ELB (Network Load Balancer to be precise) where you don't control what the health request looks like.
Would there be any interest in adding that feature?
Sounds like a reasonable feature to add. We would want to do it in a separate RPC service (not the model inference one). Contributions welcome :)
It does make a lot of sense, on my scenario, running under a k8s cluster, the /healthz endpoint helps a lot to keep the infrastructure monitored, rather than collecting errors from the clients.
I'd be interested to know how serving is used in production without some form of health checking. What do people rely on in production then? I like @bknl's approach as a temporary fallback.
If you know the list of model names, then it can be done using GetModelMetadata, except there is a bug in there: https://github.com/tensorflow/serving/issues/784
I have the same requirement to get model status. I find some new protos in tf-serving 1.5 such as get_model_status.proto and model_service.proto, but have no idea how to use them. Reading the java code generated from these proto files, it does look like a separate RPC service mentioned by @chrisolston . I ask the same question on stackoverflow: how-to-get-model-status. Could anyone explain it a bit?
Follow the code example _/serving/tensorflow_serving/model_servers/tensorflow_model_server_test.py_, you can use GetModelStatus in tf_serving version 1.5 to get model version and model status (such as AVAILABLE, LOADING and END). AVAILABLE means the server is running and the model is ready to work.
Hi @Zhiqiang-Ye
I am trying to get all versions of serving model from tensorflow-serving server and @chrisolston suggests me to use the GetModelStatus API. However, I face some import issue on importing GetModelStatus call. Can you help me to identify the right usage of GetModelStatus API? #961
Many Thanks
@bknl @mazorigal Could please let me know how to use GetModelMetadataRequest calls to the servers to check for liveness and model presence, as I am not able to use the GetModelStatus API #961 . Thanks in advance
Hi,
I am just using the GetModelMetadataRequest in order to get any 200 response back from tensor serving. I dont really much care what the the response content. In case tensor flow serving is down, you will get response which is not 200. I then interpret it as a failed health check.
Hope that helps :-)
@mazorigal @bknl. Considering the Restful API implementation in TFS >= 1.8.0, is there any way can we use the RESTful API to access GetModelMetadataRequest or the GetModelStatus. Please let me know.
@ianchen06
It looks like the a REST endpoint has now been added here https://github.com/tensorflow/serving/commit/00e459f1604c40c073cbb9cb92d72cb6a88be9cd to access GetModelStatus, which can reasonably be used for a server health check.
Nevertheless, I wanted to jump in here to note that I would also be interested in a status REST endpoint that could be used as a generic health check for the server.
Agreed, we need a legit health check REST endpoint too.
You can easily provide a health endpoint as such: https://github.com/helmuthva/jetson/blob/master/workflow/deploy/tensorflow-serving/src/webservice/health/healthz.py
/cc @quanjielin
It is important to have an appropriate health check endpoint so users can add liveness check.
Is there any endpoint, at all that can be used as a health check for the server itself ie. when serving multiple models?
Following .. Having an issue where the time to fully stand up TensorFlow after a docker is started is sometimes longer than the healthcheck against the LoadBalancer causing a cycle of failovers. Would be nice to have something that responds quicker while the system starts up
For anyone interested, I made a small tensorflow serving health probe. It communicates with tensorflow serving over grpc, calling the ModelService.GetModelStatus() rpc, and ensures a model status of AVAILABLE. It can be used as a kubernetes exec probe, similar to grpc_health_probe.
tfs_model_status_probe - TensorFlow Serving Model Status Probe
https://github.com/codycollier/tfs-model-status-probe
Most helpful comment
Another scenario (which I'm facing) is using TF Serving behind ELB (Network Load Balancer to be precise) where you don't control what the health request looks like.
Would there be any interest in adding that feature?