Hello,
Is there any way to have metrics (response time, prediction time, number of requests...) on the server ?
Thank you.
Currently we don't have any built-in solution for that. Contributions welcome :)
Perhaps a good way to structure it would be to subclass ServableStateMonitor with something that also maintains & exports statistics.
@FNegrello @chrisolston
How is this issue (or similar issue: #711) going?
If nobody push commits here now, I'd like to contribute.
Then, I also ask you about metrics definitions:
Predict service, and use LOG (for stdout) or syslog in this commit.the Prometheus metrics format - and pull model - is a good place to start.
works well with kubernetes, has c++ bindings, has a strong community, solid (turing-complete) query language, and great integration with a newly-updated Grafana UI.
for now, i would start with the following:
model_name
model_version
any custom labels/tags the user passes in
request counts
error counts (for each type of error you see in the TF Serving code)
graph execution time (aka prediction time)
in the future, i鈥檇 like to see the following:
custom ops for timing specific parts of the graph (similar in concept to the batch/unbatch ops, except start_timer/end_timer)
individual request tracing like ZipKin or Jaeger
model ensemble statistics (which models were involved in the final prediction)
a/b test metrics
the Prometheus metrics format - and pull model - is a good place to start.
@cfregly Sure. In this case, which is the better on using Prometheus?
/metrics) for HTTP requests, such as: https://github.com/grpc-ecosystem/go-grpc-prometheusI just also implemented tensorflow serving exporter for prometeus besides server metrics.
https://github.com/ynqa/tf_serving_exporter
I would love to see prometheus support here as well, any traction from the PR #800? I can see it got a little stale.
@thomasjungblut It's okay to move the repo into here.
@cfregly @ynqa is #800 compatible with prometheus?
@jlewi Yes. But the exporter is needed between prometheus server and tf serving.
Hi @chrisolston
Any update from tf-serving team since your comment in https://github.com/tensorflow/serving/pull/800#issuecomment-414407903?
@wydwww Updates about prometheus exporter are described on this release note: https://github.com/tensorflow/serving/blob/master/RELEASE.md#major-features-and-improvements
@ynqa Thanks for your reply! I checked this commit and set up the monitoring. I can see the following types of metrics by simply curl or in Prometheus:
# TYPE :tensorflow:cc:saved_model:load_attempt_count counter
# TYPE :tensorflow:cc:saved_model:load_latency counter
# TYPE :tensorflow:contrib:session_bundle:load_attempt_count counter
# TYPE :tensorflow:contrib:session_bundle:load_latency counter
# TYPE :tensorflow:core:direct_session_runs counter
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter
Is there a way to get the metrics for prediction time of a request and the time when model_server received a request? I am trying to test in the lifecycle of one prediction request, how much time is spent on the network and how much on prediction by model_server.
@wydwww welcome. Now, I think there are some ways to collect your own custom metrics:
@ynqa Thanks.
Currently I use a program running on server to receive requests from client and send them to model server. This can remove time spent on internet.
I saw there was 021efbd3281aa815cab0b35eab6d6d25249c12d4, which expose prometheus metrics on /monitoring/prometheus/metrics.
However, I tested the tensorflow/serving docker image with tag nightly/latest/1.12.0, but all received 404 not found on /monitoring/prometheus/metrics. While /v1/models/model worked without problem.
Reproduce with:
docker run --rm -it -v `pwd`/models:/models -p 8501:8501 tensorflow/serving:1.12.0
@litaxc i think your issue is same as #1180 -- this is fixed in latest 1.13.0-rc1 release. can you please help test the published docker image? -- thanks!
@netfs I tried 1.13.0-rc1 but still got 404 Not Found on /monitoring/prometheus/metrics
I use the exported model from https://www.tensorflow.org/serving/serving_basic and still cannot get any prometheus metrics.
@netfs it works! thank you!
you need to pass --monitoring_config_file=
to the model server to enable prometheus endpoint.
@netfs Is documentation about monitoring progressing? Or could I write?
@ynqa Idt anyone is actively working on monitoring docs so please feel free to write one up.
Is there documentation about how to access the /monitoring/prometheus/metrics endpoint?
Is there documentation about how to access the
/monitoring/prometheus/metricsendpoint?
Are there any plans to add a metric that is like prediction time that has model_name as a label so we can see things like the average latency of a request for a given model?
Also the metric :tensorflow:cc:saved_model:load_attempt_count has a label model_path it seems like this would make more sense to be the name of the model and version rather than the path (or it could just have all of them)
Hi @ynqa i'm working in a python client based in your code to get serving metrics and i'm having some difficulties to make it works. My code looks like:
import grpc
import tensorflow as tf
from tensorflow_serving.apis import get_model_metrics_pb2
from tensorflow_serving.apis import model_service_pb2_grpc
def main():
channel = grpc.insecure_channel("127.0.0.1:8501")
stub = model_service_pb2_grpc.ModelServiceStub(channel)
request = get_model_metrics_pb2.GetModelMetricsRequest()
request.model_spec.name = '1'
request.model_spec.signature_name = 'get_model_metrics'
response = stub.GetModelMetrics(request, 100)
if response.status.error_code == 0:
print("Success")
print(response)
else:
print("Fail!")
print(response.status.error_code)
print(response.status.error_message)
if __name__ == '__main__':
tf.app.run()
And when I try to tun it this message appears:
from tensorflow_serving.apis import get_model_metrics_pb2
ImportError: cannot import name 'get_model_metrics_pb2' from 'tensorflow_serving.apis'
I generate get_model_metrics.pb2 with protoc so i assume the error is not here. Searching it on Google and reading what people says, i'm sure that is a circular depedence failure but i don't know how to fix it properly. Maybe you have any idea or know someone who is working on it. Thanks a lot!
Hi, @blester125 @wydwww @ynqa
Since these metrics
TYPE :tensorflow:serving:request_example_count_total counter
TYPE :tensorflow:serving:request_example_counts histogram
TYPE :tensorflow:serving:request_log_count counter
NOT supported by the official versions by now, what is the cheapest way to get these metrics? Any improvement yet? Looking forward to your reply-)
By the way, the number of requests received per second and the time cost per request is what I want.
`# TYPE :tensorflow:data:autotune counter
Hi, @chrisolston , I'm wondering why these metrics are empty by now? They are the TODO list or something else?
Thanks,
echoing what PayneJoe talked above
I am running tf serving as docker container and exposed a REST endpoint with prometheus metrics enabled.
I could see following stats from prometheus endpoint
# TYPE :tensorflow:core:graph_runs counter
:tensorflow:core:graph_runs{} 10000
# TYPE :tensorflow:data:autotune counter
# TYPE :tensorflow:data:bytes_read counter
# TYPE :tensorflow:data:elements counter
# TYPE :tensorflow:data:optimization counter
# TYPE :tensorflow:serving:model_warmup_latency histogram
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter
but I could only see :tensorflow:core:graph_runs counters but there are no counters for :tensorflow:serving:request_example_count_total ,
:tensorflow:serving:request_log_count etc.
Am I missing anything here?
I try to extract TensorFlow Serving metrics.
how can I see the the number of requests, requests per second, the time tfs needs to serve a request?
root@tf-ds-model-1-gbkpm:/# curl localhost:8501/monitoring/prometheus/metrics
# TYPE :tensorflow:cc:saved_model:load_attempt_count counter
:tensorflow:cc:saved_model:load_attempt_count{model_path="s3://ds_model/model1",status="success"} 1
# TYPE :tensorflow:cc:saved_model:load_latency counter
:tensorflow:cc:saved_model:load_latency{model_path="s3://ds_model/model1"} 708801
# TYPE :tensorflow:contrib:session_bundle:load_attempt_count counter
# TYPE :tensorflow:contrib:session_bundle:load_latency counter
# TYPE :tensorflow:core:direct_session_runs counter
:tensorflow:core:direct_session_runs{} 21558641
# TYPE :tensorflow:core:graph_run_time_usecs counter
:tensorflow:core:graph_run_time_usecs{} 168661266983
# TYPE :tensorflow:core:graph_runs counter
:tensorflow:core:graph_runs{} 21558640
# TYPE :tensorflow:serving:model_warmup_latency histogram
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter
how is it started:
tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=model --model_base_path=/models/model --model_config_file=/etc/tfserving/model/model.conf --monitoring_config_file=/etc/tfserving/monitoring/monitoring.conf
A documentation on what represents these metrics and how to use them in production uses cases would be very nice. Right now i'm trying to implement a scaling mechanism for our serving machines and i can just deduct what the metrics are doing.
Is there such documentation somewhere? because i can't find it
also is there a way to monitor requests per second into the Tensorflow serving container, how many requests got severed, how many failed etc.? I don't see it...
Most helpful comment
A documentation on what represents these metrics and how to use them in production uses cases would be very nice. Right now i'm trying to implement a scaling mechanism for our serving machines and i can just deduct what the metrics are doing.
Is there such documentation somewhere? because i can't find it