Serving: Export server metrics

Created on 2 Jun 2017  路  32Comments  路  Source: tensorflow/serving

Hello,
Is there any way to have metrics (response time, prediction time, number of requests...) on the server ?
Thank you.

needs prio contributions welcome

Most helpful comment

A documentation on what represents these metrics and how to use them in production uses cases would be very nice. Right now i'm trying to implement a scaling mechanism for our serving machines and i can just deduct what the metrics are doing.

Is there such documentation somewhere? because i can't find it

All 32 comments

Currently we don't have any built-in solution for that. Contributions welcome :)

Perhaps a good way to structure it would be to subclass ServableStateMonitor with something that also maintains & exports statistics.

@FNegrello @chrisolston
How is this issue (or similar issue: #711) going?

If nobody push commits here now, I'd like to contribute.

Then, I also ask you about metrics definitions:

  1. What are the elements output? (e.g. model_name, model_version, and elapsed_time per prediction...)
  2. How does it output the metrics?

    • @FNegrello case: Output metrics per calling Predict service, and use LOG (for stdout) or syslog in this commit.

    • Another suggestion: Add the service for metrics to pull them from the monitoring systems.

the Prometheus metrics format - and pull model - is a good place to start.

works well with kubernetes, has c++ bindings, has a strong community, solid (turing-complete) query language, and great integration with a newly-updated Grafana UI.

for now, i would start with the following:

model_name

model_version

any custom labels/tags the user passes in

request counts

error counts (for each type of error you see in the TF Serving code)

graph execution time (aka prediction time)

in the future, i鈥檇 like to see the following:

custom ops for timing specific parts of the graph (similar in concept to the batch/unbatch ops, except start_timer/end_timer)

individual request tracing like ZipKin or Jaeger

model ensemble statistics (which models were involved in the final prediction)

a/b test metrics

the Prometheus metrics format - and pull model - is a good place to start.

@cfregly Sure. In this case, which is the better on using Prometheus?

  1. Add the service here for output metrics.
  2. Preparing endpoint (e.g. /metrics) for HTTP requests, such as: https://github.com/grpc-ecosystem/go-grpc-prometheus

I just also implemented tensorflow serving exporter for prometeus besides server metrics.
https://github.com/ynqa/tf_serving_exporter

I would love to see prometheus support here as well, any traction from the PR #800? I can see it got a little stale.

@thomasjungblut It's okay to move the repo into here.

@cfregly @ynqa is #800 compatible with prometheus?

@jlewi Yes. But the exporter is needed between prometheus server and tf serving.

Hi @chrisolston
Any update from tf-serving team since your comment in https://github.com/tensorflow/serving/pull/800#issuecomment-414407903?

@wydwww Updates about prometheus exporter are described on this release note: https://github.com/tensorflow/serving/blob/master/RELEASE.md#major-features-and-improvements

@ynqa Thanks for your reply! I checked this commit and set up the monitoring. I can see the following types of metrics by simply curl or in Prometheus:

# TYPE :tensorflow:cc:saved_model:load_attempt_count counter
# TYPE :tensorflow:cc:saved_model:load_latency counter
# TYPE :tensorflow:contrib:session_bundle:load_attempt_count counter
# TYPE :tensorflow:contrib:session_bundle:load_latency counter
# TYPE :tensorflow:core:direct_session_runs counter
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter

Is there a way to get the metrics for prediction time of a request and the time when model_server received a request? I am trying to test in the lifecycle of one prediction request, how much time is spent on the network and how much on prediction by model_server.

@wydwww welcome. Now, I think there are some ways to collect your own custom metrics:

  1. on client side, estimate and export metrics you defined
  2. fork tf serving and add exporting mechanism into that

@ynqa Thanks.
Currently I use a program running on server to receive requests from client and send them to model server. This can remove time spent on internet.

I saw there was 021efbd3281aa815cab0b35eab6d6d25249c12d4, which expose prometheus metrics on /monitoring/prometheus/metrics.

However, I tested the tensorflow/serving docker image with tag nightly/latest/1.12.0, but all received 404 not found on /monitoring/prometheus/metrics. While /v1/models/model worked without problem.

Reproduce with:
docker run --rm -it -v `pwd`/models:/models -p 8501:8501 tensorflow/serving:1.12.0

@litaxc i think your issue is same as #1180 -- this is fixed in latest 1.13.0-rc1 release. can you please help test the published docker image? -- thanks!

@netfs I tried 1.13.0-rc1 but still got 404 Not Found on /monitoring/prometheus/metrics
I use the exported model from https://www.tensorflow.org/serving/serving_basic and still cannot get any prometheus metrics.

you need to pass --monitoring_config_file=<file> to the model server to enable prometheus endpoint. the file format looks like this (you can skip path if you want to use the default). see the unit test for more details.

@netfs it works! thank you!

you need to pass --monitoring_config_file= to the model server to enable prometheus endpoint.

@netfs Is documentation about monitoring progressing? Or could I write?

@ynqa Idt anyone is actively working on monitoring docs so please feel free to write one up.

Is there documentation about how to access the /monitoring/prometheus/metrics endpoint?

Is there documentation about how to access the /monitoring/prometheus/metrics endpoint?

1348 is attempting to document monitoring. feel free to comment on that PR if you want to see additional information added.

Are there any plans to add a metric that is like prediction time that has model_name as a label so we can see things like the average latency of a request for a given model?

Also the metric :tensorflow:cc:saved_model:load_attempt_count has a label model_path it seems like this would make more sense to be the name of the model and version rather than the path (or it could just have all of them)

Hi @ynqa i'm working in a python client based in your code to get serving metrics and i'm having some difficulties to make it works. My code looks like:

import grpc     
import tensorflow as tf

from tensorflow_serving.apis import get_model_metrics_pb2
from tensorflow_serving.apis import model_service_pb2_grpc


def main():
    channel = grpc.insecure_channel("127.0.0.1:8501")
    stub = model_service_pb2_grpc.ModelServiceStub(channel)
    request = get_model_metrics_pb2.GetModelMetricsRequest()
    request.model_spec.name = '1'
    request.model_spec.signature_name = 'get_model_metrics'
    response = stub.GetModelMetrics(request, 100)

if response.status.error_code == 0:
    print("Success")
    print(response)
else:
    print("Fail!")
    print(response.status.error_code)
    print(response.status.error_message)


if __name__ == '__main__':
    tf.app.run()

And when I try to tun it this message appears:

    from tensorflow_serving.apis import get_model_metrics_pb2
ImportError: cannot import name 'get_model_metrics_pb2' from 'tensorflow_serving.apis'

I generate get_model_metrics.pb2 with protoc so i assume the error is not here. Searching it on Google and reading what people says, i'm sure that is a circular depedence failure but i don't know how to fix it properly. Maybe you have any idea or know someone who is working on it. Thanks a lot!

Hi, @blester125 @wydwww @ynqa

Since these metrics

TYPE :tensorflow:serving:request_example_count_total counter

TYPE :tensorflow:serving:request_example_counts histogram

TYPE :tensorflow:serving:request_log_count counter

NOT supported by the official versions by now, what is the cheapest way to get these metrics? Any improvement yet? Looking forward to your reply-)

By the way, the number of requests received per second and the time cost per request is what I want.

`# TYPE :tensorflow:data:autotune counter

TYPE :tensorflow:data:bytes_read counter

TYPE :tensorflow:data:elements counter

TYPE :tensorflow:data:optimization counter

TYPE :tensorflow:serving:model_warmup_latency histogram

TYPE :tensorflow:serving:request_example_count_total counter

TYPE :tensorflow:serving:request_example_counts histogram

TYPE :tensorflow:serving:request_log_count counter`

Hi, @chrisolston , I'm wondering why these metrics are empty by now? They are the TODO list or something else?

Thanks,

echoing what PayneJoe talked above

I am running tf serving as docker container and exposed a REST endpoint with prometheus metrics enabled.

I could see following stats from prometheus endpoint

# TYPE :tensorflow:core:graph_runs counter
:tensorflow:core:graph_runs{} 10000
# TYPE :tensorflow:data:autotune counter
# TYPE :tensorflow:data:bytes_read counter
# TYPE :tensorflow:data:elements counter
# TYPE :tensorflow:data:optimization counter
# TYPE :tensorflow:serving:model_warmup_latency histogram
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter

but I could only see :tensorflow:core:graph_runs counters but there are no counters for :tensorflow:serving:request_example_count_total ,
:tensorflow:serving:request_log_count etc.
Am I missing anything here?

I try to extract TensorFlow Serving metrics.
how can I see the the number of requests, requests per second, the time tfs needs to serve a request?

root@tf-ds-model-1-gbkpm:/# curl localhost:8501/monitoring/prometheus/metrics
# TYPE :tensorflow:cc:saved_model:load_attempt_count counter
:tensorflow:cc:saved_model:load_attempt_count{model_path="s3://ds_model/model1",status="success"} 1
# TYPE :tensorflow:cc:saved_model:load_latency counter
:tensorflow:cc:saved_model:load_latency{model_path="s3://ds_model/model1"} 708801
# TYPE :tensorflow:contrib:session_bundle:load_attempt_count counter
# TYPE :tensorflow:contrib:session_bundle:load_latency counter
# TYPE :tensorflow:core:direct_session_runs counter
:tensorflow:core:direct_session_runs{} 21558641
# TYPE :tensorflow:core:graph_run_time_usecs counter
:tensorflow:core:graph_run_time_usecs{} 168661266983
# TYPE :tensorflow:core:graph_runs counter
:tensorflow:core:graph_runs{} 21558640
# TYPE :tensorflow:serving:model_warmup_latency histogram
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter

how is it started:

tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=model --model_base_path=/models/model --model_config_file=/etc/tfserving/model/model.conf --monitoring_config_file=/etc/tfserving/monitoring/monitoring.conf

A documentation on what represents these metrics and how to use them in production uses cases would be very nice. Right now i'm trying to implement a scaling mechanism for our serving machines and i can just deduct what the metrics are doing.

Is there such documentation somewhere? because i can't find it

also is there a way to monitor requests per second into the Tensorflow serving container, how many requests got severed, how many failed etc.? I don't see it...

Was this page helpful?
0 / 5 - 0 ratings