Serving: Export server metrics

Created on 2 Jun 2017 · 32Comments · Source: tensorflow/serving

Hello,
Is there any way to have metrics (response time, prediction time, number of requests...) on the server ?
Thank you.

needs prio contributions welcome

Source

FNegrello

👍24

Most helpful comment

A documentation on what represents these metrics and how to use them in production uses cases would be very nice. Right now i'm trying to implement a scaling mechanism for our serving machines and i can just deduct what the metrics are doing.

Is there such documentation somewhere? because i can't find it

dannysid on 8 Apr 2020

👍8

All 32 comments

Currently we don't have any built-in solution for that. Contributions welcome :)

Perhaps a good way to structure it would be to subclass ServableStateMonitor with something that also maintains & exports statistics.

chrisolston on 12 Jun 2017

👍1

@FNegrello @chrisolston
How is this issue (or similar issue: #711) going?

ynqa on 22 Feb 2018

If nobody push commits here now, I'd like to contribute.

Then, I also ask you about metrics definitions:

What are the elements output? (e.g. model_name, model_version, and elapsed_time per prediction...)
How does it output the metrics?
- @FNegrello case: Output metrics per calling Predict service, and use LOG (for stdout) or syslog in this commit.
- Another suggestion: Add the service for metrics to pull them from the monitoring systems.

ynqa on 22 Feb 2018

the Prometheus metrics format - and pull model - is a good place to start.

works well with kubernetes, has c++ bindings, has a strong community, solid (turing-complete) query language, and great integration with a newly-updated Grafana UI.

for now, i would start with the following:

model_name

model_version

any custom labels/tags the user passes in

request counts

error counts (for each type of error you see in the TF Serving code)

graph execution time (aka prediction time)

in the future, i’d like to see the following:

custom ops for timing specific parts of the graph (similar in concept to the batch/unbatch ops, except start_timer/end_timer)

individual request tracing like ZipKin or Jaeger

model ensemble statistics (which models were involved in the final prediction)

a/b test metrics

cfregly on 22 Feb 2018

👍7

the Prometheus metrics format - and pull model - is a good place to start.

@cfregly Sure. In this case, which is the better on using Prometheus?

Add the service here for output metrics.
Preparing endpoint (e.g. /metrics) for HTTP requests, such as: https://github.com/grpc-ecosystem/go-grpc-prometheus

ynqa on 22 Feb 2018

I just also implemented tensorflow serving exporter for prometeus besides server metrics.
https://github.com/ynqa/tf_serving_exporter

ynqa on 21 Apr 2018

I would love to see prometheus support here as well, any traction from the PR #800? I can see it got a little stale.

thomasjungblut on 23 Apr 2018

@thomasjungblut It's okay to move the repo into here.

ynqa on 23 Apr 2018

@cfregly @ynqa is #800 compatible with prometheus?

jlewi on 20 Jun 2018

👍1

@jlewi Yes. But the exporter is needed between prometheus server and tf serving.

ynqa on 25 Jun 2018

Hi @chrisolston
Any update from tf-serving team since your comment in https://github.com/tensorflow/serving/pull/800#issuecomment-414407903?

wydwww on 20 Nov 2018

@wydwww Updates about prometheus exporter are described on this release note: https://github.com/tensorflow/serving/blob/master/RELEASE.md#major-features-and-improvements

ynqa on 22 Nov 2018

@ynqa Thanks for your reply! I checked this commit and set up the monitoring. I can see the following types of metrics by simply curl or in Prometheus:

# TYPE :tensorflow:cc:saved_model:load_attempt_count counter
# TYPE :tensorflow:cc:saved_model:load_latency counter
# TYPE :tensorflow:contrib:session_bundle:load_attempt_count counter
# TYPE :tensorflow:contrib:session_bundle:load_latency counter
# TYPE :tensorflow:core:direct_session_runs counter
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter

Is there a way to get the metrics for prediction time of a request and the time when model_server received a request? I am trying to test in the lifecycle of one prediction request, how much time is spent on the network and how much on prediction by model_server.

wydwww on 27 Nov 2018

@wydwww welcome. Now, I think there are some ways to collect your own custom metrics:

on client side, estimate and export metrics you defined
fork tf serving and add exporting mechanism into that

ynqa on 11 Dec 2018

@ynqa Thanks.
Currently I use a program running on server to receive requests from client and send them to model server. This can remove time spent on internet.

wydwww on 11 Dec 2018

I saw there was 021efbd3281aa815cab0b35eab6d6d25249c12d4, which expose prometheus metrics on /monitoring/prometheus/metrics.

However, I tested the tensorflow/serving docker image with tag nightly/latest/1.12.0, but all received 404 not found on /monitoring/prometheus/metrics. While /v1/models/model worked without problem.

Reproduce with:
docker run --rm -it -v `pwd`/models:/models -p 8501:8501 tensorflow/serving:1.12.0

litaxc on 23 Jan 2019

@litaxc i think your issue is same as #1180 -- this is fixed in latest 1.13.0-rc1 release. can you please help test the published docker image? -- thanks!

netfs on 14 Feb 2019

@netfs I tried 1.13.0-rc1 but still got 404 Not Found on /monitoring/prometheus/metrics
I use the exported model from https://www.tensorflow.org/serving/serving_basic and still cannot get any prometheus metrics.

litaxc on 15 Feb 2019

you need to pass --monitoring_config_file=<file> to the model server to enable prometheus endpoint. the file format looks like this (you can skip path if you want to use the default). see the unit test for more details.

netfs on 15 Feb 2019

👍2

@netfs it works! thank you!

litaxc on 18 Feb 2019

you need to pass --monitoring_config_file= to the model server to enable prometheus endpoint.

@netfs Is documentation about monitoring progressing? Or could I write?

ynqa on 24 Apr 2019

@ynqa Idt anyone is actively working on monitoring docs so please feel free to write one up.

misterpeddy on 29 Apr 2019

👍3

Is there documentation about how to access the /monitoring/prometheus/metrics endpoint?

zaktab on 23 May 2019

Is there documentation about how to access the /monitoring/prometheus/metrics endpoint?

1348 is attempting to document monitoring. feel free to comment on that PR if you want to see additional information added.

netfs on 23 May 2019

👍1

Are there any plans to add a metric that is like prediction time that has model_name as a label so we can see things like the average latency of a request for a given model?

Also the metric :tensorflow:cc:saved_model:load_attempt_count has a label model_path it seems like this would make more sense to be the name of the model and version rather than the path (or it could just have all of them)

blester125 on 6 Jun 2019

Hi @ynqa i'm working in a python client based in your code to get serving metrics and i'm having some difficulties to make it works. My code looks like:

import grpc     
import tensorflow as tf

from tensorflow_serving.apis import get_model_metrics_pb2
from tensorflow_serving.apis import model_service_pb2_grpc


def main():
    channel = grpc.insecure_channel("127.0.0.1:8501")
    stub = model_service_pb2_grpc.ModelServiceStub(channel)
    request = get_model_metrics_pb2.GetModelMetricsRequest()
    request.model_spec.name = '1'
    request.model_spec.signature_name = 'get_model_metrics'
    response = stub.GetModelMetrics(request, 100)

if response.status.error_code == 0:
    print("Success")
    print(response)
else:
    print("Fail!")
    print(response.status.error_code)
    print(response.status.error_message)


if __name__ == '__main__':
    tf.app.run()

And when I try to tun it this message appears:

    from tensorflow_serving.apis import get_model_metrics_pb2
ImportError: cannot import name 'get_model_metrics_pb2' from 'tensorflow_serving.apis'

I generate get_model_metrics.pb2 with protoc so i assume the error is not here. Searching it on Google and reading what people says, i'm sure that is a circular depedence failure but i don't know how to fix it properly. Maybe you have any idea or know someone who is working on it. Thanks a lot!

Davidbcn23 on 11 Jun 2019

Hi, @blester125 @wydwww @ynqa

Since these metrics

TYPE :tensorflow:serving:request_example_count_total counter

TYPE :tensorflow:serving:request_example_counts histogram

TYPE :tensorflow:serving:request_log_count counter

NOT supported by the official versions by now, what is the cheapest way to get these metrics? Any improvement yet? Looking forward to your reply-)

By the way, the number of requests received per second and the time cost per request is what I want.

PayneJoe on 24 Jun 2019

`# TYPE :tensorflow:data:autotune counter

TYPE :tensorflow:data:bytes_read counter

TYPE :tensorflow:data:elements counter

TYPE :tensorflow:data:optimization counter

TYPE :tensorflow:serving:model_warmup_latency histogram

TYPE :tensorflow:serving:request_example_count_total counter

TYPE :tensorflow:serving:request_example_counts histogram

TYPE :tensorflow:serving:request_log_count counter`

Hi, @chrisolston , I'm wondering why these metrics are empty by now? They are the TODO list or something else?

Thanks,

PayneJoe on 24 Jun 2019

👀2

echoing what PayneJoe talked above

I am running tf serving as docker container and exposed a REST endpoint with prometheus metrics enabled.

I could see following stats from prometheus endpoint

# TYPE :tensorflow:core:graph_runs counter
:tensorflow:core:graph_runs{} 10000
# TYPE :tensorflow:data:autotune counter
# TYPE :tensorflow:data:bytes_read counter
# TYPE :tensorflow:data:elements counter
# TYPE :tensorflow:data:optimization counter
# TYPE :tensorflow:serving:model_warmup_latency histogram
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter

but I could only see :tensorflow:core:graph_runs counters but there are no counters for :tensorflow:serving:request_example_count_total ,
:tensorflow:serving:request_log_count etc.
Am I missing anything here?

abhinavos7a on 28 Oct 2019

👀1

I try to extract TensorFlow Serving metrics.
how can I see the the number of requests, requests per second, the time tfs needs to serve a request?

root@tf-ds-model-1-gbkpm:/# curl localhost:8501/monitoring/prometheus/metrics
# TYPE :tensorflow:cc:saved_model:load_attempt_count counter
:tensorflow:cc:saved_model:load_attempt_count{model_path="s3://ds_model/model1",status="success"} 1
# TYPE :tensorflow:cc:saved_model:load_latency counter
:tensorflow:cc:saved_model:load_latency{model_path="s3://ds_model/model1"} 708801
# TYPE :tensorflow:contrib:session_bundle:load_attempt_count counter
# TYPE :tensorflow:contrib:session_bundle:load_latency counter
# TYPE :tensorflow:core:direct_session_runs counter
:tensorflow:core:direct_session_runs{} 21558641
# TYPE :tensorflow:core:graph_run_time_usecs counter
:tensorflow:core:graph_run_time_usecs{} 168661266983
# TYPE :tensorflow:core:graph_runs counter
:tensorflow:core:graph_runs{} 21558640
# TYPE :tensorflow:serving:model_warmup_latency histogram
# TYPE :tensorflow:serving:request_example_count_total counter
# TYPE :tensorflow:serving:request_example_counts histogram
# TYPE :tensorflow:serving:request_log_count counter

how is it started:

tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=model --model_base_path=/models/model --model_config_file=/etc/tfserving/model/model.conf --monitoring_config_file=/etc/tfserving/monitoring/monitoring.conf

Arnold1 on 27 Feb 2020

👀2

Is there such documentation somewhere? because i can't find it

dannysid on 8 Apr 2020

👍8

also is there a way to monitor requests per second into the Tensorflow serving container, how many requests got severed, how many failed etc.? I don't see it...

Arnold1 on 14 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings