Linkerd2: Proxy should expose a prometheus /metrics endpoint

Created on 23 Feb 2018 · 20Comments · Source: linkerd/linkerd2

Instead of reporting all metrics through the control plane, it would be ideal to expose metrics natively from a proxy's admin server, via a Prometheus export.

Note that if TLS is added to the proxy's admin service (on which tap is exposed), we'll likely be able to serve the metrics from this server (so we don't have to start an additional http server), because prometheus's collector can--as far as I can tell--use H2 to read metrics if the server supports TLS/ALPN.

The proxy should initially expose the following Prometheus metrics and labels:

inbound_request_total {dst_<X>} (counter)

inbound_request_duration_ms {dst_<X>} (counter)

inbound_response_total {dst_<X>, status_code, grpc_status_code}  (counter)

inbound_response_latency_ms {dst_<X>, status_code, grpc_status_code, le}  (histogram)

inbound_response_duration_ms {dst_<X>, status_code, grpc_status_code, le}  (histogram)

outbound_request_total {authority, src_<X>, dst_<Y>}  (counter)

outbound_request_duration_ms {authority, src_<X>, dst_<Y>}  (counter)

outbound_response_total {authority, src_<X>, dst_<Y>, status_code, grpc_status_code}  (counter)

outbound_response_latency_ms {authority, src_<X>, dst_<Y>, status_code, grpc_status_code, le}  (histogram)

outbound_response_duration_ms {authority, src_<X>, dst_<Y>, status_code, grpc_status_code, le}  (histogram)

<X> is taken from the CONDUIT_PROMETHEUS_LABELS env var. For example, if CONDUIT_PROMETHEUS_LABELS="deployment=foo" then the proxy should expose inbound_request_total {dst_deployment=foo} (depends on #426)
<Y> is taken from the prometheus label data from the destination service. For example, if the proxy is sending a request to an address with the prometheus label data of "job=bar" then the proxy should expose outbound_request_total {..., dst_job=bar} (depends on #429)

Open Questions:

Should we ever reset counters?
Over what time period should histograms be tracked?
We should use Prometheus's default histogram buckets... what are they?

areproxy aretelemetry prioritP0

Source

olix0r

All 20 comments

[Edit: this comment is out of date and no longer reflects the current thinking. See description for up-to-date information]

The proxy should initially expose the following Prometheus metrics and labels:

inbound_request_total {http_method, target_deployment}

inbound_response_total {http_method, target_deployment, status_code, grpc_status_code}

inbound_response_latency_ms {http_method, target_deployment, status_code, grpc_status_code, le}

outbound_request_total {source_deployment, target_service, target_deployment}

outbound_response_total {source_deployment, target_service, target_deployment, status_code, grpc_status_code}

outbound_response_latency_ms {source_deployment, target_service, target_deployment, status_code, grpc_status_code, le}

Setting target_deployment on inbound metrics depends on #426
Response totals should be incremented when the response stream is complete
Response latencies should be measured from when the request begins to when the response completes (request header to response trailer)
target_service should be set to the normalized from of the :authority header (e.g. s.n.svc.cluster.local)

adleong on 23 Feb 2018

When the proxy container is shutting down, should it linger for a while to allow the last set of metrics to be retrieved by Prometheus? (I don't know, myself.)

briansmith on 23 Feb 2018

@briansmith more generally, we probably want to honor some graceful draining period so that requests may complete while we stop accepting new requests. Aside from that, I don't think we need to do too much work to ensure that prometheus has a 100% accurate view of the world (i.e. some requests may go unaccounted, and that's ok).

olix0r on 23 Feb 2018

I would like signoff here from @olix0r @siggy and @seanmonstar on the list of metrics and labels to expose. The list is given in the issue description.

adleong on 4 Mar 2018

What's le?
I assume *_response_latency_ms is the latency of the service only. Should we have another metric that includes the time spent in the proxy as well?

seanmonstar on 5 Mar 2018

le is "less than or equal to", specific to Prometheus histograms:
https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

histograms have _bucket, _sum, and _count metrics, they look like this:

# HELP http_response_size_bytes A histogram of response sizes for requests.
# TYPE http_response_size_bytes histogram
http_response_size_bytes_bucket{le="100"} 14
http_response_size_bytes_bucket{le="200"} 15
http_response_size_bytes_bucket{le="300"} 48
http_response_size_bytes_bucket{le="400"} 1789
http_response_size_bytes_bucket{le="500"} 1789
http_response_size_bytes_bucket{le="1000"} 4087
http_response_size_bytes_bucket{le="2000"} 4089
http_response_size_bytes_bucket{le="3000"} 4102
http_response_size_bytes_bucket{le="4000"} 4119
http_response_size_bytes_bucket{le="5000"} 4135
http_response_size_bytes_bucket{le="10000"} 4376
http_response_size_bytes_bucket{le="20000"} 5864
http_response_size_bytes_bucket{le="30000"} 5864
http_response_size_bytes_bucket{le="40000"} 5864
http_response_size_bytes_bucket{le="50000"} 5864
http_response_size_bytes_bucket{le="1e+06"} 5864
http_response_size_bytes_bucket{le="2e+06"} 5864
http_response_size_bytes_bucket{le="3e+06"} 5864
http_response_size_bytes_bucket{le="4e+06"} 5864
http_response_size_bytes_bucket{le="5e+06"} 5864
http_response_size_bytes_bucket{le="+Inf"} 5864
http_response_size_bytes_sum 2.3006904e+07
http_response_size_bytes_count 5864

siggy on 5 Mar 2018

❤1

@seanmonstar

I assume *_response_latency_ms is the latency of the service only. Should we have another metric that includes the time spent in the proxy as well?

We probably want some sort of handletime_us histogram tracking time-in-proxy

olix0r on 5 Mar 2018

@adleong

target_service should be the normalized form of the :authority header

Can we just call this authority? It's not target-specific. And it's not necessarily a Service.

olix0r on 5 Mar 2018

👍1

Is there an existing convention around source_ and target_ outside of Conduit?

What about using src_ and dst_? (we don't talk about targets anywhere, but we do talk about destinations).

olix0r on 5 Mar 2018

👍1

Somewhat related to this ticket, TIL about Prometheus' labelmap config, which automates copying Kubernetes labels to Prometheus labels at collection time:
https://github.com/prometheus/prometheus/blob/release-2.1/documentation/examples/prometheus-kubernetes.yml#L6

siggy on 5 Mar 2018

Some open questions for me:
1) What is the duration in metrics like inbound_request_duration_ms? Time spent in proxy? How would inbound_request_duration_ms differ from outbound_response_duration_ms?
2) Do we want to include request/response sizes?
3) #428 does no mention use of authority. How does it differ from target_deployment, and if we're not querying for it, why are we recording it?
4) Should we export health metrics as well? Memory usage, etc?

siggy on 6 Mar 2018

This is also unclear to me:

Should we ever reset counters?

I'm not that familiar with Prometheus. After it scrapes out /metrics endpoint, should we be resetting all counters? Is it assumed that the counts in the output are the counts from between scrapes?

seanmonstar on 6 Mar 2018

Prometheus assumes counters are not reset between scrapes. I'm also unclear when/why we would reset.

siggy on 6 Mar 2018

@siggy

What is the duration in metrics like inbound_request_duration_ms?

The amount of time that from when the request starts until the request completes. I.e. effectively the time between when the HEADERS frame is sent and the TRAILERS frame is sent

Note that I don't think it's critical to expose this from the start, but will be helpful for the sake of completeness.

How would inbound_request_duration_ms differ from outbound_response_duration_ms?

inbound_request_duration_ms measures the timing for request body streams for requests originating from outside the pod.

outbound_request_duration_ms measures the timing for response body streams for requests originating within the pod.

Note that inbound_request and outbound_response are totally unrelated -- inbound_ applies to streams originating outside the pod, and outbound_ applies to streams originating inside the pod.

Do we want to include request/response sizes?

Probably. Not critical to shipping this feature though.

Should we export health metrics as well? Memory usage, etc?

Yes, but let's not couple that with this.

428 does no mention use of authority. How does it differ from target_deployment, and if we're not querying for it, why are we recording it?

it's equivalent to what we were calling target_service, see my comment above.

olix0r on 6 Mar 2018

I suggest that the next step for this is to write the documentation for Proxy Metrics, submitted as a PR. It will help to consolidate the various parts of conversation in an authoritative doc, and also means we have to update that doc if the impl changes ;)

olix0r on 6 Mar 2018

Question: why do we even need inbound_ and outbound_ prefixes? can this simply be added as a label like orientation="inbound"? this would make it easy to count, for instance, the total number of requests through a proxy...

olix0r on 6 Mar 2018

orientation could certainly be a label. One potential drawback of this is that it is harder to tell which combinations of labels will be exposed together. For example, the src_<X> label will not be populated on the incoming metrics because the proxy cannot tell which pod grouping a request came from.

adleong on 6 Mar 2018

https://github.com/runconduit/conduit/pull/536

adleong on 8 Mar 2018

I've opened issues #540, #541, #542, #543, and #544 to track several of the subtasks involved in this issue.