Vector: Add an option to expose internal metrics via Prometheus to the Helm Chart

Created on 10 Sep 2020 · 7Comments · Source: timberio/vector

A lot of apps that are deployed to Kubernetes integrate into the Prometheus metrics collection pipelines out of the box. This works through an automated discovery process performed by the Prometheus Operator, and all the app has to do is expose a Prometheus endpoint and set some annotations, opting in for collection.

Vector has internal metrics, and we could very easily expose them via Prometheus endpoint. So, pretty much, all we have to do is add a proper knob to our Helm Chart. We can even enable it by default.

This should be a significant quality of life improvement for people already using Prometheus for metrics collection, because without any extra effort they'll be getting important data points about vector operation itself into their monitoring/altering system.

kubernetes feature

Source

MOZGIII

Most helpful comment

Thinking about this as I was testing out the vector-agent and vector-aggregator charts, and will drop my first impressions here.

Prometheus Operator

Provides both a ServiceMonitor and a PodMonitor CRD to allow for the discovery of metric endpoints. PodMonitor could be useful to look at given the (current) lack of a service resource targeting the vector-agent, theoretically it could be used to generate targets for both charts (assuming a shared selector across all the pods). I'm not certain if a port needs to be specific on the target pods for these CRDs to function properly, that _might_ be a requirement to implementing Prometheus Operator support.

Prometheus Helm

The non-operator helm chart required annotations documented here to be applied to resources to generate the scrape targets. This should be supported today, but it might be a nice "sugar" to have preconfigured annotations a user could toggle on, rather than having to write the config themselves.

I would definitely keep either route opt-in, rather than on by default. Prometheus Operator is probably pretty common, but it wouldn't be a good experience to try and apply not valid resources (due to a lack of the CRD).

spencergilbert on 15 Oct 2020

❤2 👍1

All 7 comments

Thinking about this as I was testing out the vector-agent and vector-aggregator charts, and will drop my first impressions here.

Prometheus Operator

Prometheus Helm

spencergilbert on 15 Oct 2020

❤2 👍1

I'm pretty sure that we don't need to ship the ServiceMonitor/PodMonitor at all, and can just add the annotations, as a starting point.
Technically, it's should be possible to add custom annotations already, and thus make Vector scraped by Prometheus.

So, the question is, what do we want to configure for users out of the box, and how do we do that.

The trivial idea is to add another built-in source - like kubernetes_logs in the vector-agent chart - but with internal_metrics type instead. In addition to that, we can add a built-in sink - parometheus_sink with prometheus type - that would expose the metrics from the internal_metrics. Of course, everything would be configurable - the source/sink names, parameters, sink inputs, etc. But, by default, they'll be connected to each-other out of the box.

What do you think?

MOZGIII on 16 Oct 2020

:thinking: I think we definitely want to support the Prometheus Operator CRDs out of the box - just doing annotations wouldn't support anyone using the operator, just those who use the helm chart. Of course I don't have numbers on users of each group...

spencergilbert on 16 Oct 2020

👍1

I explored the design space, enough. I see the following sequence of operations:

Add ports to containers (without this Prometheus can't scrape using kubernetes_sd_configs).

This is to allow our users to configure it manually while we're working on the official support.

See https://github.com/timberio/vector/issues/3808 and https://github.com/timberio/vector/pull/4835.
Add a notion of built-in metrics to the Helm charts.

The way this would work is:
1. We'll add an internal_metrics source and a prometheus sink (pointing to that source by default; we'll allow passing extra inputs to the sink to allow including other sources).
2. We'll add the relevant items to container ports (and to Services where applicable).
3. We'll add either Pod annotations (for prometheus-native scraping) or prometheus-operator-powered PodMonitors - based on the user choice.
All this conditioned by a single opt-in value (metrics.enabled: true), off by-default. Beyond a single opt-in, each of the configs will have good defaults (well, except for the annotations/PodMonitor pick, which will also be mandatory), but everything will be highly customizable - i.e. PodMonitor will expose the relabelings configuration for customization.

This is tracked by this issue.
Add Grafana dashboards.

See https://github.com/timberio/vector/issues/4838.

MOZGIII on 2 Nov 2020

👍1

cc @eeyun for your input.

binarylogic on 2 Nov 2020

To explicitly add my thoughts here - I think @MOZGIII has the right order of operations here. Add ports first to provide manual config. When it comes to prometheus CRD vs a simple scrape-able annotation I'm not entirely certain but my gut tells me that we probably will need to support both flows. I don't know how commonly the prometheus operators are leveraged in the wild.

eeyun on 10 Nov 2020

👍1

In my experience, anecdotal with no data, operators are way more common than the helm chart.

Also as a least effort option, if users can provide arbitrary annotations already, they could roll it themselves. If there's a lot of users doing that they'd probably mention and building it in would make sense?

spencergilbert on 10 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings