Prometheus-operator: Support targets sharding

Created on 8 May 2019 · 1Comment · Source: prometheus-operator/prometheus-operator

Hi, colleagues!

What is missing?

At the moment there is no targets sharding support in the Prometheus Operator.
It would be great to add it.

Why do we need it?

Currently we have a thousands of targets in each Prometheus and it seems that we're coming to its performance limit on one node.

Possible solutions that I see are:
1) Prometheus per namespace
2) Use sharding

Both of solutions have their own advantages.

At the moment solution with sharding seems a bit better, that's why:

All Prometheus targets are our microservices and all of them following to the same observability standards, so we should to keep the same aggregation/alerting rules on all Prometheus instances. It means that its better to keep this Prometheuses in one logical group.
We have a lot of namespaces (currently ~100), some of namespaces are really much bigger than others, its normal and we don't want to customize resource limits for each namespace.
Prometheus Operator provide failover logic when one of shards is down: it could reconfigure Prometheus instances and rebalance targets by setting up modulus = Prom instances count.

My propose is to add sharding attribute to ServiceMonitor like shard_by: <label>. This label (or labels list will be used as source label for sharding with action: hashmod). Modulus could be configured automatically based on Prom instances count.

What do you think?

kinfeature

Source

d-ulyanov

Most helpful comment

If you need a solution quickly, you can already use additional relabeling rules on your ServiceMonitor via the hashmod action, and create multiple ServiceMonitors per "shard". Your use case makes a lot of sense, I'd like to think it through a little bit further, and arrive at a solution, that would allow us to eventually autoscale sharding based on the metric ingestion (I'm thinking a general purpose way, where a Prometheus object would become a shard and maybe a ShardedPrometheus object that orchestrates these, and can be autoscaled via the HPA). What I'm saying is, maybe the sharding decision should be configured in the Prometheus object ultimately instead of the ServiceMonitor (where it's already possible albeit a little manual today).