Victoriametrics: Add data deduplication from HA Prometheus pair based on `--query.replica-label` arg similar to Thanos Query

Created on 1 Jul 2019  路  20Comments  路  Source: VictoriaMetrics/VictoriaMetrics

Hi,

VM seems great and we'd like to replace our Thanos setup (for 6 K8s Clusters) with it and we do understand how to label series with different labels to split Cluster data but I still struggle with data deduplication. In order to drastically reduce infrastructure costs we are running multiple Prom instances on preemptible hosts. With Thanos Query dedup capabilities it works fine. We don't need to perform any extra actions to get metrics once from all Prom instances for the given cluster. In addition to that dedup handles partial responses for us.

Is there any chance dedup can be included into VM in the observable future or this feature is irrelevant for the case?

enhancement question

Most helpful comment

Just one thing to add:
If you're using Prometheus-operator in HA mode, you have to set prometheus.prometheusSpec.replicaExternalLabelNameClear: true for VictoriaMetrics' deduplication to work. Otherwise, the operator will add an external label prometheus_replica (e.g. prometheus_replica="prometheus-kps-prometheus-0"), thus the requirement _"Note that these Prometheus instances must have identical external_labels section in their configs, so they write data to the same time series."_ (Deduplication in VM) will not be satisfied.

///

And for those who was a bit confused like me while configuring deduplication in VM cluster, dedup.minScrapeInterval has to be set in two places: vmselect (=1ms; for deduplication at a query level) and vmstorage (something lower than your scrape interval, maybe 5s; for deduplication at a storage level). - The documentation is not very clear about that.
I hope I'm not giving a wrong advice here :sweat_smile:

All 20 comments

Thanos deduplication requires setting distinct label value in external_labels config for each replica in Prometheus HA pair - see these docs. This label must be passed to --query.replica-label arg when starting Query component in order to enable proper deduplication.

It is easy to create similar deduplication in VictoriaMetrics using the following steps:

  • start multiple VictoriaMetrics instances (or clusters) in different datacenters (availability zones)
  • configure each replica1 from Prometheus HA pairs to write data to the first VictoriaMetrics, while replica2 must write data to the second VictoriaMetrics. Replicas should have identical labels in external_labels section. There is no need in setting distinct label values for each replica like in Thanos case.
  • start Promxy in front of VictoriaMetrics instances.
  • send queries from Grafana to Promxy. It should handle data deduplication and merging.

Read more about high availability setups here.

I do understand this approach but it is opposite to what I'm trying to achieve here. I have HA Proms in the same zone and I want to have single big VM instance/cluster for data aggregation layer for a number of K8s clusters because VM has great performance and scales vertically greatly. In this case I don't see issues in providing different labels for members of HA pair and I'd like some way to dedup this data based on those labels. In addition to that if Proxmy is used - I guess I loose cool PromQL extension features that VM offers.

I don't see issues in providing different labels for members of HA pair and I'd like some way to dedup this data based on those labels

I see. Then VictoriaMetrics must support --query.replica-label config arg like Thanos Query component. Converting this issues to the corresponding feature request.

I guess I loose cool PromQL extension features that VM offers.

Unfortunately yes :(
@jacksontj , Promxy author, is looking into Extended PromQL integration with Promxy - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/56 . So, probably, Promxy will gain Extended PromQL in the future.

In the mean time you can run only a single Prometheus instance per K8S cluster, so it would replicate new data to VictoriaMetrics in real-time. When the Prometheus host is preempted, you may lose data for a few scrape intervals required for spinning up new host with Prometheus instance in K8S cluster. This is much smaller potential data loss comparing to Thanos case, where up to 2 hours of recent data may be lost on host preemption event. See this article for more details.

FTR you can achieve this sort of label-based dedupe using promxy's relabel_config -- I've seen people use this to shard prometheus hosts and merge the data together. The docstring there explains the behavior, if you have any questions feel free to let me know :)

FTR you can achieve this sort of label-based dedupe using promxy's [relabel_config]

I was looking for this since yesterday and couldn't find how to achieve this. Thx! Still i think that having this kind of dedup for VIctoriaMetrics can be benefitial as I suppose that given VM architecture it can be done without any noticeable degradation in query

relabel_config - is not what we need.
This is just mutate labels without any kind of deduplication. Manipulation with lables is the similar as query {replica=~"replica-(1|2)"}
What if two prometheus do remote_write the same metrics to victoriametrics-server with the same labels. Is victoriametrics return latest value or both? So if one of prom will down - there are only metrics will be 2 times less often?

What if two prometheus do remote_write the same metrics to victoriametrics-server with the same labels. Is victoriametrics return latest value or both? So if one of prom will down - there are only metrics will be 2 times less often?

VictoriaMetrics stores all the data points for the same time series received from any clients. It doesn't do any deduplication on the received data. This may be OK in certain cases, but usually this hurts compression ratio and may break certain queries involving such functions as rate, count_over_time, sum_over_time, etc.

relabel_config - is not what we need.
This is just mutate labels without any kind of deduplication. Manipulation with lables is the similar as query {replica=~"replica-(1|2)"}

This is actually only partially accurate. In the case of promxy it mutates the labels of the timeseries, but the deduplication within promxy is actually done entirely by the labels (some more details in https://github.com/jacksontj/promxy/blob/master/pkg/servergroup/config.go#L80) -- I've seen at least a few installs successfully use the relabel_config to do custom data deduplication (when you can't just use the servergroups).

Related feature request in Promxy - https://github.com/jacksontj/promxy/issues/258 .

VictoriaMetrics stores all the data points for the same time series received from any clients. It doesn't do any deduplication on the received data. This may be OK in certain cases, but usually this hurts compression ratio and may break certain queries involving such functions as rate, count_over_time, sum_over_time, etc.

What happens if the timestamp and value for a time series are exactly the same, because backfilling was done from slightly overlapping snapshots? Will it then store the exact same value twice and will it change anything with the metrics? (At least it shouldn't hurt compression ratio to store exactly the same values in case it does that.)

What happens if the timestamp and value for a time series are exactly the same, because backfilling was done from slightly overlapping snapshots? Will it then store the exact same value twice and will it change anything with the metrics?

Samples with identical values and timestamps are saved twice in the db. This changes results for functions such as count_over_time and sum_over_time, since the corresponding samples are counted multiple times. This also may worsen compression ratio. For instance, if we have the following series:
10 20 30 40 50 60 70 80 90
Then VM converts it to 10 0 0 0 0 0 0 0 0 using delta coding, which can be compressed to very compact form like ten plus eight zeroes.
If we double each value in the series:
10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90
Then it is converted into 10 0 10 0 10 0 10 0 10 ... with delta coding, which has not so good compression ratio comparing to the original sequence.

@valyala Thank you very much. Are there any plans to dedupe inside VM in the future?

Yes, there are such plans - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/86 . There are also plans to add -minScrapeInterval command-line flag, which would delete duplicate data points in time series if the distance between these data points is less than -minScrapeInterval.

FYI, the -dedup.minScrapeInterval flag has been added in VictoriaMetrics v1.33.0. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#deduplication for more details.

FYI, the -dedup.minScrapeInterval flag has been added in VictoriaMetrics v1.33.0. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#deduplication for more details.

@valyala Will this work for all variants of ingesting data into VM or just Prometheus remote write?

@valyala And what about existing data inside VictoriaMetrics? Will this option apply for that?

@mback2k , thanks for good questions! See answers below:

Will this work for all variants of ingesting data into VM or just Prometheus remote write?

The deduplication is applied to all the data ingested into VictoriaMetrics. It isn't limited to Prometheus data.

And what about existing data inside VictoriaMetrics? Will this option apply for that?

The deduplication is applied to all the existing data and to new data. There are two levels of de-duplication:

  • Gradual de-duplication during background merging of smaller parts into bigger parts. Read here about parts. This de-duplication is aimed towards reducing the occupied storage space. Note that bigger and older parts are merged less frequently comparing to smaller and newer parts. So historical data can be left duplicated. This can be addressed in the future.
  • Final de-duplication at query time. All the queried data points are finally de-duplicated during queries.

Closing this issue, since de-duplication is already supported by VictoriaMetrics - see https://victoriametrics.github.io/#deduplication .

Just one thing to add:
If you're using Prometheus-operator in HA mode, you have to set prometheus.prometheusSpec.replicaExternalLabelNameClear: true for VictoriaMetrics' deduplication to work. Otherwise, the operator will add an external label prometheus_replica (e.g. prometheus_replica="prometheus-kps-prometheus-0"), thus the requirement _"Note that these Prometheus instances must have identical external_labels section in their configs, so they write data to the same time series."_ (Deduplication in VM) will not be satisfied.

///

And for those who was a bit confused like me while configuring deduplication in VM cluster, dedup.minScrapeInterval has to be set in two places: vmselect (=1ms; for deduplication at a query level) and vmstorage (something lower than your scrape interval, maybe 5s; for deduplication at a storage level). - The documentation is not very clear about that.
I hope I'm not giving a wrong advice here :sweat_smile:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Serrvosky picture Serrvosky  路  3Comments

ozn0417 picture ozn0417  路  3Comments

sh0rez picture sh0rez  路  3Comments

valyala picture valyala  路  4Comments

isality picture isality  路  3Comments