Victoriametrics: Different result same query to Prometheus and VictoriaMetrics

Created on 15 Nov 2019  路  10Comments  路  Source: VictoriaMetrics/VictoriaMetrics

Describe the bug
I use same query to Prometheus and VictoriaMetrics in Grafana, but have different graphs in Grafana (see screenshots):

  • graph in Prometheus higher (value = 2), but shorter (two times: 15:50:45 - 15:51:30, 15:51:45 - 15:52:30);
  • graph in Prometheus lower (value = 1), but longer (one time: 15:50:45 - 15:53:00);

To Reproduce
Use one query to Prometheus and VictoriaMetrics

Expected behavior
Graphs must be same

Screenshots
Graph for prometheus:

prometheus

Graph for victoriaMetric:

victoriaMetric

Version
victoria-metrics-20190822-120009-tags-v1.26.0-0-g1272e407

Additional context

bug

Most helpful comment

We found the prometheus implementations of increase and delta pretty useless for our data, as the extrapolation performed in the functions does not deliver correct results, Usually actual increases in the metric are ignored.

This is why we replaced our queries

sum(idelta(my_metrics{instance=~"$instance"}[$__interval]) by (something)

with:

sum(my_metrics{instance=~"$instance"} - my_metrics{instance=~"$instance"} offset $__interval >= 0) by (something)

This (although requiring double lookups of the series) correct results and does not miss increases of a time series.

All 10 comments

This issue can be related to the fact that Prometheus and VictoriaMetrics differently calculate increase:

  • VictoriaMetrics always returns the difference between the last and the first raw data points on the interval from square brackets - [1m] in your case.
  • Prometheus calculates interpolated value for increase - see https://github.com/prometheus/prometheus/issues/3746 .

IMHO, VictoriaMetrics' calculations are better than Prometheus' in this case, since they never return floating-point values for integer counter increase.

In order to prove the assumption, could you post graphs for the following query on both Prometheus and VictoriaMetrics on the same time range as graphs above?

ssw_log:counter:by_instance_level_source{job="ssw-log", level="E", instance="ss5-ss-prod-3:3903"}

This will allow calculating manually increase values according to the aforementioned algorithms for VictoriaMetrics and PRometheus.

@a-illiushchenia , are there any updates?

Yes, and it is wery interesting:
I took enather interval:

  1. sum(ssw_log:counter:by_instance_level_source{job="ssw-log", level="E", instance="ss5-ss-prod-1:3903"})

We have increace 1 point and it dispay correct in Prometheus and VictoriaMetrics:

Prometheus:

Prom_1

Prom_2

VictoriaMetrics:

VictMetr_1

VictMetr_2

  1. sum by (instance)(increase(ssw_log:counter:by_instance_level_source{job="ssw-log", level="E"}[1m]))

Prometheus:

Prom_3

VictoriaMetrics:

VictMetr_3

Hi @a-illiushchenia !
Which instance has that 2 increase on Prometheus pic? Could you pls compare sum and increase for that particular instance in both Prom and VM?

We found the prometheus implementations of increase and delta pretty useless for our data, as the extrapolation performed in the functions does not deliver correct results, Usually actual increases in the metric are ignored.

This is why we replaced our queries

sum(idelta(my_metrics{instance=~"$instance"}[$__interval]) by (something)

with:

sum(my_metrics{instance=~"$instance"} - my_metrics{instance=~"$instance"} offset $__interval >= 0) by (something)

This (although requiring double lookups of the series) correct results and does not miss increases of a time series.

@a-illiushchenia , the graphs show that Prometheus returns incorrect +2 increase for the actual +1 increase for the given time series. And the increase lasts for 30 seconds, while it should last for 1 minute according to [1m] time window passed to increase() function. VictoriaMetrics' graphs look correct.

@lammel , great solution! Note that the query can be improved with with templates and remove_resets() function from Extended PromQL in the following ways:
1) Mention my_metric{instance=~"$instance"} only once.
2) Remove possible counter resets.

The resulting query would look like:

with (
    q = remove_resets(my_metrics{instance=~"$instance"})
)
sum(q - q offset $__interval) by (something)

@valyala , the extended promql looks very polished. We will look into VM soon with the prometheus remote write setup.

Do the increase/delta functions work correct without extrapolation in victoria metrics (can I assume no increase is missed)?
From your update in prometheus issue 3806# I assume that VM delta function can be safely used, is efficient and correct.

Do the increase/delta functions work correct without extrapolation in victoria metrics (can I assume no increase is missed)?

Yes, both functions in VictoriaMetrics should return the exact increase / delta on the given time window in square brackets. If the time window is missing, then it is equal to step value - i.e. the interval between two adjacent points on the graph.

As for PromQL, it is great, but unfortunately it cannot be used with Promxy yet, since it understands only standard PromQL :( There are plans to fix this in the future - see this issue for details.

@a-illiushchenia , I'm going to close this issue as working as intended. Feel free re-opening it or adding additional details if you feel that VictoriaMetrics has issues with PromQL results.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

prdatur picture prdatur  路  3Comments

sh0rez picture sh0rez  路  3Comments

EricAntoni picture EricAntoni  路  3Comments

genericgithubuser picture genericgithubuser  路  4Comments

0xBF picture 0xBF  路  3Comments