Victoriametrics: inf datapoint causes other values to be truncated to integer

Created on 8 Sep 2020 · 13Comments · Source: VictoriaMetrics/VictoriaMetrics

Describe the bug

These points are a regression vs. Prometheus:

if VM encounters an inf value in a metric, neighboring values will be truncated to an integer (presumably due to the compression algorithm).
VM records +/-inf in the storage with the corresponding max or min integer

Especially (1) is a critical bug which has caused severe precision loss in our data corpus.

To Reproduce

Write these metric points:

foo 1.5
foo 2.33

Confirm raw points with a query:
http://localhost:8428/api/v1/query_range?query=foo[2h]&start=2020-09-08T03:00:00Z

{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"foo"},"values":[[1599540900,"1.5"],[1599541200,"2.33"]]}]}}

Write inf value to series:

foo inf

Note that inf is represented by max integer, and that previous datapoints have been truncated to integer:

{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"foo"},"values":[[1599540900,"1"],[1599541200,"2"],[1599541500,"2"],[1599541800,"2"],[1599542100,"9223372036854776000"]]}]}}

Expected behavior

inf should be preserved as inf in the storage and queries
inf shouldn't affect dynamic range of compression

In Prometheus case, the inf datapoint seems to become NaN, so it isn't graphed.

Perhaps the compression algorithm should have some sanity bounds on minimum precision.

prometheus does not treat inf / NaN specially: https://www.robustperception.io/get-thee-to-a-nannary

Version

victoria-metrics-20200815-125320-tags-v1.40.0-0-ged00eb3f3

bug

Source

belm0

Most helpful comment

Agreed that NaN and Inf values could be useful in certain cases. Unfortunately it may be very hard to add proper support for these values without significant changes in data encoding layer. This is unfortunate trade-off between encoding efficiency and supporting corner case values.
Let's start with documenting this behavior - see 0bccb58e802298348ac179dd055ff5288b76557c .
Let's keep this issue open in the hope a solution could be found eventually for proper handling of corner-case values at data encoding layer.

valyala on 15 Sep 2020

👍2

All 13 comments

@belm0 , could you try VictoriaMetrics from the latest commit in master branch? ( currently 62fde80490af6d4ab4acee7ce75e098334f37ae3 ). It should ignore newly ingested inf values. This should prevent from precision loss for already stored data after writing inf values.

See build instructions.

valyala on 11 Sep 2020

FYI, the commit mentioned above has been included in v1.41.0.

valyala on 12 Sep 2020

It should ignore newly ingested inf values

@valyala respectfully, I disagree with VM ignoring Inf and NaN.

These values are useful for representing exceptional states within a series, and are supported in Prometheus as first class values for TSDB storage and query operators. They are standardized sentinel values that propagate through math operators in important ways that ad-hoc sentinels cannot (like using 9,999,999 to mean "big number", or -1 to mean "error").

I understand that a Grafana graph does not display these values, but that doesn't imply they are useless. Simple examples:

if a sensor is malfunctioning transiently it may report NaN, and we can graph the count or rate of NaN values in the series. And malfunctioning is distinct from the host device not being online for scraping, so NaN should not be conflated with the absence of a sample in the TSDB.
a time-of-flight sensor may actually report Inf if there is no reflection. We don't want to represent this case as "no data" because that would drop important information, nor as an arbitrarily large number, because that would skew a naive statistics query in subtle ways. (In contrast, the output of a query operating on Inf values will be obviously wrong and prompt the author to correctly cover the Inf case.)

VM is advertised as working with existing Prometheus data and queries. That should apply beyond the superficial. There are Prometheus users employing NaN and Inf, and it will be a sore spot for their migrations to VM.

belm0 on 12 Sep 2020

valyala on 15 Sep 2020

👍2

It looks like the solution for proper storing of Inf values has been discovered and implemented in the commit 26115891dba851ff8c98a222ef528b56a116ab20 . Now Inf values shouldn't lead to precision loss for nearby samples in the same time series.

valyala on 18 Sep 2020

This is unfortunate trade-off between encoding efficiency and supporting corner case values.

It's more fair to say that some users may not currently use Inf or NaN. But for those who use it, I don't think it's considered a corner case (see examples I gave). Prometheus limits values to numbers only, so these values can be significant for conveying more information in a type-safe way that propagates across operators.

It looks like the solution for proper storing of Inf values has been discovered and implemented in the commit 2611589

I couldn't tell from the commit. Will it be possible to query on Inf, like count(foo == Inf)?

belm0 on 20 Sep 2020

I couldn't tell from the commit. Will it be possible to query on Inf, like count(foo == Inf)?

It seems to be working in the same way as in Prometheus. But, to be honest, I'm surprised with this, since, according to math rules, the result of inf == inf is undetermined.

valyala on 21 Sep 2020

I'm surprised with this, since, according to math rules, the result of inf == inf is undetermined

Are you confusing Inf and NaN? Inf is a valid value to use in equality and mathematical operations. NaN is not because, by definition, it's not a number.

$ python
>>> from math import inf, nan, exp
>>> inf == inf
True
>>> abs(-inf) == inf
True
>>> 1/inf
0.0
>>> exp(inf)
inf
>>> nan == nan
False
>>> exp(nan)
nan

belm0 on 22 Sep 2020

FYI, VictoriaMetrics should properly store inf values starting from v1.41.1. Unfortunately this wasn't mentioned in release notes (see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/789 ).

valyala on 29 Sep 2020

Are you confusing Inf and NaN? Inf is a valid value to use in equality and mathematical operations. NaN is not because, by definition, it's not a number.

yes, thanks for the pointer. I confused infinity with their applicability in limit calculations.

valyala on 29 Sep 2020

It looks like the solution for proper storing of Inf values has been discovered and implemented in the commit 2611589 . Now Inf values shouldn't lead to precision loss for nearby samples in the same time series.

The fix actually made things worse: NaN rather than truncation #805

Unit testing seems to be inadequate or at the wrong level?

belm0 on 3 Oct 2020