Thanos: query: Staleness problem

Created on 14 May 2020  ·  22Comments  ·  Source: thanos-io/thanos

Thanos, Prometheus and Golang version used:
thanos: v0.12.0

Object Storage Provider:
private CEPH (S3)

What happened:
See on end_input time and resolution:
Снимок экрана 2020-05-14 в 16 34 52
Снимок экрана 2020-05-14 в 16 35 02
Staleness functionality in prometheus library get rid of some points returned from thanos-stores.

What you expected to happen:
Return all data from store on any time_range

How to reproduce it (as minimally and precisely as possible):
see on screenshots.

Full logs to relevant components:

Anything else we need to know:
I think, that we have few ways to resolve problem:

  1. Update prometheus library and set LookbackDelta parameter > 5 min (need check)
  2. Update query and move/duplicate points to needed timestamp. (Interpolate data for PromQL)
  3. Update prometheus library to return all points from stores.

All 22 comments

What is the ceph version?

CEPH does not matter, because thanos-stores return all data to query, and only in prometheus library points marked as staleness.

Nice, thanks for this. Funnily enough we just talked about this exact problem with @juliusv (:

We need different lookbackDelta for different resolution I think, right? @juliusv

At a minimum, it would be good to add the --query.lookback-delta that we have in Prometheus to Thanos as well. However, since it's a global setting, it would apply to all time series, even the ones that are scraped a intervals <5m. Normally you wouldn't want to set this lookback delta higher than needed for everything, as that will result in old samples being returned for quite long (although explicit staleness markers already help with that).

I think that we need in dynamic lookback-delta inpdepended on resolution, forexample resolution/2.

Well. The main problem is that we can use different resolution in single PromQL eval (:

So it can be [1h of raw data, 2w of 1h resolution, and 5h of 5m resolution] combined.

So I think we might need to think of something in the PromQL itself. @brian-brazil do you know how hard would be that?

Also we can temporarily add lookback delta per query as well :thinking:

Varying resolution within one query is unlikely to wrok. What I'd do is present that to PromQL that looks real from the downsampled data - e.g. here you might provide interpolated samples every 1m.

It kinda depends on what the query is though.

@bwplotka @brian-brazil
Can we choose solution as soon as possible? I'm work on this problem now, and can to implement both solutions...

Can you elaborate more @brian-brazil ? So essentially you would actually for each downsampled data, actually expand it to have samples every 1m, fake interval? :thinking:

What would be the corner cases? Why it depends on query?

Alternatively we could have 3 PromQL engines in Querier and chose what to use based on the returned data. Then we can evaluate for the given periods and contact the results. However for large steps and intervals, it would be most likely bad....

So essentially you would actually for each downsampled data, actually expand it to have samples every 1m, fake interval

Yes, something like that.

Why it depends on query?

For e.g. sum_over_time you need different data than count_over_time to produce the desired result.

Looks like @IKSIN we could try that in querier.go

Ok! I try do it )

I am pretty sure we need special iterator for downsampled chunks.

For e.g. sum_over_time you need different data than count_over_time to produce the desired result.

This is already well handled.

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Closing for now as promised, let us know if you need this to be reopened! 🤗

we still work on this, let's reopen

BTW do you know we can now configure stalenees Lookback delta?

However we might want to adjust it for different resolutions indeed

@bwplotka As I remember staleness lookback delta is not something new. Or it was changed recently somehow?

We just allow users to configure it on Querier from flag that's it.

BTW do you know we can now configure stalenees Lookback delta?

However we might want to adjust it for different resolutions indeed

Well, here's my attempt at it: https://github.com/thanos-io/thanos/pull/3277

Was this page helpful?
0 / 5 - 0 ratings

Related issues

abursavich picture abursavich  ·  4Comments

barryib picture barryib  ·  4Comments

sepich picture sepich  ·  4Comments

hedeesaa picture hedeesaa  ·  3Comments

bwplotka picture bwplotka  ·  4Comments