Influxdb: `GROUP BY time(x)` includes partial intervals

Created on 4 Apr 2017 · 11Comments · Source: influxdata/influxdb

Bug report

__System info:__ InfluxDB version 1.2.1

__Steps to reproduce:__

Insert data with an interval of 1m, but start 30s after a full minute
Example: Insert data at 00:00:30, 00:01:30, 00:02:30, …
SELECT mean(value) FROM data WHERE time <= now() GROUP BY time(1m)

__Expected behavior:__
I would expect InfluxDB to only group time ranges that fully fit the into the queried time range (in this case, time ranges that have already passed)

Example:

The last time covered by the query (now()) is 00:01:01
The last query result covers the range 00:00:00 - 00:01:00
_(I'm unsure if the time field of the result should be 00:00:00 or 00:01:00)_

__Actual behavior:__
InfluxDB groups time ranges extending the queried time, in this example meaning as soon as 00:01:00 passes, it will group 00:01:00 - 00:02:00. With data only being inserted at 00:01:30, the last result will be null (until 00:01:30 passes), although data is inserted with the same interval as used in the query.

__Additional info:__
Related issues: #3926 #4282 #4038 #8010

1.x

Source

jomo

👍15

Most helpful comment

The big problem for this is when you're doing a count or sum without an end time constraint (or just time <= now()). The last period will always be too low and will slowly rise to the expected value over the time interval, and then drop back to 0 right after the next round time period has passed. Is there an easy way to simply exclude the last group if it's not complete (maybe only when using an open ended query)?

onlynone on 1 Nov 2017

👍18

All 11 comments

Whenever you have a query with a time range that includes a partial interval, it will still include the partial interval, but won't include data that doesn't fit inside of the time constraints.

So in your example, if you query SELECT mean(value) FROM cpu WHERE time >= '2000-01-01T00:00:00Z' AND time < '2000-01-01:00:02:30Z' GROUP BY time(1m) you will get the 2:00 - 3:00 interval at the end, but it will only contain data between 2:00 and 2:29.999999999.

So I think this is expected behavior. Does that make sense?

jsternberg on 4 Apr 2017

In my opinion including a partial interval feels wrong. It becomes quite troublesome when combining this with fill(0), which is what I'm doing in Grafana (I want the absence of data for longer than the specified interval to be treated as 0).
The result is that the end of the graph drops to 0 and then suddenly jumps to its correct value when data is available.

I would only expect a null value when a complete interval passes with no data included.

Edit: made a gif to show what I mean
influxdb interval grafana

jomo on 4 Apr 2017

👍5

onlynone on 1 Nov 2017

👍18

So how to cut off the last incomplete interval? How you solved this @jomo ?

gelinger777 on 10 Jul 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 8 Oct 2019

I guess this is still an issue?

jomo on 8 Oct 2019

Yes, it is...

crab86 on 8 Oct 2019

oh please +1, just started using influx this week, and this has been driving me crazy...

craigyk on 11 Nov 2019

👍1

Having exactly the same behaviour... any update on this?

mmihalev on 21 Dec 2019

👍3

I had this problem, and was about to report when I found this gh issue. I'm not sure if it's a feature or a bug?
My workaround was to use calculated absolute timestamps in the SELECT query and 'snap' the absolute timestamp to the interval specified in GROUP BY.

Was a two week long headache to track down exactly why my data was being corrupted.

azidyn on 1 Mar 2020

This is an issue for pretty much every aggregated graph I create in Grafana. If you do sum or mean or some other aggregator and group by a time range then the last time range will always show an oddly low, high or incorrect value. I imagine every single user of influx/grafana encounters this exact issue. It would be great to have an "exclude incomplete group" or "exclude most recent group" option. The other possibility would be to have an option of group by NOT aligned to, say, hour. So if I group by hour and the current time is 3:15, then the group will be 2:15-3:15 instead of 2:00-3:00 and 3:00-4:00