Influxdb: Add options to group by time to set where the time windows occur

Created on 3 Mar 2015 · 16Comments · Source: influxdata/influxdb

Currently when you do a query and GROUP BY time(1h) it will bucket everything modulo the interval, directly on the hour markers. Users should be able to specify that the buckets occur either at the current time, or some offset, or the time of the first or last data point in a query, like so:

SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, now())

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, last())

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h and time < now() - 1h
GROUP BY time(10m, now() - 1h)

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, first())

These options to the group by time will only be valid for live queries. If someone is creating a continuous query we'll have to validate that they haven't put in these extra options.

RFC arequeries

Source

pauldix

👍2

Most helpful comment

I have the same request. In my use case, I'm trying to aggregate by week using group by time(1w) but the intervals always start on Thursdays. Why thursdays? Ideally I want to measure weeks naturally from Sunday to Saturday (or in some cases I might want Monday to Sunday).

heythisisnate on 19 Nov 2015

👍4

All 16 comments

Wouldn't something similar also be a good feature for continuous queries? The use case I have in mind is aggregating data from different users and doing this starting from 7am (but their timezone!) every 24 hours.

gergia on 4 Mar 2015

@gergia that would be much trickier to implement. The design of continuous queries won't support more than maybe a few hundred. Ideally they are able to downsample many thousands of series. Not sure how that would be done if each user has a timezone. Maybe one for each timezone you have?

pauldix on 5 Mar 2015

This could be a decent intermediate solution for #388 as well, without having to deal with the mess of timezones (let the higher level application manage that). I am in the same boat as a number of the other users that I need to run hourly, daily, weekly, and monthly aggregates (differences and means) in the timezone that my server is located in. I think this would be a hugely beneficial addition.

jnutzmann on 10 Mar 2015

this feature is essential for streaming machine learning. Systems like in finance, i.e. hedge funds, need to be able to train their AI with every possible window of time in data history, and with any group size, not just at preconceived markers.

pablo-tech on 6 Nov 2015

heythisisnate on 19 Nov 2015

👍4

royalaid on 1 Feb 2016

@pauldix would it be good to just have the buckets start at the start time? It will end up making the output of queries using now() act more sane. Currently, I can't know how many results this query will return because of now().

SELECT mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)

This should always return 10 rows, but it will often return 11 because now() isn't exactly on a time interval. We would then also need an alternate syntax that allows truncating now() to the correct precision though in order to allow for the old behavior.

jsternberg on 6 Apr 2016

@jsternberg is this not already implemented? I like your idea in https://github.com/influxdata/influxdb/issues/1837#issuecomment-206525778 but I think we should close this issue and open a new one for controlling bucket numbers explicitly.

beckettsean on 27 May 2016

This is already implemented. It got lost in the shuffle of things.

jsternberg on 14 Jul 2016

I use the query "select ... from ... where ... group by time(30m) fill(0)" and the db returns nothing because there is no data . Can I get a series of '0' data when there is no data ?

Reventon1993 on 26 Aug 2016

@Reventon1993 currently that is not possible. Follow https://github.com/influxdata/influxdb/issues/6967 for progress on returning fill() even if there are no results.

beckettsean on 26 Aug 2016

when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here. I am using 1.0.2 version

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)
when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)

hariteritreee on 7 Nov 2016

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)

hariteritreee on 7 Nov 2016

@hariteritreee please use [email protected] for questions. GitHub issues are for bugs or feature requests. The answer to your question is to use the GROUP BY time offset: http://docs.influxdata.com/influxdb/v1.0/query_language/data_exploration/#advanced-group-by-time-syntax

beckettsean on 7 Nov 2016

@hariteritreee please use [email protected] for questions. GitHub issues are for bugs or feature requests. The answer to your question is to use the GROUP BY time offset: http://docs.influxdata.com/influxdb/v1.0/query_language/data_exploration/#advanced-group-by-time-syntax

this googlegroups.com account is not working beck. do you have any other email ?