Currently when you do a query and GROUP BY time(1h) it will bucket everything modulo the interval, directly on the hour markers. Users should be able to specify that the buckets occur either at the current time, or some offset, or the time of the first or last data point in a query, like so:
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, now())
-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, last())
-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h and time < now() - 1h
GROUP BY time(10m, now() - 1h)
-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, first())
These options to the group by time will only be valid for live queries. If someone is creating a continuous query we'll have to validate that they haven't put in these extra options.
Wouldn't something similar also be a good feature for continuous queries? The use case I have in mind is aggregating data from different users and doing this starting from 7am (but their timezone!) every 24 hours.
@gergia that would be much trickier to implement. The design of continuous queries won't support more than maybe a few hundred. Ideally they are able to downsample many thousands of series. Not sure how that would be done if each user has a timezone. Maybe one for each timezone you have?
This could be a decent intermediate solution for #388 as well, without having to deal with the mess of timezones (let the higher level application manage that). I am in the same boat as a number of the other users that I need to run hourly, daily, weekly, and monthly aggregates (differences and means) in the timezone that my server is located in. I think this would be a hugely beneficial addition.
this feature is essential for streaming machine learning. Systems like in finance, i.e. hedge funds, need to be able to train their AI with every possible window of time in data history, and with any group size, not just at preconceived markers.
I have the same request. In my use case, I'm trying to aggregate by week using group by time(1w) but the intervals always start on Thursdays. Why thursdays? Ideally I want to measure weeks naturally from Sunday to Saturday (or in some cases I might want Monday to Sunday).
+1
@pauldix would it be good to just have the buckets start at the start time? It will end up making the output of queries using now() act more sane. Currently, I can't know how many results this query will return because of now().
SELECT mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)
This should always return 10 rows, but it will often return 11 because now() isn't exactly on a time interval. We would then also need an alternate syntax that allows truncating now() to the correct precision though in order to allow for the old behavior.
@jsternberg is this not already implemented? I like your idea in https://github.com/influxdata/influxdb/issues/1837#issuecomment-206525778 but I think we should close this issue and open a new one for controlling bucket numbers explicitly.
This is already implemented. It got lost in the shuffle of things.
I use the query "select ... from ... where ... group by time(30m) fill(0)" and the db returns nothing because there is no data . Can I get a series of '0' data when there is no data ?
@Reventon1993 currently that is not possible. Follow https://github.com/influxdata/influxdb/issues/6967 for progress on returning fill() even if there are no results.
when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here. I am using 1.0.2 version
SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)
when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.
SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)
when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.
SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)
@hariteritreee please use [email protected] for questions. GitHub issues are for bugs or feature requests. The answer to your question is to use the GROUP BY time offset: http://docs.influxdata.com/influxdb/v1.0/query_language/data_exploration/#advanced-group-by-time-syntax
@hariteritreee please use [email protected] for questions. GitHub issues are for bugs or feature requests. The answer to your question is to use the GROUP BY time offset: http://docs.influxdata.com/influxdb/v1.0/query_language/data_exploration/#advanced-group-by-time-syntax
this googlegroups.com account is not working beck. do you have any other email ?
is this no longer supported?
GROUP BY time(10m, last())
I get
time dimension offset must be duration or now()
Most helpful comment
I have the same request. In my use case, I'm trying to aggregate by week using
group by time(1w)but the intervals always start on Thursdays. Why thursdays? Ideally I want to measure weeks naturally from Sunday to Saturday (or in some cases I might want Monday to Sunday).