Extending the discussion started in #2071 and re-starting the old discussion from #388, we should support time zones on a per-query basis.
A quick example might look like:
select mean(value) from cpu
where time >= today()
group by time(10m)
time_zone(PST)
Or you could also do time_zone(+8) or time_zone(-2).
Any other suggestions?
Doing a simple +/- is not sufficient for full timezone support because of daylight savings (PST vs PDT depending on the time of year).
For example, if you have a GROUP BY time(1d), the size of the day/bucket can vary depending on it's time of year. This makes it extremely messy to implement with the current method. That said, it is exactly what I need for my application.
Right now, I am planning on implementing an index on top of influx for this purpose, but would love to not have to do that.
IMO there should be both options.
Query-wide parameter covers most of the cases but function that just returns offset based on supplied timezone (which would be necessary for per-query operator anyway) is much more convenient than doing that client-side
Regarding "any other questions", can support be provided for falling edge v rising edge groupings?
time_zone as a "number" should be a four number string too (some XX30 tz)
Not sure if you're still debating format, but I'd recommend supporting the Olsen / tz database. See http://joda-time.sourceforge.net/timezones.html. It supports named timezones, as well as UTC offsets (which are confusingly backwards, ie. Etc/GMT+5 is actually GMT-5). This is the closest thing to a standard I've ever seen.
+1 for Olsen tz
... with timezone('america/phoenix')
why not write a timezone item in conf file?
+1
+1
another +1 for olsen db rather than literal offsets or any other named time zones
though I should also point out that timezones aren't the only problem. if I do a group by time(30m), I might want my groups to be between 15 and 45, so that the "middle" of the grouping is on the hour and half hour, when I expect my peak load due to user traffic patterns. and that has nothing to do with timezone - it's more "offset the grouping by a certain amount"
select mean(value) from cpu
where time >= today()
group by time(10m)
time_zone(+8)
It may be the best
I would use a configurable default timezone + query param to override it
I eagerly awaited this feature.
Without this feature,I can not use group by time(1d) .
Make timezone can be configurable(in influxdb.conf) maybe is easier.
Can add this feature in the 0.9.2 release version first?
+1
We are fetching sums and grouping by 1 day and can not get accurate stats because of this timezone issue.
+1 for setting default timezone in config file.
Currently, I'm using a quick workaround which consists in using a cron script or program to retrieve the average of the day (executed at 00:01 AM for example) of the desired series from InfluxDB, using the correct start and end timestamps in the WHERE clause of the query.
After that, insert the data in a new series with the timestamp of the beginning of the day and you'll be able to retrieve the correct data in your queries.
I really want to know which version will include this feature, is there a PR proposed already?
There is no PR for this yet, and the work is currently unscheduled.
+1
I'm surprised such a simple and basic feature is unscheduled :( So far grouping by 1d is useless for me :(
...or at least a timezone setting in a config file. In my case it's always the same timezone.
@hotsnow77 we will happily accept a PR for this feature, but I don't think you'll find it simple. It is definitely a highly desired addition but likely won't be tackled until clustering and other stability/availability concerns are considered mature.
This is pointless. If the devs are busy with other features that they consider more important why bother complaining? Either get together some golang developers who can send a patch or let them work on their priorities without pointless complaints that will only slow them down because they are nice and they'll find the time to answer politely.
We want this feature as bad as you so don't slow them down with useless comments.
PS: this is an open source project, "without warranty of any kind"
+1 for per-query timezone
Just bumped into this as well when trying to chart solar gains per day, they were all offsetted vs UTC instead of what I wanted.
Looking forward to get native support in influxdb for this. I fixed it in grafana for my use case :)
jnutzmann comments about timezones and day light savings are spot on.
Just configuring the service to aggregate under a specific timezone is weak. US & Canada fall into at least 6 different timezones, and if your app is global then obviously this just isn't sufficient if accuracy is important.
Also, a lot of people in this thread are referring to this affecting only group by day queries, but it also affects group by hour, as their are timezones like Newfoundland which are UTC−03:30.
My personal opinion is that support for this should only take the timezone names, so that devs can't screw up day light savings. Dates are more complicated then most realize.
+1!
my suggestion is using names (like in php). A lot of countries changes her time at summer. P.e in Europe are countries that in summer are +2 and winter are +1.
select mean(value) from cpu
where time >= today()
group by time(1d)
time_zone("Europe/Madrid")
The hard part here (implementation wise) will probably be that timezones change (or something, I'm not familiar with the process).
moment-timezone for example is updated periodically (https://github.com/moment/moment-timezone/tree/develop/data/meta)
There seem to be 6+ releases just for this year: https://www.iana.org/time-zones
@Zolmeister that's actually pretty easy. Most distributions keep the tzdata in a separate package that keeps up to date, so you just read in the files when you need them. Worst case you have to restart influx when they get updated. I'm not a Go developer, but https://codereview.appspot.com/153720043 seems to imply that Go as a language already pulls in the tzdata packages as they are updated.
Also, a lot of people in this thread are referring to this affecting only group by day queries, but it also > affects group by hour, as their are timezones like Newfoundland which are UTC−03:30.
And one other quirk is the fact, that a day does not always start at midnight. In many production facilities which run 24 hours a day, reporting is associated with "mill day" which changes when morning shift comes to work. In a paper mill where I work, that is 6 am, so ALL daily reporting is associated to this "mill day" from 6 am to 6 am.
Another bit of complexity tottote - and I agree, I've seen the requirement
for the same and would love support for this. However this risks becoming
so complex it doesn't get done, so I would like to point out too that I
think one big area of complexity which might be avoided is support for
timezones by name which have daylight savings. While I'm sure there are
plenty of people who could +1 that they live in such an area and "need that
feature", in reality, most organisations collecting any serious amount of
time series data over geographically diverse areas have found that support
for daylight savings time is unworkable. When clocks step forward you have
a gap, but when they revert, you have duplicate data. In reality, most
(look to big utility companies collecting metering data across timezones)
avoid this by using a time offset that _doesn't_ change over the year
(despite spanning 5 states and three timezones, the Australian East coast
utility grid uses GMT+10 across the board). If I was to make a suggestion,
it would be to shoot for support for fixed (non-DST) timezones first and
then see how many people are left clamouring to see their data with annual
gaps and overlaps (I suspect not that many).
On Tue, 22 Sep 2015 at 21:23 tottote [email protected] wrote:
Also, a lot of people in this thread are referring to this affecting only
group by day queries, but it also > affects group by hour, as their are
timezones like Newfoundland which are UTC−03:30.And one other quirk is the fact, that a day does not always start at
midnight. In many production facilities which run 24 hours a day, reporting
is associated with "mill day" which changes when morning shift comes to
work. In a paper mill where I work, that is 6 am, so ALL daily reporting is
associated to this "mill day" from 6 am to 6 am.—
Reply to this email directly or view it on GitHub
https://github.com/influxdb/influxdb/issues/2074#issuecomment-142224331.
If I was to make a suggestion,
it would be to shoot for support for fixed (non-DST) timezones first and
then see how many people are left clamouring to see their data with annual
gaps and overlaps (I suspect not that many).
I don't think the main problem with DST is those gaps and overlaps on the two days per year when DST starts or ends. The main problem is the fact that you have one hour time offset in your aggregated data for a whole DST period. But that's just my point of view, others may have other preferences.
DST is a problem, in every aspect of life. If I ever get elected to King Of The World, my first task will be to criminalize DST. :-) Until then, we just have to deal with it. The fixed offset for handling "mill day" is pretty easy compared to the complexity of handling the DST.
Fortunately, most programming environments (even the standard C library) have working API for converting between local time and UTC, which can even handle the DST properly.
+1
Fortunately, most programming environments (even the standard C library) have working API for converting between local time and UTC, which can even handle the DST properly.
This.
If someone can point to a pure Go library that solves all of this I'm sure we would take a look.
I'm no Go person, but https://golang.org/pkg/time/ references timezones.
continue saveing datetimes in the database in UTC by convention (this might seem unnecessary but is
important)
Then use these 4 lines to convert from UTC to any other timezone before doing the group_by
:+1:
This is really important. I thought I provide some sample usecases that are impossible to solve without timezone support.
Daily Counters
Lots of IoT systems have some sort of daily counters or summations. In this example a Solar PV system controller that sums up the daily production of electricity - reseting at local midnight.
Being in California UTC midnight becomes 16:00h. Because of this simple calculations like production by day/week/month are impossible as max(value) by 1d is wrong:

Total Counters
In many cases IoT sensor systems have total summation counters. In this example a smart grid meter. Daily/Weekly/Monthly consumption or production can be calculated, but will never matchup to numbers from other system dues to the same 16:00h day break.

Another example of daily counters
In this case output from a rainfall sensor that reports daily rainfalls. Resets at midnight local time. Makes it impossible to calculate the sum of rainfall for the past week/month/year as the 16:00h day break double counts rainfall across two days in influx

Agree that this is super important. The current workaround is to just work with hourly data and handle all the timezone stuff outside of influxDB. Correct?
@savraj correct, all timezone calculations currently need to be handled outside of InfluxDB.
+1
+1
+1!
Another clarification: you can write a time to influx db that has an offset (for example: '2016-04-09T16:34:25.663409-04:00' -- the -04:00 is currently US/Eastern time) but whenever you read from influx db, you get UTC back. It's almost better that it is consistent like this -- as timezones get complex quickly.
@savraj You can write a time like '2016-04-09T16:34:25.663409-04:00', it's a past time. but you can not write a time like '2016-04-09T16:34:25.663409 +08:00' (if current time is 2016-04-09T16:34:25.663409) ,because it is a future time.
@thetcc actually it's the opposite.
Time(+0000) = 2016-04-09T16:34:25.663409
Time(+0400) = 2016-04-09T20:34:25.663409
Time(-0400) = 2016-04-09T12:34:25.663409
So if it's the same clock time in different zones:
Time(-0400) = 2016-04-09T16:34:25.663409
Time(+0000) =2016-04-09T20:34:25.663409
Means that if the clock time is the same as current time, but you set the timezone to being negative offset to UTC, then you're working in future time.
Likewise:
Time(+0400) = 2016-04-09T16:34:25.663409
Time(+0000) =2016-04-09T12:34:25.663409
A positive offset to UTC will be in the past (if the clock time is the same).
in 0.12,i can not use this function.
can i modify the source code, to change timezone?
@thetcc There's nothing in InfluxDB that prevents timestamps in the future, that's fully supported. They aren't returned by default when querying, however. The default upper bound on time for queries is now(), so if you write points in the future you must explicitly set the upper bound to a time further in the future. http://docs.influxdata.com/influxdb/v0.12/troubleshooting/frequently_encountered_issues/#querying-after-now
If anyone's interested here's what I do in Go, to save the time in whatever timezone you want. Probably better ways to do this, so open to ideas for improvement.
func timeInLocation(t time.Time, loc string) (time.Time, int, error) {
l, err := time.LoadLocation(loc)
if err != nil {
return time.Time{}, 0, err
}
_, offset := t.In(l).Zone()
return t.Add(time.Duration(offset) * time.Second), offset, nil
}
isn't the original idea if this topic (and it's ancestor ticket) for the user to control what keywords such as GROUP BY DAY mean (as in group data in such a way that DAY matches the timezone of the server etc in question)?
I am fine submitting data as utc and it being stored inside influxdb as utc.
best practice for saving times in a database is saving in UTC. Clients can later convert to local TZ. As consense writes the original topic adressed GROUP BY DAY queries. This should not be more difficult than something like
select mean(value) from cpu group by time(1d) time_zone(CET)
And in the backend:
t := [time read from DB, should be in UTC]
tz, _ := time.LoadLocation("CET") // get TZ identifier from select clause
t.In(tz) // convert time before group by
after this do the normal group by.
This does seem to add a constant offset to the comparison so it does not handle summer/winter time changes in the dataset, right?
Yeah, not sure you can close out a timezone requirement with a time offset solution. Sorry guys. Only partly resolved.
https://github.com/influxdata/influxdb/pull/6504#issuecomment-216334289
@giko45 if you can explain a bit what you mean, we could add something more. What kind of output are you expecting? Times all get represented as UTC so the main concern is having the client interpreting the result. Even when you cross timezone boundaries due to going from something like PDT to PST, you can just do this:
SELECT value FROM cpu WHERE time > '2016-11-05T00:00:00Z-07:00' AND time <= '2016-11-07T00:00:00Z-08:00'
These times will work properly and only select data within those time periods. It's just the output will always be in UTC. If you have an idea for something that can't be done with the currently existing tools, can you open an issue describing your use case?
SELECT SUM(value) FROM energy WHERE time>'2016-02-05T00:00:00Z-07:00' AND time<'2016-07-05T00:00:00Z-06:00' GROUP BY time(1d);
The offset bucket you added now allows me to account for the fact that I want my buckets at 06:00Z, but it does _not_ account for the fact that in the specified interval for this query, one of the days has a bucket that is 23 hours, and that the UTC time of the boundaries between buckets will change from 07:00Z to 06:00Z on that day. This is _not_ just a presentation problem, because the bucket size needs to dynamically adjust based on when the DST transition is. Also note that in the query as written right now, there's no way to know _when_ the DST transition is. The offsets -0700 and -0600 don't tell you that - only the actual timezone ('America/Denver') has enough information to know when DST transitions are.
Also note that while the offset solution is enough to help half-hour timezones (like you mentioned - India) get by (ignoring DST problems), if you provide a way for a timezone to be provided, you shouldn't also need to specify an offset, since you can infer the offsets from the time zone (i.e. current offset % 60 will tell you if it's a 30 minute, or possibly even 15/45, or :04 -- I'm pretty sure one of those zones exist). In fact, I would expect something like "SELECT MEAN(value) FROM series GROUP BY time(1h) AT TIME ZONE 'Asia/Kolkata'" to automatically infer an offset of 30m from UTC times for the grouping.
I've created #6541 and #6542 for tracking those two use cases. This implementation should work for the use case described in this ticket and will be a good jumping off point for implementing full timezone support.
Thank you for your feedback.
A syntax like #6541 or @ccutrer
SELECT mean(value) FROM cpu WHERE time > now() - 7d GROUP BY time(1d) TZ("America/New York")
would be my use case too.
However implementation requires
1: times in the database are in UTC
2: before the grouping: convert all "time"-s in to the required TZ
3: do the grouping (since "time" now is a local time grouping will handle DST, and yes this will imply one day per year with 23 h and one day with 25h but thats OK )
4: last step i am not totally sure: *_Should the result be converted back to UTC? *_ in case of continues queries: yes, you don't want to save local times in you DB. In case of client queries: probably yes too. only return UTC times to the client. this just might seem a bit strange since you will get daily aggregated values att an offset in UTC. However its the clients responsibility to present this correct.
+1
it would be nice if the time_zone option is available
@mattbasta assuming 8h could be augmented with minute based offset, there's still the issue of DST which can't be dealt with like this :(
@mattbasta all good and well to hold the opinion, however there are ample use cases and it's a fairly standard requirement for a database dealing with datetimes to be capable of dealing with the real world and its complexities. Imho this is a long outstanding issue that needs resolving by the Influx core team and I know I am not alone. Until that happens should we implement Influx to meet our tsdb needs we will be limited to aggregating at a half hour level in the database and then completing any aggregation in the application. This puts it at a disadvantage when comparing with other options.
If one saves data in the database in UTC then daily aggregates can simply be done by converting to the right TZ before aggregration, this will handel DST too.
Aggregation should be something like this:
select mean(value) from cpu group by time(1d,"CET")
And in the backend (i am not a go programmer):
t := [time read from DB, should be in UTC]
tz, _ := time.LoadLocation("CET") // get TZ identifier from SQL select clause
t.In(tz) // convert time before group by
... // after this do the normal GROUP BY.
Normal select queries will return data in UTC (as the data is saved in UTC) but Clients can convert to local TZ (which is good practice anyway since clients might be in diffrent TZ)
@jufemaiz Does any other tsdb handle that better ? Genuinely curious, haven't played with other ones much
@mattbasta PST and PDT are _not_ time zones. They are simply defined offsets (-8 and -7). Pacific Time is a time zone, and includes the information about when it changes offsets (DST), as well as nicknames for those offsets. And it's not hard for software to support it. There is a well maintained database of time zone information that software can use to know what the DST rules are for any given time zone in any given year (within practicality): https://en.wikipedia.org/wiki/Tz_database. Usually software will support the tzinfo data files, and rely on the operating system to keep them up to date.
I deleted my comment. I don't want to be involved in this discussion. Feel free to argue all you want.
Influx should store data with a UTC timestamp. Formatting timestamps should be a display detail.
The complete solution to timezone handling is to provide a timezone formatting mechanism. It should be possible to just use something like this with time: format(time,'Central Standard Time'). There can not be any solution based on using "server time" or "client time" etc. Queries simply need to be able to format time. Any query inferring time, such as group by and others, should be using the timezone of the locale that influx can detect at the server to shift time relative to UTC.
@greggwon you're correct except for the rather large use case of aggregation methods by calendar days (weeks, months, years etc), then InfluxDB needs to be able to manage timezone offsets. In this case it is not merely a display detail but a functional requirement :)
+1
+1
Aggregation becomes a "display detail", because you are now visualizing. The functional definition as shown in the sample query is not my point. My point is that there should be a common time reference that is automatically known for all data. Data should not float around in varied Timezones and then have to be coerced to another in random fashion. Instead, there should be a single base timezone reference and then "visualization" can specify what timezone is important.
In the sample query, the 'time' comparison to "today()" also implies a visualization (extraction of the data for use) and so today() should also be offset by a timezone offset. today() needs the parameter, not a new function which will randomly interject problems to all other 'time' values in an expression. Each use of a time function, should by default use the server timezone offset but allow a parameter which specifies the timezone offset applicable.
Yes, that becomes more places that the offset must be specified, but it allows the developer/application the explicit control needed to make sure that we don't suddenly have offset math strewn about the query with various +21600000, and -21600000 etc.
I will be locking this issue. Litigating the result of a closed issue that was opened two years ago and resolved six months ago is unfair to the people subscribed to the issue and I think it is unfair to forcibly involve others in the discussion.
With that said, I am happy to share the rationale for why the issue was resolved in the way it was resolved. But, the proper location for that question is on our community forums and not the issue tracker. @greggwon if you want to continue discussion, please create an account on that website and post a new topic. I will keep an eye out for the next day to see if there is any discussion about time zones there.
If you have issues with the way it was resolved that pertain to a valid use case of time zone math, please open an issue. I do not want to fail to address valid use cases since I want the database to be as useful to as many people as possible. If there are problems with the existing implementation, I want to address them so that everyone benefits from a better database.
One clarification that might be needed for most people (since I have seen this confusion before), the TZ parameter _does not_ affect any time values in the actual query. It affects time boundaries. The time 2000-01-01T00:00:00Z already has a time listed. It's UTC and the time zone parameter cannot and will not affect that.
Time zones are a disabled feature by default. If you don't include the TZ() parameter, no time zone math will be done. And the previous group by offset feature introduced in 1.1 or 1.2 is still there for other use cases.
I hope that clarification helps and look forward to your continued engagement with the InfluxData community.
Most helpful comment
A syntax like #6541 or @ccutrer
SELECT mean(value) FROM cpu WHERE time > now() - 7d GROUP BY time(1d) TZ("America/New York")
would be my use case too.
However implementation requires
1: times in the database are in UTC
2: before the grouping: convert all "time"-s in to the required TZ
3: do the grouping (since "time" now is a local time grouping will handle DST, and yes this will imply one day per year with 23 h and one day with 25h but thats OK )
4: last step i am not totally sure: *_Should the result be converted back to UTC? *_ in case of continues queries: yes, you don't want to save local times in you DB. In case of client queries: probably yes too. only return UTC times to the client. this just might seem a bit strange since you will get daily aggregated values att an offset in UTC. However its the clients responsibility to present this correct.