Proposal: [Description of the feature]
When executing fill(previous) the query should always have a value for previous, even if there is no point with that field in the query time range.
Current behavior: [What currently happens]
> select * from fp
name: fp
--------
time value
2016-06-20T16:09:13Z 10
2016-06-20T16:19:13Z 100
> select max(value) from fp where time > now() - 20m group by time(5m) fill(previous)
name: fp
--------
time max
2016-06-20T16:05:00Z 10
2016-06-20T16:10:00Z 10
2016-06-20T16:15:00Z 100
2016-06-20T16:20:00Z 100
2016-06-20T16:25:00Z 100
> select max(value) from fp where time > now() - 18m group by time(5m) fill(previous)
name: fp
--------
time max
2016-06-20T16:10:00Z
2016-06-20T16:15:00Z 100
2016-06-20T16:20:00Z 100
2016-06-20T16:25:00Z 100
Note the null value for the 16:10-16:15 bucket, despite there being a point at 16:09 with a value.
Desired behavior: [What you would like to happen]
> select * from fp
name: fp
--------
time value
2016-06-20T16:09:13Z 10
2016-06-20T16:19:13Z 100
> select max(value) from fp where time > now() - 20m group by time(5m) fill(previous)
name: fp
--------
time max
2016-06-20T16:05:00Z 10
2016-06-20T16:10:00Z 10
2016-06-20T16:15:00Z 100
2016-06-20T16:20:00Z 100
2016-06-20T16:25:00Z 100
> select max(value) from fp where time > now() - 18m group by time(5m) fill(previous)
name: fp
--------
time max
2016-06-20T16:10:00Z 10
2016-06-20T16:15:00Z 100
2016-06-20T16:20:00Z 100
2016-06-20T16:25:00Z 100
Use case: [Why is this important (helps with prioritizing requests)]
Currently customers have to know when the last value was recorded in order to make sure that point is included in the time range. For irregular series that's a significant burden. If the system can always find the most recent value regardless of the lower time bound, then many state change queries become useful.
Are there any good work-arounds for this right now? I'm collecting sparse data and trying to graph it using fill(previous), which results in the first several values being null because the previous value falls outside the desired query time range.
The only thing I can think to do right now is to execute a second query to get last(field) with the time ending at the start time of the above query, then use that result to fill in the null values.
@jwheeler-gs that's the best workaround for now
I have a crude solution to this in the branch for #5943. The proper solution to this requires #5943 to be implemented, but the crude solution will be good enough for 1.0 if I can't get the proper solution working in time.
what happens if fill(previous) is used but there was no data in any interval? Presumably we could pull the tag set for the previous point and use that? Specifically wondering about https://github.com/influxdata/influxdb/issues/6967
I... have no idea. I will have to get back to you on that.
If there was no previous data, wouldn't it make sense to just return null until the first value is encountered? That would be the same as the current behavior but only in the case where there are no previous values.
@jwheeler-gs yes, that is the current behavior and would be for this too. The issue is what happens when there is no data in that interval at all. Right now, fill will not fill a series that doesn't exist in the interval. @beckettsean was asking what would happen if fill(previous) was used and there was no data in the interval, but there was data in the past at some point for that series.
Aaah, I get it now. That's actually going to be a possible case for my use as well. I'd expect (hope for) fill to return that previous value for the entire interval. But then again, I'd also expect it to not return any data past the current point in time.
I'm using fill to provide data directly to a chart which can be adjusted to show any time window, past, present, or even future (where future data is filled in using prediction data from another dataset). What I really want is to see the previous value up to some specified point (the present timestamp) and then no data past that. This is probably a bit much to ask of influx and is outside the scope of what it really needs to provide. That can still be handled easily enough on the receiving end by nulling out all values past the current timestamp.
Would it be possible to add fill(previous) for non-grouped queries? Something like
select GPS.latitude, GPS.longitude, Weather.temperature from car where time >=... and time <=... fill(previous)
Weather.temperature is written to the DB sparsely, and the reader must assume that as long as no new temperature is reported, the previous value holds.
I know I can do
select GPS.latitude, GPS.longitude, Weather.temperature from car where time >=... and time <=... group by time(1s) fill(previous)
but that slows down the request considerably, and I get loads of empty lines for times in the time range where there are no measurements.
@retorquere the issue with having fill(previous) with no GROUP BY time() clause is that the system doesn't know when to fill. Should it return one point per nanosecond? Per second? A group by interval is needed to create a regular time series, so that the "missing" points are clear.
@beckettsean I will defer to your superior knowledge on the matter of course, but conceptually, I'd figure it would return exactly the same points as with a regular non-grouped selected, just with the nulls filled in by the value in the column in one of the rows already selected.
@retorquere InfluxDB does not store nulls. There are no nulls returned in a non-grouped query.
If I submit this however:
curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=mydb" --data-urlencode "q=select GPS.latitude, Weather.temperature from car where time >= '2015-10-13T14:16:13Z' limit 10"
I get this (where I'd love for there to be a way to have those nulls be replaced by 5)
{
"results": [
{
"series": [
{
"name": "car",
"columns": [
"time",
"GPS.latitude",
"Weather.temperature"
],
"values": [
[
"2015-10-13T14:16:14Z",
51.9893696,
5
],
[
"2015-10-13T14:16:15Z",
51.9893696,
null
],
[
"2015-10-13T14:16:16Z",
51.9893696,
null
],
[
"2015-10-13T14:16:17Z",
51.9893696,
null
],
[
"2015-10-13T14:16:18Z",
51.9893696,
null
],
[
"2015-10-13T14:16:19Z",
51.9893696,
null
],
[
"2015-10-13T14:16:20Z",
51.9893696,
null
],
[
"2015-10-13T14:16:21Z",
51.9893696,
null
],
[
"2015-10-13T14:16:22Z",
51.9893696,
null
],
[
"2015-10-13T14:16:23Z",
51.9893696,
5
]
]
}
]
}
]
}
That's an interesting use case, where one field is more densely populated
than another. It might make sense to have fill(previous) in that case. Can
you open a feature request
https://github.com/influxdata/influxdb/issues/new describing that use
case?
On Thu, Aug 25, 2016 at 1:10 PM, retorquere [email protected]
wrote:
If I submit this however:
curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=mydb" --data-urlencode "q=select GPS.latitude, Weather.temperature from car where time >= '2015-10-13T14:16:13Z' limit 10"
I get this (where I'd love for there to be a way to have those nulls be
replaced by 5){
"results": [
{
"series": [
{
"name": "car",
"columns": [
"time",
"GPS.latitude",
"Weather.temperature"
],
"values": [
[
"2015-10-13T14:16:14Z",
51.9893696,
5
],
[
"2015-10-13T14:16:15Z",
51.9893696,
null
],
[
"2015-10-13T14:16:16Z",
51.9893696,
null
],
[
"2015-10-13T14:16:17Z",
51.9893696,
null
],
[
"2015-10-13T14:16:18Z",
51.9893696,
null
],
[
"2015-10-13T14:16:19Z",
51.9893696,
null
],
[
"2015-10-13T14:16:20Z",
51.9893696,
null
],
[
"2015-10-13T14:16:21Z",
51.9893696,
null
],
[
"2015-10-13T14:16:22Z",
51.9893696,
null
],
[
"2015-10-13T14:16:23Z",
51.9893696,
5
]
]
}
]
}
]
}—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/influxdata/influxdb/issues/6878#issuecomment-242504070,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGPAcegxWGsVX6UsH3lBkSNnty064F4Gks5qjei9gaJpZM4I54m9
.
Sean Beckett
Director of Support and Professional Services
InfluxDB
+1 for this feature.
+1 for this feature!
Yes please ! This could be release useful !
Any updates on this? or anyone know a workaround or something?
Currently needing this function
+1 for this feature. Matter of fact, I thought "fill(previous)" and "fill(linear)" would do the job already.
+1 this would be very convenient if you have sensors that report only on change
+1 very much needed
It doesn't make sense to returns nulls with fill(previous) until it finds data within the first groupby range.
Super useful for 'Change Of Value' systems. If the state rarely changes, and one looks at the query at any given time, odds are there is no datapoints within that query. Kinda cripples plotting this stuff.
open since 20 Jun 2016 for such a ridiculously basic feature when talking about time series...
Any Time series database should have this, really a basic concept
Really a core function for and time series data logging. Should be implemented as soon as possible.
+1
+1
This is really needed, it's a little bit ridiculously sad that there isn't even a proper workaround.
Is there a possibility to put a bounty on this?
I think there are 2 cases to consider:
1- Logging is done at regular time intervals and the query covers >1 of these intervals
2- Logging is done at irregular time intervals
In case#1 if data is missing then the datasource was unavailable. Query's for that time period should probably return NIL or something similar.
In case #2 if data is missing then you cannot know what is going on. Either the datasource was not available or there simply wasn't any data produced in the requested period of time.
I think that the system behaviour of case #2 should be well defined before solving this problem. How can you be sure the datasource was unavailable or not? Need a datasource health inducator (and link every series to a datasource)?
I am logging data "on change", so this is a BIG issue for me. I'm directly interacting with InfluxDB, so I have some room for workarounds.
Possible workarounds (I can think of):
1- Insert value into database just after starting point of query. Delete after query is done. (will this impact database performance in the long run because of repeated inserts/deletes?)
2- Shift start time of query backwards to a point where the data is available. Delete extra samples (new start upto start) from JSON result. (I personally make sure that every day at 0:00 ALL series are logged once)
3- Find and read the previous sample and use fill(
4- Like 3 but manipulate the JSON result manually to emulate the fill.
What do you use as a work-around??
When you are directly interacting with the database, maybe some of the above workarounds might work but when your experience with the database is through another layer (such as grafana), some of the workarounds that you mention make no sense at all.
@yellowpattern : I agree. I am directly interacting with the database. I will include that in my original post.
The idea behind my post is to find out what people are using as a workaround cuz I'm really struggeling with this "bug".
@Kortenbach we are using approach (4) which allows us to execute the last query (for finding the last value before the start of query) and the fill query at the same time.
@macrosak Thank you for you reply. Good to hear that people are actually implementing workarounds!
I'm currently working on approach (2). I have a checkbox that can switch the workaround on or off...
I hope someone comes up with a permanent solution soon, cuz this doesn't feel right.
I expected this to be the default behavior for fill(previous) and fill(linear). I'm querying a dozen fields where only changes are stored, the workarounds really aren't ideal, it would be great to see this feature implemented.
Where I get bitten by this bug most is when I use a query that has "difference()" or "derivative()" in the query.
For queries beyond a certain time span, rather than get a delta that is ~1k, I get a first delta that is ~14 million (or rather, the query uses "0" from outside the time window, does a difference between that and the first value in the time window, gets 14,000,000.)
I'm getting so pissed off with this that I'm thinking of dropping influxdb for something else.
In reference to an answer i posted on #6967 - I think the proposed improvement to the fill() function could solve this issue with an API like:
fill(<fill_option>[,<missing_option>])
Where <missing_option> is one of:
'all' - fills values for all tags that ever existed (as described by @AndreCAndersen in #6967)time.Duration - similar to 'all' but only includes tags that have been created within some time interval. So if you passed 24h here, it would only fill values for tags that existed within the last 24 hours.So with fill(previous, 'all') - it would grab the last value for all tags (even if they are outside the query window).
+1
+1
Hello,
i have the impression that this will not be fixed... -although it seems really basic.
But is there a workaround?
Greetings,
Hendrik
Has there been any update to this? I am trying to display data that only gets written when it changes. Some of the values can go 12 hours without a change, so I am trying to avoid forcing it to write every minute or something since that would increase the volume of data recorded by 700 times.
I had to make another query to get the last value.
On Tue, 9 Apr 2019, 10:19 pm mudcat, notifications@github.com wrote:
Has there been any update to this? I am trying to display data that only
gets written when it changes. Some of the values can go 12 hours without a
change, so I am trying to avoid forcing it to write every minute or
something since that would increase the volume of data recorded by 700
times.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/influxdata/influxdb/issues/6878#issuecomment-481227340,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AejRqazqw1RPyuINoiLEn7p3bw_Mzd6wks5vfIU1gaJpZM4I54m9
.
+1
What workarounds are folks using out of interest?
We created a wrapper for series class that is solving a couple of quirks in influxdb. It is fixing the problematic naming last_last_last_last_measument when doing continues query. And combining the results of original and last value query. This wrapper is taking last point if series is missing data. Extra queries have of course some performance implications. On top of that we are already having performance issues with single node installation (writing 1k metrics per second).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
+1
I still feel that this is a bug that should be fixed. Why the wontfix label?
wontfix is just automatically applied when stalebot wakes up. Although it seems correct, descriptively, that it is applied? The issue has stood open over 3 years with no activity on the influx side.
It kind of makes you wonder what sort of priority Influxdb actually give fixing significant bugs.
I would like to know when this will be fixed. Anyone from influx care to join?
Some communication either way from the team would be great. Maybe there's some technical reason that makes this impossible? @adamperlin - How did your investigations go? Any news for us? It seems as if there is continued interest in this feature.
Please, everybody, be kind to each other :-)
Thank you to everyone involved! I'm also interested in this feature. I believe the team knows by now that this is a wanted feature, there is no need to stress them about it.
If anybody wants to solve this, creating a pull request is probably the best way to make this go forward. Apart from this, stressing them more with messages like some of the one I've read will just make devs more angry to end users, and make them unhappy to open source their work. Criticizing people for not being fast enough is not being kind.
Have a good day, y'all.
I also needed this to work, but have since moved to another solution. I will continue to monitor this issue in the hope it is implemented at some point.
The interesting thing is that OPC HDA - an ancient, now-outdated historian product from several decades ago - recognised that this feature was required and that you need to interpolate data at the boundaries of your query.
With so many people asking for this fundamental feature, I'm a little surprised it hasn't been officially addressed. I know the solution isn't trivial, but it's something that is rather obviously needed.
Is it being implemented?
I'm afraid not...
We switched to Warp10, which does this out-of-the-box.
Sent from my Samsung Galaxy smartphone.
-------- Original message --------
From: Kortenbach notifications@github.com
Date: 25/12/19 8:03 am (GMT+10:00)
To: influxdata/influxdb influxdb@noreply.github.com
Cc: Steven Harding sharding@optrix.com.au, Comment comment@noreply.github.com
Subject: Re: [influxdata/influxdb] fill(previous) should find most recent value, even if outside query time range (#6878)
I'm afraid not...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/influxdata/influxdb/issues/6878?email_source=notifications&email_token=ACQQOABPOJ6LVST32ZKYAS3Q2J2INA5CNFSM4CHHRG62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHTTVRQ#issuecomment-568801990, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQQOAFJ5K3BF3JCWCJEBATQ2J2INANCNFSM4CHHRG6Q.
I wouldn't be opposed to moving from influxdb to Warp10. Is it supported by grafana or any of the other dashboard UIs? Is there a tool to migrate data from influxdb to Warp10?
We switched to Warp10, which does this out-of-the-box.
Thanks for the suggestion. This seems like a good alternative to InfluxDB.
A regular time-series database should support this feature out of the box. Any workaround just adds delay to the response which is not affordable.
Waiting for this too.... +1
+1
+1
+1.
Is there any workaround for this right now? I an trying to display sensor values in grafana, that only get updated whenever they change
Not that I've seen. I switched products - missing something this vital made me feel that InfluxDB isn't worth sticking with. Shame, since performance is quite nice. If they ever finally get it working, I'd love to return.
You can try switching to polled recording, or recording at fixed intervals but even then you'll get issues with requesting at times between points - and recording by exception (which you are) is FAR more efficient.
I'm currently using Warp10. Recording time is good, query time is great with small numbers of channels but falls off a bit with large numbers of channels queried at once.
The query language is interesting - extremely flexible and somewhat frustrating at the same time. You need to use a specific version of the FETCH Warpscript query to get the bounding values, but it works nicely.
NOTE: No matter which historian you decide on, if you are recording by exception and have discrete values (on/off, or off/starting/running/stopping, for example), you should record the old value a very small time step (ie. one second) before the new value. This will prevent Grafana from showing it as a smooth line and ensure it looks like a step.
Sent from my Samsung Galaxy smartphone.
-------- Original message --------
From: "a.santos" notifications@github.com
Date: 10/4/20 7:12 am (GMT+10:00)
To: influxdata/influxdb influxdb@noreply.github.com
Cc: Steven Harding sharding@optrix.com.au, Comment comment@noreply.github.com
Subject: Re: [influxdata/influxdb] fill(previous) should find most recent value, even if outside query time range (#6878)
+1.
Is there any workaround for this right now? I an trying to display sensor values in grafana, that only get updated whenever they change
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/influxdata/influxdb/issues/6878#issuecomment-611754732, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQQOAA3RXOTPZQK3YA7R4DRLY23HANCNFSM4CHHRG6Q.
I've been waiting for this as well, and now I'm fighting an issue where database compression on a RaspberryPi 3B+ consumes all system resources. I was already looking at other database options because of this issue, that was the last straw. I'm currently looking at the migration path to Timescale. Looks promising, they have a plugin for Telegraf, and a data migration script to get your data out of Influx and into Timescale.
Thank you guys, really helpful suggestions, I will look into TimescaleDB and Warp10.
+1
Ouch what a fail. Ive just selected InfluxDB only for that particular scenario and I thought its supported out of box based on that blog post https://www.influxdata.com/blog/working-with-irregular-time-series/
any news on this?
I would really like this as wel
This is such an essential part of creating queries and time overviews.
I would advice anyone waiting for this to choose another product. This issue has been open for over 4 years now and not a single reaction from the influxdb team. Not even a single word. It's not going to happen...
+1
Has there been any progress on this issue?
I have a scenario like this.
I want to average the points of both (or more) symbols each 20 seconds (or any other interval). Below are the raw points stored in a single measurement grouped by symbol. The second query is the one I came up with to calculate mean (sum() gives me same issue as well).
select last(bs) from table where symbol=<1> or symbol=<2> and time>=1592833320000ms and time<=1592833440000ms group by symbol, time(20s)
name: table
tags: symbol=<1>
time last
2020-06-22T13:42:00Z 84
2020-06-22T13:42:20Z 94
2020-06-22T13:42:40Z 95
2020-06-22T13:43:00Z 87
2020-06-22T13:43:20Z 76
2020-06-22T13:43:40Z 77
2020-06-22T13:44:00Z
name: table
tags: symbol=<2>
time last
2020-06-22T13:42:00Z 288
2020-06-22T13:42:20Z 286
2020-06-22T13:42:40Z 282
2020-06-22T13:43:00Z
2020-06-22T13:43:20Z 312
2020-06-22T13:43:40Z 330
2020-06-22T13:44:00Z
check out how 13:43:00Z has no value; fill(previous) works if this was a single query and will copy over 282 from above. But when I group by both symbols the value will just be 0 for this time stamp and the mean value will go way down at this time.
select mean(lbs) from (select last(bs) as lbs from table where symbol=<1> or symbol=<2> and time>=1592833320000ms and time<=1592833440000ms group by symbol, time(20s)) group by time(20s) limit 10
name: table
time mean
2020-06-22T13:42:00Z 186
2020-06-22T13:42:20Z 190
2020-06-22T13:42:40Z 188.5
2020-06-22T13:43:00Z 87
2020-06-22T13:43:20Z 194
2020-06-22T13:43:40Z 203.5
2020-06-22T13:44:00Z
Having a fill(previous) in the inner query does not work. Question is, how can I take the last value of the missing groups into the mean calculation.
I even tried using FLUX with the following query, but still see same issue.
from(bucket: "db/rp")
|> range($range)
|> filter(fn: (r) => r._measurement == "table" and (r._field == "") and r.symbol=~ /^[[Symbol]]$/)
|> aggregateWindow(every: 1m, createEmpty: true, fn:sum) |> fill(usePrevious: true)
|> group(columns: ["_time", "_field"]) |> sum() |> group(columns: ["_measurement", "_field"])
Let me know if there is any workaround for this issue. Thanks!
+1
I'd like this feature too.
+1 for this feature
+1 for this feature
+1
+1
This is really a must need feature if working with irregular time series !
+1 since aggregations on irregular time series for different tag-sets depend on the most recent values
@jwheeler-gs Sorry for ping after so much time, but any chance you can give us more details of your workaround of using two queries?
+1
+1
Most helpful comment
In reference to an answer i posted on #6967 - I think the proposed improvement to the
fill()function could solve this issue with an API like:fill(<fill_option>[,<missing_option>])Where
<missing_option>is one of:'all'- fills values for all tags that ever existed (as described by @AndreCAndersen in #6967)time.Duration- similar to'all'but only includes tags that have been created within some time interval. So if you passed24hhere, it would only fill values for tags that existed within the last 24 hours.So with
fill(previous, 'all')- it would grab the last value for all tags (even if they are outside the query window).