Influx: 1.4.2
OS: Ubuntu 16.04
Calculation 1: Lower time boundary in the gap, Upper time boundary covering at least 30m of the second burst
> select integral("value", 1h) from "manor/Active_Power" where time >= '2017-08-03T17:30:00Z' and time <= '2017-08-04T13:00:00Z' group by time(30m)
name: manor/Active_Power
time integral
---- --------
2017-08-04T12:00:00Z 2.05983977713282
2017-08-04T12:30:00Z 4.022977703630225
Calculation 2: The exact same query, but with the lower time boundary set to include datapoints from the first burst.
> select integral("value", 1h) from "manor/Active_Power" where time >= '2017-08-03T17:00:00Z' and time <= '2017-08-04T13:00:00Z' group by time(30m)
name: manor/Active_Power
time integral
---- --------
2017-08-03T17:00:00Z 12.733459071307028
2017-08-04T12:00:00Z 350.9098028283047
2017-08-04T12:30:00Z 4.022977703630225
__Expected behavior:__ [What you expected to happen]
In both calculations the integral for the '2017-08-04T12:00:00Z' timestamp should be 2.05983977713282.
__Actual behavior:__ [What actually happened]
In the second calculation, the integral for that timestamp is actually abnormally large: 350.9098028283047
__Data Examples:__
2017-08-03T17:00:00.031Z 21.137407302856445
2017-08-03T17:00:01.032Z 21.18560028076172
2017-08-03T17:00:02.029Z 21.14714813232422
2017-08-03T17:00:03.035Z 21.119062423706055
2017-08-03T17:00:04.037Z 21.091886520385742
2017-08-03T17:00:05.051Z 21.07956886291504
2017-08-03T17:00:06.051Z 21.055408477783203
2017-08-03T17:00:07.045Z 20.883914947509766
2017-08-03T17:00:08.054Z 20.61969757080078
2017-08-03T17:00:09.05Z 20.399765014648438
2017-08-03T17:00:10.058Z 20.319677352905273
2017-08-03T17:00:11.057Z 20.36859130859375
2017-08-03T17:00:12.052Z 20.513774871826172
2017-08-03T17:00:13.056Z 20.661264419555664
2017-08-03T17:00:14.055Z 20.60524559020996
2017-08-03T17:00:15.056Z 20.38081169128418
2017-08-03T17:00:16.071Z 19.88274383544922
2017-08-03T17:00:17.067Z 19.169958114624023
2017-08-03T17:00:18.065Z 18.343095779418945
2017-08-03T17:00:19.068Z 17.661298751831055
2017-08-03T17:00:20.068Z 17.340673446655273
2017-08-03T17:00:21.075Z 17.11452293395996
2017-08-03T17:00:22.08Z 16.758804321289062
2017-08-03T17:00:23.085Z 16.28508758544922
2017-08-03T17:00:24.081Z 15.913329124450684
2017-08-03T17:00:25.086Z 15.690470695495605
2017-08-03T17:00:26.091Z 15.656637191772461
2017-08-03T17:00:27.098Z 15.70372486114502
2017-08-03T17:00:28.097Z 15.655599594116211
2017-08-03T17:00:29.098Z 15.670706748962402
I can confirm this bug is still present, I'm calculating the power generated by a solar plant that obviously leaves a gap each night and the first calculation of every day fails. Changing the fill option doesn't affect the calculation.
Here's an image in grafana of the strange behaviour.
Influx: 1.6.1
OS: Ubuntu 16.04

same here, real issue for us as we normally get sensor blackouts (_machine maintenances, etc.._).
Anyone has a better solution then adhoc fixes? @perchten @artumino, because repairing fields based on threshold via kapacitor stream is ugly (_setting the threshold is a hack..._).
Thanks.
I'm afraid we're just ignoring those datapoints and pretending that time bucket is part of the outage... manually. Not a happy hack at all.
Mathematically the number actually makes sense, if the integral calculation for that bucket is based not on the actual time boundaries for the bucket, but the whole time since the last filled bucket (i.e. including the outage time).
I can't remember exactly how we confirmed this, and I'm just saying it from memory right now. But if anyone from Influx gets round to looking at this (please) then this might be a fruitful initial line of investigation.
There must be quite a lot of people affected by this without even realising...
There must be quite a lot of people affected by this without even realising...
Not sure so many use this... Integral is missing from kapacitor, and "common" issues like the above, or ORDER BY time DESC are broken.
Just reading through the source here, I couldn't catch the bug...
Can acknowledge, still there in 1.6.4 :( I have some missing data points due to a bug in my script, and the integral is wrong for the first hour (or whatever interval you use) after the data was written to influx again.
Correct (as it should be, if I move the time range)
SELECT INTEGRAL("p_ac", 1s)/1000/3600 FROM "pv" WHERE time >= '2018-10-19T11:00:00Z' AND time <= '2018-10-19T13:00:00Z' GROUP BY time(1h) fill(0)
2018-10-19T11:00:00Z 1.9801416266698937
2018-10-19T12:00:00Z 2.7342758853887283
Wrong:
SELECT INTEGRAL("p_ac", 1s)/1000/3600 FROM "pv" WHERE time >= '2018-10-18T19:00:00Z' AND time <= '2018-10-19T13:00:00Z' GROUP BY time(1h) fill(0)
2018-10-18T19:00:00Z 0.002189419077859756
2018-10-19T11:00:00Z 24.368232593120933
2018-10-19T12:00:00Z 2.7342758853887283

@rbetts @jsternberg Maybe you can have a quick look/test and approve PR, that'd be great!
@oori does my change make sense for you?
Query after fix:
SELECT INTEGRAL("p_ac", 1s)/1000/3600 FROM "pv" WHERE time >= '2018-10-18T19:00:00Z' AND time <= '2018-10-19T13:00:00Z' GROUP BY time(1h) fill(0)
1539889200000000000 0.002189419077859756
1539946800000000000 1.9801416266698937
1539950400000000000 1.9084904447847266
Note that the second value also changed because of missing data!
Still present in 1.7.5.
I have done some analysis on this, mainly in conjunction with CQ's as that's where you typically calculate energy consumption with an integral. My first attempts are here https://community.influxdata.com/t/application-note-using-integral-in-continuous-queries/9265/2
There is a structural problem with using integral on a GROUP BY basis. At every start of the query, the area from the start of the window until the first point is not taken into account. Not so much of a problem with normal queries, but if you insert this into a Simple Continuous Query, you're missing out data at the start of each interval. You can move to the advanced syntax and play with RESAMPLE and FOR, but it simply pushes the problem back in time.
Something similar happens in the case above (@perchten, @artumino), where there's no clean start of a new window in the case of missing data and a measurement from a window long ago is taken and it generates a large spike - which is clearly visible.
The missed output at the start of each query (in a CQ) however is less visible and I wouldn't be surprised if it went unnoticed for a lot of people.
To better cooperate with the CQ query mechanism, I would only generate a reduced output point if the algorithm is sure all required inputs are present. In the case of an integral in a window, it means that the values at the start and and the end of each window needs to be defined and thus that the query needs to include points before and after the window. In a CQ with advanced syntax, you're then able to enlarge the query size so the window of interest is satisfied and the integral can be calculated.
So I came up with this: https://github.com/Tiger-Power/influxdb/commit/4a92b7bcd9da10cc8a1ae0dbd093fee769e584d7
It's only for the Float implementation for now (why there's so much code duplication for each type is probably another discussion?).
Not sure if it's in line with the influx philosophy and it somehow breaks current implementations. But it only avoids outputting results which are wrong anyway.
It also keeps the current implementation for simple integral operations without the GROUP BY statement.
@roberthartung I don't think your solution is mathematically correct. Although it removes the spurious outcome reported in this bug, it also results in the area before the first point in each window being lost which can be significant if you're not drastically downsampling.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Why was this issue marked as wontfix - integral aggregation is quite important for certain types of time series data (e.g. to compute energy from power measurements, etc.)?
This issue should definitely be addressed.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.
I agree that it should be fixed. It is affecting all our calculations and fill does not seem to have any effect.
Still affecting us (OP) too, and we never managed to find a workaround.
I contend it isn't a feature request, it's a bug. And it's still live, whatever the bots might say about this thread's activity.
Most helpful comment
Why was this issue marked as wontfix - integral aggregation is quite important for certain types of time series data (e.g. to compute energy from power measurements, etc.)?
This issue should definitely be addressed.