Timescaledb: gap filling: maximum length

Created on 19 Aug 2019 · 8Comments · Source: timescale/timescaledb

Does the gap filling functions have option to limit the size of gaps being filled by setting maximum length of gap in number of time steps?

Gap Fill community-request

Source

zinji

👍1

Most helpful comment

Any update on this? This would be a really useful feature to have.

moodgorning on 21 Mar 2020

👍2

All 8 comments

@zinji it currently does not but that's an interesting idea, whats the use case you had in mind for this functionality?

svenklemm on 21 Aug 2019

@svenklemm this functionality is highly useful when computing stats on the timeseries data such as mean daily or monthly temperature; dality rainfall totals etc.
usecase: From hourly rainfall totals, i want to compute daily rainfall totals to 9:00 am using gap filling to fill any gaps less that 4hrs and any gaps greater than 4hrs are left null.

zinji on 22 Aug 2019

I concur, we have a very similar use case at hand that would benefit very much from this. We have sensor readings that usually come in every 15 minutes. So we are ok with interpolating 1-2 missing values, but don't want to go further than that. Anything more would be treated as a gap in the data, i.e. NULL

moodgorning on 24 Aug 2019

I think I managed to achieve this through the following query:
`CREATE TABLE timestamps (
ts timestamp,
value numeric
);

INSERT INTO timestamps VALUES
('2015-05-01 12:00:00.0',1),
('2015-05-01 12:15:00.0',2),
('2015-05-01 12:30:00.0',3),
('2015-05-01 12:45:00.0',4),
('2015-05-01 13:00:00.0',5),
('2015-05-01 13:15:00.0',6),
('2015-05-01 13:30:00.0',7),
('2015-05-01 13:45:00.0',8),
('2015-05-01 14:00:00.0',9),
('2015-05-01 14:15:00.0',10),
('2015-05-01 15:00:00.0',7),
('2015-05-01 15:15:00.0',8),
('2015-05-01 15:30:00.0',9),
('2015-05-01 15:45:00.0',10);

WITH interpolated AS (
WITH deltas AS (
SELECT ts,
value,
extract(epoch from ts::timestamp) - lag(extract(epoch from ts), -1) OVER (ORDER BY ts) delta
FROM raw_measurements
ORDER BY ts
)
SELECT time_bucket_gapfill(
'5 minutes', ts,
start => '2015-05-01 11:53:00',
finish => '2015-05-01 17:00:00') AS minutes_5,
interpolate(avg(value)) AS avg_val,
locf(last(delta,ts)) AS locf_delta,
last(delta,ts) AS deltaActual
FROM deltas
GROUP BY minutes_5
)
SELECT * from interpolated WHERE interpolated.locf_delta >= -%s OR NOT deltaActual IS NULL;`
Not the most elegant way and I'm sure theres lots of room for improvement, but the general idea is to use interpolate over the whole dataset and generate a new column with the time to the next measurement which can then be used to drop all interpolations that were over a gap more than x seconds (900 in this example). This will not fill the gap with NULL values, but you could probably achieve that by running another time_bucket_gapfill over the current result.