Does the gap filling functions have option to limit the size of gaps being filled by setting maximum length of gap in number of time steps?
@zinji it currently does not but that's an interesting idea, whats the use case you had in mind for this functionality?
@svenklemm this functionality is highly useful when computing stats on the timeseries data such as mean daily or monthly temperature; dality rainfall totals etc.
usecase: From hourly rainfall totals, i want to compute daily rainfall totals to 9:00 am using gap filling to fill any gaps less that 4hrs and any gaps greater than 4hrs are left null.
I concur, we have a very similar use case at hand that would benefit very much from this. We have sensor readings that usually come in every 15 minutes. So we are ok with interpolating 1-2 missing values, but don't want to go further than that. Anything more would be treated as a gap in the data, i.e. NULL
I think I managed to achieve this through the following query:
`CREATE TABLE timestamps (
ts timestamp,
value numeric
);
INSERT INTO timestamps VALUES
('2015-05-01 12:00:00.0',1),
('2015-05-01 12:15:00.0',2),
('2015-05-01 12:30:00.0',3),
('2015-05-01 12:45:00.0',4),
('2015-05-01 13:00:00.0',5),
('2015-05-01 13:15:00.0',6),
('2015-05-01 13:30:00.0',7),
('2015-05-01 13:45:00.0',8),
('2015-05-01 14:00:00.0',9),
('2015-05-01 14:15:00.0',10),
('2015-05-01 15:00:00.0',7),
('2015-05-01 15:15:00.0',8),
('2015-05-01 15:30:00.0',9),
('2015-05-01 15:45:00.0',10);
WITH interpolated AS (
WITH deltas AS (
SELECT ts,
value,
extract(epoch from ts::timestamp) - lag(extract(epoch from ts), -1) OVER (ORDER BY ts) delta
FROM raw_measurements
ORDER BY ts
)
SELECT time_bucket_gapfill(
'5 minutes', ts,
start => '2015-05-01 11:53:00',
finish => '2015-05-01 17:00:00') AS minutes_5,
interpolate(avg(value)) AS avg_val,
locf(last(delta,ts)) AS locf_delta,
last(delta,ts) AS deltaActual
FROM deltas
GROUP BY minutes_5
)
SELECT * from interpolated WHERE interpolated.locf_delta >= -%s OR NOT deltaActual IS NULL;`
Not the most elegant way and I'm sure theres lots of room for improvement, but the general idea is to use interpolate over the whole dataset and generate a new column with the time to the next measurement which can then be used to drop all interpolations that were over a gap more than x seconds (900 in this example). This will not fill the gap with NULL values, but you could probably achieve that by running another time_bucket_gapfill over the current result.
@zinji ok so you would want gapfill to still produce the row with a timestamp but the locf/interpolate column would have NULL values
@svenklemm Yes, the query output should output the applicable timesteps with null values depending on the time bucket used.
Any update on this? This would be a really useful feature to have.
I'm looking for this as well 馃ズ
Most helpful comment
Any update on this? This would be a really useful feature to have.