Timescaledb: continuous aggregate failed to run

Created on 9 Jan 2020 · 8Comments · Source: timescale/timescaledb

Relevant system information:

OS: (Alpine 8.3.0) 8.3.0, 64-bit
PostgreSQL version: PostgreSQL 11.5
TimescaleDB version:1.5.1
Installation method: "using Docker", image: timescale/timescaledb:latest-pg11

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

CREATE TABLE conditions (
 time        TIMESTAMPTZ       NOT NULL,
 location    TEXT              NOT NULL,
 temperature DOUBLE PRECISION  NULL,
 humidity    DOUBLE PRECISION  NULL
);

SELECT create_hypertable('conditions', 'time', chunk_time_interval => INTERVAL '1 hour');

INSERT INTO conditions(time, location, temperature, humidity)
  VALUES (NOW(), 'office', 70.0, 50.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES (NOW(), 'office', 70.0, 50.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES (NOW(), 'office', 70.0, 50.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES (NOW(), 'office', 70.0, 50.0);

create view cagg_conditions_view with
(timescaledb.continuous, timescaledb.refresh_interval = '5m')
AS
select location,time_bucket('30m', time) as half_hour,avg(temperature) AS avg_temp,avg(humidity) AS avg_humi from conditions group by  location ,half_hour;

REFRESH MATERIALIZED VIEW cagg_conditions_view;

Expected behavior

cagg_conditions_view can have aggregated data

Actual behavior

my_cagg=# REFRESH MATERIALIZED VIEW cagg_conditions_view;
INFO:  new materialization range not found for public.conditions (time column time): not enough new data past completion threshold (1578542400000000)
INFO:  materializing continuous aggregate public.cagg_conditions_view: no new range to materialize
INFO:  materializing continuous aggregate public.cagg_conditions_view: no new range to materialize or invalidations found, exiting early
REFRESH MATERIALIZED VIEW
my_cagg=# select * from cagg_conditions_view ;
 location | half_hour | avg_temp | avg_humi
----------+-----------+----------+----------
(0 rows)

Screenshots

conttimescaledb_information.continuous_aggregate_stats
Additional context
Add any other context about the problem here.

question

Source

jianhaiqing

Most helpful comment

no matter what time is it currently, only the time of the latest record inserted, isn't it ?

Yes, it uses the maximum time value in the data. This might be changed in future, to use now instead.

k-rus on 9 Jan 2020

🎉1 👍1

All 8 comments

Thank you for the detailed description.
As I see you don't provide refresh_lag parameters, which is twice the bucket width by default, see the doc of creating the view. Thus the refresh doesn't find the data to materialise, since the inserted data is within the lag range. See the doc of refresh.
Can you insert data that they are older than the lag, or change the refresh_lag? (note that setting refresh_lag too small or negative might affect performance during high load.)
Let us know if you are able to get it to work.

k-rus on 9 Jan 2020

👍1

Good shot. I get it. (I get a timer to generate the data for 2 hours, then i figure out the usage of refresh_lag and bucket_width).
Can you explain why we need timescaledb.max_interval_per_job parameter ? when should i take care of this parameter?
And after refresh_interval, what timescaledb would do, as refresh_interval is smaller than the bucket_width, would it aggregate the refresh_interval data? where would it save the intermediate result ?

jianhaiqing on 9 Jan 2020

Can you explain why we need timescaledb.max_interval_per_job parameter ? when should i take care of this parameter?

This parameter is used to manage how much data will be materialised, since if materialisation goes for entire data set, it might take too much resources and affect production. So it is good to set this parameter to be not too big and not too small (if small, then it might not catch up with materialising data). The parameter might be better explained in this tutorial.
Let me know if you have more questions, or I can close the issue.

k-rus on 9 Jan 2020

👍1

One more question:

Why the data from from 15:30 ~ 16:49 didn't be aggregated.
select * from cagg_conditions_view order by half_hour desc limit 10;
raw data & aggregated view
continuous stat

what'more, i run cagg manually and i get the follow logs

my_cagg=# REFRESH MATERIALIZED VIEW cagg_conditions_view;
INFO:  new materialization range not found for public.conditions (time column time): not enough new data past completion threshold (1578556800000000) 【2020-01-09 16:00:00】
INFO:  materializing continuous aggregate public.cagg_conditions_view: no new range to materialize
INFO:  materializing continuous aggregate public.cagg_conditions_view: no new range to materialize or invalidations found, exiting early
REFRESH MATERIALIZED VIEW

jianhaiqing on 9 Jan 2020

I would pay attention that the data needs to lag behind by bucket_width + refresh_lag, which will be 1 hour in your case if I don't mistake.

k-rus on 9 Jan 2020

yes, bucket_width + refresh_lag=1h
data from from 15:30 ~ 16:49 didn't be aggregated. So non-aggregated data will more than the value(1hour), and no matter what time is it currently, only the time of the latest record inserted, isn't it ?

jianhaiqing on 9 Jan 2020