I would like to zoom in Grafana from several months timespans to seconds and back with data sampled several times per second.
Given a table with raw data I can create one or several rollup tables to speed things up, but the query should fetch data from the correct table based on the timespan.
Grafana provides variable $__interval_ms that we can use in the query:
...
FROM (
SELECT ...
FROM data_raw
WHERE $__interval_ms < use_rollup_threshold
UNION
SELECT ...
FROM data_rolledup
WHERE $__interval_ms >= use_rollup_threshold
)
This works, but grows unwieldy.
How about infastructure that would allow you to specify a rollup table family:
table_name| bucket_size
=======================
foo | NULL
foo_1min | 60000
foo_1hour | 3600000
foo_1day | 86400000
foo_1week | 604800000
and a way to target it in a query
...
FROM
rollup_table('foo', $__interval_ms)
and it would automatically select the appropriate table to query.
See also #350.
@tkurki I agree that this type of functionality would be particularly useful, and we hope to offer it sometime in the next several releases. Thanks!
Hi,
Any progress on this?
It is possible to use a Grafana query variable, refreshed On Time Range Change, that calls a PL/pgSQL function to automatically determine the best aggregate view to query. This seems like a fairly effective workaround for this issue (and is IMHO better than alternative suggested in both this blog post and this video using UNION ALL) but still has at least one drawback (increased latency) compared to how @tkurki framed the feature request.
Inspiration for this workaround was found in the comments on a Grafana issue relating to InfluxDB https://github.com/grafana/grafana/issues/4262
To avoid maintaining a mapping from bucket sizes to view names, this examples code encodes the aggregate view's bucket sizes in the name. All aggregate views are postfixed with an underscore, a quantity (a number) and a multiplier (the letters: s, m, h, d) representing the bucket width. A view with rides aggregated in 1 hour buckets (hourly) would be called _rides_1h_. A view with rides aggragated every 10 minutes would be called _rides_10m_. The postfix should basically agree with the bucket_width argument to the time_bucket function.
The naming scheme can be altered by modifying the regular expressions in the example code.
The function get_aggregate_view_for_interval_ms can be called from Grafana to determine the correct view name for a given interval (i.e. the interval defined in the Grafana variable $__interval_ms).
Example:
​ SELECT get_aggregate_view_for_interval_ms('rides', $__interval_ms);
Arguments:
name the base name of the aggregate viewsinterval_ms the desired bucket width in ms, e.g. Grafana's $__interval_ms variableCREATE OR REPLACE FUNCTION public.get_aggregate_view_for_interval_ms(name information_schema.sql_identifier, interval_ms bigint)
RETURNS information_schema.sql_identifier
LANGUAGE plpgsql
STABLE
AS $function$
DECLARE
result character varying := (SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
AND table_type = 'VIEW'
AND table_name ~ (name || '_\d+[smhd]')
AND interval_ms >= get_bucket_width_ms(table_name)
ORDER BY get_bucket_width_ms(table_name) DESC
LIMIT 1);
BEGIN
-- If a aggragate view was found that has an appropriate
-- bucket width for this interval, return it
IF result IS NOT NULL THEN
RETURN result;
END IF;
-- Otherwise return the view with the smallest bucket width
RETURN (SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
AND table_type = 'VIEW'
AND table_name ~ (name || '_\d+[smhd]')
ORDER BY get_bucket_width_ms(table_name)
LIMIT 1);
END; $function$;
get_bucket_width_ms is used by the above function to extract the bucket width (in milliseconds) from a view name.
CREATE OR REPLACE FUNCTION public.get_bucket_width_ms(name information_schema.sql_identifier)
RETURNS bigint
LANGUAGE plpgsql
IMMUTABLE
AS $function$
DECLARE
quantity integer := SUBSTRING(name FROM '(\d+)[smhd]$');
unit char := SUBSTRING(name FROM '\d+([smhd])$');
mult bigint := CASE
WHEN unit = 's' THEN 1
WHEN unit = 'm' THEN 1 * 60
WHEN unit = 'h' THEN 1 * 60 * 60
WHEN unit = 'd' THEN 1 * 60 * 60 * 24
END;
BEGIN
RETURN quantity * mult * 1000;
END; $function$;
Assuming you have the ride data from the tutorial, create the same views as in the blog post (but using the naming scheme described above):
CREATE VIEW rides_1d
WITH (timescaledb.continuous, timescaledb.refresh_interval = '1 day')
AS
SELECT time_bucket('1 day', pickup_datetime), COUNT(*) AS ride_count
FROM rides
GROUP BY 1;
CREATE VIEW rides_1h
WITH (timescaledb.continuous, timescaledb.refresh_interval = '1 hour')
AS
SELECT time_bucket('1 hour', pickup_datetime), COUNT(*) AS ride_count
FROM rides
GROUP BY 1;
CREATE VIEW rides_10m
WITH (timescaledb.continuous, timescaledb.refresh_interval = '10 minutes')
AS
SELECT time_bucket('10 minutes', pickup_datetime), COUNT(*) AS ride_count
FROM rides
GROUP BY 1;
The get_aggregate_view_for_interval_ms can now be used to select the view name most appropriate for a given interval. E.g.: with an interval of one minute, 59 minutes, one hour, two hours, and one day:
nyc_data=# SELECT get_aggregate_view_for_interval_ms('rides', 1 * 60 * 1000);
-[ RECORD 1 ]----------------------+----------
get_aggregate_view_for_interval_ms | rides_10m
nyc_data=# SELECT get_aggregate_view_for_interval_ms('rides', 59 * 60 * 1000);
-[ RECORD 1 ]----------------------+----------
get_aggregate_view_for_interval_ms | rides_10m
nyc_data=# SELECT get_aggregate_view_for_interval_ms('rides', 60 * 60 * 1000);
-[ RECORD 1 ]-----------------+---------
aggregate_view_for_resolution | rides_1h
nyc_data=# SELECT get_aggregate_view_for_interval_ms('rides', 2 * 60 * 60 * 1000);
-[ RECORD 1 ]-----------------+---------
aggregate_view_for_resolution | rides_1h
nyc_data=# SELECT get_aggregate_view_for_interval_ms('rides', 24 * 60 * 60 * 1000);
-[ RECORD 1 ]-----------------+---------
aggregate_view_for_resolution | rides_1d
To use the function to select the correct view in Grafana:
Create the above functions
Ensure that the continuous aggregates are named as above and that the user Grafana is using has at least the SELECT privilege on the continuous aggregates, e.g.:
GRANT SELECT ON rides_1d TO grafana;
GRANT SELECT ON rides_1h TO grafana;
GRANT SELECT ON rides_10m TO grafana;
(Views that the user does not have permissions to access will not be returned by get_aggregate_view_for_interval_ms)
Then, to make a template variable to use in queries:
rides_view_nameSELECT get_aggregate_view_for_interval_ms('rides', $__interval_ms);Finally, update the panel to use the new template variable:
Edit the panel
Use the template variable (e.g. $rides_view_name) in the From field in the query builder, or edit the SQL and use the template variable in the raw SQL, e.g.:
SELECT
time_bucket AS "time",
ride_count AS "ride_count"
FROM $rides_view_name
WHERE
$__timeFilter(time_bucket)
ORDER BY 1
Note: for the example 'rides' dataset there really isn't enough data for the rides_1d aggregate view to be very useful given the intervals that Grafana (by default) requests. The rides_1d is only used when zoomed out to a time range of around 5 years or more.
The major issue with this approach is that it introduces some latency for each template variable set up to refresh _On Time Range Change_ (as these are performed before any queries for actual data are issued). This problem will be compounded for dashboards that use many different continuous aggregates and therefore require many template variables that refresh _On Time Range Change_ (Grafana might parallelise these requests, but I haven't checked if this is the case).
When using the query builder, if you change the From field in an existing panel to use the template variable, the query builder resets (even though the value of the template variable is the same as the view that is being replaced).
The example code here assumes that continuous aggregates at all resolutions contain the full dataset. If data is dropped out of (for example) high resolution continuous aggregates then the get_aggregate_view_for_interval_ms would need to grow the ability to ignore views that don't contain data for the currently viewed time period.
Most helpful comment
TLDR
It is possible to use a Grafana query variable, refreshed On Time Range Change, that calls a PL/pgSQL function to automatically determine the best aggregate view to query. This seems like a fairly effective workaround for this issue (and is IMHO better than alternative suggested in both this blog post and this video using
UNION ALL) but still has at least one drawback (increased latency) compared to how @tkurki framed the feature request.Inspiration for this workaround was found in the comments on a Grafana issue relating to InfluxDB https://github.com/grafana/grafana/issues/4262
View and Table Names
To avoid maintaining a mapping from bucket sizes to view names, this examples code encodes the aggregate view's bucket sizes in the name. All aggregate views are postfixed with an underscore, a quantity (a number) and a multiplier (the letters: s, m, h, d) representing the bucket width. A view with rides aggregated in 1 hour buckets (hourly) would be called _rides_1h_. A view with rides aggragated every 10 minutes would be called _rides_10m_. The postfix should basically agree with the
bucket_widthargument to thetime_bucketfunction.The naming scheme can be altered by modifying the regular expressions in the example code.
PL/pgSQL functions
The function
get_aggregate_view_for_interval_mscan be called from Grafana to determine the correct view name for a given interval (i.e. the interval defined in the Grafana variable$__interval_ms).Example:
​
SELECT get_aggregate_view_for_interval_ms('rides', $__interval_ms);Arguments:
namethe base name of the aggregate viewsinterval_msthe desired bucket width in ms, e.g. Grafana's$__interval_msvariableget_bucket_width_msis used by the above function to extract the bucket width (in milliseconds) from a view name.Example
Assuming you have the ride data from the tutorial, create the same views as in the blog post (but using the naming scheme described above):
The
get_aggregate_view_for_interval_mscan now be used to select the view name most appropriate for a given interval. E.g.: with anintervalof one minute, 59 minutes, one hour, two hours, and one day:Grafana
To use the function to select the correct view in Grafana:
Create the above functions
Ensure that the continuous aggregates are named as above and that the user Grafana is using has at least the SELECT privilege on the continuous aggregates, e.g.:
(Views that the user does not have permissions to access will not be returned by
get_aggregate_view_for_interval_ms)Then, to make a template variable to use in queries:
rides_view_nameSELECT get_aggregate_view_for_interval_ms('rides', $__interval_ms);Finally, update the panel to use the new template variable:
Edit the panel
Use the template variable (e.g.
$rides_view_name) in the From field in the query builder, or edit the SQL and use the template variable in the raw SQL, e.g.:Note: for the example 'rides' dataset there really isn't enough data for the rides_1d aggregate view to be very useful given the intervals that Grafana (by default) requests. The rides_1d is only used when zoomed out to a time range of around 5 years or more.
Drawbacks
The major issue with this approach is that it introduces some latency for each template variable set up to refresh _On Time Range Change_ (as these are performed before any queries for actual data are issued). This problem will be compounded for dashboards that use many different continuous aggregates and therefore require many template variables that refresh _On Time Range Change_ (Grafana might parallelise these requests, but I haven't checked if this is the case).
When using the query builder, if you change the From field in an existing panel to use the template variable, the query builder resets (even though the value of the template variable is the same as the view that is being replaced).
The example code here assumes that continuous aggregates at all resolutions contain the full dataset. If data is dropped out of (for example) high resolution continuous aggregates then the
get_aggregate_view_for_interval_mswould need to grow the ability to ignore views that don't contain data for the currently viewed time period.