Hi,
Thanks for your cool project.
I need to do some grouping by in a query. In the docs it is suggested to use a view (https://docs.hasura.io/1.0/graphql/manual/queries/derived-data.html).
My problem is that I want to filter the rows before they are grouped (using a WHERE).
For example, imagine you have a table of appointments. I want to filter them based on their date (e.g. events after November) and group them by their category.
If I use a view, the grouping happens first and I cannot access the date column afterwards thus, I cannot filter.
For the moment, I solved this writing my own function but then I lose much of power and simplicity that comes with hasura (for example, having to implement the filters myself).
Did I miss something? I cannot stop thinking there must be a more elegant solution I missed.
There are many ways to achieve this via pure SQL views. Pls see GROUP BY HAVING or SUB QUERIES in any sql reference :)
Hi @tirumaraiselvan, thanks for the answer.
I know how HAVING or SUB QUERIES work but how do you suggest to use them?
I will expose my case in more detail.
Following the example I said before: imagine you have a table of appointments. I want to filter them based on their date (e.g. events after November) and group them by their category.
CREATE VIEW appointments_by_category AS
SELECT category, COUNT(*) AS count
FROM appointments
GROUP BY category;
Calling it from Hasura graphQL endpoint:
query MyQuery {
appointments_by_category {
category
count
}
}
But if I wanted to filter by date:
query MyQuery {
appointments_by_category(where: {date: {_gt: "2017-01-01"}}) {
category
count
}
}
This will not work since the field "date" is not available in the results.
I could create the view like this:
CREATE VIEW appointments_by_category AS
SELECT category, COUNT(*) AS count
FROM appointments
WHERE date > '2017-01-01'
GROUP BY category;
But I want the filter to be dynamic. Obviously, I don't want to always filter by the same date.
Will a view like the below work?
CREATE VIEW appointments_by_category_and_date AS
SELECT category, date, COUNT(*) AS count
FROM appointments
GROUP BY category, date;
Nope :(
The results are different. I can have lots of groups if I group by date. Even after filtering, they would need to be grouped again.
@hector Right. The only solution I can imagine with "no code" is:
CREATE VIEW appointments_by_category_and_date AS
SELECT category, date, COUNT(*) AS count
FROM appointments
GROUP BY category, date;
And then run an auto-generated aggregate query on it:
query {
appointments_by_category_and_date_aggregate(
where: {
category: {_eq: "ABC"},
date: {_gt: "2017-01-01"}
}
){
aggregate {
count
}
}
}
Mmm I see your point.
The thing is that when querying you want the count of every category group. With your solution there is only the count of a single group.
I guess there is no way to do it without functions.
any solution?
@joni7777 if you have the same problem that I do you will need to implement a SQL function that does the grouping for you. Check this docs: https://hasura.io/docs/1.0/graphql/manual/schema/custom-functions.html#custom-sql-functions
I can imagine Hasura might implement a solution for this but I understand it is not trivial. For starters, the type of data returned when grouping changes for every possible combination.
@hector Yea i saw that option, and i did try that, but when you want to group by data and summarise it, AND filter by timestamp the for specific sets of the data its not possible cuz the group by rows dont have the timestamp.
Do you have any idea?
For now im grouping by in JS instead 馃槀
Example:
user | views | timestamp
1 | 100 | "2020-03-01 ...."
1 | 200 | "2020-03-02 ...."
1 | 300 | "2020-03-03 ...."
2 | 100 | "2020-03-01 ...."
2 | 250 | "2020-03-02 ...."
3 | 300 | "2020-03-03 ...."
Results wanted for timestamp >= "2020-03-01 ...." AND <= "2020-03-02 ...."
1 | 300
2 | 350
You need to build a SQL function which accepts the parameters you need. In your case I think it would be the dates, right?
Something like this:
CREATE FUNCTION users_grouped_function(start_datetime datetime, end_datetime datetime)
RETURNS SETOF users_grouped AS $$
SELECT users.id, sum(users.views)
FROM users
WHERE start_datetime <= users.timestamp AND end_datetime >= users.timestamp
GROUP BY users.id;
$$ LANGUAGE sql STABLE;
I wrote this on the fly so there might be errors.
An important thing to note is that the return type has to be a table. This is one of the constraints of Hasura (https://hasura.io/docs/1.0/graphql/manual/schema/custom-functions.html#supported-sql-functions).
So you will need to create an empty table (I called it users_grouped in the example) with columns that have the types you want to return in your select.
Yea just saw it, used it now its a good enough solution but i really wish there was group by agg and join with remote schema 馃檶馃徎
Our app is heavy on the data visualization so grouping and whereing is super important. Javascript filtering is definitely not an option. We made a Postgres function that takes arguments to adjust the GROUP BY and WHERE and a matching Hasura-tracked table that has the same datatype columns as the output of the function. The extra special bit is that the where function argument is a JSON type that is interpreted as the same shap as the Hasura where in Graphql queries. This allows us to use all the same front-end toolings for data filtering without re-doing all the where generation logic.
With this setup then we can also still use all of Hasur's relationship logic. Does Hasura rock or what??
To find all the work log entries that employees did on a job for a particular customer grouped by whatever and easy to graph the results without JS madness we can do this:
query{
grouped_work_entry_aggregate(
args: {
where: {userId: {_in: ["id1", "id2"]}}
groupByDay: false
groupByUser: true
groupByCustomer: false
}
order_by: {day: desc}
){
aggregate{
count
}
nodes{
duration
avgDuration
user{displayName}
day
customer{
name
}
}
}
}
SQL function:
You can see here the burrowing into the JSON where argument to find values in an identical structure to Hasura where.
DROP FUNCTION IF EXISTS public.grouped_work_entry;
CREATE FUNCTION public.grouped_work_entry(
"where" json default null,
"groupByDay" boolean default false,
"groupByWeek" boolean default false,
"groupByMonth" boolean default false,
"groupByJob" boolean default false,
"groupByUser" boolean default false,
"groupByCustomer" boolean default false
)
-- Must return a Hasura-tracked table of matching shape/type
RETURNS SETOF work_entry_generic_group AS $$
SELECT
-- Optional Grouping setup
-- Sets a field to NULL if arg grouping flag was not set
CASE WHEN "groupByCustomer" THEN "job"."customerId" ELSE null END as "gCustomer",
CASE WHEN "groupByDay" THEN date_trunc('day', "work_entry"."start") ELSE null END as "gDay",
CASE WHEN "groupByWeek" THEN date_trunc('week', "work_entry"."start") ELSE null END as "gWeek",
CASE WHEN "groupByMonth" THEN date_trunc('month', "work_entry"."start") ELSE null END as "gMonth",
CASE WHEN "groupByUser" THEN "work_entry"."userId" ELSE null END as "gUser",
CASE WHEN "groupByJob" THEN "work_entry"."jobId" ELSE null END as "gJob",
-- Counts
COUNT(DISTINCT "work_entry"."userId"),
COUNT(DISTINCT "work_entry"."id"),
COUNT(DISTINCT "work_entry"."jobId"),
COUNT(DISTINCT "job"."customerId"),
-- Work entry duration stats
SUM("work_entry"."duration"),
AVG("work_entry"."duration"),
MIN("work_entry"."duration"),
MAX("work_entry"."duration"),
-- More work entry time stats
MIN("work_entry"."start"),
MAX("work_entry"."end"),
MIN("work_entry"."startTime"),
MAX("work_entry"."endTime")
FROM "work_entry"
INNER JOIN "job" ON "work_entry"."jobId" = "job"."id"
WHERE
-- Customer
("where"#>>'{job,customerId,_eq}' IS NULL OR ("job"."customerId" = "where"#>>'{job,customerId,_eq}'))
AND ("where"#>>'{job,customerId,_in}' IS NULL OR ((("where"#>'{job,customerId,_in}')::jsonb) ? "job"."customerId"))
-- User
AND ("where"#>>'{userId,_eq}' IS NULL OR ("work_entry"."userId" = "where"#>>'{userId,_eq}'))
AND ("where"#>>'{userId,_in}' IS NULL OR ((("where"#>'{userId,_in}')::jsonb) ? "work_entry"."userId"))
-- Job
AND ("where"#>>'{jobId,_eq}' IS NULL OR ("work_entry"."jobId" = "where"#>>'{jobId,_eq}'))
AND ("where"#>>'{jobId,_in}' IS NULL OR ((("where"#>'{jobId,_in}')::jsonb) ? "work_entry"."jobId"))
-- Times
AND ("where"#>>'{start,_lte}' IS NULL OR ("work_entry"."start" <= ("where"#>>'{start,_lte}')::timestamp with time zone))
AND ("where"#>>'{start,_gte}' IS NULL OR ("work_entry"."start" >= ("where"#>>'{start,_gte}')::timestamp with time zone))
GROUP BY
"gDay",
"gWeek",
"gMonth",
"gJob",
"gCustomer",
"gUser"
$$ LANGUAGE sql STABLE;
And the matching Hasura-tracked table:

@nolandg I like your solution and I made something very similar :)
Still, I believe this ability to have "group bys" in Hasura is very valuable and would be much better than the custom function solution.
@nolandg How "generic" do you think that custom function is?
I'm wondering whether it would be possible to write a JS/TS script that uses introspection to auto-generate at least mostly functional copies of these group by functions per-table. Seems like it could be neat and fairly useful.
You think it might map cleanly from table + relationship definition to SQL function templates?
@GavinRay97 The function I showed above is not generic at all but yes, if we had more tables we needed that grouping on, we would definitely make some sort of code generator to produce the SQL.
I think if you defined a limited amount of 1st-level fields/values/aggregations you wanted, it would not be hard to map them cleanly. Making it completely generic and covering all cases would produce a huge function of questionable performance and utility I fear. What might work is a describing the shape of aggregations and conditions you want and then making a parser to produce that function.
You would also need to handle the Hasura side and make a generator to create (and run?) the migrations required to create the Hasura-tracked table that mirrors the function--the function output table type--with all the relations you want on that. Again, unfeasible to make it completely generic and cover all cases especially when it comes to relationships. You'd need to describe the limits you want.
In a perfect world, Hasura would do all this for us :-)
I would also love to have this feature built in. Adding functions to the database is not ideal, but is a workaround we will use for now.
Just to give another use case which would greatly benefit from server side group by: charts. Charts usually require aggregations.
Taking https://github.com/hasura/graphql2chartjs (https://www.youtube.com/watch?v=153iv1-qFuc) as an example, a bar chart showing how many albums an artist has. A view is used to count the nb of albums per artist.
If go a bit beyond this simple example and we image a more complex page, we would like to filter artists by time/genre/etc.
I understand that arbitrary group by will make hasura query performance highly unpredictable, but for me the discussion should be limited to medium datasets, < 1mil rows. For these Postgres can do arbitrary filtering and grouping in 1-2 seconds. Current solutions:
For extremely large data sets even filtering needs to be done on indexed fields and responsive aggregate queries usually requires pre-aggregation, which actually reduces the problem to medium datasets.
Also subscriptions with grouping might be postponed IMHO, if problematic to implement in the first iteration.
Our app is heavy on the data visualization so
grouping andwhereing is super important. Javascript filtering is definitely not an option. We made a Postgres function that takes arguments to adjust theGROUP BYandWHEREand a matching Hasura-tracked table that has the same datatype columns as the output of the function. The extra special bit is that thewherefunction argument is a JSON type that is interpreted as the same shap as the Hasurawherein Graphql queries. This allows us to use all the same front-end toolings for data filtering without re-doing all thewheregeneration logic.With this setup then we can also still use all of Hasur's relationship logic. Does Hasura rock or what??
To find all the work log entries that employees did on a job for a particular customer grouped by whatever and easy to graph the results without JS madness we can do this:
query{ grouped_work_entry_aggregate( args: { where: {userId: {_in: ["id1", "id2"]}} groupByDay: false groupByUser: true groupByCustomer: false } order_by: {day: desc} ){ aggregate{ count } nodes{ duration avgDuration user{displayName} day customer{ name } } } }SQL function:
You can see here the burrowing into the JSONwhereargument to find values in an identical structure to Hasurawhere.DROP FUNCTION IF EXISTS public.grouped_work_entry; CREATE FUNCTION public.grouped_work_entry( "where" json default null, "groupByDay" boolean default false, "groupByWeek" boolean default false, "groupByMonth" boolean default false, "groupByJob" boolean default false, "groupByUser" boolean default false, "groupByCustomer" boolean default false ) -- Must return a Hasura-tracked table of matching shape/type RETURNS SETOF work_entry_generic_group AS $$ SELECT -- Optional Grouping setup -- Sets a field to NULL if arg grouping flag was not set CASE WHEN "groupByCustomer" THEN "job"."customerId" ELSE null END as "gCustomer", CASE WHEN "groupByDay" THEN date_trunc('day', "work_entry"."start") ELSE null END as "gDay", CASE WHEN "groupByWeek" THEN date_trunc('week', "work_entry"."start") ELSE null END as "gWeek", CASE WHEN "groupByMonth" THEN date_trunc('month', "work_entry"."start") ELSE null END as "gMonth", CASE WHEN "groupByUser" THEN "work_entry"."userId" ELSE null END as "gUser", CASE WHEN "groupByJob" THEN "work_entry"."jobId" ELSE null END as "gJob", -- Counts COUNT(DISTINCT "work_entry"."userId"), COUNT(DISTINCT "work_entry"."id"), COUNT(DISTINCT "work_entry"."jobId"), COUNT(DISTINCT "job"."customerId"), -- Work entry duration stats SUM("work_entry"."duration"), AVG("work_entry"."duration"), MIN("work_entry"."duration"), MAX("work_entry"."duration"), -- More work entry time stats MIN("work_entry"."start"), MAX("work_entry"."end"), MIN("work_entry"."startTime"), MAX("work_entry"."endTime") FROM "work_entry" INNER JOIN "job" ON "work_entry"."jobId" = "job"."id" WHERE -- Customer ("where"#>>'{job,customerId,_eq}' IS NULL OR ("job"."customerId" = "where"#>>'{job,customerId,_eq}')) AND ("where"#>>'{job,customerId,_in}' IS NULL OR ((("where"#>'{job,customerId,_in}')::jsonb) ? "job"."customerId")) -- User AND ("where"#>>'{userId,_eq}' IS NULL OR ("work_entry"."userId" = "where"#>>'{userId,_eq}')) AND ("where"#>>'{userId,_in}' IS NULL OR ((("where"#>'{userId,_in}')::jsonb) ? "work_entry"."userId")) -- Job AND ("where"#>>'{jobId,_eq}' IS NULL OR ("work_entry"."jobId" = "where"#>>'{jobId,_eq}')) AND ("where"#>>'{jobId,_in}' IS NULL OR ((("where"#>'{jobId,_in}')::jsonb) ? "work_entry"."jobId")) -- Times AND ("where"#>>'{start,_lte}' IS NULL OR ("work_entry"."start" <= ("where"#>>'{start,_lte}')::timestamp with time zone)) AND ("where"#>>'{start,_gte}' IS NULL OR ("work_entry"."start" >= ("where"#>>'{start,_gte}')::timestamp with time zone)) GROUP BY "gDay", "gWeek", "gMonth", "gJob", "gCustomer", "gUser" $$ LANGUAGE sql STABLE;And the matching Hasura-tracked table:
How does this work ? You didn't select the column id in the matched table in the your code example.
It would be helpful if you can post a gist for a complete code then ?
@revskill10 Not sure I understand what you mean by "select the column id". This thread is talking about grouping and the code I posted is grouping rows by certain fields. What id would you select? You would need to use an aggregation function on the id column of the source table, things like sum or avg which don't many any sense for id columns. The rows returned by databases from group_by and similar functions cannot be identified by the original row ids since they are grouped and aggregated.
If you need to uniquely identify the returned rows then you can generate an id for each row in the above SQL. If non-deterministic ids are ok for your application then you can simply use SQL UUID function. If you need deterministic ids for the grouped rows than you could create a hash of your determining columns right in the SQL above. We eventually required deterministic ids because we started using subscriptions on this grouped table. If subscriptions don't return deterministic ids then it wrecks havoc with your Apollo cache and UI is continually refreshing. This is what we are doing now for generating the id column:
sql
CONCAT(
min("work_entry"."id"::text), '_',
min("work_entry"."userId"), '_',
min("work_entry"."jobId"), '_',
min("work_entry"."taskId"), '_',
min("work_entry"."start"), '_',
CASE WHEN "groupByDay" THEN date_trunc('day', "work_entry"."start")::date ELSE null END, '_',
CASE WHEN "groupByWeek" THEN date_trunc('week', "work_entry"."start")::date ELSE null END, '_',
CASE WHEN "groupByMonth" THEN date_trunc('month', "work_entry"."start")::date ELSE null END
),
It hashes all the fields our application deems as determining identity of a unique grouped row.
Most helpful comment
Our app is heavy on the data visualization so
grouping andwhereing is super important. Javascript filtering is definitely not an option. We made a Postgres function that takes arguments to adjust theGROUP BYandWHEREand a matching Hasura-tracked table that has the same datatype columns as the output of the function. The extra special bit is that thewherefunction argument is a JSON type that is interpreted as the same shap as the Hasurawherein Graphql queries. This allows us to use all the same front-end toolings for data filtering without re-doing all thewheregeneration logic.With this setup then we can also still use all of Hasur's relationship logic. Does Hasura rock or what??
To find all the work log entries that employees did on a job for a particular customer grouped by whatever and easy to graph the results without JS madness we can do this:
SQL function:
You can see here the burrowing into the JSON
whereargument to find values in an identical structure to Hasurawhere.And the matching Hasura-tracked table:
