Prisma1: Group By and Aggregated Values

Created on 21 Nov 2017 · 43Comments · Source: prisma/prisma1

70 Was a wide ranging discussion of how to support GroupBy and Aggregations in a type safe GraphQL API. This issue takes the learnings from previous discussions and provides a final API Proposal.

Throughout this proposal the examples will be based on this data schema:

type User {
  id: ID! @unique
  name: String!
  age: Int!
  salaryBracket: String!
  city: String!
}

Note: According to #353 we will introduce a new API version that combines the capabilities of the Simple and Relay API. The API is not final yet, but there will be a relay-style connection field for all relations, providing us a convenient place to introduce aggregation fields.

Retrieving all users who live in Aarhus:

{
  allUsersConnection(where: {city: "Aarhus"}) {
    edges {
      node { id, name }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection {
    edges: [
      { node: { id: "1", name: "Søren" } },
      { node: { id: "2", name: "Karl" } }
    ]
  }
}

Aggregations

Aggregate functions

avg
median
max
min
count
sum

API

Getting the average age of people living in Aarhus is accomplished like this in SQL:

SELECT AVG(age) FROM User WHERE city = 'Aarhus'

With Prisma it would look like this:

{
  allUsersConnection(where: {city: "Aarhus"}) {
    aggregate {
      avg {
        age
      }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    aggregate: {
      avg: {
        age: 33
      }
    }
  }
}

Limiting the scope of aggregations

The normal where, skip, first and orderBy arguments can be used to limit the scope of data included in the aggregations:

{
  allUsersConnection(where: {city: "Aarhus"}, first: 5, orderBy AGE_DESC) {
    aggregate {
      avg {
        age
      }
    }
  }
}

This will return the average age of the 5 oldest people in Aarhus

See example return value

Data:

[
  {id: "1", name: "Søren", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 99, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "4", name: "Johannes", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "5", name: "Mathias", age: 99, salaryBracket: "50-80", city: "Aarhus"},
  {id: "6", name: "Marcus", age: 5, salaryBracket: "0-5", city: "Aarhus"}
]

Return value:

{
  allUsersConnection: {
    aggregate: {
      avg: {
        age: 99
      }
    }
  }
}

Larger example

combining aggregations and data retrieval:

{
  allUsersConnection(where: {city: "Aarhus"}) {
    aggregate {
      avg {
        age
      }
      max {
        age
      }
    }
    edges {
      node { name, age }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection {
    aggregate: {
      avg: {
        age: 33
      }
      max: {
        age: 43
      }
    }
    edges: [
      { node: { name: "Søren", age: 23 } },
      { node: { name: "Tim", age: 43 } }
    ] 
  }
}

Group

In relational databases, GROUP BY is most often used together with aggregation functions like this SELECT city, AVG(age) FROM User GROUP BY city

Because GraphQL returns tree structured data, it is quite compelling to use groupBy without aggregation functions:

{
  allUsersConnection {
    groupBy {
      city {
        key
        connection {
          edges {
            node { id, name }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    groupBy: {
      city: [
        {
          key: "Aarhus"
          connection: {
            edges: [
              { node: { id: "1", name: "Søren" } },
              { node: { id: "2", name: "Tim" } }
            ]
          }
        },
        {
          key: "Magdeburg"
          connection: {
            edges: [
              { node: { id: "3", name: "Nilan" } }
            ]
          }
        }
      ]
    }    
  }
}

Or even in multiple levels:

{
  allUsersConnection {
    groupBy {
      city {
        key
        connection {
          groupBy {
            salaryBracket {
              key
              connection {
                edges {
                  node { id, name }
                }
              }
            }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"},
  {id: "4", name: "Dom", age: 99, salaryBracket: "50-80", city: "Aarhus"}
]

Return value:

{
  allUsersConnection: {
    groupBy: {
      city: [
        {
          key: "Aarhus"
          connection: {
            groupBy: {
              salaryBracket: [
                {
                  key: "0-5"
                  connection: {
                    edges: [
                      { node: { id: "1", name: "Søren" } }
                    ]
                  }
                },
                {
                  key: "50-80"
                  connection: {
                    edges: [
                      { node: { id: "2", name: "Tim" } },
                      { node: { id: "4", name: "Dom" } }
                    ]
                  }
                ]
              }
            }
          }
        },
        {
          key: "Magdeburg"
          connection: {
            groupBy: {
              salaryBracket: [
                {
                  key: "0-5"
                  connection: {
                    edges: [
                      { node: { id: "3", name: "Nilan" } }
                    ]
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Combining groupBy and aggregations

The following query will group by city, return first 5 Users, average age of first 5 users and average age of everyone in city

{
  allUsersConnection {
    groupBy {
      city {
        key
        firstTwo: connection(first: 2, orderBy: AGE_DESC) {
          edges {
            node { name }
          }
          aggregate {
            avg {
              age
            }
          }
        }
        allInCity: connection {
          aggregate {
            avg {
              age
            }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Emanuel", age: 11, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "3", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "4", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    groupBy {
      city: [
        {
          key: "Aarhus"
          firstTwo: {
            edges: [
              { node: { name: "Tim" } },
              { node: { name: "Søren" } }
            ]
            aggregate: {
              avg: {
                age: 33
              }
            }
          }
          allInCity: connection {
            aggregate: {
              avg: {
                age: 25.666
              }
            }
          }
        },
        {
          key: "Magdeburg"
          firstTwo: {
            edges: [
              { node: { name: "Nilan" } },
              { node: { name: "Søren" } }
            ]
            aggregate: {
              avg: {
                age: 99
              }
            }
          }
          allInCity: connection {
            aggregate: {
              avg: {
                age: 99
              }
            }
          }
        }
      ]
    }    
  }
}

Limitations

Both groupBy and aggregations are on single fields only. You can filter the data that goes into the aggregation, but there is no way to use expressions as keys in a group by query.

rf1-draft

Source

sorenbs

👍98 ❤37 🎉30

Most helpful comment

shameless bump: begging for this feature ;)

arnabkd on 22 Jun 2018

👍39

All 43 comments

Hello Soren,
currently contemplating over your proposal. Could you please add the underlying schema as well? It's probably trivial, but I would like to rule out mistakes on my end.

ejoebstl on 21 Nov 2017

👍2

For the multiple level group, can you please add example data (ungrouped as well as grouped)? I can't quite grasp the concept of multi-level groups.

ejoebstl on 21 Nov 2017

@ejoebstl I have added example responses to all queries. This should make the proposed dynamics very clear :-) Looking forward to your feedback.

The multi level groups are really very simple. By exploiting the fact that we have a wonderful tree structure to place data into. The more interesting question is wether this is useful or not.

sorenbs on 21 Nov 2017

👍1

It's an excellent idea to allow grouping without aggregation by exploiting the three structure. That's a main limitation of SQL.

The feature itself is very useful. Until now, when you wanted to group data, you needed to come up with either a relation or do it in your application. Grouping and aggregation is not only incredibly useful for building powerful frontends (think of a search feature for thousands of nodes, where you can filter by fields), but also decreases overhead in the backend by a lot. Even if I just want to gather some statistics about my data using the playground, this makes everything easier.

Some considerations:

Right now it's not possible to use a combination of multiple fields in a groupBy, correct?
Is it possible to use an aggregation inside a filter? Use case for your example: select all users with more than medium age.
I'd suggest to also add a count_distinct aggregation to count all distinct values of a field.
Will this proposal also work for the Simple API, or is the Simple API a thing of the past anyway?

I'm quite sure the proposal is a good way though. The few points above can most likely be added afterwards without any complication.

ejoebstl on 22 Nov 2017

Right now it's not possible to use a combination of multiple fields in a groupBy, correct?

Correct. It's also not possible to use an arbitrary expression. I think this ability might be worth giving up in trade for a simple type-safe API

Is it possible to use an aggregation inside a filter? Use case for your example: select all users with more than medium age.

See proposal #1279

I'd suggest to also add a count_distinct aggregation to count all distinct values of a field.

Great idea!

Will this proposal also work for the Simple API, or is the Simple API a thing of the past anyway?

In the future there will be only a single API flavour as described in #353

sorenbs on 22 Nov 2017

There is no example for a count aggregation, I'm guessing it looks like this:

{
  postsConnection {
    aggregate {
      count
    }
  }
}

Please confirm or correct!

nikolasburk on 11 Dec 2017

Is it possible to order by aggregated value? I try to do a something like:
Course
-- Episodes
---- Views
Views model

{
  date: DateTime! @unique 
  views: Int!
}

I want to query top Course order by daily / weekly / ... views. It will sum all episiodes views between 2 date and order by that sum.

kieusonlam on 16 Dec 2017

👍6

Why was this issue moved to the graphcool-framework repo?

I thought that Group By and Aggregated Values would be implemented in Prisma.

The Prisma documentation links to this issue

jvbianchi on 26 Jan 2018

@jvbianchi

As I know Graphcool Framework is a GraphQL backend solution. Still a lot of people using it like me.

Prisma is not a replacement. It is an open-source GraphQL query engine can connect to a lot of different database not just Graphcool Framework. It's a standalone version of Graphcool 1.0 and they will go a different way from now.

You can read it here: https://www.graph.cool/forum/t/graphcool-framework-and-prisma/2237

I'm still waiting for them to this features, because I think I'll stick with Graphcool Framework. :)

Everyone can correct me if I'm wrong.

kieusonlam on 26 Jan 2018

@kieusonlam Ok, but that doesn't explain why this feature will not be implemented in Prisma as well.

the count aggregate function has already been implemented, why not the others too?

jvbianchi on 26 Jan 2018

@jvbianchi It's already have this feature. You can check the example here: https://github.com/graphcool/graphql-server-example
topHomes query have numRatings which is defined in
https://github.com/graphcool/graphql-server-example/blob/master/src/resolvers/Home.ts

kieusonlam on 26 Jan 2018

@kieusonlam That is what I just said. count has been implemented.

But avg, median, max, min, sum and group by have not.

Do you have a example with any of this other aggregated functions?

jvbianchi on 26 Jan 2018

@jvbianchi Hmm, yup, that's my bad. It's still missing avg, median, max, min, sum. We may wait for graphcool team to have the right answer.

kieusonlam on 26 Jan 2018

😄3

My bad 🙂 I'll open this issue again, thanks for the heads up @jvbianchi.

@kieusonlam, the plan is to eventually bring back the evolution of the Prisma API into the Graphcool Framework.

marktani on 26 Jan 2018

👍4 😄1

Just to confirm we cannot currently access these query filter options or fields right. I can’t seem to find connections or use the where clause in the playground which means it is impossible to do this sort of complex query on counts of edges for example, right?

magus on 28 Jan 2018

Might be completely off topic here, but you might want to look at how OData has implemented aggregation as it's rather flexible and covers a lot of complex use cases. You can read the specification here.

I've also written a JSON query object to generate the OData query string as it can be rather difficult to built it out using just string building (especially with nesting). This might be useful as inspiration for how GraphQL might support this - https://github.com/techniq/odata-query#transforms

techniq on 28 Jan 2018

One of the more complex patterns is applying a filter before aggregation and another after. I have an example of this in my README (you can also look at the various tests of the project as well).

techniq on 28 Jan 2018

Is there any progress on this?
Could the ability to perform raw sql queries using prisma be added to overcome waiting for features like these be implemented? I think there's always gonna be an edge case where the CRUD api falls short, and it would be good to have a scape hatch for those cases, guaranteeing that the decision of using prisma scales to a complex project

marcosfede on 26 Feb 2018

👍14

Sure, that's a great point. You are connecting your database to Prisma, so you can also send raw queries there 🙂

marktani on 3 Mar 2018

Is this proposal on the roadmap?

danielrasmuson on 19 Mar 2018

@danielrasmuson we are currently putting together a public roadmap for the next 6-12 months. It is safe to say that this feature will be on the roadmap as it is very highly requested :-)

sorenbs on 20 Mar 2018

👍21

Looking forward to this as we're currently in need of this and have run into this limitation multiple times with both graphcool and prisma over the last few months. Let me know if there's anything I can help with this @sorenbs

Nedomas on 28 Mar 2018

👍8

Any idea where this is on the roadmap? Highly needed 👍

MJones180 on 31 May 2018

👍6

@marktani, where can I learn more about this statement?:

Sure, that's a great point. You are connecting your database to Prisma, so you can also send raw queries there 🙂

sakhmedbayev on 5 Jun 2018

👍2

shameless bump: begging for this feature ;)

arnabkd on 22 Jun 2018

👍39

Any update for this feature?

oae on 24 Jul 2018

👍10

Going to bump as well. Not having this feature == lots more work and poor client performance. :)

gentle-noah on 26 Jul 2018

👍9

Will it be possible to use aggregates in filter query?
For example, to get active users by number of commits they made:

query activeUsers {
  users(where: {
      commits: {
        date_gte: "THIS_MONTH_DATE",
        aggregate: {
          count_gte: 5
        }
      }
    }) {
     email
   }
}

kirgene on 26 Aug 2018

👍5

@sorenbs @schickling This feature is planned for Q3 in 2018. Only 1 month till the end of Q3. Any progress? Will aggregate functions be implemented at once or one by one? I really need avg for my project.

FluorescentHallucinogen on 1 Sep 2018

👍23

Q3 2018 is over. Any news?

FluorescentHallucinogen on 16 Oct 2018

👍28

I need max, there is any way I can get this functionality?

ellipticaldoor on 23 Oct 2018

@sorenbs Any news on this ? Can the Roadmap label be updated if it's planned for later ?

kevinmarrec on 30 Oct 2018

👍4

Q3 has been over for a while and still no response as to the current status of this. An update would be nice 👍

MJones180 on 3 Nov 2018

👍8

Also looking for an update on the status of this.

stephen-bunn on 6 Nov 2018

👍6

This continues to be an important feature for us. I'll update this issue when we have a concrete timeframe. See also this explanation for why we were unable to ship this feature in Q3 as planned.

@FluorescentHallucinogen - we will likely implement a large chunk of this feature in one go as each individual aggregation is comparatively little work.

sorenbs on 14 Nov 2018

👍11 ❤7 👀3

Any ETA on this? Very much needed 🙏🏼

joshhopkins on 25 Jan 2019

👍16

Waiting for this one to drop, there will be a big use in our project in production!

impowski on 5 Feb 2019

At least implement sum :)

cihadturhan on 8 Mar 2019

👍16 👀3

any eta?(

terion-name on 12 Mar 2019

👍10

Bump. :)

par6n on 20 May 2019

@sorenbs The Prisma 2 was released, my congrats 🎉
But what about this issue, do you have any rough estimations?

EddiG on 10 Jul 2019

👍9 👀1

Will it be possible to use aggregates in filter query?
For example, to get active users by number of commits they made:
query activeUsers {
  users(where: {
      commits: {
        date_gte: "THIS_MONTH_DATE",
        aggregate: {
          count_gte: 5
        }
      }
    }) {
     email
   }
}

I want to echo @kirgene question. I want to be able to do something similar but looks like there is no way to do this.