Diesel: `GROUP BY` support

Created on 13 Feb 2016  路  16Comments  路  Source: diesel-rs/diesel

Please correct me if I'm misled, but it seems that Diesel currently doesn't support GROUP BY statements. #89 sounds like we can already call .group_by as of today, but there's actually no such method. Am I correct?

Another issue I encountered was that I can't write donate::table.select((donate::name, diesel::expression::sum(donate::amount))). It type-errors:

src/eventer.rs:189:31: 189:37 error: the trait `diesel::expression::SelectableExpression<eventer::donate::table, _>` is not implemented for the type `(eventer::donate::columns::name, diesel::expression::functions::aggregate_folding::Sum<eventer::donate::columns::amount>)` [E0277]

So I ended up working around the two issues above by using sql, such like:

donate::table.select((donate::name, sql::<Text>("sum(amount) AS sum")))
             .filter(sql("TRUE GROUP BY name"))
             .order(sql::<Text>("sum").desc())
             .load(&*db).unwrap();

One more thing: I had to use sql::<Text>("sum") because without ::<Text>, type inference fails. Though Text isn't exactly sane in this case, I'd say...

Any ideas? Thanks in advance for your help!

accepted enhancement

Most helpful comment

Does this mean it's not possible to group by a non-fkey column?
E.g. select version, count(*) from clients group by version order by version

All 16 comments

You are correct, we are missing support for group_by (and more generally, we are also missing the ability to select _multiple_ aggregate expressions even if it's valid. Ref #3). This is something I'd like to add support for in the future. We haven't gotten to it yet, as the features that we're targeting for 1.0 are features that are likely to require breaking changes to our existing API (0.6 will bring a reworking of associations, and is currently planned to be the last release before 1.0).

For the most part, your workaround appears to be the way to go about doing this in the short term. The sql function will _never_ infer the type, and will always need it to be specified. Text isn't correct here though, BigInteger is.

In the short term, I think it's fine for us to add a group_by function (not public API, as it will change when we properly support it) that takes a column. Because .filter(sql("TRUE GROUP BY ...")) is silly and you shouldn't have to do that.

@barosl I've pushed up 6df20de which adds a group_by method. This is not part of the public API, and has only the most basic of tests for it. This method will almost certainly have breaking changes when we properly support the group by clause publicly, but I wanted to make sure you had a reasonable workaround. With this, your workaround can become:

donate::table.select((donate::name, sql::<BigInteger>("sum(amount) AS sum")))
             .group_by(donate::name)
             .order(sql::<BigInteger>("sum").desc())
             .load(&*db).unwrap();

which is a little more sane. I'm not going to ship a new version for this change, as it doesn't affect the public API. That means you'll have to point at git for now, but next time I have a bug fix to use as an excuse I'll ship 0.5.2 which will include that change.

This "fix" does not close this issue, nor does it constitute proper group by support, it's simply a crappy workaround that is slightly less crapy than filter(sql("TRUE GROUP BY ..."))

Note for anybody reading this because they're looking for issues to work on:

Before we can add proper support for group by, we almost certainly need to resolve #3.

Ah, thanks for the explanation!

I think I can live with .filter(sql("TRUE GROUP BY ...")) for now, because although it's a bit silly as you said, there seems to be no other "better" workarounds out there. I was a bit worried if there were better alternatives, but given your explanation, I'm fine with using the best workaround. I can perfectly wait until you guys introduce a true group_by function.

The reason why I used .order(sql::<Text>("sum")) was that whatever I put there, it compiled and worked. So I put the one that looked the most "generic" (Text). But thinking again, it seems that the reason why it compiled was that .order() requires its parameter to be orderable, and Text happens to be SqlOrd. Is my assumption correct?

because although it's a bit silly as you said, there seems to be no other "better" workarounds out there

I just pushed a better workaround. :P Still not great, but less silly at least.

and Text happens to be SqlOrd. Is my assumption correct?

SqlOrd actually is only for functions like max. .order doesn't care about the type that you pass it. My comment about the type was for your select clause, not your order clause.

I just pushed a better workaround.

Oh, I thought the added group_by was for internal use because you said it was not part of the public API. By "public API" did you mean the method was #[doc(hidden)]?

My comment about the type was for your select clause, not your order clause.

Oops, it seems that the pasted code above is different from my production code. I actually did use BigInteger correctly in the select clause (it doesn't even compile if I use other types, actually), so what caused me confused is what to use in the order clause. I guess, even though .order doesn't care about types particularly, using BigInteger in there too is "correct". Am I right?

Oh, I thought the added group_by was for internal use because you said it was not part of the public API. By "public API" did you mean the method was #[doc(hidden)]?

Correct

I guess, even though .order doesn't care about types particularly, using BigInteger in there too is "correct"

Also correct

Awesome! Thanks for the explanation.

Does this mean it's not possible to group by a non-fkey column?
E.g. select version, count(*) from clients group by version order by version

Thanks, @barosl for the workaround. And just adding I can't wait for the group_by support! :tada:

How about the HAVING clause? It looks like it still requires the hacky filter(sql("TRUE GROUP BY ... HAVING ...")).

Things left to do after merging #2251:

Needed for 2.0:

  • [x] Figure out support for group by in BoxableExpression
  • [x] Figure out support for group by in BoxedSelectStatement
  • [x] Expose GroupByDsl via public API

After 2.0/open for contributions:

  • [x] Support for group by clauses with more than one column
  • [ ] HAVING clauses

So ... how will the transition to 2.0 play out?

@kivo360 From the perspective of the diesel team the transition is as simple as: We release 2.0 as soon as it is finished. There will be a changelog stating what has changed in an incompatible way compared to the last 1.x release and everything else is up to the user.

I see there's a grouped_by feature added now, but it seems like it loads all the data from the DB. I want to select one column from a parent table along with the number of records in a child table belonging to each record in the parent table. Is there a way to do just that, and not load the content of all the other columns of the child table (they're big)?

@yujiri8 Our issue tracker is not a place to ask questions. This is especially true for existing tracking issues for new features. Please use our forum or the gitter channel for such questions.

Was this page helpful?
0 / 5 - 0 ratings