Ecto: Group by fragment incorrectly raises grouping_error column must appear in the GROUP BY clause

Created on 10 Nov 2019 · 18Comments · Source: elixir-ecto/ecto

Environment

Elixir version (elixir -v): 1.9.2 (compiled with Erlang/OTP 22)
Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.): PostgreSQL 11.5_1
Ecto version (mix deps): 3.2.5
Database adapter and version (mix deps): postgrex 0.15.1
Operating system: MacOS Mojave

from m in Message,
  group_by: fragment("(CASE WHEN recipient_id = ? THEN sender_id ELSE recipient_id END)", ^account_id),
  select: {
    fragment("(CASE WHEN recipient_id = ? THEN sender_id ELSE recipient_id END)", ^account_id)
  }

This query when run through Repo.all returns the following error

[debug] QUERY ERROR source="messages" db=0.0ms queue=0.6ms
SELECT (CASE WHEN recipient_id = $1 THEN sender_id ELSE recipient_id END) FROM "messages" AS m0 GROUP BY (CASE WHEN recipient_id = $2 THEN sender_id ELSE recipient_id END) [123, 123]
** (Postgrex.Error) ERROR 42803 (grouping_error) column "m0.recipient_id" must appear in the GROUP BY clause or be used in an aggregate function

    query: SELECT (CASE WHEN recipient_id = $1 THEN sender_id ELSE recipient_id END) FROM "messages" AS m0 GROUP BY (CASE WHEN recipient_id = $2 THEN sender_id ELSE recipient_id END)
    (ecto_sql) lib/ecto/adapters/sql.ex:629: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql) lib/ecto/adapters/sql.ex:562: Ecto.Adapters.SQL.execute/5
    (ecto) lib/ecto/repo/queryable.ex:177: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3

But, if I run the generated query SELECT (CASE WHEN recipient_id = 123 THEN sender_id ELSE recipient_id END) FROM "messages" AS m0 GROUP BY (CASE WHEN recipient_id = 123 THEN sender_id ELSE recipient_id END) directly in Postgres I get results without any issues.

Interpolating fields provides same error.

Expected behavior

Return results from db

Source

ZombieHarvester

👍1

All 18 comments

Right. Postgrex doesn't emit the error when you have = 123 because the ID is hardcoded. I am not sure how to solve this elegantly. Ideas are welcome.

josevalim on 10 Nov 2019

True, if I hardcode the value in the Ecto query then it works

This one works
group_by: fragment("(CASE WHEN recipient_id = 123 THEN sender_id ELSE recipient_id END)")

This one raises
group_by: fragment("(CASE WHEN recipient_id = ? THEN sender_id ELSE recipient_id END)", ^account_id)

ZombieHarvester on 10 Nov 2019

Right. Postgrex doesn't emit the error when you have = 123 because the ID is hardcoded. I am not sure how to solve this elegantly. Ideas are welcome.

Sorry, but I don't get why you are talking about hard-coding. From what I can see a fragment call is simply incorrect.

[debug] QUERY ERROR source="messages" db=0.0ms queue=0.6ms
SELECT (CASE WHEN recipient_id = $1 THEN sender_id ELSE recipient_id END) FROM "messages" AS m0 GROUP BY (CASE WHEN recipient_id = $2 THEN sender_id ELSE recipient_id END) [123, 123]
** (Postgrex.Error) ERROR 42803 (grouping_error) column "m0.recipient_id" must appear in the GROUP BY clause or be used in an aggregate function

    query: SELECT (CASE WHEN recipient_id = $1 THEN sender_id ELSE recipient_id END) FROM "messages" AS m0 GROUP BY (CASE WHEN recipient_id = $2 THEN sender_id ELSE recipient_id END)
    (ecto_sql) lib/ecto/adapters/sql.ex:629: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql) lib/ecto/adapters/sql.ex:562: Ecto.Adapters.SQL.execute/5
    (ecto) lib/ecto/repo/queryable.ex:177: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3

@ZombieHarvester As error says m0.recipient_id is not in group by clause or in an aggregate function. If you read SQL code again you would see clearly that instead of m0.recipient_id there is only recipient_id and PostgreSQL fails, because recipient_id is not exactly as same as m0.recipient_id. In theory for one from clause (with also no joins) it's ok, but again in theory. However that's not a rule PostgreSQL is following, but edge-case of another rule which PostgreSQL is using …

Typically everything queryable (like: from or join clauses) if is named then it should be used with it's name. If you don't want to call m0.something then just do not name it m0. That's said ecto automatically generates names, so it's not possible to just not use naming. However Ecto.Query.Api supports the opposite way.

I do not have a local repo to check it fast, so please keep in mind that I'm writing it from memory and I could make a mistake. In short ecto instead of support not naming should allow to use names dynamically and that's what field/2 is supposed to do, right?

from m in Message,
  group_by: fragment("(CASE WHEN recipient_id = ? THEN sender_id ELSE recipient_id END)", ^account_id),
  select: {
    fragment("(CASE WHEN recipient_id = ? THEN sender_id ELSE recipient_id END)", ^account_id)
  }

Following what I said above this should be changed to:

from m in Message,
  group_by: fragment("(CASE WHEN ? = ? THEN ? ELSE ? END)", field(m, :recipient_id), ^account_id, field(m, :sender_id), field(m, :recipient_id)),
  select: {
    fragment("(CASE WHEN ? = ? THEN ? ELSE ? END)", field(m, :recipient_id), ^account_id, field(m, :sender_id), field(m, :recipient_id))
  }

Please correct me in case I got something wrong.

Hope that helped. Happy coding! :smile:

Eiji7 on 10 Nov 2019

@Eiji7 I just tried it, but it seems to be producing the same result

from m in Message,
  group_by: fragment("(CASE WHEN ? = ? THEN ? ELSE ? END)", field(m, :recipient_id), ^account_id, field(m, :sender_id), field(m, :recipient_id)),
  select: {
    fragment("(CASE WHEN ? = ? THEN ? ELSE ? END)", field(m, :recipient_id), ^account_id, field(m, :sender_id), field(m, :recipient_id))
  }

** (Postgrex.Error) ERROR 42803 (grouping_error) column "m0.recipient_id" must appear in the GROUP BY clause or be used in an aggregate function

    query: SELECT (CASE WHEN m0."recipient_id" = $1 THEN m0."sender_id" ELSE m0."recipient_id" END) FROM "messages" AS m0 GROUP BY (CASE WHEN m0."recipient_id" = $2 THEN m0."sender_id" ELSE m0."recipient_id" END)
    (ecto_sql) lib/ecto/adapters/sql.ex:629: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql) lib/ecto/adapters/sql.ex:562: Ecto.Adapters.SQL.execute/5
    (ecto) lib/ecto/repo/queryable.ex:177: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3

ZombieHarvester on 10 Nov 2019

@Eiji7
group_by: fragment("(CASE WHEN recipient_id = 123 THEN sender_id ELSE recipient_id END)")
When i hardcoded the variable value then the query worked as expected. That makes me think that the problem is not in the table alias

ZombieHarvester on 10 Nov 2019

@ZombieHarvester Don't worry as it (generated SQL code) looks much better (easier to maintain i.e. enhance in future) now. Anyway it would help (still work) if you would make a more complex queries using field/2 instead of raw names.

Also I can see something else, but before I would describe I want to give a simplest possible example in plain SQL:

# this select works:
select (case when value = 1 then 1 else 0 end) as result from unnest(array[1, 2, 3]) value group by (case when value = 1 then 1 else 0 end);
# this select does not work:
select (case when value = 1 then 1 else 0 end) as result from unnest(array[1, 2, 3]) value group by (case when value = 2 then 1 else 0 end);

What's interesting here is an error message:

ERROR: column "value.value" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select (case when value = 1 then 1 else 0 end) as result fro...

So this is not because of hard-coded values, but more that select does not (strict!) match group by.

In your query this is in your select:

(CASE WHEN m0."recipient_id" = $1 THEN m0."sender_id" ELSE m0."recipient_id" END)

and this is in your group by:

(CASE WHEN m0."recipient_id" = $2 THEN m0."sender_id" ELSE m0."recipient_id" END)

The only difference here is $1 vs $2, so in my theory this is a root cause of the problem and I would like to confirm it now.

A dollar sign ($) followed by digits is used to represent a positional parameter in the body of a function definition or a prepared statement. In other contexts the dollar sign may be part of an identifier or a dollar-quoted string constant.

I'm not good in database theory, so please correct me if I'm wrong. From what I can see our use case here is representation of a positional parameter for prepared statement, right? Looks like that's because PostgreSQL is failing.

The possible solution here is to wrap a case into a subquery which in SQL would look like:

select value.result from (select (case when inner_value = 1 then inner_value else inner_value end) as result from unnest(array[1, 2, 3]) inner_value) value group by value.result;

with this instead of 2 different parts we have only one. For sure that's not a solution for my example (different return in case clauses), because it would work only for exactly same result, but it's enough for use case specified in this issue and example before was to reproduce a general problem i.e. confirm that naming is not a root cause and simply value could break group by validation check.

In ecto it would work like:

query = from m in Message,
  select: %{result: fragment("(case when ? = ? then ? else ? end)", field(m, :recipient_id), ^account_id, field(m, :sender_id), field(m, :recipient_id))}

from(m in subquery(query), select: m.result, order_by: m.result)

Again, I wrote it from memory and simply could make a mistake or just a typo.

@ZombieHarvester Could you please confirm my theory by trying proposed subquery/1 call?

@josevalim If what I wrote would be correct then would comparing group_by to select be possible and simple at ecto/postgrex part? If that would simplify problem then we could raise an error here. What do you think about it?

Eiji7 on 11 Nov 2019

There is not a check we can do, as it would require parsing SQL.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

josevalim on 11 Nov 2019

There is not a check we can do, as it would require parsing SQL.

@josevalim Hmm … Is that really so? Looks like that ecto itself can't do anything, but what about doing that on adapter level?

Here are few fragments of Ecto.Adapters.Postgres.Connection code:

# …
select = select(query, select_distinct, sources)
# …
group_by = group_by(query, sources)
# …
defp select(%{select: %{fields: fields}} = query, select_distinct, sources) do
  ["SELECT", select_distinct, ?\s | select_fields(fields, sources, query)]
end
# …
defp group_by(%{group_bys: []}, _sources), do: []
defp group_by(%{group_bys: group_bys} = query, sources) do
  [" GROUP BY " |
   intersperse_map(group_bys, ", ", fn
     %QueryExpr{expr: expr} ->
       intersperse_map(expr, ", ", &expr(&1, sources, query))
   end)]
end
# …

Source: https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex

If I'm not wrong then if we could do something like:

# 1. return something like:
{full_select, part_select} = select(query, select_distinct, sources)
# where part_select is list of things to select without not needed parts like: " SELECT "
# 2. pass it to group_by:
group_by = group_by(query, sources, part_select)
# 3. add `not in` check on `expr` (as we would work on list of things to select)
[
  " GROUP BY " |
  intersperse_map(group_bys, ", ", fn %QueryExpr{expr: expr} ->
    result = intersperse_map(expr, ", ", &expr(&1, sources, query))
    if not result in part_select, do: raise("oops"), else: result
  end)
]

Of course I was quickly looking at code and I could get something wrong … What do you think about it?

Eiji7 on 11 Nov 2019

Since they allow fragments, how would you know that fragment("a + b") and fragment("a+b") are the same? You would have to parse it. Also, I don't think it is our job do repeat the validation that already happens in the database.

The error message from the DB is clear here, the question is how we are going to allow users to express the query above correctly within Ecto (if there is a way to do so). At least, group_by can always be done on Elixir land after you get DB results.

josevalim on 11 Nov 2019

Since they allow fragments, how would you know that fragment("a + b") and fragment("a+b") are the same? You would have to parse it.

@josevalim Ah right, I did not think about it. Honestly if there would be somewhere a good and fast beautifier library for formatting SQL code then it could be considered in "worst case" (assuming we want to cover 100% cases and don't leave this standalone).

Anyway we can match on exactly same expressions and emit only warning without raising any error. Of course that have it's downside which you found already i.e. without formatting SQL we would have a warning for a correct query. However well explained warning would be worth even in such case. Of course still not amazing idea, but it's something and does not require a lot of work.

Also, I don't think it is our job do repeat the validation that already happens in the database.

That's correct in typical case. However I believe that the highest priority is providing a good error or warning message. Current one is definitely not clear at least for me as I "wasted" some time to figure out why error is saying that x field does not appears in group by when it really was, but in another "form". That would be really helpful if we would have a warning which gives a hint that such query could fail describing why (something like why matching is bad inside if/else blocks).

The error message from the DB is clear here, (…)

Sorry, but I really don't get how:

** (Postgrex.Error) ERROR 42803 (grouping_error) column "m0.recipient_id" must appear in the GROUP BY clause or be used in an aggregate function

for such query:

from m in Message,
  group_by: fragment("(CASE WHEN ? = ? THEN ? ELSE ? END)", field(m, :recipient_id), ^account_id, field(m, :sender_id), field(m, :recipient_id)),
  select: {
    fragment("(CASE WHEN ? = ? THEN ? ELSE ? END)", field(m, :recipient_id), ^account_id, field(m, :sender_id), field(m, :recipient_id))
}

is clear in any way. Please notice that error is saying about field standalone which in fact appears inside group_by and then this is really confusing. In fact PostgreSQL complains more about $1 and $2 parameters rather than any field as raw value of those parameters works as expected, so the problem is really not about if x field is in group_by query part or not. After finding what happen I would remember that probably even for years, but I don't see a typical newbie would know what's happen or would follow this issue regardless how old it would be.

If it would complain about whole case part then it would be definitely more readable as developer would know that field matches, but something "else" still did not and therefore developer would look what did not matched. Again look at my first comment in which I recommended to use field/2. Only after that I was trying to look something around parameters or whole case statement.

the question is how we are going to allow users to express the query above correctly within Ecto (if there is a way to do so). At least, group_by can always be done on Elixir land after you get DB results.

In some cases hard, not worth or even impossible/have no sense. Please keep in mind a scenarios like streaming results from database. That's said I don't have any other idea that checking if every expression in group_by is equal to any expression in select and giving a warning.

Eiji7 on 11 Nov 2019

Honestly if there would be somewhere a good and fast beautifier library for formatting SQL code then it could be considered in "worst case" (assuming we want to cover 100% cases and don't leave this standalone).

Which is a huge effort in itself and it would definitely make things slower too.

That would be really helpful if we would have a warning which gives a hint that such query could fail describing why (something like why matching is bad inside if/else blocks).

IMO, that's the database job. There is no way we are absorbing this complexity.

Maybe I have stared at those messages too long, so they are clear to me, but if the error messages are not clear, then that's a database bug to be fixed, not ours to work around it.

josevalim on 11 Nov 2019

That's said I don't have any other idea that checking if every expression in group_by is equal to any expression in select and giving a warning.

Other ideas are to allow you to refer to selected fields on group_by or allow you to refer to previously bound parameters. So you don't have $1 and $2 but rather $1 on both. Solving these are likely simpler than any solution that would require us to do any SQL parsing/beautifying/etc, the difficulty is rather on design more than implementation.

josevalim on 11 Nov 2019

👍1

@josevalim That would be nice to pass only unique values and I think that I already have few cases in which I would use it already for myself!

However this does not solves all cases where a "simple and stupid" warning would cover. Of course it would complain also about SQL format, but it would not force to use "the only one right format", but rather force to only keep just one SQL format which is really simple to solve and does not complicates implementation on developer side.

Eiji7 on 11 Nov 2019

Another solution could be to generate the same param whenever we encounter the same variable. So even using something like fragment("? + ?", ^a, ^a) would generate $1 + $1.

michalmuskala on 2 Dec 2019

👍1

I have earlier pointed to that problem in on the mailing list:

https://groups.google.com/forum/#!topic/elixir-ecto/L1J9cWatvdI
https://groups.google.com/forum/#!topic/elixir-ecto/MLOdpAlPLVg

Both solutions (squashing parameters and "meta table") are something that would solve that problem. Especially that part with meta table would make this query much simpler to read:

from m in Message,
  group_by: self.ids,
  select: %{
    ids: fragment("(CASE WHEN ? = ? THEN ? ELSE ? END)", field(m, :recipient_id), ^account_id, field(m, :sender_id), field(m, :recipient_id))
}

However the problem will persist when if we would like to return raw column without naming it, but then maybe we could use self or self[1] directly for that? I do not know, but any API for referencing query-selected columns would be nice.

EDIT:

Additionally named fragment parameters would be also nice feature which would simplify such queries a little.

hauleth on 2 Dec 2019

Hey! just ran into this issue, and just wondering if there was a solution or workaround that anyone has come up with?

In my case the problem arises with:

      |> select([is: is], fragment("date_trunc(?, ?)", ^bucket, is.inserted_at))
      |> group_by([is: is], fragment("date_trunc(?, ?)", ^bucket, is.inserted_at))

If i remove the bucket and make it 'day' it seems to work

Harrisonl on 28 Aug 2020

@Harrisonl Hi! Maybe this could work?

|> select([is: is], fragment("date_trunc(?, ?)", ^bucket, is.inserted_at))
|> group_by([_], fragment("1")

syfgkjasdkn on 13 Oct 2020

@syfgkjasdkn it will work, however this is quite error prone, as in general case (select_merge) we cannot assume ordering of the parameters in SELECT. It can be hacked with:

|> select([is: is], fragment("date_trunc(?, ?) AS inserted_at", ^bucket, is.inserted_at))
|> group_by([_], fragment("inserted_at"))

But I think that API like:

|> select([is: is], fragment("date_trunc(?, ?)", ^bucket, is.inserted_at))
|> group_by([_], self) # or `this`, I used `self/0` as it is meaningless in the `Ecto.Query.API`

Would be much clearer.

hauleth on 14 Oct 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings