When using distinct and order_by fields listed in DISTINCT part
correctly show up in ORDER BY clause
Post |> distinct(:created_at) |> order_by(desc: :id)
# SELECT DISTINCT ON (p0."created_at") p0."id", p0."created_at"
# FROM "posts" AS p0
# ORDER BY p0."created_at", p0."id" DESC
However, when field is used both in distinct and order_by it gets added
twice to ORDER BY clause, which makes it impossible to alter the ordering
(in this example to change default ASC to DESC)
Post |> distinct(:created_at) |> order_by(desc: :created_at, desc: :id)
# SELECT DISTINCT ON (p0."created_at") p0."id", p0."created_at"
# FROM "posts" AS p0
# ORDER BY p0."created_at", p0."created_at" DESC, p0."id
The expected behaviour would be to overwrite order placed by distinct
with one set by order_by
# ...
# ORDER BY p0."created_at" DESC, p0."id
Adding order_by on default when using distinct is a very bad idea. There is no possible way to change the order to descending. Maybe add it ONLY when there is no other order by. I also see no point in adding it automatically since postgresql will provide a proper message for this error. Better be explicit and avoid any magic, especially when you can't overwrite it.
Pull request that changed it:
https://github.com/elixir-ecto/ecto/pull/504
Yes, I agree. We can change this for 2.1.
Alternatively we should support :asc and :desc inside distinct, since the fields given to distinct need to be the first in order_by anyway.
If you want to make it even smarter, I'd suggest something like this:
# no order
Post |> distinct(:id, :created_at)
# => DISTINCT ON (id, created_at) ...
# existing field, different sort
Post |> distinct(:id, :created_at) |> order(desc: :id)
# => DISTINCT ON (id, created_at) ... ORDER BY id DESC, created_at
# additional field
Post |> distinct(:id, :created_at) |> order(desc: :name)
# => DISTINCT ON (id, created_at) ... ORDER BY id, created_at, name DESC
# existing field, requires reordering
Post |> distinct(:id, :created_at) |> order(desc: :created_at)
# => DISTINCT ON (created_at, id) ... ORDER BY created_at DESC, id
Not sure though if the last one isn't too much magic.
The problem is that the order in order by matters (although not in distinct). So our options is to make it behave exactly as in order_by, keeping distinct with higher priority, or force users to do it manually.
@josevalim the fields given to distinct don't need to be in order by. It's the opposite, fields used in order by need to be in distinct.
This is a correct query, there is no need to order by ID:
SELECT DISTINCT ON (p.id, p.created_at) p.* FROM "posts" AS p ORDER BY p.created_at
also @teamon order doesn't matter in distinct so your last example doesn't need to switch created_at and id
btw. thanks for your great work on Elixir and the whole ecosystem, we are migrating whole company from Ruby/Rails and loving it! :)
Fields used in DISTINCT ON do need to be in ORDER BY, it's specified in the postgres docs:
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
Most helpful comment
Yes, I agree. We can change this for 2.1.