Ecto: Can't overwrite ORDER when using distinct

Created on 5 Sep 2016  路  7Comments  路  Source: elixir-ecto/ecto

Environment

  • Elixir version (elixir -v): Elixir 1.3.2
  • Database and version: PostgreSQL 9.5.4
  • Ecto version (mix deps): 2.0.3 and 042aa92c62883572917477b2acaad3297590775c (master)
  • Database adapter and version (mix deps): postgrex 0.11.2
  • Operating system: Mac OS X 10.11.5

    Current behavior

When using distinct and order_by fields listed in DISTINCT part
correctly show up in ORDER BY clause

Post |> distinct(:created_at) |> order_by(desc: :id)
# SELECT DISTINCT ON (p0."created_at") p0."id", p0."created_at" 
# FROM "posts" AS p0 
# ORDER BY p0."created_at", p0."id" DESC

However, when field is used both in distinct and order_by it gets added
twice to ORDER BY clause, which makes it impossible to alter the ordering
(in this example to change default ASC to DESC)

Post |> distinct(:created_at) |> order_by(desc: :created_at, desc: :id)
# SELECT DISTINCT ON (p0."created_at") p0."id", p0."created_at" 
# FROM "posts" AS p0 
# ORDER BY p0."created_at", p0."created_at" DESC, p0."id

Expected behavior

The expected behaviour would be to overwrite order placed by distinct
with one set by order_by

# ...
# ORDER BY p0."created_at" DESC, p0."id
Enhancement Intermediate

Most helpful comment

Yes, I agree. We can change this for 2.1.

All 7 comments

Adding order_by on default when using distinct is a very bad idea. There is no possible way to change the order to descending. Maybe add it ONLY when there is no other order by. I also see no point in adding it automatically since postgresql will provide a proper message for this error. Better be explicit and avoid any magic, especially when you can't overwrite it.

Pull request that changed it:
https://github.com/elixir-ecto/ecto/pull/504

Yes, I agree. We can change this for 2.1.

Alternatively we should support :asc and :desc inside distinct, since the fields given to distinct need to be the first in order_by anyway.

If you want to make it even smarter, I'd suggest something like this:

# no order
Post |> distinct(:id, :created_at)
# => DISTINCT ON (id, created_at) ...

# existing field, different sort
Post |> distinct(:id, :created_at) |> order(desc: :id) 
# => DISTINCT ON (id, created_at) ... ORDER BY id DESC, created_at

# additional field
Post |> distinct(:id, :created_at) |> order(desc: :name) 
# => DISTINCT ON (id, created_at) ... ORDER BY id, created_at, name DESC

# existing field, requires reordering
Post |> distinct(:id, :created_at) |> order(desc: :created_at) 
# => DISTINCT ON (created_at, id) ... ORDER BY created_at DESC, id

Not sure though if the last one isn't too much magic.

The problem is that the order in order by matters (although not in distinct). So our options is to make it behave exactly as in order_by, keeping distinct with higher priority, or force users to do it manually.

@josevalim the fields given to distinct don't need to be in order by. It's the opposite, fields used in order by need to be in distinct.

This is a correct query, there is no need to order by ID:
SELECT DISTINCT ON (p.id, p.created_at) p.* FROM "posts" AS p ORDER BY p.created_at

also @teamon order doesn't matter in distinct so your last example doesn't need to switch created_at and id

btw. thanks for your great work on Elixir and the whole ecosystem, we are migrating whole company from Ruby/Rails and loving it! :)

Fields used in DISTINCT ON do need to be in ORDER BY, it's specified in the postgres docs:

The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alaadahmed picture alaadahmed  路  4Comments

fuelen picture fuelen  路  3Comments

nathanjohnson320 picture nathanjohnson320  路  4Comments

atsheehan picture atsheehan  路  4Comments

brandonparsons picture brandonparsons  路  3Comments