Ecto: Ecto 3: union/union_all + order_by gives "(undefined_table)" Postgres error

Created on 18 Nov 2018  路  18Comments  路  Source: elixir-ecto/ecto

Environment

  • Elixir version (elixir -v): 1.7.3
  • Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.): psql (PostgreSQL) 10.6 (Ubuntu 10.6-0ubuntu0.18.04.1)
  • Ecto version (mix deps):
    {:ecto_sql, "~> 3.0.1"}
    {:ecto, "~> 3.0.2"},
  • Database adapter and version (mix deps):
    {:postgrex, ">= 0.0.0"}
  • Operating system:
    Ubuntu 18.04

Current behavior

I'm using Ecto 3's union_all in order to union and paginate multiple tables, sorted by timestamps, with a single query. It works quite well (thanks for this feature!) until I add the order_by clause:

def trades_query() do
    Trades.Trade
  end

  def orders_query() do
    Orders.Order
  end

  def big_select() do
    trades_select =
      trades_query()
      |> select([t], %{
        tag: "trade",
        timestamp: t.inserted_at,
        data: fragment("row_to_json(?)", t)
      })

    orders_select =
      orders_query()
      |> select([o], %{
        tag: "order",
        timestamp: o.inserted_at,
        data: fragment("row_to_json(?)", o)
      })

    trades_select
    |> union_all(^orders_select)
    # This breaks, as does order_by(:inserted_at)
    |> order_by(:timestamp)
    |> limit(50)
    |> offset(30)
  end

It gives the following error

** (Postgrex.Error) ERROR 42P01 (undefined_table) missing FROM-clause entry for table "t0"

    query: SELECT t0."inserted_at", row_to_json(t0) FROM "trades" AS t0 UNION ALL (SELECT o0."inserted_at", row_to_json(o0) FROM "orders" AS o0) ORDER BY t0."timestamp" LIMIT 50 OFFSET 30
    (ecto_sql) lib/ecto/adapters/sql.ex:590: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql) lib/ecto/adapters/sql.ex:523: Ecto.Adapters.SQL.execute/5
    (ecto) lib/ecto/repo/queryable.ex:147: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:18: Ecto.Repo.Queryable.all/3

I asked on the slack channel and @michalmuskala suggested me order_by(fragment("inserted_at")) as a workaround, and right now that's enough for my needs, but...

Expected behavior

I expected to be able to order by one of the selected fields. Since the union docs talk about order_by, I think this might be the expected behavior.

Chore Advanced

Most helpful comment

@eumend Would turning the union query into a subquery work for you? Like:

union_query = union_all(trades_select, ^orders_select)

from(u in subquery(union_query), select: [u.id, u.other_fields], order_by: u.inserted_at) |> Repo.all()

All 18 comments

So the thing to keep in mind is that, according to SQL, the order_by applies to the whole union and not a particular union. And because Ecto can only refer to named fields, using order_by with unions in Ecto are unlikely to ever work. We will make sure to document this behaviour more clearly but right it is a limitation of how unions work. Maybe you could try to use fragments but I am not sure how you would refer to the result of the union as Ecto does not generate named fields in the select (like foo AS bar).

Using fragments work since you can refer to the overall union results as "unqualified" - order_by(fragment("inserted_at")) does what's expected.

@michalmuskala I will add that note too.

@eumend Would turning the union query into a subquery work for you? Like:

union_query = union_all(trades_select, ^orders_select)

from(u in subquery(union_query), select: [u.id, u.other_fields], order_by: u.inserted_at) |> Repo.all()

@qcam that's beautiful.

Would it be a good idea to have some assertions in the query planner to ensure combinations and order_by (and maybe others) are mutually exclusive? 馃

I assume we may have valid cases, such as the fragment Micha艂 showed, or
random sorting. so I would prefer to focus on the docs

I believe we should be able to achieve everything using a subquery. But I'm not 100% sure of performance implications of wrapping a union in a subquery.

@qcam I can't test it right now but that looks perfect! If it works, as it probably does, it will likely allow me to use ecto_scrivener which is what I'd rather do. Thanks!

Would it be a good idea to have some assertions in the query planner to ensure combinations and order_by (and maybe others) are mutually exclusive? 馃

@qcam, now that I thought more about it, I think we can check that if there is any combination, we could check that none of the fields in order_by use the binding. For instance, we cannot have p.foo or anything of sorts.

We could make a very simple check that traverses the order_by fields looking for top level bindings:

Enum.find(order_bys, &{_dir, %{expr: {{:., _, _}, _, [{:&, _, [_]}, _]}}})

And raise a proper error message with the suggestions here. Thoughts?

That sounds great! I can give it a try if you haven鈥檛 started. The only thing I wonder is that shall we make that check at when the query is built in compile time or when the query is planned?

When the query is planned. Or to be more precise, when the query is normalized inside the planner (the result of normalization is cached).

But, yes, please go ahead!

@qcam Your suggestion worked, though I did have to use one last hack. You see, for some reason the tag field I declared with a string, disappears form the subquery, so when I do:

trades_select =
      trades_query()
      |> select([t], %{
        id: t.id,
        timestamp: t.inserted_at,
        data: fragment("row_to_json(?)", t),
        tag: "trade"
      })

    orders_select =
      orders_query()
      |> select([o], %{
        id: o.id,
        timestamp: o.inserted_at,
        data: fragment("row_to_json(?)", o),
        tag: "order"
      })

    union_query =
      trades_select
      |> union_all(^orders_select)

    from(u in subquery(union_query),
      select: %{
        id: u.id,
        timestamp: u.timestamp,
        data: u.data,
        tag: u.tag
      }
    )
|> Repo.all()

I get

** (Postgrex.Error) ERROR 42703 (undefined_column) column s0.tag does not exist

    query: SELECT s0."id", s0."timestamp", s0."data", s0."tag" FROM (SELECT t0."id" AS "id", t0."inserted_at" AS "timestamp", row_to_json(t0) AS "data" FROM "trades" AS t0 UNION ALL (SELECT o0."id", o0."inserted_at", row_to_json(o0) FROM "orders" AS o0)) AS s0
    (ecto_sql) lib/ecto/adapters/sql.ex:590: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql) lib/ecto/adapters/sql.ex:523: Ecto.Adapters.SQL.execute/5
    (ecto) lib/ecto/repo/queryable.ex:147: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:18: Ecto.Repo.Queryable.all/3

BUT If i do

trades_select =
      trades_query()
      |> select([t], %{
        id: t.id,
        timestamp: t.inserted_at,
        data: fragment("row_to_json(?)", t),
        tag: fragment("'trade'")
      })

    orders_select =
      orders_query()
      |> select([o], %{
        id: o.id,
        timestamp: o.inserted_at,
        data: fragment("row_to_json(?)", o),
        tag: fragment("'order'")
      })

Then the query works fine:

[debug] QUERY OK db=75.6ms
SELECT s0."id", s0."timestamp", s0."data", s0."tag" FROM (SELECT t0."id" AS "id", t0."inserted_at" AS "timestamp", row_to_json(t0) AS "data", 'trade' AS "tag" FROM "trades" AS t0 UNION ALL (SELECT o0."id", o0."inserted_at", row_to_json(o0), 'order' FROM "orders" AS o0)) AS s0 []

And yes, it works with order_by, scriverer's paginate etc.

About the tag, is this supposed to work this way? I mean, if I skip the order_by/pagination then the tag string works fine. What I get is that's added after the sql if I declare it as a string?

@eumend in your first query, can you not declare the select at all? Or do you have to declare a select?

I think this will be broken with unions anyway as we will likely tag everything as trade and nothing as order. When we have a union, we need to send all literals to the database. The same thing for subqueries. I will open up a separate issue.

@josevalim Yes that's the thing, I need every table tagged and for that I need the select in each one. Or some sort of identifier after the query is done I guess.

Closing in favor of #2828.

Was this page helpful?
0 / 5 - 0 ratings