Ecto: Support named joins

Created on 22 Jan 2018  Â·  58Comments  Â·  Source: elixir-ecto/ecto

Tasks

  • [x] Allow multiple options to be given to join/5
  • [x] Support the :as construct in queries
  • [x] Include :as on inspected queries
  • [x] Update the documentation
  • [x] Add support for :as in from too

The problem

A commonly asked feature about Ecto is to support named joins. The idea is that, if you need to compose a query by parts, you may want to refer to previous joins. The only way to do so in Ecto today is via positional bindings:

  query = Post

  # Filter by the join
  query = from p in query,
            join: c in Comment, where: c.post_id == p.id

  # Extend the query
  query = from [p, c] in query,
            select: {p.title, c.body}

There is also the ... identifier that helps select the last bindings:

  # Extend the query
  query = from [p, ..., c] in query,
            select: {p.title, c.body}

However, those approaches are still restrictive, as they require you to track when a given join was added.

For those reasons, having a mechanism that refers to a previously established join would be very useful.

Solution

There were previous attempts to address this issue but all of them attempted to conflate the positional binding with the named join. This approach can be very error prone because in Ecto the name of the binding has no effect on the generated query. So if bindings suddenly had to be unique, many queries would fail.

Another approach would be to identify the joins by their schema but that is also limited as you would be unable to join the same schema/table more than once, which may be desired from time to time.

So it is clear that the only approach to solve this is by introducing an explicit naming mechanism. We propose to do so via the :as option in queries:

  query = Post

  # Filter by the join
  query = from p in query,
            join: c in Comment, as: :comments, where: c.post_id == p.id

  # Extend the query
  query = from [p, comments: c] in query,
            select: {p.title, c.body}

If the :as key is present, Ecto will do the following:

  1. Has another join be specified with the :as key? If so, we raise
  2. Otherwise we store it for further lookups

We could even leverage this solution to include the :as names in the resulting SQL query, as it may be handy in certain situations.

Feedback

Your turn. :)

Advanced Discussion

Most helpful comment

@AndrewDryga we would appreciate a PR that adds has_named_binding?(query, key). :)

All 58 comments

First some glee. Oh My Word This Would Simplify SOOO Much Of My Code And Shorten My Factorial-Growing Compile Times Whoooo!!!

Now that that is out of the way. ^.^

I'm not sure why the :as needs to be in the join query, instead of this:

query = from p in query,
            join: c in Comment, as: :comments, where: c.post_id == p.id

Why not do this:

query = from p in query,
            join: [comments: c] in Comment, where: c.post_id == p.id

This way the syntax more matches the use query and it seems more obvious as well since the 'name' of the thing can be inferred as it's unique 'key'. :-)

@OvermindDL1 I thought about it but my reservation is that join expects a single binding and not a list of bindings. With your proposal we would mix the consumption of bindings with their definition.

@OvermindDL1 I thought about it but my reservation is that join expects a single binding and not a list of bindings. With your proposal we would mix the consumption of bindings with their definition.

True a list implies more, but it matches the later query. What about instead a 2-tuple?

query = from p in query,
            join: {:comments, c} in Comment, where: c.post_id == p.id

That way it still matches the key/binding pair later.

That works!

Another reason to use :as though is that we can use it in the generated SQL
too. It may be handy in some situations.

If we tie it to the binding, it is a bit awkward to also make it part of

the query, because those seemingly have no relation.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

Another reason to use :as though is that we can use it in the generated SQL
too. It may be handy in some situations.

You can with the tuple as well. I know as is more SQL'y, but it just doesn't seem to fit and makes the query syntax seem a lot more noisy to me, and less elixir'y/erlang'y. The 'key' in it seems related though to me?

This feature would be incredible. We hit the limits of ... when trying to generate queries with nested joins and this would solve it.

Can the name be dynamic / interpolated?

query = 
  from [p, ^som_dyamic_name: c] in query,
  select: {p.title, c.body}

This is a great step towards making join queries more composable, and fixes one of my biggest gripes with Ecto!

However, the proposed solution still doesn't fix the issue completely. I still can't write a single function that can accept a joined or non-joined query. Also, it still makes dealing with multiple joins of the same table difficult because the alias has to be hardcoded in the function (unless it can be interpolated like @mbuhot suggested).

For example, this function is not reusable:

def active(query) do
  from [accounts: a] in query,
    where: a.active_until > datetime_add(fragment("NOW()"), -5, "day")
end

# this would work
query = from t in Thing, join: a in assoc(t, :account), as: :accounts
Account.active(query)

# but this (as far as I understand) would not
query = from a in Account
Account.active(query)

# neither would this
query =
  from t in Thing,
    join: a in assoc(t, :account), as: :accounts,
    join: a2 in assoc(t, :other_account), as: :accounts2
Account.active(query) # i want to filter on :accounts2

You can declare the same join twice as long as it has the same name and it
will be added only once, this allows operations to be idempotent.

We can support dynamic aliases too but our policy regarding those is to

always start with the strict one and added the interpolation when necessary.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

You can declare the same join twice as long as it has the same name and it
will be added only once, this allows operations to be idempotent.

Hmm, how will that handle conflicting on: ... options? 'and' them together or first-only or last-only or so?

I'm not big on dynamic aliases personally, though having a function to detect what names already exist on the query so we could branch would be very nice. :-)

I think the current plan is to raise when the on clauses are conflicting.

It would be handy to have also additional join-like function that would work similarly to select_merge, it would check if given named join is present and adds one only if it isn't already present, otherwise it will add it. This would help with composing queries.

@hauleth that's exactly what we are discussing above. :) If the as option is given more than once, it is idempotent as long as you have the same :on.

Oh, yeah, I was mislead by @michalmuskala comment, sorry for the fuss.

Will there be some official way to introspect what join 'names' have already been added to the query, or will we need to manually introspect the internals of the query to figure it out? Basically a function like [:some_name, :another_name] = get_join_names(some_query) (and maybe also a helper like is_joined?(some_query, :some_name) or so)? Although perhaps returning a set (map with true values, or maybe something useful in the values?) would make matching in a case a lot easier. :-)

@hauleth no problem.

Let's get the feature in its basic form in, then we can discuss all possible additions and their use cases.

I was experimenting with adding support for idempotent named joins but hit some issues along the way. Most chief among them is how to reliably determine whether two join expressions are equal. The query expressions in on seem to be non-trivial to compare when the expressions inside contain positional references to bindings. Comparison, at least regarding my current knowledge of Ecto internals and Elixir metaprogramming, would either require expanding those expressions or some other way of altering them - which may carry certain computational cost. If I'm missing some well known way of making such comparison, please let me know.

Another problem I've started to consider is whether the behavior (joins being idempotent) doesn't bring an unwanted element of surprise to the user of the library. Perhaps that's not as big of an issue as I imagine it to be, but in situation where somebody adds a (named) join for the second time and, for some reason, tries to refer to it by its absolute position, may be caught by surprise that the binding was not added. Again, whether that's a big issue that join behaves somewhat differently in that particular case, I don't know.

For reference, I have a WIP branch with some groundwork done for comparing joins (although mind you, as it currently stands, it's a non-working, messy draft): https://github.com/zoldar/ecto/commit/f532d6ecf74b6195ec7c65cf1f07deb3febf8122

If nothing changes, I will try to come up with a way to make the comparison at least work in the coming days - reliably, if possible. If anyone has any ideas or opinions on the problems/questions I've raised here, I'd be happy to hear them.

@zoldar if you have the exact same join, with the exact same on and as, then odds are that you want them to be the same anyway.

Regarding the check, the bindings have to be the same, otherwise they are separate joins, no? I would try implement this by traversing the on_expr (i.e. %JoinExpr{}.on.expr) at compile time and by setting the metadata node in the AST to an empty list, effectively removing contextual or line/file information. You should be able to do this by using Macro.prewalk/2. The key you are going to use to check if the join is the same will be {join_source, :erlang.phash2(prewalked_join_on_expr)}.

Thoughts?

@josevalim

Ok, so, given a following basic example:

"posts"
|> join(:inner, [], b1 in "blogs", on: b1.valid, as: :blogs)
|> join(:inner, [], b2 in "blogs", on: b2.valid, as: :blogs)

I'm getting following on expressions:

{{:., [line: 263], [{:b1, [line: 263], nil}, :valid]}, [line: 263], []}

{{:., [line: 264], [{:b2, [line: 264], nil}, :valid]}, [line: 264], []}

which later become:

{{:., [], [{:&, [], [1]}, :valid]}, [], []}

{{:., [], [{:&, [], [2]}, :valid]}, [], []}

dropping metadata from AST alone seems to not be enough. However, it probably could indeed be possible to traverse the expression and "unify" all occurrences of positional arguments referring to the joined association. I'll look into that in the coming days.

@zoldar You are right. We could use the escaped on but that complicates the code too. I think it is probably best to give up on implementing this feature implicitly. Maybe we would need to find a way to override those as more explicitly or, as @OvermindDL1 said, find a way to check if an as already exists or not. I will remove this item from the todo list. Just documentation and supporting :as in from left. Thank you for exploring those alternatives!

_/me is very of the thought that a name should only exist once, if it is attempted to be used again then it should raise an exception saying as much while showing the existing and the attempted join informations..._

(EDIT: Really though, multiple usages of the same name just screams BUG to me)

Honestly all I'd want is just named joins (if the name is attempted to be used again then hard error), and a function or documented field on the query to get the list of used names (to save me from having to carry them around manually, which I would inevitably have to do in every case that I use named queries). :-)

@josevalim Just wanted to make sure, for avoidance of any doubt:

Add support for :as in from too

is that about supporting :as in bindings list, as in:

from [p, comments: c] in posts_with_comments, ...

(which is already supported - Ecto.Query.Builder.From.build/2 runs bindings through Builder.escape_binding/3; it's just a matter of adding tests and documenting that)?

or is it about aliasing the source of the query, like:

from p in Posts, as: :posts, ...

?

The second form would probably be useful in the context of generating the query by adapter and using the provided aliases instead of generated ones.

@zoldar the second one!

Note that today we keep from in the AST as a simple tuple but I think we need to introduce a FromExpr so we can also store :as inside of it.

Given this in itself is a big task, I would recommend to first send a PR that introduces FromExpr and start to consistently use it. It will require changes to the builder, planner and the adapters.

@zoldar another use case is for naming from is because when composing queries you may not want to know if :comments is a from or join. :)

@zoldar ping! Let me know if you have any questions or if you need any help. :)

@josevalim Hey. I'll probably be pushing a PR with my attempt at from expression refactor some time this week(end), along with questions.

@zoldar with your PR, we can now support :as in from. After we do it and add documentation, this should be fully complete. I am excited!

I have the as in from support ready to push for review. I'll do that
later today.

sob., 7.04.2018, 13:43 uĆŒytkownik JosĂ© Valim notifications@github.com
napisaƂ:

@zoldar https://github.com/zoldar with your PR, we can now support :as
in from. After we do it and add documentation, this should be fully
complete. I am excited!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elixir-ecto/ecto/issues/2389#issuecomment-379463479,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAj6P2YpWWiVFoWs1-wtXVBtl0R81AuVks5tmKZWgaJpZM4Rnuo9
.

@zoldar :heart: :green_heart: :blue_heart: :yellow_heart: :purple_heart:

@josevalim I think it would be good to expose a way to check if a given name is not already used for bindings - as it was mentioned in https://github.com/elixir-ecto/ecto/issues/2389#issuecomment-369054839 . Although it's probably not that hard to do (with, say, Map.has_key?(query.aliases, :foo)), it would be handy to have such function in the API - something like Ecto.Query.has_binding?/2. This would be helpful, for instance, in cases when a query is built from filters which can be applied multiple times and only the first call of each should add relevant joins.

WDYT? \cc @OvermindDL1

@zoldar I think that is definitely an important piece to have, since we don't want them relying on an internal of the query. I could write this utility myself, but one thing I know I'll need for my API query builder is also "give me an unused alias".

Sounds good to me! Although I would wait until we have a concrete use case

and move forward. :)

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

@zoldar that's exactly what happens when you add the join with same source and on and same alias multiple times - it will be added just once. I'm not sure I see what would be an advantage of checking it manually.

@michalmuskala Hmm, as far as the implementation for named bindings goes, it will explicitly crash, unless I misunderstood? https://github.com/elixir-ecto/ecto/blob/ce87525ab0d88746a231bb2a06019b585c8be00a/test/ecto/query_test.exs#L244

@zachdaniel Sure, go for it if you like.

but one thing I know I'll need for my API query builder is also "give me an unused alias".

what do you mean by that?

It will raise if you try to join two different things using the same alias. If you join the same thing twice with the same alias, the join will be deduplicated for you - that's the major feature of named joins.

When we developed the feature we agreed that we wouldn’t allow the same
binding twice even for the same source because it is not really practical.
In most cases to reconstruct the binding you need previous associations or

on clauses.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

I would say for now the best we can do is to have people try this out and
give feedback on what they are trying to build. Then we will figure which

new APIs to add and what to make more flexible.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

@zoldar so in my API, you can provide filters on joined values, and you can also include related entities. The current implementation has to do some very hacky things to accomplish this, so lets not dive into that too much. But what I'd need is something equivalent of "give me an alias that is not in use in this query already", so that I can tie it to a specific reference to an association in our request.

Like I said though, it wouldn't bother me if no one wants this, because its easy enough to generate aliases until I get one that is not in use, but I would rather use the function you talked about to check if an alias is in use. so like

...building query

# encounter a filter/include request on an association
alias = Ecto.Query.new_alias(query)
association = util_to_figure_out_assoc # my own thing
filter = util_to_figure_out_dynamic_filter # my own thing

new_query =
  from row in query,
    join: associated in assoc(row, ^association),
    as: ^alias,
    on: ^filter

or something along those lines.

@josevalim is right - it's best to wait until people start trying that out in practice. I actually had a real life use case on my mind but I'll hold off with proposing anything until/if I actually try to use that feature on an actual living code.

Sorry for stirring the pond in a closed issue on Sunday 🙈

@zoldar as soon as I feel like I have the tools to replace my existing code w/ named bindings (there are a few things I need, and this is one of them) then I will be trying it out in practice.

But we can totally discuss this in a different format later this week :D I'll just see if I can try the refactor that I have in mind w/ ecto master and see what things I run into, and then report back.

The names fully work At compile time, so we can’t return a dynamic name to
use (yet). If we approach the problem as “we need dynamic names”, then the
only solution we will find is “dynamic names”. But if we ask “how to best

compose this query”, we may find more answers.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

@josevalim that is very true :D My use case is that I'm accepting a (very graphql-ish) query from user input, something like

%{
  upvotes: %{greater_than: 1},
  any_of: [
    %{comments: %{author_id: 1}},
    %{comments: %{upvotes: %{less_than: 1}}}
  ]
}

We already support a significant set of filters and boolean combinators like that, but its done w/ quite a bit of metaprogramming that I'd like to make unnecessary. I am already tying specific association joins that I require to binding indices externally to the query, and then building query expressions manually. I was hoping to replace that with named bindings, but that would require me to be able to specify a binding at runtime while constructing a query.

@zachdaniel in this case I would try to pre-process the data and move the any_of inside each association.

The other idea is to have something like has_named_binding? that @zoldar mentioned and make sure the bindings is introduced at some point. So you would have functions like:

def ensure_comments(query) do
  if has_named_binding?(query, :comments) do
    query
  else
    join(query, [p], c in assoc(p, :comments), as: :comments)
  end
end

We have something like 60 resources most of which have upwards of 10 relationships (and some many more) so having to have a handwritten (or even compile time generated) list of functions that can join + name that association isn't really feasible. I'll write something up that explains our use case more in depth later, since we're looking to open source our framework in the next 6 months to a year as time allows to clean it up and extract the parts that are specific to us.

This would be helpful, for instance, in cases when a query is built from filters which can be applied multiple times and only the first call of each should add relevant joins.

WDYT? \cc @OvermindDL1

Please!

that's exactly what happens when you add the join with same source and on and same alias multiple times - it will be added just once. I'm not sure I see what would be an advantage of checking it manually.

I'll ever only add it once, and honestly if it was added multiple times with the same name (even if an identical query) I really say it should just error out as that really is an odd logic issue there.

However, checking if the name exists is highly valuable as an earlier part could have joined it with a name with a special unique set of dynamic bindings, and later something uses that join (perhaps further joins).

It will raise if you try to join two different things using the same alias. If you join the same thing twice with the same alias, the join will be deduplicated for you - that's the major feature of named joins.

Uh, this sounds like a major logic bug just waiting to happen...

@josevalim is right - it's best to wait until people start trying that out in practice. I actually had a real life use case on my mind but I'll hold off with proposing anything until/if I actually try to use that feature on an actual living code.

I have lots of real-life use-cases that I've spoken of on the forums and so forth, I can detail some of them here if curious? My use-cases don't _need_ a test for if a query has a binding contained within it, I could carry that information in a tuple or map or so, but that is painful to be carrying another thing along with the query as then I'd have to thread that through everywhere, have a final thing that pulls it out, etc... etc... It is just such a major code simplification to just let me query the names already in it, especially as I don't have to worry about it getting out of sync or anything of the sort.

My use case is that I'm accepting a (very graphql-ish) query from user input, something like

Ah, that's a good use-case, my uses are very similar on the front-side as well, however I have other reasons needed for named joins and that is because I have to heavily optimize a very very slow oracle server link and need to minimize the number of queries (which means utterly ginormous queries) while optimizing them as much as possible. Joining anywhere from 8 to 22 tables is entirely common.

Folks, this issue is closed and the basic feature is in. If you feel you need more functionality, please provide a complete proposal. You can use this topic as an example. Please provide a snippet with how the code looks today, the problems the old code has, and then describe the solution, with advantages, disadvantages and the updated code samples. Thank you.

@josevalim if I understand @OvermindDL1 correctly (and if so, I completely agree) is that we want to reduce functionality proposed there, not add to it. The part we are afraid of is:

If you join the same thing twice with the same alias, the join will be deduplicated for you

So if I understand correctly if I now do this:

query1 = from foo in Foo, join: bar in Bar, on: bar.foo_id == foo.id
query2 = from foo in query1, join: bar in Bar, on: bar.foo_id == foo.id

it will silently pass. However I (and I think @OvermindDL1 also) think that this is dangerous behaviour and it should fail the same as when used like:

query1 = from foo in Foo, join: bar in Bar, on: bar.foo_id == foo.id
query2 = from foo in query1, join: bar in Bar, on: bar.foo_id2 == foo.id

It should provide much less confusion IMHO and also allow us to add such behaviour later if we find it usable. Other way around it would be infeasible to do so (I mean to remove such feature).

EDIT: To join and merge I would propose join_merge macro that would work like currently proposed doubled named join. I think it would be less confusing to the people.

The behavior you mention was discarded along the way and it has not been
implemented. :)

Btw the goal was never to ignore it implicitly, only when named. But it was

not implemented either way.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

Is this feature available now in latest release? (coz I dont see in docs of version 2.2.10 which was released on Apr 8)

No the feature is only in master. You can click the commit and it generally

shows the branches and tags it is available.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

Sure. Thanks for the quick response.

After we added named joins there is no way to ensure named join exists. Here is an example where it's useful:

  def with_preloaded_account(queryable) do
    if alias_defined?(queryable, :account) do
      queryable
    else
      queryable
      |> join(:left, [assignment], account in assoc(assignment, :account), as: :account)
      |> preload([assignment, account: account], account: account)
    end
  end

  def alias_defined?(queryable, alias_name) when is_atom(queryable) do
    queryable
    |> Ecto.Queryable.to_query()
    |> alias_defined?(alias_name)
  end

  def alias_defined?(queryable, alias_name) do
    Map.has_key?(queryable.aliases, alias_name)
  end

This way you have more flexibility and can do cool stuff like this which is used in domain context:

  def with_preloads(queryable, preload: []) do
    queryable
  end

  def with_preloads(queryable, preload: preload) when is_atom(preload) do
    with_preloads(queryable, preload: List.wrap(preload))
  end

  def with_preloads(queryable, preload: [:user | rest_preloads]) do
    queryable
    |> with_preloaded_user()
    |> with_preloads(preload: rest_preloads)
  end

  def with_preloads(queryable, preload: [:account | rest_preloads]) do
    queryable
    |> with_preloaded_account()
    |> with_preloads(preload: rest_preloads)
  end

  def with_preloads(queryable, []) do
    queryable
  end

I saw in multiple projects that by having preloads option in domain context you can save a lot of duplicated code and do not execute separate queries.

So should we add this to Ecto.Query?

@AndrewDryga if I read your request correctly it is the same as there
https://github.com/elixir-ecto/ecto/issues/2389#issuecomment-379582851

@hauleth yeah, did not saw that one. This is exactly what we ended up doing.

@AndrewDryga we would appreciate a PR that adds has_named_binding?(query, key). :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alaadahmed picture alaadahmed  Â·  4Comments

nathanjohnson320 picture nathanjohnson320  Â·  4Comments

shahryarjb picture shahryarjb  Â·  3Comments

a12e picture a12e  Â·  4Comments

stavro picture stavro  Â·  4Comments