Ecto: Joins in queries are not respecting a schema's prefix, fix listed

Created on 9 Feb 2017  路  17Comments  路  Source: elixir-ecto/ecto

Precheck

As troubleshot here: https://elixirforum.com/t/ecto-cross-schema-join/3611/4

Environment

  • Elixir version (elixir -v):
  • Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.):
  • Ecto version (mix deps):
  • Database adapter and version (mix deps):
  • Operating system:
$ elixir -v
Erlang/OTP 19 [erts-8.2] [64-bit] [smp:8:8] [async-threads:10]

Elixir 1.4.0

$ psql --version
PostgreSQL 9.5.3, compiled by Visual C++ build 1800, 64-bit

$ mix deps | grep ' ecto '
* ecto 2.1.3 (Hex package) (mix)

$ mix deps | grep ' postgrex '
* postgrex 0.13.0 (Hex package) (mix)

$ uname -a
MINGW64_NT-10.0 <User> 2.5.0(0.295/5/3) 2016-03-31 18:47 x86_64 Msys

Current behavior

Right now if you have, say, two tables, that have two different @schema_prefix "blah" set, they are not honored in the PostgreSQL system (unsure about others) because of the create_names/4 function at:
https://github.com/elixir-ecto/ecto/blob/master/lib/ecto/adapters/postgres/connection.ex#L570-L572

          {table, schema} ->
            name = [String.first(table) | Integer.to_string(pos)]
            {quote_table(prefix, table), name, schema}

Specifically that case branch is entirely ignoring the schema's @schema_prefix and using the global one, regardless of how entirely wrong it may be considering it is set to the value of the one in the first schema.

Expected behavior

For joins to not get their prefix corrupted and to actually work so the PostgreSQL connection does not throw an error back.

Simple fix

A simple fix that works in my quite large codebase is to just add a single line at the start of that case branch, to make the above code become this:

          {table, schema} ->
            prefix = schema.__schema__(:prefix)
            name = [String.first(table) | Integer.to_string(pos)]
            {quote_table(prefix, table), name, schema}

However, this breaks the ability to override the prefix globally, which honestly I prefer, however a more proper fix (and why this is not a PR when I already have the code ready, though I can PR this if you prefer...) would be to eradicate the :prefix option for Repo.all and its kin and instead come up with some way to override the prefix on a per-schema basis, of which two methods come immediately to mind.

Assuming you have a schema called Post and one called Author and each have a different @schema_prefix set, and if you have this query:

query = from p in Post,
  join: a in Author, on: p.author_id = a.id

Method 1

I prefer this best.

You could override it like this:

query = from p in Post,
  join: a in {"newprefix", Author}, on: p.author_id = a.id

Or some variation of 'tagging' the module some-how.

Method 2

You could override it by passing a list of schema's to the action:

Repo.all(query, prefix: [Author: "newprefix", Post: "yetanotherprefix"])

This gives the ability to override it at the call site instead of the generation site, however it would not allow you to, for example, have 2 Post schema's in the same query in two different prefix's (not a case I've ever needed but it is supported by SQL and I've seen the occasional odd system do it).

Note

Either method could still be used with a :prefix key option on the action that could override them 'all' if that is really really wanted for whatever reason, but regardless, the default should be to obey the settings of the schema itself.

Most helpful comment

Please revisit this, I've faced this issue when trying to migrate and old rails app to phoenix.

All 17 comments

The prefix feature is meant primarily as a way to support multi-tenancy in ecto, and thus providing isolation between different prefixes is highly desired. It is fully intentional that a prefix is global for a query.

Changing how prefixes work, especially the fact that a prefix of the primary source of the query is considered a prefix for the whole query is definitely not a backwards compatible change and would need to be scheduled for Ecto 3.0 if we decide to go with that.

The {"source", Schema} notation is not appropriate for specifying prefix, as it's already used in other places for specifying an alternative database source (table) name for the schema.

Extending the prefix option to be more extensible (by accepting a map) might be an acceptable solution, but a good API needs to be designed.

Yeah I was not sure about a replacement API, hence why there is no PR as of yet, but the simple fix is enough if anyone else needs to join across schemas so that alone may be useful of an immediate use for people in the future until the issue is resolved. :-)

If curious about my use-case:
We have PostgreSQL with the different applications in different schema's, however we are making a server that can access all the data via a web interface, and to prevent duplication of data we are joining across schemas, which works fine with no issue, just it is impossible to do in Ecto's queries it seems because of this issue, thankfully it was a simple work-around since we do not ever override schema's, they are set-in-stone so to say.

The simple fix you provided is not backwards compatible, so unfortunately, however I may feel about it, it's not acceptable currently.

If you have tables in different schemas, but they don't change: an acceptable solution might be to set a proper search_path, which would allow you accessing tables from different schemas in the cross-schema application.

Changing how prefixes work, especially the fact that a prefix of the primary source of the query is considered a prefix for the whole query is definitely not a backwards compatible change and would need to be scheduled for Ecto 3.0 if we decide to go with that.

It could be if an option was given to specify that on the Repo or so via the original use Ecto.Repo, otp_app: :my_server, say, something like use Ecto.Repo, otp_app: :my_server, multi_prefix: true or so. That would then be backwards compatible.

My main curiosity, it makes sense for the :prefix key on something like Repo.all to override the default nil prefix on a schema, but is there ever a reason for it to override a @schema_prefix that is not nil because if a schema has a prefix set like that then it should be well known that it only works in that context. Perhaps that might be a better change?

The simple fix you provided is not backwards compatible, so unfortunately, however I may feel about it, it's not acceptable currently.

Oh it absolute is not and should not be taken as stated in the first message. ^.^
A proper API should be developed for sure.

If you have tables in different schemas, but they don't change: an acceptable solution might be to set a proper search_path, which would allow you accessing tables from different schemas in the cross-schema application.

If I could I would, but I do not have the access to do that, big government managed databases take years to make changes especially with FERPA and HIPAA requirements both in the mix... >.>

My main curiosity, it makes sense for the :prefix key on something like Repo.all to override the default nil prefix on a schema, but is there ever a reason for it to override a @schema_prefix that is not nil because if a schema has a prefix set like that then it should be well known that it only works in that context.

The rule with prefixes is that it goes from generic to localized:

global > schema > :prefix option

It would be inconsistent to not allow the developer to override at the callsite something that has been set in a module.

Extending the prefix option to be more extensible (by accepting a map) might be an acceptable solution, but a good API needs to be designed.

Yes, that's likely the best way to go. Until then, you have to run separate queries or raw SQL, since cross-prefix queries are forbidden pretty much by design.

It would be inconsistent to not allow the developer to override at the callsite something that has been set in a module.

Very true, but that is not what is happening here. I am not specifying a :prefix key on the all call and yet the joins are still being overridden by the 'first listed' schema's prefix, whatever the first may be. For example, swapping the aboves:

Repo.all(from(p in Post,
  join: a in Author, on: p.author_id = a.id))

To become:

Repo.all(from(a in Author,
  join: p in Post, on: p.author_id = a.id))

Also swaps the prefix's used, which is entirely an unexpected outcome. However I would expect:

Repo.all(from(a in Author,
  join: p in Post, on: p.author_id = a.id), prefix: "blah")

To override both regardless.

Yes, that's likely the best way to go. Until then, you have to run separate queries or raw SQL, since cross-prefix queries are forbidden pretty much by design.

What was the reasoning behind this design as I've not found it documented anywhere yet? It makes entirely valid (and often used, at least both at my current job and at my last job in two entirely different sectors, one government, the other 'big business') queries categorically impossible.

What was the reasoning behind this design as I've not found it documented anywhere yet?

It was the use case we designed the feature for. We would need to extend the feature to support other use cases.

It was the use case we designed the feature for. We would need to extend the feature to support other use cases.

Ah, so just a case of 'Its not been needed yet'. :-)

I am still curious, if a :prefix key is not defined, why do the join prefix's get ignored though? That still seems like a major and surprising bug...

Because in multi-tenancy you don't want to perform queries across tenants. It is a documented behaviour and not a bug (albeit I agree 100% it is a limitation).

I'm curious, would anyone really do multi-tenancy with Postgres Schemas instead of via Postgres Users? That seems... exceptionally odd and highly unsafe as it could potentially give one tenant view over all tenant's potentially private information if they found any kind of exploit.

Just to weigh in on this, I'm currently also working on an application where I will need to be able to query across schemas. My use case is a multi-tenant CMS of sorts, however the separation of data is not at tenant boundaries, but rather at site boundaries. One organization may own many sites, and conversely multiple organizations/users within them could have permissions to access/edit a single site. A single user may also belong to multiple organizations.

Therefore my organizations, users, and other similar concerns are in the public schema while each separate site has a schema of its own (as the common case is end-users viewing sites, which greatly benefits from this).

As regards this feature鈥攖his is perhaps not as clean, but here are another two ideas for this syntax:

Option 3

query = from p in Post,
  join: a in {Author, prefix: "newprefix"}, on: p.author_id = a.id

I can immediately see the potential disadvantage here, though I'm not familiar enough with the performance of Erlang guards to evaluate how bad it would be: namely, in the logic which currently may accept a 2-tuple for the use-case already mentioned, one would have to check if the second element is a keyword list or not.

Option 4

query = from p in Post,
  join: a in {:prefixed, "newprefix", Author}, on: p.author_id = a.id

This might be a clean solution as it would be very easy to pattern-match on, and though it is lengthy, as noted this is a rare-enough use case that I think it would be ok. The Author in the example could then be another tuple in the case where we are pulling from another table (the current use for tuples as data sources).

I'd be happy to work on a proposal/PR for this if it seems reasonable.

Can always have a prefix call as well, kind of like:

query = from p in Post,
  join: a in prefix("newprefix", Author), on: p.author_id = a.id

As that would fit the style of the rest of the query helpers as well.

As long the default is for the @schema_prefix "blah" not to be ignored (unless a prefix: key is set on the repo function, which should rightly override it) that would be best. :-)

Out of all the proposals here, I like the additional prefix/2 syntax the most appealing (I was actually intending to propose something similar).

The only issue is that the right side of in in from p in Post is evaluated outside of the query syntax context, so it wouldn't be possible to do from p in prefixed("prefix", Post). That's the same issue we're facing with using fragment as the main query source.

@OvermindDL1 unfortunately changing how @schema_prefix works is not an option until Ecto 3.0

@OvermindDL1 unfortunately changing how @schema_prefix works is not an option until Ecto 3.0

Hmm, you have any other work-around for this? I have no control over the schema layout in this database and I really do not fancy messing with raw SQL (much as I know it, it is still not safe).

@michalmuskala we could probably introduce a helper prefix/2 macro which would expand to the syntax in my Option 4.

Please revisit this, I've faced this issue when trying to migrate and old rails app to phoenix.

Was this page helpful?
0 / 5 - 0 ratings