Efcore: Query: Translate IQueryable.Concat/Union/Intersect/Except/etc. to server

Created on 19 Oct 2016  路  49Comments  路  Source: dotnet/efcore

EF version:1.0.1.0

``` C#
var q1 = from m in db.Table1
select new
{
name = m.Name,
};
var q2 = from m in db.Table2
select new
{
name = m.Name,
};
q1 = q1.Concat(q2);
var result = q1.ToArray();

I watch the sql string by ILogger, the sql string is not union two table, the dbconnection will excute two dbcommand for each table like this:
command 1:

``` SQL
SELECT "m"."Name"
FROM "Table1" AS "m"

command 2:

SELECT "m"."Name"
FROM "Table2" AS "m"

at EF 6.1 version, it's make a sql string like this:

(SELECT "m"."Name"
FROM "Table1" AS "m") union
(SELECT "m"."Name"
FROM "Table2" AS "m")
closed-fixed punted-for-2.1 type-enhancement

Most helpful comment

c# var q1 = from m in db.Table1 select new { name = m.Name, }; var q2 = from m in db.Table2 select new { name = m.Name, }; q1 = q1.Concat(q2); var result = q1.OrderBy(m=>m.Name).Skip(1000).Take(10).ToArray();

If i use the ef core like that,it also make two sql string , excute two dbcommand,and concatenate the results in memory, and then excute "OrderBy(m=>m.Name).Skip(1000).Take(10)",
I think it's bad performance

All 49 comments

@simpleway2016 In general the LINQ implementation of EF Core only attempts to translate specific patterns to equivalent SQL expressions for relational databases.

In the particular case of Queryable.Concat() an appropriate translation could be two SQL queries with UNION ALL in between but instead we concatenate the results in memory.

Changing this to producing a single SQL query on the server isn't currently a top priority because we believe that for many common scenarios (e.g. for the sample queries you posted) there isn't a significant difference.

Could you elaborate on what scenario you think performing the union on the server would be more useful for you?

c# var q1 = from m in db.Table1 select new { name = m.Name, }; var q2 = from m in db.Table2 select new { name = m.Name, }; q1 = q1.Concat(q2); var result = q1.OrderBy(m=>m.Name).Skip(1000).Take(10).ToArray();

If i use the ef core like that,it also make two sql string , excute two dbcommand,and concatenate the results in memory, and then excute "OrderBy(m=>m.Name).Skip(1000).Take(10)",
I think it's bad performance

I would really like to see this feature too. As @simpleway2016 said above, paging results of a union in the database is much more efficient.

It would have a significant performance difference if there is a large amount of latency between the client and the server (such as being in a different timezone).

Consider a banking application where it needs to display a number of types of transactions, each of which has its own specific properties but also shares some common data with other types, in one page, with paging and sorting. This is a perfect case where translating the concat query into one union all statement would be needed.

Please put this feature on a higher priority ,We really need this feature, it should have much higher priority than Lazy loading, Imagine there are 100,000 rows of data and we only need top 10

@ajcvickers what's the state of this ?

@John0King It's still scheduled for 2.1, but as far as I am aware @smitpatel has not implemented it yet. (It's always possible it won't make 2.1 given we don't know the full schedule yet.)

Prompt to me, where I can read all supported things for EF Core? (e.t. Concat....)

@ajcvickers EF Core 2.1 Missing this feature again ? :disappointed:

I would be keen to see except implemented too. Any updates on the progress?

Enumerable | T-SQL
--------------|-------
Concat | UNION ALL
Except | EXCEPT
Intersect | INTERSECT
Union | UNION

Just a possible workaround for @John0King example

Imagine there are 100,000 rows of data and we only need top 10

You can Select top 10 from the first table, and Select top 10 from the second table, and then
union them in memory and show first 10.

It's a side note, we also suffer from inability of EF Core to implement server side Union.

@michael-freidgeim-webjet I already use this for some case, but the really problem is paging .

Waiting for this, to know If I'll have t么 change a lot of code in a migration to Core. It would be nice to still can do Union on the SQL Server

Let's wait like "Homunculus loxodontus"

Voting up for the in-server UNION.
Imagine you have to union 5 similar queries, each produces 10001 rows: 10000 rows are same for each, and 1 is different.
If UNION runs on server - 10005 rows are transferred to a client.
If UNION runs locally - 50005 rows are transferred to a client.
Makes a difference.

Hearted this feature, seems like basic functionality.
I've searched my ass off for this extra SELECT query returning 8000 rows while I instructed a top 100!

context.Table1.Select(...)
.Except(context.Table2.Where(...).Select(...))
.OrderBy(...).Skip(...).Take(100).ToList();

Huge performance hit!
Makes set operators completely useless for any serious project.

Server side UNION:
Linq2Sql - yes
Entity Framework - yes
EFCore 1.0 - no
EFCore 2.0 - no
EFCore 2.1 - no?
Is there any workarounds or this leaves us in the struggle with outdated technologies?

Unfortunately, the lack of this functionality has made moving to EFCore a non-starter, so still anxiously awaiting it's implementation.

Whilst this would be nice to have, I use the Execute methods of the database when things aren't available. No reason not to use what is a much better library than previous version.

@Allann Suppose a lot of DTOs combined in a various ways for union projections, where execution partly occurs on client side, partly on DBs. With Execute|FromSql it's impossible.

@Allann, I even build SPs or UDFs, but this is an additional level of code, which should not be there for simple cases. Also dynamically generated conditions are hard to write without an O/RM. Yes, we were doing it in a pre-O/RM era, but why go back in past?

Any news on this ?

@John0King This issue is in the Backlog milestone. This means that it is not going to happen for the 3.0 release. We will re-assess the backlog following the 3.0 release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources.

For who are thinking in migrate from EF6 to EF Core, this is a big pain head because the behavior expect from Union/Concat is very different from EF6 that uses UNION ALL sintaxe, what can cause performance issues.
I'm being forced to use pure SQL to make work what i need, besides i have a performance issue if i use Union/Concat like is sugested above. :(

Hello, I just wanted to weight in with my comment: We were planning to migrate our applications that are still using Linq2SQL, and this is a no-go right now because of limitations like this one. The performance drop would be too important and using plain SQL is not really an option ...

This is a huge downside to EF core right now for me. It's something I just expected out of the box - I didn't want to have to plan my queries around EF Core's limitations

As mentioned before its kinda basic functionality and extremely important for ANY production application working with relatively big datasets. So can we atleast be sure it will be in 3.0? And what are the very estimates on 3.0 release date? I've found only that 2.2 will come Q4 2018, so we atleast need some dates have in mind. Thank you.

@ajcvickers any news on this ? I see u put a label on this. that does the consider-for-next-release mean ?

@John0King It means we're going to consider it for 3.0 as part of a re-evaluation of that milestone. (Currently the milestone is over-booked, so in general things are more likely to be removed than added, but as we have gained more understanding around things like the evolution of the query pipeline, it has made more important some features that were not planned for 3.0.)

In fact, this is giving a huge headache for many of us.

Organizations with databases containing many tables with similar or matching data patterns, or for whom the number of concurrent database connections is a concern, are really going to need this feature. Organizations for whom both concerns are relevant will need to be careful about adopting EF Core until this feature is integrated. I am hoping this feature is re-evaluated into the 3.0 release, or into a very near-future release. Not having this feature isn't the end of the world (not for me, at least), but it will definitely make things harder for some organizations, especially ones that are strapped for cash (like government organizations).

@ajcvickers any news on this? when are you going to release version 3.0, approximate date?

Thanks

@mmikirtumov The plan is to release EF Core 3.0 later this year at the same time as .NET Core 3.0.

@ajcvickers Does this feature on board for sure this time?

Any ETA? It's one of the main blockers for my project to migrate from EF to EF Core

Thank you

@John0King Nothing is for sure until it ships. In this case, it's in the plan to fix it for 3.0, but that could change depending on what happens between now and release time.

This would be incredibly helpful for us as well. Would love to see it in 3.0

This is a must-have feature for a lot of cases. Right now the only way is to create a raw sql query.

Using the new DbQuery feature and some SQL made this irrelevant for me, in-fact we even replaced a few EF generated SQL with some tuned hand-cut SQL to improved performance and result sets. I can highly recommend that approach for others wanting this type of functionality.

If it is still relevant, here is the solution: use Dapper.

The DbQuery feature (and dapper) both work well if you have a static query that you need to run that needs to be filtered/paged, but if your query requires dynamic changes, then those don't work, or quickly become a maintenance problem. After all, you can always just run your query using a DbCommand/DbDataReader, and instantiate and populate your own classes, but that assumes that the person who creates the SQL is both good at it, and every person who maintains it after that is also good at it, and quite likely, you also get tied to a specific database vendor instead of staying database agnostic. Which, really is the whole point of an ORM in the first place, is it not?

Here are various notes on database-specific details for set operations (will update as I come across others).

Database documentation

SqlServer: UNION, INTERSECT/EXCEPT
Sqlite: https://www.sqlite.org/lang_select.html (Compound Select Statements)
PostgreSQL: https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION
MySQL: https://dev.mysql.com/doc/refman/8.0/en/union.html

Notes

  • All databases seem to support OrderBy, Take and Skip on the set operation, without needing a subquery pushdown (e.g. SELECT 1 AS x UNION SELECT 2 ORDER BY x LIMIT 1 OFFSET 1).
  • PostgreSQL and MySQL allow OrderBy in operands, but require them to be surrounded by parentheses (e.g. (SELECT 1 AS x ORDER BY x) UNION (SELECT 2 AS x ORDER BY x)). SqlServer and Sqlite do not support this.

    • This means that one can have OrderBy/Take/Skip both in the operands and on the set operation itself. Therefore, the set operation behaves more like an independent expression, and less like a clause of SelectExpression.

  • PostgreSQL, MySQL and SqlServer allow Take in operands, but not Sqlite (e.g. SELECT TOP(1) 1 AS x UNION SELECT 2).
  • SqlServer and PostgreSQL assign higher precedence to INTERSECT than to UNION/EXCEPT; this means we need to use parentheses (i.e. a.Union(b).Intersect(c) needs to be rendered as (a UNION b) INTERSECT c). The current logic is to surround with parentheses all set operations which are contained in other set operations, and which aren't of the same type (e.g. UNION/INTERSECT).
  • Sqlite does not support parentheses on set operation operands at all (e.g. (a) UNION (b)). However, it evaluates left-to-right, so rendering LINQ expression in-order into SQL seems like it would work - as long as we never generate parentheses.
  • MySQL supports UNION, but not INTERSECT or EXCEPT.
  • All databases support UNION ALL (which is Linq Concat). PostgreSQL also supports INTERSECT ALL and EXCEPT ALL (no Distinct).

@roji please reverify that SQL Server does not support take/order by within subset operations. I believe it should, but not if you only specify order by without a top/take.

I believe:
(SELECT TOP 1 1 as x) UNION (SELECT TOP 1 2)
and
(SELECT TOP 1 1 as x ORDER BY 1) UNION (SELECT TOP 1 2)
should both work, while
(SELECT 1 as x ORDER BY 1) UNION (SELECT TOP 1 2)
makes very little sense because the union doesn鈥檛 guarantee order, so having the order by in the first subset operation is useless.

@kingmotley trying to run your 2nd query ((SELECT TOP 1 1 as x ORDER BY 1) UNION (SELECT TOP 1 2)) errors with "Incorrect syntax near the keyword 'ORDER'". (MSSQL 2017). TOP is indeed allowed in set operands (as I wrote above).

Are you seeing a different behavior?

Sorry, I didn't have access to an instance of Sql Server when I originally posted, but was pretty certain that you could do two unions of a limited set. I assumed (wrongly) that since views have the requirement that they can only have a TOP clause when there is also an ORDER BY, then the 2nd query would work. But in practice, using a view like that in a query (and then inlining it) would look more like the following, which does work:

SELECT * FROM (SELECT TOP 1 1 as x ORDER BY 1) t1 UNION SELECT * FROM (SELECT TOP 1 2 as x ORDER BY 1) t2

Guys, you should try https://github.com/tuespetre/Impatient

That's good suggestion for Sql Server, but we use MySQL

Was this page helpful?
0 / 5 - 0 ratings