Efcore: Query with GroupBy or GroupJoin throws exception

Created on 9 Aug 2019  路  42Comments  路  Source: dotnet/efcore

If you run into issue with GroupBy/GroupJoin and arrive here, then please take time to read https://github.com/dotnet/efcore/issues/17068#issuecomment-586464350 fully to direct your feedback in specific issue for your specific scenario.


Read our documentation about this here


Group By

Linq GroupBy operator allows to create IGrouping<TKey, TElement> where TKey & TElement could be arbitrary types. Further, IGrouping implements IEnumerable<TElement> which means you can compose over using any queryable operator after grouping.
In contrast SQL GROUP BY keyword is very restrictive. In SQL, you can only use scalar values in GROUP BY. Most database also allows only referencing grouping key columns or aggregate applied over any other column. Hence selecting a column which is not in key or does not have aggregate applied is not possible.
Due to this mismatch, not all linq GroupBy can be translated to SQL GROUP BY. In EF Core 3.0 we have support for SQL GROUP BY but not for all linq GroupBy. There are few bugs/limitation in this translation for which we have various issues filed, mainly for scenarios where Distinct or a predicate is applied before aggregate operation.
Another GroupBy scenario which can be translated to server is selecting first element of each group which is being tracked by #13805
Most other linq GroupBy cannot be evaluated on server and must be evaluated on client side. This issue tracks about allowing EF Core to do this client side grouping implicitly. Absence of this feature means users would have to manually call AsEnumerable before GroupBy to explicitly falling into client side grouping.

GroupJoin

Linq GroupJoin is a queryable operator which does not have equivalent SQL keyword. The biggest use of GroupJoin for an ORM is to generate Left join on server side as mentioned here. Customers.GroupJoin(Orders,...) would create IEnumerable<Order> for each Customer. As with GroupBy, you can compose over this in Linq but there is no translation. Again, GroupJoin requires client side grouping to generate correct result. This issue also tracks adding implicit support for it.


In recent discussion it came up that, allowing this may be pit of failure (with escape hatch at the bottom :trollface: ).

closed-not-needed punted-for-3.0 type-enhancement

Most helpful comment

@smitpatel

Regarding GroupJoin:

I'm quite surprised. Never thought the GroupJoin key selectors would be the reason for not supporting it. But you do support Join which should have exactly the same issues you've mentioned with nulls in keys. You do even support the "official" LINQ left outer join pattern which contains GroupJoin.

To recap, from the 3 possible join based LINQ patterns

(1) JOIN

from a in As
join b in Bs on some_key(a) equals some_key(b)
select new { a, b }

(2) LEFT JOIN

from a in As
join b in Bs on some_key(a) equals some_key(b) into a_Bs
from b in a_Bs.DefaultIfEmpty()
select new { a, b }

(3)

from a in As
join b in Bs on some_key(a) equals some_key(b) into a_Bs
select new { a, a_Bs }

for some reason you don't want to support the third one.

All they have the same potential key comparison issue. And all they can be rewritten to use correlated subquery rather than join. I guess you see the point.

But please consider supporting the last one rather than removing the first two :-)


Regarding GroupBy:

You definitely need to add support or provide alternative solution for the common "N items per group" problem. You've added a very nice row_number over support which solves it for correlated subqueries, but people often need that after GroupBy, especially the top 1 (last) item of group. See for instance the following SO post Problem with EF OrderBy after migration to .net core 3.1


Regarding client evaluation:

Switching to explicit client evaluation of code which used 2.x Async API is not easy - see the following SO post Converting EF Core queries from 2.2 to 3.0 - async await

I think EF Core should probably provide its own way to switch to client evaluation (other than AsEnumerable() or AsAsyncEnumerable()) which keeps the result IQueryable>T>, but somehow "switches" the context from there. Just a thought. AsEnumerable() is good and was what we are suggesting EF6 people who use synchronous LINQ, but nowadays many people use "modern" frameworks and Async APIs. You gave them async APIs for Queryable and you can't simple take it back without providing a good alternative (I mean, you can of course, but that's not good).


Currently I'm the biggest EF Core supporter on SO and trust me, some 3.x decisions like this really created a lot of problems/frustration for people coming from EF6 or EF Core 2.x, and for sure are not good for EF Core future. People are start seeking/promoting alternative frameworks, which at the end won't be good for anyone who invested time/efforts there.

All 42 comments

https://github.com/aspnet/EntityFrameworkCore/issues/12560#issuecomment-525084333 is a good example of why it's dangerous to implement implicit client evaluation of GroupBy. Requiring users to do explicit client evaluation with AsEnumerable forces them to think and understand the distinctions and perf impact, etc.

Reiterating previous thoughts, the more I think of it, the more I think we should aim to avoid implicit client evaluation for any scenario where client evaluation is (significantly) more expensive than server evaluation, when possible.

In retrospect it actually seems like quite a mistake for LINQ to have GroupBy returning groupings rather than a reduced/aggregated scalar, given that it tends to follow SQL in operator naming and the general prevalence of SQL.

Linking all other bugs from backlog which requires client side GroupBy. If we decide to fix this, then please consider all the scenarios in linked closed bugs.

If we decide to close this, also close related issues which are blocked on this one. And also clean up tests in codebase.

~This may have some value in doing because of #17944~
Work-around is simpler than thought due to just doing collection projection.

I have updated the title & description of this issue so it becomes easier to find and we can point users to this when they run into issues with GroupBy/GroupJoin.

Is this article related to the topic?
If so, maybe it makes sense to clean up and stop confusing people

Closing this as by-design because there is no reasonable or valuable translation to the server we can do in these cases.

What's the difference between collection navigation property and GroupJoin? Why using the former is translated and the latter is not ("by design"?!)

And yet LINQ to SQL has no problem translating cases LINQ to EF Core 3 fails miserably at.

@NetMage - Feel free to use LINQ to SQL for scenarios where you find it useful.

Just to add some context to people hitting this and to add to what @smitpatel and @ajcvickers wrote above.

When there's a way to fully translate a LINQ construct to SQL, we definitely do so (or at least strive to do so, leaving issues open to track). However, in many cases that is simply not possible, as @smitpatel explained at the top of this issue. If anyone thinks we should be able to translate a LINQ construct to SQL but do not, please open a new issue with the LINQ construct and the proposed SQL translation you'd like to see.

In cases where we can't translate to SQL, since version 3.0 we only perform implicit client evaluation in very restricted cases - when we can be (almost) sure that there will be no big performance degradation due to unintentionally fetching huge amounts of data from the server. We are careful with this since it has caused problems in the past, and we want to keep the line dividing server (SQL) evaluation and client evaluation as crystal-clear as possible. However, if what you're looking for is client evaluation, simply add AsEnumerable at any point in your LINQ query, and anything coming after that will be performed client-side. This allows you to use any complex construct - GroupJoin, GroupBy, anything - as long as you're aware you'll be pulling all the necessary data back from the database.

This issue tracks about allowing EF Core to do this client side grouping implicitly. Absence of this feature means users would have to manually call AsEnumerable before GroupBy to explicitly falling into client side grouping.

@ajcvickers would like to re-discuss this

@smitpatel Is LINQ to SQL a supported feature? Is it under active development? Would you recommend it for new projects? Or was that statement like recommending SilverLight or LightSwitch?

To be clear, I am all for the removal of automatic client side processing in EF Core 3.x and the requirement for manually requesting it when desired with AsEnumerable, but this is the second time the latest version of EF has lost the capability to translate queries to SQL that were present in much older technologies. I like the suggestion to open specific translation issues and will work on that.

[...] the latest version of EF has lost the capability to translate queries to SQL that were present in much older technologies.

I understand (and even identify) with the frustration around this. It's worth reminding that EF Core is a separate product from both LINQ to SQL and Entity Framework 6, and we generally don't commit to supporting everything that was supported in these products; this frees EF Core to do things right and make, rather than being bound by legacy features etc. This is of course not good news for someone looking to migrate from LINQ to SQL to EF Core, but unfortunately that's the way things are.

As I wrote above, where there's a reasonable, performant SQL translation for a LINQ construct, we in general strive to implement it regardless of whether they were supported or not in older technologies (provided the effort involved justifies the value, depending on competing priorities, etc. etc.). On the other hand, for implicit/automatic client evaluation of GroupBy, a conscious, reasoned decision was made to not support that, because we believe that requiring explicit client evaluation would be better for our users - even if LINQ to SQL did do this.

However, we'll be rediscussing the GroupBy decision specifically (which is why this issue was reopened).

It was working in aspnetcore 2.2. Now it stopped working. And 3.1x is a a LTS, making us either have to downgrade to 2.1, or force a lot of code change in 3.1. This is a decision that not well made. Please at least give an option to change it to old behavior.

@xqiu unless I'm mistaken, this was client-evaluating in 2.2, not translating to SQL. You can achieve that exact same behavior with 3.x by putting AsEnumerable before your GroupBy - that will trigger client evaluation (see this breaking change note). We no longer do this implicitly in 3.x in order to protect people from unknowingly pulling in large amounts of data from the database because your specific GroupBy can't be translated to SQL.

We all agree that removing the client evaluation is a good thing in general.

The problem though is that there is no easy way to switch the query context. AsEnumerable() changes the type to IEnumerable<T>, hence the final materialization Async calls are no more available. ToList() / await ToListAsync() immediately executes the query at that point. And AsAsyncEnumerable() requires installing additional package for further LINQ support.

But let me get back to my previous comment. Let forget about GroupBy for now and discuss the GroupJoin. While it has no full server side support, the translation and cooperative client/server evaluation is supported for queries using collection navigation properties. e.g. query like

var query = db.Set<Foo>()
    .Select(foo => new
    {
        foo.Id,
        foo.Name,
        Bars = foo.Bars
            .Select(bar => new
            {
                bar.Id,
                bar.Name
            }).ToList()        
    };

is supported, but the equivalent

var query = db.Set<Foo>()
    .GroupJoin(db.Set<Bar>(), foo => foo.Id, bar => bar.FooId, (foo, fooBars) => new
    {
        foo.Id,
        foo.Name,
        Bars = fooBars
            .Select(bar => new
            {
                bar.Id,
                bar.Name
            }).ToList()        
    };

is not.

This makes no sense to me. While in StackOverflow we always tell people to use navigation properties, many times they just don't want to add navigation properties to their models (for some unknown reason), or just can't if there is no FK relationship between the joining tables. The fact that it is supported for navigation properties means that it should be supported for GroupJoins.

If this scenario is not targeted by this issue and the current inability to translate such queries is just a current limitation that will be addressed in the future, that's fine. Just wanted to make sure it is not considered "by design" and "won't fix".

Perhaps what would help is AsEnumerableAsync, though I am not sure what it would return...

@powermetal63 here's one good justification (at least IMHO) for why the 1st is supported and the 2nd isn't... The main point of removing implicit client evaluation between 2.2 and 3.0 is to make it extremely clear/explicit which parts of the query are executed on the server, and which on the client. Our current logic is extremely simple: everything except the top-level select must always be server-evaluatable, or we throw. If we start to add other exceptions - such as GroupJoin - we start muddying things up again, and users have to start remembering what's implicitly client-evaluatable, and what's not. In other words, a user could write a GroupJoin - expecting it to be server-translated and efficient - not knowing that in fact all data is being brought back from the server, etc. (this is important especially if we continue to compose operators over the GroupJoin results). With the current situation there's just one, crystal-clear case where we implicitly client-evaluate, and that's an advantage IMHO.

And as always, it's extremely easy to insert an AsEnumerable before the GroupJoin to explicitly trigger client evaluation, so implicit evaluation of GroupJoin seems to me to have very limited value. Having said that, this is only my opinion and the team does intend to rediscuss our strategy with regards to client evaluation and possibly relax our current restrictions.

@NetMage AsEnumerableAsync is provided by EF Core, returning an IAsyncEnumerable. You can use foreach to iterate over that, or compose over it with async LINQ operators from the System.Linq.Async package.

@roji The only difference between the two queries I see is that for the first one somewhere in the pipeline the translator seems to be replacing (expanding?) collection navigation property with GroupJoin. And I believe the supported query does not bring all the data to client - there are no Includes for related data, the projection should be happening server side and somehow the resulting query shape is materialized on the client. So IMHO it works exactly as one would expect.

@roji I was suggesting that AsEnumerableAsync would be different from AsAsyncEnumerable which you linked to. AsEnumerableAsync would convert an IQueryable to an IEnumerable like AsEnumerable does but in an asynchronous fashion, allowing for continuing a LINQ query on the client side without giving up async execution of the query.

Without a built-in translation for GroupJoin, I don't see how any placing of AsEnumerable in a query can be as efficient as LINQ to SQL / EF 2.2 was which placed a single narrowed query to the SQL engine, even though returning multiple copies of the left hand data (and doing an unnecessary count), it at least narrowed the (usually larger) right hand side from the entire table. The query would have to be completely (manually) transformed to do an Any/EXISTS condition on the right hand side of the GroupJoin.

Here is an extension method expressing a translation for GroupJoin that minimizes the data returned, uses two SQL queries but trades server and client time for minimizing network data. It effectively runs the left query twice on the SQL Engine, so if that is very expensive it may not be worth it, but on a simple test database it is as fast as EF 2.2. This breaks the clear client/server line, but I don't think a proper fix for that could be done with just an extension method.

(I have a new version that handles complex types by converting to memberwise equality which EF Core doesn't yet handle, but it is a bit longer.)

public static class IQueryableExt {
    public static IEnumerable<TRes> GroupJoinAsEnum<TLeft,TRight,TKey, TRes>(
        this IQueryable<TLeft> qleft,
        IQueryable<TRight> qright,
        Expression<Func<TLeft,TKey>> leftKeyFn,
        Expression<Func<TRight,TKey>> rightkeyFn,
        Func<TLeft,IEnumerable<TRight>,TRes> resFn
       ) {
        var rightParm = rightkeyFn.Parameters[0];
        var rightBody = rightkeyFn.Body;
        var leftParm = leftKeyFn.Parameters[0];
        var leftBody = leftKeyFn.Body;
        Expression<Func<TRight,bool>> whereTemplate = r => qleft.Any(l => true);
        var whereTemplateBody = (MethodCallExpression)whereTemplate.Body;
        var anyBody = Expression.MakeBinary(ExpressionType.Equal, leftBody, rightBody);
        var anyLambda = Expression.Lambda(anyBody, leftParm);
        var whereBody = Expression.Call(null, whereTemplateBody.Method, qleft.Expression, anyLambda);
        var whereLambda = Expression.Lambda<Func<TRight,bool>>(whereBody, rightParm);

        var left = qleft.AsEnumerable();
        var right = qright.Where(whereLambda).AsEnumerable();

        return left.GroupJoin(right, leftKeyFn.Compile(), rightkeyFn.Compile(), resFn);
    }
}

I had to revert one day worth of work because of this. I don't think all client evaluations are equally wrong. 2.X correctly translated group joins into left joins and did all the grunt work (adding server side ORDER BY and then iterating and grouping the already sorted rows). Is there any chance you will add this functionality back? It doesn't have to be implicit, if there's an API that enables this kind of in-built client evaluation, that should be fine, something like this:

db.ProductCategories
    .GroupJoin(db.Products ...)
    .SelectMany(...)
    .WithClientEvaluation()

WithClientEvaluation would explicitly enable client evaluation for (and only for) the query it was called on, ie. further chaining client-evaluated steps without calling AsEnumerable would still throw when enumerated.

@powermetal63 for the discussion on whether to translate certain variants of GroupJoin, or whether to implicitly client-evaluate it, we should probably wait for @smitpatel to weigh in (he's on leave) and then discuss as a team. Again, my personal mental model/opinion is that:

  • Anything which we can translate to SQL, and where there's value to doing so (over client evaluation), we should strive to do so.
  • We should avoid implicit client evaluation in cases where we'd be bringing more data from the server than suggested
  • Even for cases where we wouldn't be bringing more data (e.g. at least some cases of GroupJoin), I still think there's value in restricting implicit client evaluation to top-level Select only, simply to make it extremely clear what gets implicitly client-evaluated and what doesn't.
  • Implicit client evaluation in general has little value simply because users can always do explicit client evaluation (AsEnumerable/AsAsyncEnumerable), so keeping the implicit client evaluation rules simple overrides it for me.

@NetMage

AsEnumerableAsync would convert an IQueryable to an IEnumerable like AsEnumerable does but in an asynchronous fashion, allowing for continuing a LINQ query on the client side without giving up async executio

That doesn't really make sense to me - AsEnumerable conceptually expresses that the IQueryable should be evaluated, but doesn't actually evaluate it and fetch the results, that's exactly the difference between AsEnumerable and ToList. For example, if you have an IQueryable and then add AsEnumerable to that, but don't actually iterate over it (with foreach, ToList or whatever), nothing will actually be done. So I'm not sure what it would mean for AsEnumerable to do this in an "asynchronous fashion".

Here is an extension method expressing a translation for GroupJoin that minimizes the data returned, uses two SQL queries but trades server and client time for minimizing network data. It effectively runs the left query twice on the SQL Engine, so if that is very expensive it may not be worth it, but on a simple test database it is as fast as EF 2.2. This breaks the clear client/server line, but I don't think a proper fix for that could be done with just an extension method.

As you write, this is risky as a general translation because of the double evaluation of the query - we generally don't do that. Users can still perform things like this if they want to, but we shouldn't do it implicitly for them.

I'm sorry to spam everybody, but what is the work around here other than doing a AsEnumerable?
A simple

SELECT Name, COUNT(*) FROM CitiesInTheWorld GROUP BY Name

I can't run a simple

dbContext.CitiesInTheWorld.GroupBy(m => m.Name).ToDictionaryAsync(m => m.Key, m => m.Count());

And I cannot run a RAW sql because in EF Core, I have to start from an object and return all its property. The only solution I have is this?

context.Database.GetDbConnection().CreateCommand()
// ... and plenty more code
// https://stackoverflow.com/a/46013305/734742

Am I wrong?

@jsgoupil your question is a duplicate of #16730. You need to express the aggregate operation in a Select:

c# dbContext.CitiesInTheWorld .GroupBy(m => m.Name) .Select(g => new { g.Key, Count = g.Count() } .ToDictionaryAsync(x => x.Key, x => x.Count);

See the docs on GroupBy for more information and #16730. This is a good example of why implicit client evaluation for GroupBy would be a big pit of failure.

Hitting lots of errors with this after upgrading from 2.2 to 3.1. Reverting the changes until this is sorted out. In some of our scenarios we have collections filtered and having the ToDictionary run is not an issue at all. Results from some of these are also cached so impact would be very minimal. We constantly use application insights dependency data to view the results of our queries and determine impact.

Really wish the feedback from this issue was considered more to help people in my situation have a migration path:
https://github.com/dotnet/efcore/issues/14935

Would have been able to upgrade to 3.1, toggle to allow client evaluation, then over time we could toggle this and address some of the breaking unit tests, toggle back to release, rinse repeat until we have all instances addressed. We have a a fairly large project with multiple exceptions that the team could have worked on over time.

Is the only option forward to wait until we have bandwidth to fix several queries through the entire application? Really hope that isn't the case :(

@powermetal63 - Queries described in your comment https://github.com/dotnet/efcore/issues/17068#issuecomment-575257343are not equivalent.
```C#
var query = db.Set()
.Select(foo => new
{
foo.Id,
foo.Name,
Bars = foo.Bars
.Select(bar => new
{
bar.Id,
bar.Name
}).ToList()
});

Gets rewritten inside EF Core as
```C#
var query = db.Set<Foo>()
                .Select(foo => new
                {
                    foo.Id,
                    foo.Name,
                    Bars = db.Set<Bar>()
                        .Where(bar => foo.Id != null && foo.Id == bar.FooId)
                        .Select(bar => new
                        {
                            bar.Id,
                            bar.Name
                        }).ToList()
                });

No, EF Core does not introduce a GroupJoin here.

The reason your query differs is,
If Bar.FooId is nullable (that means child can exist without a parent), then if whatever reason your first sequence of Foo gets null key value (may be it came from right side of a left join, there are various scenario where it could be possible), then both gives different results. The expansion EF Core does make sure that you are taking only Bars for given foo and not matching orphan bars just because you got null foo.

The difference won't be visible when using int properties as key since GroupJoin lookup filter out null keys (hence they won't match each other), but the moment you get composite keys, new {Id = null} == new {Id = null} is true in C#, then your results are incorrect.

Following code demonstrate the difference. (Seems bit contrived but this is enumerable behavior, what EF Core strives to achieve)
```C#
using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
var foo = new Foo
{
Id = 1,
Name = "FirstFoo",
Bars = new List
{
new Bar
{
Id = 1,
FooId = 1,
Name = "FirstBar"
},
new Bar
{
Id = 2,
FooId = 1,
Name = "SecondBar"
}
}
};

        var foo3 = new Foo
        {
            Id = 3,
            Name = "ThirdFoo",
            Bars = new List<Bar>()
        };

        var bar = new Bar { Id = 3, Name = "ThirdBar", FooId = null };

        var listOfFoos = new List<Foo> { foo, foo3 }.Select(f => new Foo
        {
            Id = f.Bars.Count > 0 ? f.Id : null,
            Name = f.Name,
            Bars = f.Bars
        }).ToList();
        var listOfBars = foo.Bars.Concat(new[] { bar }).ToList();

        var query1 = listOfFoos
                .Select(foo => new
                {
                    foo.Id,
                    foo.Name,
                    Bars = foo.Bars
                        .Select(bar => new
                        {
                            bar.Id,
                            bar.Name
                        }).ToList()
                }).ToList();

        var query2 = listOfFoos
            .GroupJoin(listOfBars, foo => new { foo.Id }, bar => new { Id = bar.FooId }, (foo, fooBars) => new
            {
                foo.Id,
                foo.Name,
                Bars = fooBars
                     .Select(bar => new
                     {
                         bar.Id,
                         bar.Name
                     }).ToList()
            }).ToList();

        Console.WriteLine(query1[1].Bars.Count);
        Console.WriteLine(query2[1].Bars.Count);
    }
}

public class Foo
{
    public int? Id { get; set; }
    public string Name { get; set; }
    public List<Bar> Bars { get; set; }
}

public class Bar
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int? FooId { get; set; }
}

}
```

You can certainly can say, do it when non-composite keys, when keys are not nullable when coming from first sequence but I would just point to what EF Core rewrites internally. That is most optimal way of doing if you don't want to have navigation properties in your model.
Rest of points @roji has summarized well. The clear line is important. LINQ is hard to understand. People still struggle to understand when the group by clause does not translate when it does not have aggregate operation applied. I personally don't see value of writing long piece of code in EF Core to support a very small subset which can be written multiple different ways without creating ambiguity for others.

@smitpatel Is LINQ to SQL a supported feature? Is it under active development? Would you recommend it for new projects? Or was that statement like recommending SilverLight or LightSwitch?

@NetMage - You said it translates in LINQ to SQL but not in EF Core. If LINQ to SQL works for you, feel free to use it.

@smitpatel

Regarding GroupJoin:

I'm quite surprised. Never thought the GroupJoin key selectors would be the reason for not supporting it. But you do support Join which should have exactly the same issues you've mentioned with nulls in keys. You do even support the "official" LINQ left outer join pattern which contains GroupJoin.

To recap, from the 3 possible join based LINQ patterns

(1) JOIN

from a in As
join b in Bs on some_key(a) equals some_key(b)
select new { a, b }

(2) LEFT JOIN

from a in As
join b in Bs on some_key(a) equals some_key(b) into a_Bs
from b in a_Bs.DefaultIfEmpty()
select new { a, b }

(3)

from a in As
join b in Bs on some_key(a) equals some_key(b) into a_Bs
select new { a, a_Bs }

for some reason you don't want to support the third one.

All they have the same potential key comparison issue. And all they can be rewritten to use correlated subquery rather than join. I guess you see the point.

But please consider supporting the last one rather than removing the first two :-)


Regarding GroupBy:

You definitely need to add support or provide alternative solution for the common "N items per group" problem. You've added a very nice row_number over support which solves it for correlated subqueries, but people often need that after GroupBy, especially the top 1 (last) item of group. See for instance the following SO post Problem with EF OrderBy after migration to .net core 3.1


Regarding client evaluation:

Switching to explicit client evaluation of code which used 2.x Async API is not easy - see the following SO post Converting EF Core queries from 2.2 to 3.0 - async await

I think EF Core should probably provide its own way to switch to client evaluation (other than AsEnumerable() or AsAsyncEnumerable()) which keeps the result IQueryable>T>, but somehow "switches" the context from there. Just a thought. AsEnumerable() is good and was what we are suggesting EF6 people who use synchronous LINQ, but nowadays many people use "modern" frameworks and Async APIs. You gave them async APIs for Queryable and you can't simple take it back without providing a good alternative (I mean, you can of course, but that's not good).


Currently I'm the biggest EF Core supporter on SO and trust me, some 3.x decisions like this really created a lot of problems/frustration for people coming from EF6 or EF Core 2.x, and for sure are not good for EF Core future. People are start seeking/promoting alternative frameworks, which at the end won't be good for anyone who invested time/efforts there.

for some reason you don't want to support the third one.

Yes, that reason is client eval as @roji has already mentioned. The way GroupJoin is different from Join/LeftJoin is cardinality. That is the reason there is no SQL equivalent of GroupJoin. It is true that Join/LeftJoin can re-written as correlated subquery, but does that help in SQL translation? If EF Core did not have navigations then we wouldn't have this issue of how to expand item. The reason we chose to correlated subquery for collection navigation rather than a GroupJoin is it's equivalency to SQL in many scenarios. e.g.
```C#
db.Customers.Select(c => c.Orders.OrderByDescending(o => o.OrderDate).Select(o => o.OrderID).FirstOrDefault());

If above query gets re-written into correlated subquery then it becomes,
```C#
db.Customers.Select(c => db.OrdersWhere(o => c.CustomerID == o.CustomerID).OrderByDescending(o => o.OrderDate).Select(o => o.OrderID).FirstOrDefault());

Which translates to SQL as

Select (Select Top(1) o.OrderID from Orders as o where o.CustomerID = c.CustomerID Order by o.OrderDate)
From Customers as c

Now if we converted the collection navigation to GroupJoin then arriving at above translation is fairly complex.
I hope that explains why correlated subquery is preferred over GroupJoin for collection navigation.
Apart from above, no matter how you would convert query in LINQ - Join/correlation, at the end it would be Join in SQL (except for some cases of outer/cross apply or scalar subquery) because SQL cannot support it.
To elaborate more on point @roji noted above, why we don't translate GroupJoin is difficulty of understanding when it would fetch more data than needed and explaining that to Customer. Further, since it needs to do client eval to compute groups it complicates the matter.
The 3rd query you posted is fairly straight forward GroupJoin which would evaluate on client just fine. The moment you start composing over the grouping, there is potential danger that client eval is going to fetch more than from server. Further, suppose above GroupJoin worked but then you put it in the middle of the query and now your query does not work anymore. (let me add more about this in GroupBy below).
As OP of this issue, I filed this issue for the sole purpose of doing naked GroupJoin/GroupBy support which doesn't bring any additional data (exactly your 3rd query). It is nice to have for EF to evaluate since putting AsEnumerable does not simply work for GroupJoin like other operators. But we as a team arrived at the conclusion that, while we can make it work, it would just increase ambiguity for Customers.

GroupBy
Sadly, GroupBy does have SQL representation in limited scenario which means we had to implement it at the same time, we have to explain to customers what works and what does not. While many smart customers understand what GroupBy correspond to SQL GROUP BY, (EF Core added support for SQL GROUP BY only), many fails to see it and files issue why the unsupported query does not work. Even worse, the previous client eval behavior was making it even bigger perf disaster.
Few of the GroupBy issue I am aware of that we are not supporting but we plan to add support are

  • SQL Group by with join composition (#10012)
  • GroupBy-Where-Aggregate or GroupBy-Aggregate which takes predicate (#11711/#18836)
  • GroupBy - First of each group/top N of each group (#12088/#13805)
  • GroupBy-Distinct-Aggregate (#17761)
  • GroupBy entityType (#17653)

We have laid ground work for GroupBy following select top(n) of group by introducing Row number expression but to support each translation takes time in designing/implementing/testing. We have some of above translation scheduled for next release, some may have to wait.
Once all above translations are done, probably, GroupBy will face lesser issue of client eval since most other ones are clear case when there is no SQL equivalent. Further GroupBy can use AsEnumerable operator easily.

Regarding client eval in async form - IX-Async would be way to go. As a framework, EF Core does not provide enumerable implementation, same way we would like to do the same for async code path too. It would be decision of corefx team if they want to put async enumerable methods beside sync ones. We are actively pushing for it for easy of using library for our customers, till then using external lib is way to go.

For comparison of EF6 vs EF Core, let us know what specific queries EF6 translated which EF Core is not translating. While we have taken liberty to explicitly not support few things from EF6 which were problematic, we still try to achieve parity for most part so that customers have ease of migration. I wouldn't say that query pipeline in EF Core is on par with EF6. It has some new features which EF6 did not have, while it lacks behind in certain parts. We have tracking issues for things we are aware lacking and we want to do in EF Core already. Help with finding any missing translation which is not being tracked would useful.

@smitpatel

Regarding GroupJoin:

I'm confused - sometimes you say key matching is the reason for not supporting, another time the client evaluation.

How exactly you expand the navigations is irrelevant, I used them just as example. Lets forget it and concentrate on GroupJoin and "client evaluation".

So what exactly is "client evaluation" in this concrete example:

public class Foo
{
    public int Id { get; set; }
    public int? SomeKey { get; set; }
}

public class Bar
{
    public int Id { get; set; }
    public int? SomeKey { get; set; }
}

and the two queries:

var supported = db.Set<Foo>()
    .Select(foo => new { foo, bars = db.Set<Bar>().Where(bar => foo.SomeKey == bar.SomeKey) })
    .Select(r => new
    {
        r.foo.Id,
        Bars = r.bars.ToList()
    })
    .ToList();

var unsupported = db.Set<Foo>()
    .GroupJoin(db.Set<Bar>(), foo => foo.SomeKey, bar => bar.SomeKey, (foo, bars) => new { foo, bars = bars })
    .Select(r => new
    {
        r.foo.Id,
        Bars = r.bars.ToList()
    })
    .ToList();

And here are my points:

(1) The final projection (hence the shape of the result set) is one and the same.
(2) Both have no direct SQL equivalent producing the desired result set shape
(3) Hence there must be some sort of "client evaluation" involved, no?
(4) The first one translates to

  SELECT [f].[Id], [b].[Id], [b].[SomeKey]
  FROM [Foo] AS [f]
  LEFT JOIN [Bar] AS [b] ON [f].[SomeKey] = [b].[SomeKey]
  ORDER BY [f].[Id], [b].[Id]

executes successfully and produces the desired materialized shape.

The second fails with "client evaluation" exception. Why?

I would expected both to produce one and the same translation and one and the same materialized result set shape, or both fail because of "client evaluation". The null comparison logic is irrelevant, both GroupJoin and Join LINQ methods are shortcuts for correlated subqueries limited to equality checks. So, again, why is the second LINQ pattern not supported, and more importantly, why you insist of never supporting it?

It's important, because even it's just a shortcut, it's much easier to use in LINQ query syntax, and also it would allow us (customers) to write custom extension methods for providing GroupBy support of a things you don't - like the aforementioned top N items per group, if we could generically replace the unsupported

baseQuery.GroupBy(keySelector, (key, grouping) => grouping.OrderBy(something).First())

with

baseQuery.Select(keySelector).Distinct()
     .GroupJoin(baseQuery, key => key, keySelector, (key, grouping) => grouping.OrderBy(something).First())

I'm confused - sometimes you say key matching is the reason for not supporting, another time the client evaluation.

Key matching is not the reason. Key matching was just a pointer that both queries are not always equivalent. i.e. limited case. As I have mentioned in my post, main reason is client eval and difficulty of explaining to customer when client eval of GroupJoin won't happen because it fetches more data from server. Also in first running query, since it is in last select, it is client eval'ed. For GroupJoin case, it is not easy to explain when GroupJoin can be last and when it can appear in middle.
This exactly coincide with the scenario you mentioned. While naked GroupJoin can do client eval because it does not fetch additional data, in your case, you are composing over grouping so it cannot be translated to SQL easily either. What that means implies is, while both query you posted (GroupBy & GroupJoin) can produce structurally equivalent results we need to translate both of them in some way in SQL and as you know GroupBy one is in being tracked. GroupJoin query would need to do something similar too.
There are 2 main themes involved.

  • Naked GroupBy/GroupJoin where the grouping is not being composed over and it is last operator. EF Core can do client evaluation for customers since the amount of data required from server is all consumed. This is what this issue was tracking when I filed it.
  • Supporting composition over grouping coming out of result of GroupJoin. This is something I am not sure how to judge value of it. Especially for the case you mention, we can spend time in implementing all missing GroupBy translations or we split time in writing some GroupBy translations & some GroupJoin ones. If your GroupBy query is prominent way of writing the query vs GroupJoin variation and if EF Core added support for GroupBy-Group-FirstOrDefault, would we really have strong need to support similar construct in GroupJoin?

@smitpatel
Assuming that I (as not native English speaker) am not explaining it correctly.

All I wanted to say regarding GroupJoin is that in the remaining part of the LINQ query, bars should be treated equally regardless of whether it is a result of

db.Set<Foo>().Select(foo => new { foo, bars = db.Set<Bar>().Where(bar => foo.SomeKey == bar.SomeKey) })

or

db.Set<Foo>().GroupJoin(db.Set<Bar>(), foo => foo.SomeKey, bar => bar.SomeKey, (foo, bars) => new { foo, bars = bars })

Same for bars here:

from foo in db.Set<Foo>()
let bars = db.Set<Bars>().Where(bar => foo.SomeKey == bar.SomeKey)

and

from foo in db.Set<Foo>()
join bar in db.Set<Bars>() on foo.SomeKey equals bar.SomeKey into bars

@powermetal63 - Is there any additional value to support different LINQ query which translates to same thing at the expense of translating lesser things in general? My point is, when you can write query in form 1, why write in second form. Plus confusion about 2nd form, is when it works and when it does not. (Form 1 does not face that issue)

@smitpatel (and @roji with thumbs up):

The value is exactly the same as for

  • Any(predicate) vs Where(predicate).Any()
  • Sum(expr) vs Select(expr).Sum()
  • Many others similar to the above
  • ALL OTHER LINQ join constructs, especially the left outer join pattern

i.e.

from foo in db.Set<Foo>()
join bar in db.Set<Bar>() on foo.SomeKey equals bar.SomeKey into bars
from bar in bars.DefaultIfEmpty()

vs

from foo in db.Set<Foo>()
from bar in db.Set<Bar>().Where(bar => foo.SomeKey == bar.SomeKey).DefaultIfEmpty()

which you already DO support, so I guess you saw a value in them, even though they fall into the same category as the one in discussion.

I wouldn't say I saw value in supporting both but implementing both was relatively easy. Especially earlier ones you mentioned is just matter of 1 if condition and do under the hood same thing, Translate Where/Select and then apply aggregate.
The last pattern is bit more involved then in identifying but then again once identified, we just call LeftJoin anyway.

The patterns you are proposing, are there so easily interchange-able? May be in some cases. Though from my understanding of LINQ the cost to implement is not going to be that cheap as the case you mentioned. I personally have no preference in supporting one pattern or all. At the end of the day it is trade-off, if more than one pattern to do the same thing should support or more translation added. And where each customer lies on that spectrum of trade-off is different. As this issue is open, we as a team are going to discussing it again.

@powermetal63 from my point of view (which I think @smitpatel shares), we do indeed implement interchangeable LINQ constructs in general, when there are no other factors in play. I do think there's value in doing so (some constructs are easier for some people to reason in, etc.).

In this very specific case, though, there's an additional important factor - the rules on what can get client evaluated and what cannot. We could client-evaluate this specific GroupJoin scenario (which is interchangeable with top-level Select which we do client-evaluate), but at the cost of blurring the general distinction between what gets client-evaluated and what not: at the moment only top-level Select gets client-evaluated, and that's simple and clear; if we start allowing some types of GroupJoin, that gets blurred and complicated. So the question here (as I understand it) really is about consistency and simplicity of client evaluation rules rather than interchangeability of LINQ constructs.

But as @smitpatel we'll be revisiting this again as a team.

Note from team meeting: @smitpatel will discuss with team in a design meeting.

We discussed this issue in a team meeting in detailed and I have read all above comments thoroughly. Following are our conclusions
GroupBy operator
As listed in this issue earlier following are missing translations in GroupBy operator which we will add in future and we will not client eval any of those patterns.

  • GroupBy-aggregate operator with another join (#10012)
  • Grouping-Where-Aggregate or Grouping-Select-Aggregate (#11711/ #18836)
  • Grouping-FirstOrDefault over the group (#12088)
  • Ability to select top N of each group (#13805)
  • Grouping-Distinct-Aggregate (#17376)
  • GroupBy entity type (#17653)
  • Filed #19929 to track adding client eval of GroupBy operator when it is final query operator.

GroupJoin operator
GroupJoin is mainly used to generate LeftJoin and there is no SQL equivalent. Hence we will not translate any composition over GroupJoin.

  • Filed #19930 to client eval GroupJoin when it is final query operator and grouping is not being composed over.
    In all other cases, GroupJoin is not going to supported.

GroupJoin <-> GroupBy equivalence
Due to high cost of pattern matching GroupJoin operator, we are not going to add any translation in GroupJoin even if equivalent GroupBy query would be translated. Customers would be required use GroupBy syntax as that is closest representation of what is the SQL.

AsAsyncEnumerable usage
As our exception message says, when client evaluating use AsAsyncEnumerable operator. We understand that after getting IAsyncEnumerable, composing over is not easy. Since Enumerable operators which works on IEnumerable are defined in .NET runtime, we believe that most beneficial thing for all the customers of .NET would be to have equivalent operators which works on IAsyncEnumerable in .NET runtime only. If you need async enumerable operators in your query, please file an issue in https://github.com/dotnet/runtime In the meantime, you can System.Interactive.Async - an external library, which allows enumerable like operators over IAsyncEnumerable for composition. Since this issue is much broader than EF Core and there is a reasonable open source work-around, we are not making any changes to EF Core regarding this.

WithClientEvaluation operator

  • We don't see any value in such operator. EF Core is not going to do any client eval except in top level projection as described in documentation. In order to evaluated further operators on client, use appropriate operator as indicated by message. We understand it is a big breaking changes for some apps but for the best performance it is something which needs to be done at some point. Hence we made the breaking change in 3.x release.

With all of above, I believe that it addresses all the points raised in discussion here.
If you run into issue with GroupBy/GroupJoin and arrive here, then please take time to read this comment fully to direct your feedback in specific issue for your specific scenario.

Closing this issue as non-needed since sub-issues are created to track individual translation patterns.

Was this page helpful?
0 / 5 - 0 ratings