Efcore: Relational: Support translating GroupBy() to SQL

Created on 6 Jun 2015  ยท  100Comments  ยท  Source: dotnet/efcore

_Updated as of 3/14/2018 based on implementation introduced_

LINQ's GroupBy() operators can sometimes be translated to SQL's GROUP BY clauses, in particular when aggregate functions are applied in the projection.

Scope for 2.1 release

Our current intention is for the scope of the work in 2.1 to improve how LINQ's GroupBy is evaluated in this particular scenario:

Grouping on a simple expression that references a single column or multiple columns (using an anonymous type) of the query root and then projecting any aggregate operation from (Count, Sum, Average, Min, Max) and/or any individual property that is part of the grouping key (this will be translated to GROUP BY in SQL)

What is supported in 2.1.0-preview2

Apart from what is supported in 2.1.0-preview1 (details below), we have also added some more patterns

  • Grouping by constant/variables
  • Grouping by properties of reference navigations
  • Ordering or filtering on grouping key or aggregate
  • Aggregate being DTO/nominal type

Examples
```C#
// Grouping by constant or a variable from closure
db.Orders.GroupBy(o => 2).Select(g => g.Count());
var a = 5;
db.Orders.GroupBy(o => a).Select(g => g.Count());

// Grouping by scalar properties from reference navigations
db.Orders.GroupBy(o => o.Customer.City).Select(g => g.Count());

// Ordering by Key/Aggregate after GroupBy
db.Orders.GroupBy(o => o.CustomerID).OrderBy(o => o.Count()).Select(g => new { g.Key, Count = g.Count() });

// Filtering on Key/Aggregate after GroupBy (Translates to Having clause in SQL)
db.Orders.GroupBy(o => o.CustomerID).Where(o => o.Count() > 0).Select(g => new { g.Key, Count = g.Count() });

// Projecting aggregate into nominal type
db.Orders.GroupBy(o => o.CustomerID).Select(g => new CustomerCountInfo { g.Key, Count = g.Count() });

And a few bugfixes - #11218 #11157 #11176

## What is supported in 2.1.0-preview1

``` csharp

// Grouping by single column projecting aggregate or key
db.Orders.GroupBy(o => o.CustomerId).Select(g => g.Count());
db.Orders.GroupBy(o => o.CustomerId).Select(g => new { CustomerId = g.Key, Count = g.Count() });

// Grouping by multiple columns (using anonymous type) projecting aggregate or key or nested key
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate }).Select(g => g.Sum(o => o.Cost));
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate })
  .Select(g => new { CustomerId = g.Key.CustomerId, Sum = g.Sum(o => o.Cost) });
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate })
  .Select(g => new { Key = g.Key, Sum = g.Sum(o => o.Cost) }); // Where Key will be anonymous object

// Grouping after complex query root
(from o in db.Orders.Where(o => o.OrderID < 10400).OrderBy(o => o.OrderDate).Take(100)
    join c in db.Customers.Where(c => c.CustomerID != "DRACD" && c.CustomerID != "FOLKO").OrderBy(c => c.City).Skip(10).Take(50)
        on o.CustomerID equals c.CustomerID
    group o by c.CustomerID)
.Select(g => new { g.Key, Count = g.Average(o => o.OrderID) });

db.Orders.OrderBy(o => o.OrderID)
    .Skip(80)
    .Take(500)
    .GroupBy(o => o.CustomerID)
    .Select(g => g.Max(o => o.OrderID));

// All above examples have group of entity types after GroupBy

// Selecting Group of anonymous types containing multiple columns
db.Orders.GroupBy(o => o.CustomerId, new {o.OrderDate, o.Price}).Select(g => g.Sum(t => t.Price));

Scenarios that we are not planning to improve in the 2.1 release

~1. Grouping on constants~ (available in 2.1.0-preview2)

  1. Grouping on an entity (e.g. a reference navigation property)
  2. Projecting non-aggregate scalar subqueries after grouping, e.g. FirstOrDefault()
  3. Making groups of multiple entityTypes using anonymous types.
  4. Using Key/Aggregate values after GroupBy in joins (#10012)

All the scenarios above present different variations depending on what happens after the GroupBy, e.g. is there an aggregate function or not, is the key mentioned in the projection or not, etc. These scenarios will still result in client evaluation.

We would appreciate if customers that care about EF Core supporting any of those scenarios that are scoped out from 2.1 to create individual issues for them, up-vote them, and keep the discussion there.

closed-fixed type-enhancement

Most helpful comment

Two years and almost two versions later this basic function has not yet been implemented nor is it in the priorities. The EntityFramework Core should still be in beta and not going to version 2.0

All 100 comments

I have these classes

``` C#
public class Blog
{
public int BlogId { get; set; }
public string Url { get; set; }

public List<Post> Posts { get; set; }

}

public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
public string Content { get; set; }

public int BlogId { get; set; }
public Blog Blog { get; set; }

}

If I use this code

``` C#
var query1 = from p in context.Posts
             where p.BlogId == 1
             group p by p.Title into g
             select new
             {
                 Title = g.Key,
                 Count = g.Count()
             };

EF7 generates this SQL

SELECT [p].[PostId], [p].[BlogId], [p].[Content], [p].[Title]
FROM [Post] AS [p]
WHERE [p].[BlogId] = 1

without grouping and projection and groups data on the client side

If I use Group by with Inner Join with this code

``` C#
var query = from p in context.Posts
join b in context.Blogs on p.BlogId equals b.BlogId
where p.BlogId == 1
group p by p.Title into g
select new
{
Title = g.Key,
Count = g.Count()
};

I get this error

{"Unable to cast object of type 'Microsoft.Data.Entity.Query.QueryResultScope1[<>f__AnonymousType12[EF7Test.Post,EF7Test.Blog]]' to type 'Microsoft.Data.Entity.Query.QueryResultScope`1[EF7Test.Post]'."}
```

EF 7.0.0-beta6-13735

Just another test case:

public class Categoria
{
    public int Id { get; set; }
    public int? CategoriaId { get; set; }
    public string Nombre { get; set; }
}

var q = from c in dbContext.Categorias
                join cc in dbContext.Categorias on c.CategoriaId equals cc.Id
                group c by c.CategoriaId into g
                select new { Id = g.Key.Value, total = g.Count() };

Throws the following exception:

{"Unable to cast object of type 'Microsoft.Data.Entity.Query.QueryResultScope`1[<>f__AnonymousType3`2[nyp.DataModels.Categoria,nyp.DataModels.Categoria]]' to type 'Microsoft.Data.Entity.Query.QueryResultScope`1[nyp.DataModels.Categoria]'."}

This is for a recursive tree-like structure. Here I'm just trying to count children.

@anpete should this go to backlog for now?

@rowanmiller Sounds good.

Why it's go to backlog?!
It's critical for for information and reports systems.
For example, I have 2 million records and I need to show sales by customer. Grouping in client side, is impossible!

View/ Indexed view?

Or a raw SQL query. It's not that we aren't going to support it, just that we don't have time to get this translating (and stabilize it) before the 7.0.0 release. This one will be one of the first things we tackle after that.

What would be the recommended approach using SQL (FromSql)?
The problem I see is that we need an entity to materialize the records into, and I think it would need to be mapped.

@rjperes that would require a feature we are planning to add in 1.1.0: the ability to boostrap queries using arbitrary types: #1862.

@rowanmiller I was hoping this was a 1.1 milestone, but I don't see this in 1.2 either. Any idea when this will get pulled off the backlog? I know we have work-arounds in the mean time.

@chris-oswald not yet, we are planning for the next release this week.

It's good to at least see it is listed as item #2 in the Critical O/RM features section.

Even if all situations cannot be accounted for - I hope we are going to at least get the simplest cases covered.

And to be fair @chris-oswald there is literally nothing in the current 1.2 feature list - you scared me for a moment!

Heads up: I have made a lot of headway in this area. Expect a PR soon ๐Ÿ˜ธ

Checking in:

I've been using almost all of my free time to work on this and I'm just fixing some remaining failing tests. I've learned a lot about the codebase during this time and as I've worked on things, I've both inadvertently and purposefully introduced a number of other optimizations like LEFT JOIN LATERAL/OUTER APPLY, more compact SQL queries as a result of improving the 'query source requires materialization heuristics', and more. I just don't want anyone to think maybe I fell off of working on this ๐Ÿ˜‰

Derek @tuespetre I work for a data warehouse company and we do tons for query performance tuning. If you ever want any help in query optimization from a SQL Server perspective please reach out to my Git account. Happy to conference call.

@Derek Excited :-) Since you mention LEFT OUTER - can you check on this
issue at some point:
http://stackoverflow.com/questions/40544642/efcore-returning-too-many-columns-for-a-simple-left-outer-join
I never got around to formally reporting it.

In summary of the issue : LEFT OUTER with a simple Select projection would
end up returning all columns from the DB instead of just the ones in the
projection.

Thanks!

On Mon, Jan 23, 2017 at 9:13 AM, Derek Gray notifications@github.com
wrote:

Checking in:

I've been using almost all of my free time to work on this and I'm just
fixing some remaining failing tests. I've learned a lot about the codebase
during this time and as I've work on things, I've both inadvertently and
purposefully introduced a number of other optimizations like LEFT JOIN
LATERAL/OUTER APPLY, more compact SQL queries as a result of improving
the 'query source requires materialization heuristics', and more. I just
don't want anyone to think maybe I fell off of working on this ๐Ÿ˜‰

โ€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/aspnet/EntityFramework/issues/2341#issuecomment-274551867,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADfKVnucT5t8ePDk0Ph5aJLDca-6tehOks5rVN_NgaJpZM4E6HAK
.

@simeyla the changes I've been making will address many such scenarios.

Okay... I have all bajillion (as a compliment) tests passing, now I'm going to rebase onto the last month or so of commits that have been merged to dev since I started before submitting a PR.

@tuespetre, A lot of success! Many guys are waiting for it.

7543

@tuespetre Excellent, very exciting! Can you elaborate on Complex GroupBy keys still needing to be covered?

Does this mean the first will happen on the database, but the second will still happen in code after selecting the necessary records?

from p in context.Employee
group p by p.Title into g
select new
{
    g.Key,
    count = g.Count()
};
from p in context.Employee
join t in context.Title on p.TitleId equals t.Id
group new { p, t } into new { p.Age, t.Title } into g
select new
{
    g.Key.Age,
    g.Key.Title,
    count = g.Count()
};

@langdonx something like that, I've already got the right GROUP BY statement generating for that case, but I think the projection currently won't work. I'm pretty sure of what needs to be done, I just need to go back in and write the tests and so forth now (simple keys, single-level complex keys, multi-level complex keys, keys with functions, keys with binary expressions, keys with coalesce, etc., etc.)

Hi @tuespetre
Eagerly awaiting the #7543 pull request being merged! Kudos to you for all your work.

Please can I check if this scenario will be included in the SQL translation improvements, so I don't create a duplicate issue:

```C#
var grandTotalsData = await dbContext.Expenses.GroupBy(e => 0).Select(g => new
{
Count = g.Count(),
GrossAmountTotal = g.Sum(e => e.GrossAmount),
GrossAmountAverage = g.Average(e => e.GrossAmount),
}).SingleAsync();

The ```GroupBy(e => 0)``` expression uses a constant to indicate that I want the entire table in one group.
I'm hoping this will generate a single query with multiple aggregates:

```SQL
select count(*) as Count, sum(GrossAmount) as GrossAmountTotal, avg(GrossAmount) as GrossAmountAverage
from Expenses

At the moment it generates warnings that GroupBy, Count, Sum, Average and Single cannot be translated. I've also seen the same warning about DefaultIfEmpty.

When used in isolation, Count and Sum are translated to SQL, but not in the context of a GroupBy like this.
Average, however, doesn't appear to translate at all.

Thanks

@danielgreen you can see some of the tests and sample generated SQL here:

https://github.com/tuespetre/EntityFramework/blob/52561bdb44a26e70b1b939b33af0157275b3ebdc/test/Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests/QuerySqlServerTest.cs#L6890

I'm currently working to break down #7543 as much as possible per @smitpatel / @maumar's request.

@langdonx we'll see how this one goes: #7697

```
query = query.Where(e => e.Goodscode.Contains(filter) || e.GoodsName.Contains(filter));
return query.Select(e => new GoodsDTO
{
Name = e.GoodsName,
Code = e.Goodscode,
T3Code = e.T3Code,
StockId = e.StockId
}).Take(100).ToList().Distinct(new GoodsDTOComparer()).Take(20);
//why we do like up: because EF7 dosent support Distinct and GroupBy yet (12-03-2017)
//why i use DotNetCore for a reporting system? :(

The support of GroupBy translation to SQL is not planned for the release 2.0.0?

Hi guys, I am working on a project which translates almost all EF lambda expressions into SQL to avoid in-memory execution. Maybe you guys can have a look and see if it is helpful.

Here is the link to the project:
https://github.com/ethanli83/EFSqlTranslator

You can also try it out here:
http://linqrunner.daydreamer.io/

It would be great to understand when this will be coming as it is one of our largest issues with moving to EF Core for a large part of our stack.

Huge issue, that i expected to see in build...

@smitpatel Any chance of seeing this in EFCore 2.0?

@andymac4182 - This is a decent amount of work to implement in terms of how the current query infrastructure is. Sadly, as Roadmap suggests, it will not be in 2.0. There is just not enough time to add this feature and stabilize it.

Does that mean we will possibly see this in 2.x? or will it be a 3.0 release sort of time frame? Considering how much we use EF this is a large blocker to us moving to Core.

Also that Roadmap really confuses me since you are releasing features not part of the "Critical" list before completing what you yourselves have deemed as critical.

Two years and almost two versions later this basic function has not yet been implemented nor is it in the priorities. The EntityFramework Core should still be in beta and not going to version 2.0

Very hard to believe this isn't a higher priority. Hoping someone finds the time to grab this one and get it done. Looking through this thread, looks like a good attempt was made, but the PR ended up being too broad and wasn't pulled in. Even just some level of support for simple grouped queries and common aggregate functions would be better than nothing. There shouldn't be any big inventions here, these translations have already been done by several open source ORMs.

@rweads0520 it is kind of hard to fit it into the existing codebase. I think I hate Relinq, to be honest. I gave up and started a side project to prove to myself that I could translate all the LINQ operators including hardcore stuff like GroupBy, Zip, and so forth using only expression visitors and expressions, and I'm certain took me less time than it did to do the work I did for those PRs.

@tuespetre - Which is kind of a shame, because as I understand it, the whole point of EFCore was originally to make it easier to do stuff that the EF6 codebase made difficult. And it sounds like it's actually worse in some scenarios than EF6.

Given that critical stuff like this still isn't supported, is there any reason to move to EFCore from EF6? I totally get that not every EF6 scenario will be supported by EFCore, but this is, well, kind of core, you know? (Not blaming you, obviously - just wondering what the rest of the EFCore team has to say about how this feature has gotten dropped.)

@tuespetre Don't give up on your PR I feel you were so close!

Mind-boggling that they got out of beta without a working Group By functionality. Every time I tangle with EF I end up lowering my expectations. I keep thinking I know how limited it is and then it gets worse.
Personally I think this kind of limitation ends up putting the whole Core project into Beta status. It's interesting, but if you don't have a solid data access technology you can use, it's ultimately pointless.

Just spent a week converting a project to .net Core as an experiment. Looks like that won't be going any further for a while.

@tuespetre is the side project you mentioned publicly available?

We use EF6 (and OData) extensively in our projects and it's worked out great.. Some of these responses are a bit much

Similar to @Ben-Pattinson, with the release of everything-2.0 I've been looking into getting our libraries converted over to .Net Standard 2.0 which includes converting over to this. Not having GroupBy is obviously going to be a blocker, but if it does end up in 2.1 as specified then the timing might work out great.

@rowanmiller who is accountable for following up on whether this gets into 2.0 or some release at some point? Just hoping that a more definitive answer can come sooner than later. I get that this had to be tabled for the very first release, but it is two years later now... But I do understand that there are always lots of constraints. I'm just asking for some kind of realistic plan to be made and then disseminated to us all. Thanks!

@eriksendc If you take a look at the roadmap you will see that this is planned for 2.1

@ErikEJ Oh thanks and so sorry for missing that. Super great! :) ๐Ÿ‘

Out of curiosity (as I'm playing with query building in TypeScript) - could @divega explain what he means by

LINQ's GroupBy() operators can sometimes be translated to SQL's GROUP BY clauses...

What kind of GroupBy() can't be translated to an SQL GROUP BY?

@jtheisen really weird ones; the hardest thing is usually not the grouping key, but translating things that are done with the grouping afterwards. Things like doing a SelectMany from the groupings, taking the first item from each grouping, querying navigation properties off of the grouping's contents, using navigation properties within aggregations, and so forth.

@tuespetre Hmm, I would think that in those cases there's always some naive approach like this:

SELECT
    (SELECT SUM(s.value1) FROM source WHERE s.key = k.key) sum, -- aggregation
FROM (SELECT DISTINCT s.key key FROM source s) k
OUTER APPLY (SELECT TOP 1 * FROM source s WHERE s.key = k.key ORDER BY ...) fr -- first row

Maybe you have to do more than one of those and join the results together.

Of course it's a bad query and I see that making good queries may be a lot harder though.

But if you have something more concrete that is really difficult to express in SQL at all, even with sub-optimal queries, I'd really love to see it.

@jtheisen a good example would be trying to pull a value out of a grouping that comes from a joined subquery. I mean, you won't run into it very often if at all, but it has to be considered.

But then we were talking about GroupBy to GROUP BY, not OUTER APPLY and ahem FOR JSON :trollface:

@jtheisen - The simplest example of non-translatable group by would be this
var ordersForCustomer = db.Orders.GroupBy(o => o.CustomerId)
Here you are grouping orders by customerId and getting IGrouping<string,Order>. There is no way to represent a group of orders in SQL in single row.

The SQL GROUP BY syntax has restriction in what can appear in projection. It is restricted to, either column has to be specified in GROUP BY or it has aggregate operation applied on any column. In its simplest (and straight-forward) form, all GroupBy queries which has aggregate operator applied on grouping can be translated to GROUP BY in SQL.

@tuespetre Can you write the linq for that one down? Not trying to be annoying, just genuinely interested to learn about a problem if it indeed exists.

@smitpatel That's a different problem altogether that has nothing to do with grouping. Just take this query using a navigational property:

from o in Orders
select { Order = o, Invoices = o.Invoices }

Here EF has can't select the orders in one row each either. All ORMs flatten the query and re-create the graph structure after the load.

@jtheisen still, that is exactly what I meant: LINQ's GroupBy and SQL's GROUP BY are not the same thing, and only a subset of the queries in which you use GroupBy in LINQ can be translated to a query using GROUP BY in the SQL.

The examples provided by @smitpatel and @tuespetre don't fall into that bucket.

@divega Well, yes, they are not the same thing. You can't always translate a GroupBy with just a GROUP BY.

Under that light I read your OP as asking whether EF Core should support GroupBys only in those cases where just the simple GROUP BY in the translation was enough.

I didn't read it that way because that would mean to aim really low compared to EF6.

@jtheisen fair point.

@smitpatel could you help us (as part of the work you are doing on this) come up with a list of the different scenarios in which GroupBy can be translated to something that has advantages over client evaluation, so that we can make an explicit decision on priorities?

I think it would be fine to track them all here, although creating separate issues would make things easier if we end up assigning very different priorities to them.

Hi @divega

Here is one scenarios:

var ordersForCustomer = db.Orders
    .GroupBy(o => o.CustomerId)
    .Select(g => new {
        CustomerId = g.Key,
        Count = g.Sum(o => o.TotalCost)
    );

This query gives you total cost of orders made by each customer. If you do this in memory using Linq, it will need to load all the data into memory and then aggregate. Translating it in SQL and execute it will be much faster as there is much less data to load, and db indices will be leveraged too.

@ethanli83 thanks. That one is actually well understood. We are talking about the ones that require APPLY, e.g. because something that isnโ€™t an aggregate function is projected, and potentially others.

@divega I think one really important group to fully support would be those created by data sources for data grids. DevExpress and Telerik have data grids you can bind to IQueryables that also create GroupBys and/or distinct for their requests in three cases:

  • When the user as grouped the result (data grids can be configured to allow that).
  • To populate filter selection choosers.
  • To get total summaries.

Those aren't requiring APPLY though, so that should be an easy and very important subset.

hi @divega, so maybe something like this?

// Group on an enity rather than columns, and then access column on the group key.
db.Posts
    .GroupBy(p => new { Blog = p.Blog })
    .Select(g => new { UserId = g.Key.Blog.User.UserId })

or this?

// Group on aggregation
db.Blogs
    .Where(b => b.Url != null)
    .GroupBy(b => new { Cnt = b.Posts.Count(), Avg = b.Posts.Average(p => p.LikeCount) })
    .Select(x => new { Cnt = x.Key.Cnt, Avg = x.Key.Avg, CommentCount = x.Sum(b => b.CommentCount) })

After that I believe getting the first item of something comes up frequently:

from o in orders select new { Order = o, LatestInvoice = o.Invoices.OrderBy(...).FirstOrDefault() }

That requires APPLY.

@jtheisen stepping back a little, the translation you previously proposed to one of @tuespetreโ€™s examples uses DISTINCT + OUTER APPLY, and given that DISTINCT can be considered just a syntax shortcut for a GROUP BY, it does belong to the GROUP BY + APPLY โ€œclassโ€ i was referring to.

Re GroupBy() queries generated by grids, can you provide examples and explain how they can benefit from processing on the server other than generating ORDER BY in the translation?

@divega My OUTER APPLY query was as if it came from something like

from o in orders select new {
    Order = o,
    NumberOfInvoices = o.Invoices.Count(),
    LatestInvoice = o.Invoices.OrderByDescending(...).FirstOrDefault()
}

If it's invoices and orders it probably doesn't matter much because orders don't have many invoices.

But it would matter if the associated collection can contain, on average, many items for the reason @ethanli83 stated in his second last comment.

APPLYs may not be the most pressing issue of all though - although I would say that if EF Core doesn't aim to do that eventually, that would be a bit sad. (Btw, I put a question on SO about what the eventual aim is of the EF Core team in terms of query translation sophistication - it would be awesome if the right person could say something to that.)

Re the kind of queries that grids generate: from reading one of the issues linked here, it seems the produce separate queries with aggregates for summaries. This should fall into simple GROUP BY + aggregate class.

@divega The second question regarding how server processing benefits was again addressed by @ethanli83 comment I referred to in my last comment - and in that case it's really an important issue.

I have a number of grids in LOB applications that if they'd do grouping client-side, they'd probably choke.

@divega Absolutely. It's the simpler case.

Thanks @jtheisen for mentioning data grids.
In DevExtreme we use this pattern for building dynamic queries:

c# data .Where(...) // optionally filter .GroupBy(...) // group by a number of keys .OrderBy(...).ThenBy(...) // order by keys .Select(/* keys and aggregates */)
One special case we use extensively is grouping by the empty key (new { }) to fetch multiple aggregates for the whole dataset in one go. Some time ago @jtheisen reported the issue https://github.com/aspnet/EntityFramework6/issues/34 which occurs when the empty key is used. Please consider it in EF Core.

@divega

Re GroupBy() queries generated by grids, can you provide examples and explain how they can benefit from processing on the server other than generating ORDER BY in the translation?

Those are cases when the grid is used as a simple analytic tool. An end user applies filtering and grouping and then explores groups+aggregates without preloading individual data items.

It would be great to support the case with grouping by a constant. It's similar to the case that was mentioned by @AlekseyMartynov. We also use such a query to fetch multiple aggregates. The query is dynamically generated, so it would be hard to change it to something else.

var query = context.Orders.Where(o => (o.OrderId > 0))
.GroupBy(Param_0 => 0)
.Select(Param_1 => new {
P0 = Param_1.Key,
P1 = Param_1.Count(),
P2 = Param_1.Sum(elem => elem.Freight),
P3 = Param_1.Max(elem => elem.ShipName),
P4 = Param_1.Min(elem => elem.ShipAddress)
});

Also, I would like to note that there were problems with such a query via EntityFramework 6 with third party providers like MySQL and Oracle.

@DunkanDX & @AlekseyMartynov,

Does it makes more sense to support translating things like Math.Sum rather than using GroupBy this way? Something like:

var query = context.Orders.Where(o => (o.OrderId > 0))
    .Select(o => new {
        Sum = Math.Sum(e => e.Freight),
        Max = Math.Max(e => e.ShipName),
        Min = Math.Min(e => e.ShipAddress)
});

Which can simply be translated into:

SELECT sum(o.Freight) Sum, max(o.ShipName), min(o.ShipAddress)
FROM Orders o
WHERE o.OrderId > 10

@ethanli83
The same query with grouping by constant works well against many kinds of Linq-sources including EntityFramework 4, 5 and 6.
As far as I understand your query will work only against EFCore. It won't work even for Linq To Object.
Thus, in my opinion, it better to support common scenario.

@DunkanDX, you are right. Also, I think we do not want thing that does not work as normal linq. thx

In past few days, there is discussion going on around what queries fall into the bucket of GROUP BY. Since Linq allows a large variety of combination of operators, the amount of various queries with GroupBy operator is unbounded. As a long term goal of EF Core, we would want to translate queries to server as much as possible unless there is bad perf impact. But that is eventual goal. Trying to translate an unbounded set of queries to server in single release is kinda unrealistic goal.

If we just discuss about GroupBy case, linq GroupBy & SQL GROUP BY are not same thing. While GroupBy operator would allow a lot of different operations regarding, grouping key and element selector, GROUP BY clause in SQL is much more restricted in terms of what you can exactly represent. As various post in above discussion said, at times GroupBy would translate to OUTER APPLY too. Same way, we could find various translation of GroupBy in some different shape of SQL. But covering all such translations in 2.1 release timeframe is hard to fit in goal and it may not complete.

I am removing this from 2.1.0 milestone till we figure out what are exact scenario(s) we are considering we are considering for 2.1 release. Covering all variety of GroupBy is unreachable goal.

Is it possible to support, at least, some the translations made by EF6? I know this is kind of vague but, IMO, it is better to have some basic support to Group By now than have a complete and full featured Group By later. Other improvements could be introduced later.

Having this example:

            var query =
                from table in _context.Table
                group table by table.IdParent into tables
                select new
                {
                    Id = tables.Key,
                    Count = tables.Count()
                };

            var result = query.ToList();

Right now, this basic Group By is translated in something like select * from table.
I am not following the development of entity framework, and i really have no idea what is the complexity of doing this, but i think this kind of translations should be supported as soon as possible.

Hope you consider, at least, having a basic support for the 2.1 release.

Thanks.

@smitpatel I fear that from a management perspective removing this from the 2.1.0 milestone for the reasons you give just kicks the can down the road. Someone needs to dig in and refine the scenarios now so that it _does_ get done in 2.1.0.

@brunoalreis "Is it possible to support, at least, the translations made by EF6?"

EF6 translates nearly everything group-by-wise.

The way I understand it is that this is that EF Core likely never will be as good as EF6, let alone in 2.1.0.

@smitpatel to provide a brief summary of what we will address in 2.1. We re-assess post 2.1 what further cases to consider and how to handle them.

Maybe by the TENTH anniversary of Linq2Sql?

https://msdn.microsoft.com/en-us/library/cc161164.aspx

http://www.c-sharpcorner.com/blogs/group-by-in-linq-to-sql1

On Tue, Sep 5, 2017 at 3:28 PM, Smit Patel notifications@github.com wrote:

In past few days, there is discussion going on around what queries fall
into the bucket of GROUP BY. Since Linq allows a large variety of
combination of operators, the amount of various queries with GroupBy
operator is unbounded. As a long term goal of EF Core, we would want to
translate queries to server as much as possible unless there is bad perf
impact. But that is eventual goal. Trying to translate an unbounded set of
queries to server in single release is kinda unrealistic goal.

If we just discuss about GroupBy case, linq GroupBy & SQL GROUP BY are
not same thing. While GroupBy operator would allow a lot of different
operations regarding, grouping key and element selector, GROUP BY clause
in SQL is much more restricted in terms of what you can exactly represent.
As various post in above discussion said, at times GroupBy would translate
to OUTER APPLY too. Same way, we could find various translation of GroupBy
in some different shape of SQL. But covering all such translations in 2.1
release timeframe is hard to fit in goal and it may not complete.

I am removing this from 2.1.0 milestone till we figure out what are exact
scenario(s) we are considering we are considering for 2.1 release. Covering
all variety of GroupBy is unreachable goal.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-327321545,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADfKVpJnFbMaX60cHimxamacVLXScDekks5sfcsigaJpZM4E6HAK
.

I wanted to thank some of the recent feedback on this issue and also take the chance to clarify what is really going on, given that thread has turned into some speculation on our triage process, like what it means to remove the milestone of an issue, or whether EF Core's GroupBy support is ever going to better than other O/RMs, etc.

First of all, I would like everyone to understand that we consider that improving the support of GroupBy means a few different things, i.e. there are different scenarios for which the new translation would need to work quite differently, and we are effectively assigning different priorities to those individual scenarios. Some of them will be supported in 2.1, some of them may be supported later.

I have updated the original text of this issue to reflect exactly we are planning to support in 2.1, and to enumerate the scenarios that we have decided to scope out of 2.1, and that we are possibly going to address later, depending on feedback.

I would like to stress that customer input is one of the most important things we consider when deciding what to do in each release, but prioritizing and scoping is a big part of what all software projects need to do every day in order to make progress.

We try to prioritize things that we believe can help the largest number of customers the most, but any time we decide to implement a specific feature in a given release, we are usually postponing other features that could also help.

We just happen not to think that improving the GroupBy support in all of the possible scenarios (i.e. have a superset of the capabilities of other products, including EF6) is a more important goal that every other feature or bug in our backlog.

Rather than repeating how the process of selecting what features go into a release here again, I would like to invite everyone to read a page we put together recently, precisely on this subject:

https://github.com/aspnet/EntityFrameworkCore/wiki/release-planning-process

@divega Thanks so much for clarifying. When @smitpatel said "I am removing this from 2.1.0 milestone till we figure out what are exact scenario(s) we are considering" that was very worrisome. It seemed like any / all GroupBy functionality was going to be off the table for 2.1. I've been a participant in many teams using different release planning processes and it just sounded dubious (to me, at least). Thanks to you and all who participated in determining what we'll be looking forward to.

@eriksendc Understood. For future reference, in our regular triage meeting the main query we look at is "issues with no milestone".

Interesting feedback from @FransBouma: DevExpress' grid creates groupby predicates on constant.

Indeed, I searched our forums for the issue, and it's from 2008, but I doubt they've changed it. Their LinqServerModeDataSource control can be bound to an IQueryable and be used in grids (winforms, webforms etc.) and they generate grouping on constant and add the g.Key to the projection. The control bound to the grid allows the user to group, sort and filter on the datasource and the control then builds a linq query with these elements. If no grouping is done, it groups on a constant, and this will then fail :)

As this is something you'd never expect to happen, and I do recall the issue quite clearly (as it was a problem to fix it properly back then), I thought I'd mention it.

@bricelam @FransBouma - I have filed #9969 to track it.

@divega Can you add to the scope of work throwing an exception if grouping on a constant, where the exception links back to #9969 ?

@jzabroski - At present grouping on constant does not throw. It would evaluate it on client. Change it to throw an exception is breaking change.

Hi. Have we had any supports about this?

This seems still not suppprted... I'm currently working on a project and facing the performance issue as it doesn't translate to select count(*), xxxx group by xxxx , it actually fetches all the data from DB and does group by locally which causes the performance issue, coz I have got a million records...

var counts = (from t in _dbContext.VoteTickets
join m in _dbContext.VoteItemData
on t.VoteItemId equals m.Id
where m.City == city
group t by m.District into grp
select new
{
District = grp.Key,
Count = grp.Count()
})
.ToArray();

I think you misunderstood me. I was advocating that you define the behavior
via unit test, and explained that many tools expect this behavior, and by
not supporting it, upgrading EF6 to EFCore will bring unexpected surprises.

On Thu, Oct 12, 2017, 6:07 PM Smit Patel notifications@github.com wrote:

@jzabroski https://github.com/jzabroski - At present grouping on
constant does not throw. It would evaluate it on client. Change it to throw
an exception is breaking change.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-336288838,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAbT_baFfDgkb25tNZXnm1L3dQL5A73Gks5sro2ygaJpZM4E6HAK
.

upgrading EF6 to EFCore will bring unexpected surprises.
EF6 to EF Core is not in-place upgrade. EF Core is rewrite of EF6 and in many cases have behavioral differences. Any application expecting the group by constant to work same as EF6 is gonna fail regardless of it client eval or throw. By doing client eval, there is higher probability of external tools may just work.

@mchenx I'm affraid you won't get any joy with EF Core anytime soon. I recommend doing the simple stuff in EF Core and anything complicated in dapper or similar. It really is WAY easier than even full EF. Yes there's hard coded strings, but with string interpolation, nameof and a data layer, you can really make it robust.
Give it a shot, you'll be surprised.

@Ben-Pattinson, maybe you will be interested in this project I worked on EFSqlTranslator. It translates EFCore lambda into string and executes it on Dapper.

@ethanli83, very nice! I must say, initially I was rather sceptical, having been badly burnt by EF's horrific SQL generation. Assuming you haven't cherry picked your examples of SQL generation, they are nice and readable, and moreover, sane. Very un-EF like. You'll get into trouble for stuff like that, the Union will be along in a moment to give you a good talking to :) Comprehensive examples too, keep up the good work!

Thank you @Ben-Pattinson, I will take a look at Union!~

@Ben-Pattinson SQL generation is something we have tried really hard to improve in EF Core. Do you have some examples of where you feel we can improve?

@anpete (sorry for the rambling reply)
I probably should have been more specific, I was referring to full framework's EF sql. I have spent literally days trying to get full EF to output sql that did what I meant. Trying to decipher the generated sql was often a nightmare. Getting the same operation was minutes in dapper. This is why I settled on a hybrid approach. Use EF for CRUD and simple selects, dapper for anything complex. When I started porting to core, I was pleased to see the SQL is indeed much more readable, which is a great improvement.... however imho the lack of group-by support makes it a bit of a joke.
The silent nature in which the group-by executes client-side is particularly nasty for the unwary. It reinforces my position of allways checking the SQL EF outputs - which the new generation does really help with.
I can work round this new limitation by simply extending my dapper use, but I'm sure you can see the problems anyone not on a hybrid approach will have.

I appreciate the problems you have when you need to support the same linq syntax for many different types of query, but when personal projects are running rings around your official data layer implementation, something isn't right.

Personally, all I ever wanted was to write sql in my data layer and have it compile time validated against my data layer. It's so much easier and flexible than translating between unrelated syntaxes. I've got that now with dapper, nameof, string interpolation and t4 templates. It would have been nice to get slicker intellisense support, but it's pretty usable.

@anpete I am not @ben-pattinson but if you could describe what you do
differently in generating SQL, I am a rare combination of SQL Server expert
(going back to 6.5 days of Ron Soukup) and C# engineer who has built web
apps, desktop apps, as well as a proprietary ORM-like data layer.

Years ago, I explained to Diego that EF6 and lower was fundamentally flawed
in how it generated queries, and we (the company I was working for) even
logged a bug and workaround for the most fundamental flaw in the query
generator: the dreaded duplicate left join bug where the "forest" of
possible left join possibilities does not reduce to a single, unique tree.

Cheers,
John

On Mon, Feb 26, 2018, 7:55 PM Andrew Peters notifications@github.com
wrote:

@Ben-Pattinson https://github.com/ben-pattinson SQL generation is
something we have tried really hard to improve in EF Core. Do you have some
examples of where you feel we can improve?

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-368707133,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAbT_dlkAEnarCjYtZo70QO0s2HNroufks5tY1KFgaJpZM4E6HAK
.

Smit Patel , your hardline stance indicates a severe code smell. You are
stating a policy, whereas I'm talking about the basic mechanism. If you're
hardwiring sequences of collaborations between your Domain Object Hydration
Engine/Object Materializer and your Expression Transformation (Query
Transpiler) Engine, then you are likely going to regret it and never even
realize somebody told you you were coding it wrong on February 27th, 2018.

Decouple, decouple, decouple. Let your customers choose.

On Tue, Feb 20, 2018, 6:45 PM Smit Patel notifications@github.com wrote:

upgrading EF6 to EFCore will bring unexpected surprises.

EF6 to EF Core is not in-place upgrade. EF Core is rewrite of EF6 and in
many cases have behavioral differences. Any application expecting the group
by constant to work same as EF6 is gonna fail regardless of it client eval
or throw. By doing client eval, there is higher probability of external
tools may just work.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-367161488,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAbT_bhKWA6HTDS4xfFb0VLWGRTm-NUkks5tW1kLgaJpZM4E6HAK
.

@Ben-Pattinson Thanks for the feedback and glad to hear that you have come up with a solution that works for you.

A couple of responses to the points you make: LINQ GroupBy is very different from SQL GROUP BY and so we need client eval here for many cases anyway. It was a shame we weren't able to get to GroupBy with aggregates translation until 2.1, but it is almost here. One reason for this prioritization was that we have the FromSql API, which gives you a Dapper-like experience for creating queries with raw SQL. Have you tried it? If so, it would be great if you could let us know why it is not working for you - It could be because we didn't have Query Types, which are also landing in 2.1, and make the FromSql experience much better because you no longer need an entity result type.

@anpete It's really great to finally see this working :-)

But oddly enough I still receive the error message telling me it cannot translate to GROUP BY, even though the generated SQL right underneath most certainly uses GROUP BY! I assume this must be a known issue?

Entity Framework Core 2.1.0-preview1-28290 initialized ......

warn: Microsoft.EntityFrameworkCore.Query[20500]
The LINQ expression 'GroupBy(new <>f__AnonymousType1`1(OrderDateDt = [x].OrderDateDt), [x])' could not be translated and will be evaluated locally.

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (33ms) [Parameters=[], CommandType='Text', CommandTimeout='30']

      SELECT [x].[OrderDateDt], COUNT(*)
      FROM [Order] AS [x]
      WHERE [x].[CompletedOrderId] IS NOT NULL AND ([x].[OrderDateDt] > DATEADD(day, -30E0, GETDATE()))
      GROUP BY [x].[OrderDateDt]

This is from this source C#

        var groupedOrders = _context.Order
                    .Where(x => x.CompletedOrderId != null)
                    .Where(x => x.OrderDateDt > DateTime.Now.AddDays(-30))
                    .GroupBy(x => new { x.OrderDateDt })
                    .Select(x => new
                    {
                        Date = x.Key.OrderDateDt,
                        Count = x.Count()
                    })
                    .ToList();

It returns instantly and definitely works as expected so I'm happy for now - just wanted to point this out.

@simeyla - Thanks for info. I have filed #11157 to track the issue and fix it.

For preview2 version, there are more patterns now being translated to server (including group by constant). I have updated first post to capture the details.

Please let me know if I can solve the following in 2.1.1:

I need to take the latest item from each group, but I receive warnings saying the query cannot be translated. Is there a workaround and is it going to be possible in the foreseeable future?

Example:

    public class Author
    {
        public int AuthorId { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
    }

    public class Book
    {
        public int BookId { get; set; }
        public int AuthorId { get; set; }
        public string Name { get; set; }
        public DateTime CreatedAt { get; set; }

        public virtual Author Author { get; set; }
    }

    public class EFCoreDemoContext : DbContext
    {
        public static readonly LoggerFactory MyLoggerFactory
            = new LoggerFactory(new[] { new ConsoleLoggerProvider((_, __) => true, true) });

        public DbSet<Author> Authors { get; set; }
        public DbSet<Book> Books { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseLoggerFactory(MyLoggerFactory).UseSqlServer("Server=(localdb)\\MSSQLLocalDB;Database=Test;Trusted_Connection=True");
        }
    }

    static void Main(string[] args)
    {
        using (var context = new EFCoreDemoContext())
        {
            context.Database.EnsureCreated();

            // works as expected
            var test1 = context.Books.GroupBy(x => x.AuthorId).Select(x => x.Count()).ToList();

            // warn: The LINQ expression 'GroupBy([x].AuthorId, [x])' could not be translated and will be evaluated locally.
            // warn: The LINQ expression 'orderby [b].CreatedAt asc' could not be translated and will be evaluated locally.
            // warn: The LINQ expression 'Last()' could not be translated and will be evaluated locally.
            var test2 = context.Books.GroupBy(x => x.AuthorId).Select(x => x.OrderBy(b => b.CreatedAt).Last()).ToList();
        }

        Console.ReadKey();
    }
Was this page helpful?
0 / 5 - 0 ratings