Efcore: Query: Translate GroupBy constant value with aggregate

Created on 4 Oct 2017  路  15Comments  路  Source: dotnet/efcore

Based on https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-326918994 & https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-334268880

```C#
var query = context.Orders.Where(o => (o.OrderId > 0))
.GroupBy(Param_0 => 0)
.Select(Param_1 => new {
P0 = Param_1.Key,
P1 = Param_1.Count(),
P2 = Param_1.Sum(elem => elem.Freight),
P3 = Param_1.Max(elem => elem.ShipName),
P4 = Param_1.Min(elem => elem.ShipAddress)
});

Queries like above cannot be translated to relational group by since group by constant is invalid SQL
```SQL
select CustomerID, count(*) from orders
group by 0;

throws

Msg 164, Level 15, State 1, Line 2
Each GROUP BY expression must contain at least one column that is not an outer reference.

Since the purpose of such queries is just to get full aggregate of whole table rather than group-wise, we can perhaps rewrite QueryModel to simplify the task beforehand so our translation pipeline can work seamlessly.

As indicated by @bricelam , some of the frameworks are generating such queries (he can fill in more details), hence we should separately prioritize that work, hence filing this issue.

closed-fixed type-enhancement

Most helpful comment

We translate
```C#
context.Orders.GroupBy(o => 2).Select(
g =>
new
{
Sum = g.Sum(o => o.OrderID),
Min = g.Min(o => o.OrderID),
g.Key,
Max = g.Max(o => o.OrderID),
Avg = g.Average(o => o.OrderID)
})

```SQL
SELECT SUM([t].[OrderID]) AS [Sum], MIN([t].[OrderID]) AS [Min], [t].[Key], MAX([t].[OrderID]) AS [Max], AVG(CAST([t].[OrderID] AS float)) AS [Avg]
FROM (
    SELECT [o].*, 2 AS [Key]
    FROM [Orders] AS [o]
) AS [t]
GROUP BY [t].[Key]

Same happens when grouping is by a variable from closure.

All 15 comments

I would expect that the given linq query:

var query = context.Orders.Where(o => (o.OrderId > 0))
.GroupBy(Param_0 => 0)
.Select(Param_1 => new {
P0 = Param_1.Key,
P1 = Param_1.Count(),
P2 = Param_1.Sum(elem => elem.Freight),
P3 = Param_1.Max(elem => elem.ShipName),
P4 = Param_1.Min(elem => elem.ShipAddress)
});

would group by the value '0' iff the database can do that and not by the first column as suggested by the author. So the linq query given could translate to the following sql when using postgres as the database:

WITH TMP AS (
   SELECT * FROM (VALUES (0)) AS t (key)
)
SELECT t1.KEY, COUNT(*), SUM(t2.Freight), MAX(t2.ShipName), MIN(t2.ShipAddress) FROM TMP t1
INNER JOIN "TABLE" AS t2 ON 1 = 1
GROUP BY t1.KEY

The proposal is rather a hack and thus in my opinion this enhancement should be closed as invalid since a working implementation of the linq query could be:

var query = context.Orders.Where(o => (o.OrderId > 0))
.GroupBy(Param_0 => Param_0.CustomerId)
.Select(Param_1 => new {
P0 = Param_1.Key,
P1 = Param_1.Count(),
P2 = Param_1.Sum(elem => elem.Freight),
P3 = Param_1.Max(elem => elem.ShipName),
P4 = Param_1.Min(elem => elem.ShipAddress)
});

@dpsenner - GROUP BY 0 is not supported by every database. Creating temporary table and joining with it, may not be supported by all database and that is unnecessary calculation. Regardless, for a given query, if EF is going to give data return in the form user wanted, how EF arrives at the result should be irrelevant to user as long as it is evaluated on server side.

The second linq query posted already translate correctly to server.

So we both agree that this issue is invalid because linq does not support column references by ordinal (i.e. .GroupBy(Param_0 => 0)) but rather expects explicit columns (i.e. .GroupBy(Param_0 => Param_0.CustomerId)) instead?

would group by the value '0' iff the database can do that and not by the first column as suggested by the author.

The Linq & SQL query in first post are not same. The SQL is not the translation of the Linq, it is just an example of that group by 0 is invalid SQL.

GroupBy(c => c.CustomerID) in linq translates to GROUP BY c.CustomerID in database which is direct mapping. Same cannot be applied if linq has GroupBy(c => 0) because of database limitation.

GroupBy constant is perfectly valid linq query and it is being generated by various frameworks already.
```C#

var a = new[] { 1, 2, 3 }.AsQueryable().GroupBy(t => 0).ToList();
a
List>(1) { Lookup.Grouping { 1, 2, 3 } }
```

This issue specifically to translate GroupBy constant in LINQ to something which is valid in database and gives same results.

Has .OrderBy any problem? I'm asking this because ORDER BY (SQL) is simpler than GROUP BY and may be problematic too.

@asfernandes - OrderBy(c => 0) (which is like no-op on linq) gets translated to ORDER BY (SELECT 1) on server.

Ok, so it does not treat 0 as an position. So translating to GROUP BY, a GROUP BY (SELECT 1) (or something simple as GROUP BY CAST(1 as INT) is a valid expression but should not be valid with everything.

It should not be valid to do a equivalent to SELECT FIELD FROM TABLE GROUP BY CAST(1 as INT) but SELECT 1 FROM TABLE GROUP BY CAST(1 as INT) or SELECT COUNT(FIELD) FROM TABLE GROUP BY CAST(1 as INT) are valid expressions.

I'm not understanding why this should be complicate in LINQ.

Group by constant works in EF 6 by using constant as the key column, similar to:

   0 as K0,
   COUNT(Field) AS P0
FROM Table
GROUP BY K0

This seems like the logical translation of group by constant anyway.

Pulling this back into 2.1 milestone. I got it working.

We translate
```C#
context.Orders.GroupBy(o => 2).Select(
g =>
new
{
Sum = g.Sum(o => o.OrderID),
Min = g.Min(o => o.OrderID),
g.Key,
Max = g.Max(o => o.OrderID),
Avg = g.Average(o => o.OrderID)
})

```SQL
SELECT SUM([t].[OrderID]) AS [Sum], MIN([t].[OrderID]) AS [Min], [t].[Key], MAX([t].[OrderID]) AS [Max], AVG(CAST([t].[OrderID] AS float)) AS [Avg]
FROM (
    SELECT [o].*, 2 AS [Key]
    FROM [Orders] AS [o]
) AS [t]
GROUP BY [t].[Key]

Same happens when grouping is by a variable from closure.

We are experiencing this issue using EF Core 2.1:

var conversationTime = await _myContext.Conversations
    .GroupBy(c => 1)
    .Select(g =>
        new BotStatistic
        {
            Name = "ConversationTime",
            AverageTime = g.Average(c => EF.Functions.DateDiffMillisecond(c.StartDate, c.EndDate) / 1000.0),
            ShortestTime = g.Min(c => EF.Functions.DateDiffMillisecond(c.StartDate, c.EndDate) / 1000.0),
            LongestTime = g.Max(c => EF.Functions.DateDiffMillisecond(c.StartDate, c.EndDate) / 1000.0)
        }
    ).SingleAsync();

Using SQL Server 2012.

StartDate and EndDate are datetime columns.

The intention is to return one result containing aggregate date, which is why GroupBy(c => 1) is used.
I have not been able to find any other way to return one result, and this works in-memory, but the intention is to run this as an SQL query.

According to information i've found in some issues this should be possible since 2.1, so is there a bug here or are we doing something wrong? Or is there another way to return one result?

@SamVanheer - That is #11976 Currently, we still do streaming group by for such queries.

Thanks for the quick reply, we'll use separate queries for the time being until this can be done directly.

@SamVanheer - You can rewrite your query to use element selector and get it to server eval
```C#
var conversationTime = db.Conversations
.GroupBy(c => 1,
e =>
EF.Functions.DateDiffMillisecond(e.StartDate, e.EndDate) / 1000.0)
.Select(g =>
new
{
Name = "ConversationTime",
AverageTime = g.Average(c => c),
ShortestTime = g.Min(c => c),
LongestTime = g.Max(c => c)
}).ToList();

```SQL
      SELECT AVG(CAST([t].[c] AS float)) AS [AverageTime], MIN([t].[c]) AS [ShortestTime], MAX([t].[c]) AS [LongestTime]
      FROM (
          SELECT DATEDIFF(MILLISECOND, [c].[StartDate], [c].[EndDate]) / 1000E0 AS [c], 1 AS [Key]
          FROM [Conversations] AS [c]
      ) AS [t]
      GROUP BY [t].[Key]

Thanks for the help, this works.

Here's the query we're using, slightly modified:

var conversationTime = await _myContext.Conversations
                .GroupBy(c => 1, e => EF.Functions.DateDiffMillisecond(e.StartDate, e.EndDate) / 1000.0)
                .Select(g =>
                    new BotStatistic
                    {
                        Name = "ConversationTime",
                        AverageTime = g.Average(c =>c),
                        ShortestTime = g.Min(c => c),
                        LongestTime = g.Max(c => c)
                    })
                .SingleAsync();
Was this page helpful?
0 / 5 - 0 ratings