Efcore: Query: Selecting scalar properties alongwith entity fails to lift and causes N+1 eval

Created on 8 Mar 2018  路  10Comments  路  Source: dotnet/efcore

//With classic Parent-Children (like in issue #10472)
This works well (within single select):

ctx.Parents.Select(p => new
    {
        ParentId = p.Id,
        ChildrenCount = p.Children.Count(),
        ChildrenSumXX = p.Children.Sum(c=>c.XX)
    }).ToList()

But this works as 2N+1 queries:

ctx.Parents.Select(p => new
    {
        Parent = p,
        ChildrenCount = p.Children.Count(),
        ChildrenSumXX = p.Children.Sum(c=>c.XX)
    }).ToList()

That's a bug? Or what's the problem, I cannot see any difference...

That's in 2.1.0-preview1

closed-fixed type-enhancement

Most helpful comment

It looks like just a technical imperfections at first, but I think it should be fixed with a high priority, because these queries are commonly used here and there.
In order to materialize it easily, query should look like just this

     [anonymType1.p].[id] AS [anonymType1.p.id], 
     [anonymType1.p].[someproperty1] AS [anonymType1.p.someproperty1], 
     [anonymType1.p].[someproperty2] AS [anonymType1.p.someproperty2], 

      (
          SELECT COUNT(*)
          FROM [children] AS [c]
          WHERE [anonymType1.p].[id] = [c].[parentid]
      ) AS [anonymType1.cnt], 
      (
          SELECT SUM([c0].[XX])
          FROM [children] AS [c0]
          WHERE [anonymType1.p].[id] = [c0].[parentid]
      ) AS [anonymType1.sum]
      FROM [parents] AS [anonymType1.p]

All 10 comments

The difference is
in first query, individual property (-ies) are being selected from entity rather than whole entity as in second query. Selecting whole entity requires us to use materialization code path.
At present materialization is tied closely with shaper and there would be one shaper for each query source, we cannot do materialization and some other scalar value processing both on single query source (in this case p). Future enhancement.

As a work-around you can construct object Parent manually by selecting individual properties and avoid materialization code path.

It looks like just a technical imperfections at first, but I think it should be fixed with a high priority, because these queries are commonly used here and there.
In order to materialize it easily, query should look like just this

     [anonymType1.p].[id] AS [anonymType1.p.id], 
     [anonymType1.p].[someproperty1] AS [anonymType1.p.someproperty1], 
     [anonymType1.p].[someproperty2] AS [anonymType1.p.someproperty2], 

      (
          SELECT COUNT(*)
          FROM [children] AS [c]
          WHERE [anonymType1.p].[id] = [c].[parentid]
      ) AS [anonymType1.cnt], 
      (
          SELECT SUM([c0].[XX])
          FROM [children] AS [c0]
          WHERE [anonymType1.p].[id] = [c0].[parentid]
      ) AS [anonymType1.sum]
      FROM [parents] AS [anonymType1.p]

@smitpatel Seems like the title should be improved.

Can this be added to the 2.1 milestone?

Clearing up the milestone to discuss taking these scenarios into account in the query design for 3.0. We are already overbooked for 3.0 so we may not be able to adress it, however I would like if possible that the design doesn't prevent us from being able to lift in this case.

I had similar issue with following query (MIN was unexpectedly submited to SQL server as separate query):

var query = from r in insuredRisks
                        where r.PolicyId == insuredRisk.PolicyId &&
                              r.VersionNo == insuredRisk.PolicyVersionNo &&
                              r.InsuredRiskNo == insuredRisk.InsuredRiskNo
                        select new InsuredRiskData
                                   {
                                       BasicInsuredRiskData = new BasicInsuredRiskData
                                                                  {
                                                                      InsuredRisk = r,
                                                                      InceptionDate = insuredRisks
                                                                      .Where(ir => ir.PolicyId == insuredRisk.PolicyId &&
                                                                          ir.InsuredRiskNo == insuredRisk.InsuredRiskNo &&
                                                                          ir.CreationCommissionLineNo ==                                                                           r.InsuredRisk.CreationCommissionLineNo &&
                                                                          ir.RecodVersionState != "D" && ir.RecodVersionState != "X")
                                                                      .Select(ir => ir.ValidFrom)
                                                                      .Min()
                                                                  }
                                   };

and as a workaround I wrote it this way:

var query = from r in insuredRisks
                        from ri in insuredRisks.Where(ri => ri.PolicyId == r.PolicyId &&
                                                            ri.InsuredRiskNo == r.InsuredRiskNo &&
                                                            ri.CreationCommissionLineNo == r.CreationCommissionLineNo &&
                                                            ri.RecodVersionState != PolicyVersionRecordState.SoftDeleted &&
                                                            ri.RecodVersionState != PolicyVersionRecordState.Cancelled)
                                               .GroupBy(ri => ri.PolicyId, ri => ri.ValidFrom)
                                               .Select(ri => new
                                                                 {
                                                                     InceptionDate = ri.Min()
                                                                 })
                        where r.PolicyId == insuredRisk.PolicyId &&
                              r.VersionNo == insuredRisk.PolicyVersionNo &&
                              r.InsuredRiskNo == insuredRisk.InsuredRiskNo
                        select new InsuredRiskData
                                   {
                                       BasicInsuredRiskData = new BasicInsuredRiskData
                                                                  {
                                                                      InsuredRisk = r,
                                                                      InceptionDate = ri.InceptionDate
                                                                  }
                                   };

After translation to SQL, it generated for that InceptionDate column nice subquery in select clause with MIN statement, just like one would do when writing SQL directly..

I just ran into this problem, and read that in EF Core 3.0 this might get addressed but seems hard.
I believe my case is much simpler than the ones above so you might want to take a look, hopefully this one is easier to fix.

This generates N + 1 queries
```C#
from item in Items
select new
{
Owned = OwnedItems.Any(own => own.UserId == userId && own.ItemId == item.Id),
Item = item
}

This one generates only 1 query
```C#
from item in Items
select new
{
   Owned = OwnedItems.Any(own => own.UserId == userId && own.ItemId == item.Id),
   Item = new Item { Id = item.Id, Name = item.Name, ... Every single property listed one by one... }
}

I understand that the first version needs to use the 'materialization code path' so it's different, but i can manually materialize the item by typing a lot.

Some helpful people lead me here from
https://stackoverflow.com/questions/53746633/entity-framework-core-linq-trouble-creating-select-case-exists-queries

Not sure if I have the same issue or if I'm doing something wrong, see code example below, might be isolated to the Max operator;

var query = DatabaseContext.Parents
    .Select(parent => 
        new ParentModel
        { 
            Name = parent.Name, 
            DobOfEldest = parent.Children.Max(child => child.DateOfBirth)
        });

This is generating multiple queries to get the Max(child.DateOfBirth)

Another example from me.

Query definition:

post: A board post
post.User: Poster
UserPhoto: user has photos where only one of them is uniquely IsPrimary = true
UserLike: did the viewing user like this photo?
PostPhotos: Post might have multiple attached photos (like a Facebook post's photos)

```
.Select(post => new
{
post,
post.User,
UserPhoto = post.User.Photos.Where(ph => ph.IsPrimary).Take(1).SingleOrDefault(),
UserLike = post.Likes.Any(likes => likes.UserId == forUserId),
PostPhotos = post.PostPhotos.Take(4).Select(pp => pp.Photo),
}).ToList();

### Issue Explanation
Main query generated:

SELECT p..., "p.User"...
FROM "Posts" AS p
INNER JOIN "Users" AS "p.User" ON p."UserId" = "p.User"."Id"
WHERE p."Id" = $1
ORDER BY p."Id"
LIMIT $2

So Entity Framework understands the `user` and fetches it directly in 1 query.
Every single other join is a separate query.
if we want to get 10 posts, 1 query for all the posts and users, 10 * 3 = 30 for the joins, our infamous N+1 problem here. (_I also tested if it is 10 posts are we having only 1+3=4 queries in total or 1 + 3 * 10, and unfortunately it is the latter._)

### Expected:
EF Core could have joined UserPhotos and UserLikes by understanding the relation is a `one-to-zero or one` so they can be achieved with a LEFT JOIN each.

```post: A board post
post.User: one-to-one defined in the entity
UserPhoto: one-to-zero or one could be inferred from query
UserLike: one-to-zero or one could be inferred from query
PostPhotos: one-to-many

PostPhotos is a one-to-many so having separate queries for that is okay, though I do think with some aggregation functions it might be possible to also get them but I do understand the complexity and vendor difference.

Further technical details

EF Core: 3.0.0-preview3.19153.1
Provider: Npgsql 3.0.0-preview3
OS: (Development machine, MacOS)
IDE: Rider 2019.1

But I do want to emphasize that the join functionality actually works, what's missing is understanding a join in the query is one to zero or one at runtime. If it is defined in the entity the results are expected.

So I can actually provide the Primary User Photo Id in the entity and refer to the Photo table as one to one for that property, that should work, but that would mean changing the database design to make EF work which would result in redundant data that needs to be kept updated.

This has been done.

Was this page helpful?
0 / 5 - 0 ratings