I have two queries (full code at bottom) which I believe essentially achieve the same thing but produce different SQL queries. This is probably fair enough given the expressions are written differently, but I was wondering if this is the sort of thing where you would hope to make an optimisation in the way expression is evaluated which could reduce these to the same queries/SQL statements.
```C#
_ = context.ChildEntities
.Where(x => x.ParentEntity.Code == "X")
.Select(x => new {x.Code, ParentCode = x.ParentEntity.Code})
.ToList();
_ = context.ParentEntities
.Where(x => x.Code == "X")
.SelectMany(x => x.ChildEntities)
.Select(x => new {x.Code, ParentCode = x.ParentEntity.Code})
.ToList();
Produces:
```SQL
SELECT "c"."Code", "p"."Code" AS "ParentCode"
FROM "ChildEntities" AS "c"
LEFT JOIN "ParentEntities" AS "p" ON "c"."ParentEntityId" = "p"."Id"
WHERE "p"."Code" = 'X'
SELECT "c"."Code", "p0"."Code" AS "ParentCode"
FROM "ParentEntities" AS "p"
INNER JOIN "ChildEntities" AS "c" ON "p"."Id" = "c"."ParentEntityId"
LEFT JOIN "ParentEntities" AS "p0" ON "c"."ParentEntityId" = "p0"."Id"
WHERE "p"."Code" = 'X'
I would also be interested to know if there is a genuine perf difference between the two. I am assuming the second is worse but perhaps the database (SQLite in this case) can make that optimisation in its query plan on the DB.
Just for some context about why the LINQ query might be written the second way:
I have a generic helper method for executing a very common type of query in the app: querying for a list of items owned by a particular entity. Being able to take a separate Where expression for parent entity and then a SelectMany expression for the collection (and then a DTO type for the AutoMapper projection, simplified as a Select in the above) allows the helper to also query if the parent entity exists if the collection is empty. This helps distinguish between "X has no Y" and "X doesn't exist." But perhaps the helper method is trying to do too much if it produces a poorer query for the collection as a result.
Full Code
```C#
public class Program
{
public static void Main(string[] args)
{
using var connection = new SqliteConnection("Data Source=:memory:");
connection.Open();
var loggerFactory = new LoggerFactory();
loggerFactory.AddProvider(new DebugLoggerProvider());
var options = new DbContextOptionsBuilder()
.UseSqlite(connection)
.UseLoggerFactory(loggerFactory)
.Options;
using var context = new MyDbContext(options);
context.Database.EnsureCreated();
_ = context.ChildEntities
.Where(x => x.ParentEntity.Code == "X")
.Select(x => new {x.Code, ParentCode = x.ParentEntity.Code})
.ToList();
_ = context.ParentEntities
.Where(x => x.Code == "X")
.SelectMany(x => x.ChildEntities)
.Select(x => new {x.Code, ParentCode = x.ParentEntity.Code})
.ToList();
}
}
public class MyDbContext : DbContext
{
public DbSet
public DbSet
public MyDbContext(DbContextOptions options) : base(options)
{
}
}
public class ParentEntity
{
public int Id { get; set; }
public string Code { get; set; }
public ICollection
}
public class ChildEntity
{
public int Id { get; set; }
public string Code { get; set; }
public ParentEntity ParentEntity { get; set; }
}
```
EF Core version: 3.1.8
Database provider: SQLite
Target framework: 3.1
Operating system: Windows 10
IDE: Visual Studio 2019
That looks like SQLite, not SQL Server..
My bad, I am certain I noticed it on SQL Server but then used SQLite for reproduction code in this issue and muddled my description as a result. I'll adjust the text, as I can only be certain this is relevant for SQLite until next week.
Both queries are different from LINQ perspective hence it generates different SQL. Since 2nd query has to link back to the table, it will likely to be slower perf. In above case, it is going from A to B via SelectMany then referencing back to A so in a way it could be utilized but SelectMany is giving us IQueryable<ChildEntity> at which point, expression tree does not have detailed about how the queryable was created. That means the information that it came from ParentEntity is lost and we have to generate a join to get parent data. Expression tree works this way since the IQueryable<ChildEntity> could be formed in any way as both of your queries are forming it.
As an avoidance to the join back in second query, you can preserve original ParentEntity reference and reuse it.
_ = context.ParentEntities
.Where(x => x.Code == "X")
.SelectMany(x => x.ChildEntities, (p, c) => new {p, c})
.Select(x => new {x.c.Code, ParentCode = x.p.Code})
.ToList();
The above query would generate SQL closer to SQL 1 in your example as join back to parent is not required anymore.
While in LINQ when applying SelectMany, we could manually keep the prior elements around, though it has few drawbacks
Hence, we relying on user to use appropriate SelectMany to preserve data needed.
allows the helper to also query if the parent entity exists if the collection is empty. This helps distinguish between "X has no Y" and "X doesn't exist."
While there is no code about helper so did not fully understand what helper is doing but "X has no Y" and "X doesn't exist" both cases generates 0 results in the iterator after SelectMany. So I am not sure you can identify the separation with above query. If you can describe more what you are trying to achieve and share some code, may be we can help out.
@smitpatel Thank you for looking into this and your very detailed response. I'm happy to accept that fundamentally the LINQ differs, so the SQL does too.
While there is no code about helper so did not fully understand what helper is doing but "X has no Y" and "X doesn't exist" both cases generates 0 results in the iterator after SelectMany. So I am not sure you can identify the separation with above query. If you can describe more what you are trying to achieve and share some code, may be we can help out.
Here is a full demo of what I mean. It just comes down to hoping to make a neat method that can execute two queries without the developer having to repeat themselves too much (e.g. with two similar predicates). It seems however it's potentially at the expense of performance. I honestly don't expect you to look into this as how to approach this is definitely issue for me and not EF support, I'm just providing this for completeness :)
```C#
public class Program
{
public static void Main(string[] args)
{
var mapperConfig = new MapperConfiguration(x => x.CreateMap
var loggerFactory = new LoggerFactory();
loggerFactory.AddProvider(new DebugLoggerProvider());
using var connection = new SqliteConnection("Data Source=:memory:");
connection.Open();
var options = new DbContextOptionsBuilder()
.UseSqlite(connection)
.UseLoggerFactory(loggerFactory)
.Options;
using var context = new MyDbContext(options, mapperConfig);
context.Database.EnsureCreated();
// V1
//-----------------------------------
var collectionWrapper1 =
context.GetChildCollection<ChildEntity, ChildDto>(
x => x.ParentEntity.Code == "X");
// Have to do further check with similar predicate to see if parent exists, for appropriate API response
if (collectionWrapper1.Collection.Count == 0 && !context.ParentEntities.Any(x => x.Code == "X"))
{
// return NotFound();
}
// return Ok(collectionWrapper1);
// V2
//-----------------------------------
// Null means no parent
var collectionWrapper2 =
context.GetChildCollection<ParentEntity, ChildEntity, ChildDto>(
x => x.Code == "X",
x => x.ChildEntities);
// Return API response
// return collectionWrapper2 == null ? NotFound() : Ok(collectionWrapper2);
//-----------------------------------
}
}
public class MyDbContext : DbContext
{
private readonly MapperConfiguration _mapperConfig;
public DbSet<ParentEntity> ParentEntities { get; set; }
public DbSet<ChildEntity> ChildEntities { get; set; }
public MyDbContext(DbContextOptions options, MapperConfiguration mapperConfig) : base(options)
{
_mapperConfig = mapperConfig;
}
public CollectionWrapper<TChildDto> GetChildCollection<TChild, TChildDto>(
Expression<Func<TChild, bool>> childPredicate)
where TChild : class
{
var children = Set<TChild>()
.Where(childPredicate)
.ProjectTo<TChildDto>(_mapperConfig)
.ToList();
// Do a few other common tasks worth wrapping up into a method
// ...
return new CollectionWrapper<TChildDto> {Collection = children};
}
public CollectionWrapper<TChildDto> GetChildCollection<TParent, TChild, TChildDto>(
Expression<Func<TParent, bool>> parentPredicate,
Expression<Func<TParent, IEnumerable<TChild>>> childSelector)
where TParent : class
{
var children = Set<TParent>()
.Where(parentPredicate)
.SelectMany(childSelector)
.ProjectTo<TChildDto>(_mapperConfig)
.ToList();
// If empty, check if parent exists (afterwards, so 'hot path' where parent exists only has one DB call)
if (children.Count == 0 && !Set<TParent>().Any(parentPredicate))
{
return null;
}
// Do a few other common tasks worth wrapping up into a method
// ...
return new CollectionWrapper<TChildDto> {Collection = children};
}
}
public class ParentEntity
{
public int Id { get; set; }
public string Code { get; set; }
public ICollection
}
public class ChildEntity
{
public int Id { get; set; }
public string Code { get; set; }
public ParentEntity ParentEntity { get; set; }
}
public class CollectionWrapper
{
public ICollection
}
public class ChildDto
{
public string ParentEntityCode { get; set; }
public string Code { get; set; }
}
```
Fundamentally, both queries - 1 fetching the children, 1 checking parent existence are to different tables. It is usually faster to query the table directly for the task to avoid generating joins. This perf issue is further exacerbated by the projection which contains call to a navigation. Perhaps a faster approach yet a generic function would be the one which takes 2 lambdas - predicate for child/parent table both and compute results as in example 1 but does it inside a function.