Efcore: EF Core 5 Daily build - Not enough information to uniquely identify outer element in correlated collection scenario. This can happen when trying to correlate on keyless entity or when using 'Distinct' or 'GroupBy' operations without projecting all of the key columns

Created on 15 Sep 2020  路  8Comments  路  Source: dotnet/efcore


The Include in the following:

context.Communities
    .Join(context.Communities.Include(x => x.Members),
        x => x.Id,
        y => y.Id,
        (x, y) => y)
    .Distinct()
    .ToList();

Causes:

System.InvalidOperationException: Not enough information to uniquely identify outer element in correlated collection scenario. This can happen when trying to correlate on keyless entity or when using 'Distinct' or 'GroupBy' operations without projecting all of the key columns.
   at Microsoft.EntityFrameworkCore.Query.SqlExpressions.SelectExpression.ApplyCollectionJoin(Int32 collectionIndex, Int32 collectionId, Expression innerShaper, INavigationBase navigation, Type elementType, Boolean splitQuery)
   at Microsoft.EntityFrameworkCore.Query.Internal.CollectionJoinApplyingExpressionVisitor.VisitExtension(Expression extensionExpression)
   at System.Linq.Expressions.Expression.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at Microsoft.EntityFrameworkCore.Query.IncludeExpression.VisitChildren(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.VisitExtension(Expression node)
   at Microsoft.EntityFrameworkCore.Query.Internal.CollectionJoinApplyingExpressionVisitor.VisitExtension(Expression extensionExpression)
   at System.Linq.Expressions.Expression.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at Microsoft.EntityFrameworkCore.Query.Internal.CollectionJoinApplyingExpressionVisitor.VisitExtension(Expression extensionExpression)
   at System.Linq.Expressions.Expression.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at Microsoft.EntityFrameworkCore.Query.RelationalQueryTranslationPostprocessor.Process(Expression query)
   at Microsoft.EntityFrameworkCore.Sqlite.Query.Internal.SqliteQueryTranslationPostprocessor.Process(Expression query)
   at Microsoft.EntityFrameworkCore.Query.QueryCompilationContext.CreateQueryExecutor[TResult](Expression query)
   at Microsoft.EntityFrameworkCore.Storage.Database.CompileQuery[TResult](Expression query, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.CompileQueryCore[TResult](IDatabase database, Expression query, IModel model, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.<>c__DisplayClass9_0`1.<Execute>b__0()
   at Microsoft.EntityFrameworkCore.Query.Internal.CompiledQueryCache.GetOrAddQuery[TResult](Object cacheKey, Func`1 compiler)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.Execute[TResult](Expression query)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryProvider.Execute[TResult](Expression expression)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1.GetEnumerator()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

Steps to reproduce

namespace EFTest
{
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using Microsoft.EntityFrameworkCore;

    public class CommunitiesContext : DbContext
    {
        public DbSet<Community> Communities { get; set; }

        public DbSet<Person> People { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder options)
            => options.UseSqlite("Data Source=test.db");
    }

    public class Community
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public ICollection<Person> Members { get; } = new List<Person>();
    }

    public class Person
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public ICollection<Community> Memberships { get; } = new List<Community>();
    }

    class Program
    {
        static void Main(string[] args)
        {
            using var context = new CommunitiesContext();
            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();

            var person = new Person { Id = 1, Name = "Geoff Smith" };
            context.People.Add(person);

            var community = new Community { Id = 1, Name = ".Net Foundation" };
            community.Members.Add(person);
            context.Communities.Add(community);

            context.SaveChanges();

            // Working
            var test = context.Communities
                .Join(context.Communities,
                    x => x.Id,
                    y => y.Id,
                    (x, y) => y)
                .Distinct()
                .ToList();

            Console.WriteLine(test.First().Id);

            // Not Working
            var test2 = context.Communities
                .Join(context.Communities.Include(x => x.Members),
                    x => x.Id,
                    y => y.Id,
                    (x, y) => y)
                .Distinct()
                .ToList();

            Console.WriteLine(test2.First().Id);
        }
    }
}

Further technical details

EF Core version: 6.0.0-alpha.1.20462.1
Database provider: Microsoft.EntityFrameworkCore.Sqlite
Target framework: .NET Core 5.0
Operating system: Windows 10
IDE: Visual Sutdio Code 1.49.0

closed-by-design customer-reported

Most helpful comment

@kccsf so you have just one entity on the left side of join, and you are joining with the table based on not key but some other property, like so? :

entities.Where(e => e.Id == myId).Join(entities.Include(...), o => o.Type, i => i.Type, (o, i) => i).Distinct()

This type of query would have worked correctly in 3.1.3 and it is a breaking change. On the other hand, you shouldn't need a Distinct there, since the entities returned from the join are guaranteed to be unique.

If the left side contains more that one result, and therefore right side could contain duplicates, we hit the problem that @smitpatel pointed out, and we don't have a good way of distinguishing first case from the second one.

However, you can rewrite this in a way that avoids the join completely and is still a single query:

entities.Where(e => e.Type == entities.Where(ee => ee.Id == myId).Select(ee => ee.Type).FirstOrDefault()).Include(...)

This only works if the inner Where clause Where(ee => ee.Id == myId) returns just one result.
If the Where clause returns more than one result you can use contains instead:

entities.Where(e => entities.Where(ee => ee.SomeProperty == myProperty).Select(ee => ee.Type).Contains(e.Type)).Include(...)

All 8 comments

this is the query we would have created if the exception wasn't thrown:

SELECT [t].[Id], [t].[Name], [t0].[MembersId], [t0].[MembershipsId], [t0].[Id], [t0].[Name]
FROM (
    SELECT DISTINCT [c0].[Id], [c0].[Name]
    FROM [Communities] AS [c]
    INNER JOIN [Communities] AS [c0] ON [c].[Id] = [c0].[Id]
) AS [t]
LEFT JOIN (
    SELECT [c1].[MembersId], [c1].[MembershipsId], [p].[Id], [p].[Name]
    FROM [CommunityPerson] AS [c1]
    INNER JOIN [People] AS [p] ON [c1].[MembersId] = [p].[Id]
) AS [t0] ON [t].[Id] = [t0].[MembershipsId]
ORDER BY [t0].[MembersId], [t0].[MembershipsId], [t0].[Id]

and this is the query that we create when Distinct() is replaced with Take() (so that pushdown happens)

SELECT [t].[Id], [t].[Name], [t].[Id0], [t0].[MembersId], [t0].[MembershipsId], [t0].[Id], [t0].[Name]
FROM (
    SELECT TOP(@__p_0) [c0].[Id], [c0].[Name], [c].[Id] AS [Id0]
    FROM [Communities] AS [c]
    INNER JOIN [Communities] AS [c0] ON [c].[Id] = [c0].[Id]
) AS [t]
LEFT JOIN (
    SELECT [c1].[MembersId], [c1].[MembershipsId], [p].[Id], [p].[Name]
    FROM [CommunityPerson] AS [c1]
    INNER JOIN [People] AS [p] ON [c1].[MembersId] = [p].[Id]
) AS [t0] ON [t].[Id] = [t0].[MembershipsId]
ORDER BY [t].[Id0], [t].[Id], [t0].[MembersId], [t0].[MembershipsId], [t0].[Id]

In the FROM clause, in order to correctly bucket the results we also project c.Id (from the table that is not present in the final result). We then use it in the final ORDER BY clause. However, in case of DISTINCT we can't add any new columns to the projection since that would change the result of distinct operation. Hence the exception is thrown.

This "worked" (didn't throw and returned results...) in 3.1.3 (albeit with a manual join entity). Was the previous behaviour leading to incorrect results (if so should it be mentioned in the breaking changes)?

@kccsf Can you post the code for 3.1 so that we can investigate?

@ajcvickers below works in 3.1.3. NB the error has changed in latest daily

namespace EFTest
{
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using Microsoft.EntityFrameworkCore;

    public class CommunitiesContext : DbContext
    {
        public DbSet<Community> Communities { get; set; }

        public DbSet<Person> People { get; set; }

        public DbSet<CommunityMember> CommunityMembers { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder options)
            => options.UseSqlite("Data Source=test.db");

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            base.OnModelCreating(modelBuilder);
            modelBuilder.Entity<CommunityMember>().HasKey(x => new { x.PersonId, x.CommunityId });
        }
    }

    public class Community
    {
        public int Id { get; set; }
        public string Name { get; set; }

        public ICollection<CommunityMember> Members { get; set; }
    }

    public class Person
    {
        public int Id { get; set; }
        public string Name { get; set; }

        public ICollection<CommunityMember> Communities { get; set; }
    }

    public class CommunityMember
    {
        public int PersonId { get; set; }
        public Person Person { get; set; }
        public int CommunityId { get; set; }
        public Community Community { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            using var context = new CommunitiesContext();
            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();

            context.People.Add(new Person { Id = 1, Name = "Geoff Smith" });
            context.Communities.Add(new Community { Id = 1, Name = ".Net Foundation" });
            context.CommunityMembers.Add(new CommunityMember() { CommunityId = 1, PersonId = 1 });
            context.SaveChanges();

            var test = context.Communities
                .Join(context.Communities.Include(x => x.Members).ThenInclude(x => x.Person),
                    x => x.Id,
                    y => y.Id,
                    (x, y) => y)
                .Distinct()
                .ToList();

            Console.WriteLine(test.First()?.Members.First()?.Person.Name);
        }
    }
}

@kccsf this particular example is fine because you are joining entity with itself, so the number of rows coming from the FROM clause is the same as if the INNER JOIN wasn't there. Is there a specific reason you are building query this way?

context.Communities.Include(x => x.Members).Distinct().ToList();

works correctly.

However, perhaps there is something smart we can do with joins here. @smitpatel thoughts?

The column which is missing is coming from left side of the join c.Id, we need to fetch that since that really determines number of row. If the inner returns rows with same id for different outer ids then they still need to go to different buckets. I don't think we have any easy way to know that inner source element's identifiers won't be repeated for outer source.

@maumar Scenario: I have an id of an entity to hand and wish to retrieve all entities from the same table that share the same 'type' as that entity.

I've split it into two queries, but figured best to still log as worked fine in 3.1.3

@kccsf so you have just one entity on the left side of join, and you are joining with the table based on not key but some other property, like so? :

entities.Where(e => e.Id == myId).Join(entities.Include(...), o => o.Type, i => i.Type, (o, i) => i).Distinct()

This type of query would have worked correctly in 3.1.3 and it is a breaking change. On the other hand, you shouldn't need a Distinct there, since the entities returned from the join are guaranteed to be unique.

If the left side contains more that one result, and therefore right side could contain duplicates, we hit the problem that @smitpatel pointed out, and we don't have a good way of distinguishing first case from the second one.

However, you can rewrite this in a way that avoids the join completely and is still a single query:

entities.Where(e => e.Type == entities.Where(ee => ee.Id == myId).Select(ee => ee.Type).FirstOrDefault()).Include(...)

This only works if the inner Where clause Where(ee => ee.Id == myId) returns just one result.
If the Where clause returns more than one result you can use contains instead:

entities.Where(e => entities.Where(ee => ee.SomeProperty == myProperty).Select(ee => ee.Type).Contains(e.Type)).Include(...)
Was this page helpful?
0 / 5 - 0 ratings