I discovered that (at least for queries to mysql) a where clause on the discriminator is formed, even if all derived classes are selected. This slows down remarkably even if you add an index on the discriminator field when millions of rows are in the table.
There are no other derived classes than union, person and organisation, but still ef core forms this where clause:
WHERE o.Discriminator IN ('Union', 'Person', 'Organisation')
Just skip the where Discriminator in ('class1','class2',...) when all classes are selected.
This is by design. We add predicate to make sure that we only select records which correspond to one of the types we know. Your table may have additional values in discriminator column which we don't know about and it can throw runtime exception. It is similar to having non-mapped columns and EF Core selecting only columns we know explicitly.
We could allow to mark the discriminator as 'complete' and make this perf improvement.
@ajcvickers
continuing from https://github.com/dotnet/efcore/issues/19803#issuecomment-582499923
A query can use only properties defined in the base class, but still return instances of multiple different types in the hierarchy. So, "involves properties of the base class only" does not mean the same as "involves any type in the hierarchy".
You're right, I did mean "involves any/every type in the hierarchy".
But just to be sure I got it right...
Let's assume a program involving Students and Teachers
I have a User class and Student and Teacher classes derive from it, discriminated by an enum property called Type which lives inside the User class. There are 3 corresponding DbSets.
Now, there can be 2 scenarios depending on the use-case
A. There are only Students and Teachers
B. There are Students, there are Teachers, and there are other "normal" Users who have no extra properties.
In both scenarios, the enum will have to have 3 values, or EF core will not allow us to it as a discriminator.
In scenario A, only 2 of those 3 enum values will actually be present in the data
In scenario B, all 3 enum values will be present in the data
In scenario A, when using dbcontext.Users... the developer wants to query any/every User. Being a Student/Teacher is irrelevant.
In scenario B, ambiguity comes in. When using dbcontext.Users... the developer may want either of 2 things
What I am looking for is better performance in Scenario A by eliminating the extra where clause.
I imagine global query filters can be used for scenario B.2, but even there, I think EF Core would attach an extra where clause, although I have never tried it, so that's just an assumption.
Personally, I never had experience with a use case like Scenario B, even if I did, I'd probably avoid it by using a 4th enum value for "normal" Users, and thus convert it to Scenario A. Which is why I like the idea put forward in https://github.com/dotnet/efcore/issues/18106#issuecomment-536170875, because that would allow the developer to chose what's best for them.
Clearing the milestone here to discuss with the team. Making this the default would fix the simple raw SQL issue for composing over TPH tables: #18232
Filed #20192 for default decision.
Most helpful comment
We could allow to mark the discriminator as 'complete' and make this perf improvement.