When loading related one-to-many entities, EF adds ORDER BY clauses to make sure all related entities for a given entity are grouped together.
However, we can remove the last ORDER BY clause as it is unnecessary and causes more work (see #19571).
FYI I have a repro case where I get a timeout (> 40s) retrieving a single line (amongst 10) solely due to this (removing the ORDER BY causes the query to fetch data instantly).
I have a few owned entities which have owned collections.
We encountered the same issue here, we're using Polemo MySQL provider, in our case, the auto generated ORDER BY of Include() forces MySQL to use temp table which had significant performance impact, the query usually take less than 1ms to finish without ORDER BY and it took 14s when ORDER BY applied, and we also caught several exceptions from MySQL reporting 'Can't change size of file (OS errno 28 - No space left on device)', turns out MySQL tries to allocate space for temp tables and the file exceeds 16MB which is the default settings on our MySQL server.
Please do take this seriously as ORDER BY is performed differently on different SQL products, we had to split the LINQ query into three separate queries to workaround it, but I'm not sure how this will affect other queries in the system, it's basically impossible to reproduce this in the testing environment without extensive load tests as the performance impact is closely related to the complexity of the query as well as the result set data volume. Hopefully the fix comes sooner.
SELECT `t`.`Id`, `t`.`ApprovedTime`, `t`.`Approver`, `t`.`ApproverComments`, `t`.`AutoCreatePO`, `t`.`CancelReason`, `t`.`ChargedAmount`, `t`.`CreatedTime`, `t`.`Creator`, `t`.`CustomerId`, `t`.`DeliveryDate`, `t`.`ImageFileInfo`, `t`.`IsInternalOrder`, `t`.`LastUpdatedTime`, `t`.`Notes`, `t`.`OrderNumber`, `t`.`OrderType`, `t`.`PackagingMethod`, `t`.`Priority`, `t`.`ProjectName`, `t`.`SalesOrderCompleted`, `t`.`SalesRep`, `t`.`Status`, `t`.`TenantId`, `t`.`TotalAmount`, `t`.`UniqueId`, `t1`.`Id`, `t1`.`Area`, `t1`.`AreaName`, `t1`.`ClientSalesOrderNumber`, `t1`.`ClientWorkOrderNumber`, `t1`.`CreatedTime`, `t1`.`Creator`, `t1`.`CuttingMachineNumber`, `t1`.`CuttingOperator`, `t1`.`ImageFileInfo`, `t1`.`InternalWorkOrderNumber`, `t1`.`LastUpdatedTime`, `t1`.`Length`, `t1`.`OrderPlacingDate`, `t1`.`PackingCaseId`, `t1`.`Position`, `t1`.`Price`, `t1`.`ProcessingDate`, `t1`.`SalesOrderId`, `t1`.`Status`, `t1`.`StockInOperator`, `t1`.`StockInTime`, `t1`.`StockOutOperator`, `t1`.`StockOutTime`, `t1`.`StockingAreaId`, `t1`.`TenantId`, `t1`.`TileNumber`, `t1`.`Width`, `t3`.`Id`, `t3`.`CaseNumber`, `t3`.`CreatedTime`, `t3`.`Creator`, `t3`.`LastUpdatedTime`, `t3`.`SalesOrderId`, `t3`.`Status`, `t3`.`StockInOperator`, `t3`.`StockInTime`, `t3`.`StockOutOperator`, `t3`.`StockOutTime`, `t3`.`StockingAreaId`, `t3`.`TenantId`, `t3`.`UniqueId`
FROM (
SELECT `s`.`Id`, `s`.`ApprovedTime`, `s`.`Approver`, `s`.`ApproverComments`, `s`.`AutoCreatePO`, `s`.`CancelReason`, `s`.`ChargedAmount`, `s`.`CreatedTime`, `s`.`Creator`, `s`.`CustomerId`, `s`.`DeliveryDate`, `s`.`ImageFileInfo`, `s`.`IsInternalOrder`, `s`.`LastUpdatedTime`, `s`.`Notes`, `s`.`OrderNumber`, `s`.`OrderType`, `s`.`PackagingMethod`, `s`.`Priority`, `s`.`ProjectName`, `s`.`SalesOrderCompleted`, `s`.`SalesRep`, `s`.`Status`, `s`.`TenantId`, `s`.`TotalAmount`, `s`.`UniqueId`
FROM `SalesOrders` AS `s`
WHERE (`s`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420') AND (`s`.`Id` = 1620)
LIMIT 2
) AS `t`
LEFT JOIN (
SELECT `t0`.`Id`, `t0`.`Area`, `t0`.`AreaName`, `t0`.`ClientSalesOrderNumber`, `t0`.`ClientWorkOrderNumber`, `t0`.`CreatedTime`, `t0`.`Creator`, `t0`.`CuttingMachineNumber`, `t0`.`CuttingOperator`, `t0`.`ImageFileInfo`, `t0`.`InternalWorkOrderNumber`, `t0`.`LastUpdatedTime`, `t0`.`Length`, `t0`.`OrderPlacingDate`, `t0`.`PackingCaseId`, `t0`.`Position`, `t0`.`Price`, `t0`.`ProcessingDate`, `t0`.`SalesOrderId`, `t0`.`Status`, `t0`.`StockInOperator`, `t0`.`StockInTime`, `t0`.`StockOutOperator`, `t0`.`StockOutTime`, `t0`.`StockingAreaId`, `t0`.`TenantId`, `t0`.`TileNumber`, `t0`.`Width`
FROM `Tiles` AS `t0`
WHERE `t0`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420'
) AS `t1` ON `t`.`Id` = `t1`.`SalesOrderId`
LEFT JOIN (
SELECT `t2`.`Id`, `t2`.`CaseNumber`, `t2`.`CreatedTime`, `t2`.`Creator`, `t2`.`LastUpdatedTime`, `t2`.`SalesOrderId`, `t2`.`Status`, `t2`.`StockInOperator`, `t2`.`StockInTime`, `t2`.`StockOutOperator`, `t2`.`StockOutTime`, `t2`.`StockingAreaId`, `t2`.`TenantId`, `t2`.`UniqueId`
FROM `TilePackingCases` AS `t2`
WHERE `t2`.`TenantId` = '57555980-54df-49ac-b8d5-701090c85420'
) AS `t3` ON `t`.`Id` = `t3`.`SalesOrderId`
ORDER BY `t`.`Id`, `t1`.`Id`, `t3`.`Id`


@LiangZugeng @PaulARoy thanks for posting here - note that this is only about removing the last ORDER BY; previous ones are necessary in order to properly materialize results.
What could be helpful for us to evaluate the priority of this, is a runnable code sample which clearly shows that the last ORDER BY has a significant impact on perf.
I'm sorry but it's been 2 month, we fixed it and switched to postgres in between.
The best way to reach it was to stack owned entities / owned collections.
The last order by was the only cause of perf downgrade.
I will spend some time to write some code to reproduce the perf degrade issue this weekend, I will be targeting MySQL as it鈥檚 where we found the issue. Will post a Github repo here later for this.
What is the status about this issue?
@grietine the issue is in our backlog, which means it won't be part of the upcoming 5.0 release. We'll probably consider it for 6.0.
Here is a sample project that can show the last order reducing performance and drastically increasing IO. I tried to come up with a slightly realistic example but ended up just throwing some stuff on there to get the row count and size higher. Since we use PaaS SQL Server instances our IO is more tightly limited than my local computer so IO intensive operations like the last Order By hurts performance there more than locally.
With the ORDER BY

Without the ORDER BY

Full SQL
exec sp_executesql N'SELECT [t1].[Id], [t1].[ConstructionType], [t1].[ExteriorDescription], [t1].[GeneralDescription], [t1].[InteriorDescription], [t1].[Name], [t1].[SalesDescription], [t1].[YearBuilt], [t1].[Id0], [t1].[Supervisor_Name], [t1].[Supervisor_Title], [b5].[BuildingId], [b5].[Id], [b5].[Number], [b6].[SupervisorBuildingId], [b6].[Id], [b6].[Number], [b7].[SupervisorBuildingId], [b7].[Id], [b7].[Number], [b7].[Type], [t3].[BuildingId], [t3].[Id], [t3].[Notes], [t3].[Number], [t3].[UnitBuildingId], [t3].[UnitId], [t3].[Id0], [t3].[Name], [t3].[Notes0], [t3].[ResidentUnitBuildingId], [t3].[ResidentUnitId], [t3].[ResidentId], [t3].[Id00], [t3].[Number0], [t3].[ResidentUnitBuildingId0], [t3].[ResidentUnitId0], [t3].[ResidentId0], [t3].[Id1], [t3].[ExtraDeductible], [t3].[ResidentUnitBuildingId1], [t3].[ResidentUnitId1], [t3].[ResidentId1], [t3].[Id2], [t3].[Number00], [t3].[Type]
FROM (
SELECT TOP(1) [b].[Id], [b].[ConstructionType], [b].[ExteriorDescription], [b].[GeneralDescription], [b].[InteriorDescription], [b].[Name], [b].[SalesDescription], [b].[YearBuilt], [t0].[Id] AS [Id0], [t0].[Supervisor_Name], [t0].[Supervisor_Title]
FROM [Building] AS [b]
LEFT JOIN (
SELECT [b0].[Id], [b0].[Supervisor_Name], [b0].[Supervisor_Title]
FROM [Building] AS [b0]
WHERE [b0].[Supervisor_Title] IS NOT NULL OR [b0].[Supervisor_Name] IS NOT NULL
UNION
SELECT [b1].[Id], [b1].[Supervisor_Name], [b1].[Supervisor_Title]
FROM [Building] AS [b1]
INNER JOIN [Building_ParkingSpaces1] AS [b2] ON [b1].[Id] = [b2].[SupervisorBuildingId]
UNION
SELECT [b3].[Id], [b3].[Supervisor_Name], [b3].[Supervisor_Title]
FROM [Building] AS [b3]
INNER JOIN [Building_PhoneNumbers] AS [b4] ON [b3].[Id] = [b4].[SupervisorBuildingId]
) AS [t0] ON [b].[Id] = [t0].[Id]
WHERE [b].[Id] = @__p_0
) AS [t1]
LEFT JOIN [Building_ParkingSpaces] AS [b5] ON [t1].[Id] = [b5].[BuildingId]
LEFT JOIN [Building_ParkingSpaces1] AS [b6] ON [t1].[Id0] = [b6].[SupervisorBuildingId]
LEFT JOIN [Building_PhoneNumbers] AS [b7] ON [t1].[Id0] = [b7].[SupervisorBuildingId]
LEFT JOIN (
SELECT [u].[BuildingId], [u].[Id], [u].[Notes], [u].[Number], [t2].[UnitBuildingId], [t2].[UnitId], [t2].[Id] AS [Id0], [t2].[Name], [t2].[Notes] AS [Notes0], [t2].[ResidentUnitBuildingId], [t2].[ResidentUnitId], [t2].[ResidentId], [t2].[Id0] AS [Id00], [t2].[Number] AS [Number0], [t2].[ResidentUnitBuildingId0], [t2].[ResidentUnitId0], [t2].[ResidentId0], [t2].[Id1], [t2].[ExtraDeductible], [t2].[ResidentUnitBuildingId1], [t2].[ResidentUnitId1], [t2].[ResidentId1], [t2].[Id2], [t2].[Number0] AS [Number00], [t2].[Type]
FROM [Unit] AS [u]
LEFT JOIN (
SELECT [r].[UnitBuildingId], [r].[UnitId], [r].[Id], [r].[Name], [r].[Notes], [r0].[ResidentUnitBuildingId], [r0].[ResidentUnitId], [r0].[ResidentId], [r0].[Id] AS [Id0], [r0].[Number], [p].[ResidentUnitBuildingId] AS [ResidentUnitBuildingId0], [p].[ResidentUnitId] AS [ResidentUnitId0], [p].[ResidentId] AS [ResidentId0], [p].[Id] AS [Id1], [p].[ExtraDeductible], [r1].[ResidentUnitBuildingId] AS [ResidentUnitBuildingId1], [r1].[ResidentUnitId] AS [ResidentUnitId1], [r1].[ResidentId] AS [ResidentId1], [r1].[Id] AS [Id2], [r1].[Number] AS [Number0], [r1].[Type]
FROM [Resident] AS [r]
LEFT JOIN [Resident_ParkingSpaces] AS [r0] ON (([r].[UnitBuildingId] = [r0].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [r0].[ResidentUnitId])) AND ([r].[Id] = [r0].[ResidentId])
LEFT JOIN [Pet] AS [p] ON (([r].[UnitBuildingId] = [p].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [p].[ResidentUnitId])) AND ([r].[Id] = [p].[ResidentId])
LEFT JOIN [Resident_PhoneNumbers] AS [r1] ON (([r].[UnitBuildingId] = [r1].[ResidentUnitBuildingId]) AND ([r].[UnitId] = [r1].[ResidentUnitId])) AND ([r].[Id] = [r1].[ResidentId])
) AS [t2] ON ([u].[BuildingId] = [t2].[UnitBuildingId]) AND ([u].[Id] = [t2].[UnitId])
) AS [t3] ON [t1].[Id] = [t3].[BuildingId]
ORDER BY [t1].[Id], [b5].[BuildingId], [b5].[Id], [b6].[SupervisorBuildingId], [b6].[Id], [b7].[SupervisorBuildingId], [b7].[Id], [t3].[BuildingId], [t3].[Id], [t3].[UnitBuildingId], [t3].[UnitId], [t3].[Id0], [t3].[ResidentUnitBuildingId], [t3].[ResidentUnitId], [t3].[ResidentId], [t3].[Id00], [t3].[ResidentUnitBuildingId0], [t3].[ResidentUnitId0], [t3].[ResidentId0], [t3].[Id1], [t3].[ResidentUnitBuildingId1], [t3].[ResidentUnitId1], [t3].[ResidentId1], [t3].[Id2]',N'@__p_0 int',@__p_0=1
Thanks @dandenton, it's great to have some hard data on this!
@dandenton just to make sure we're discussing the same thing... Can you please confirm that your comparison checks when removing only the last term in the ORDER BY clause (i.e. [t3].[Id2]), rather than the entire ORDER BY clause? This issue is only about the former option, which is something we can actually do - for the latter see #19571.
@roji Nope, I didn't notice https://github.com/dotnet/efcore/issues/20076 and thought this was to remove the full thing. Removing only the last column from the ORDER BY still results in slightly less CPU/IO but not nearly as much as removing the entire thing.
Including [t3].[Id2]

Excluding [t3].[Id2]

Thanks for confirming.
Our team is using Pomelo and MySql and this is causing real performance issues.
Are there any workarounds for the time being beyond just loading the related entities and join them manually?
@rabberbock please note that this issue on only about removing the last ORDER BY when joining; if you're concerned about non-last ORDER BY clauses, see #19571 (and also consider trying out split queries).
If you're convinced you're seeing a perf issue because of the last ORDER BY, it would be useful to have a code sample that shows this - it would help us bump up the priority of this issue.
@roji Thanks for the pointers! I can try to put something together. The gist of it is as follows:
The execution plan with the last ORDER BY is the following:

Without the last ORDER BY it's:

So with the last ORDER BY it uses tmp table, which in our case really degrades performance.
Without the last ORDER BY it runs in miliseconds, with the ORDER BY the same query takes a minute.
Here is a link to the MySql docs that explain when a tmp table is used. https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html.
What would you like to see in a code sample beyond the above execution plans?
Thanks for this info @rabberbock! I think the above (and the link to the MySQL docs) should be sufficient IMHO.
Most helpful comment
@roji Thanks for the pointers! I can try to put something together. The gist of it is as follows:
The execution plan with the last ORDER BY is the following:
Without the last ORDER BY it's:
So with the last ORDER BY it uses tmp table, which in our case really degrades performance.
Without the last ORDER BY it runs in miliseconds, with the ORDER BY the same query takes a minute.
Here is a link to the MySql docs that explain when a tmp table is used. https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html.
What would you like to see in a code sample beyond the above execution plans?