The following code
Code 1:
var RssNewsQuery = _repositoryContext.Article.Take(5).OrderByDescending(article => article.PublishDate);
var allArticleIds = RssNewsQuery.Distinct().ToList();
return Ok(allArticleIds);
Generated SQL 1:
SELECT DISTINCT `t`.`Id`, `t`.`CreatedBy`, `t`.`CreatedDate`, `t`.`Description`, `t`.`ImageURL`, `t`.`IsDeleted`, `t`.`Link`, `t`.`ModifiedDate`, `t`.`PublishDate`, `t`.`RSSSourceId`, `t`.`Title`, `t`.`UpdatedBy`, `t`.`UpdatedDate`
FROM (
SELECT `a`.`Id`, `a`.`CreatedBy`, `a`.`CreatedDate`, `a`.`Description`, `a`.`ImageURL`, `a`.`IsDeleted`, `a`.`Link`, `a`.`ModifiedDate`, `a`.`PublishDate`, `a`.`RSSSourceId`, `a`.`Title`, `a`.`UpdatedBy`, `a`.`UpdatedDate`
FROM `Article` AS `a`
LIMIT @__p_0
) AS `t`
Code 2 (Take(2) and OrderByDescending replaced with each other):
var RssNewsQuery = _repositoryContext.Article.OrderByDescending(article => article.PublishDate).Take(5);
var allArticleIds = RssNewsQuery.Distinct().ToList();
return Ok(allArticleIds);
Generated SQL 2:
SELECT DISTINCT `t`.`Id`, `t`.`CreatedBy`, `t`.`CreatedDate`, `t`.`Description`, `t`.`ImageURL`, `t`.`IsDeleted`, `t`.`Link`, `t`.`ModifiedDate`, `t`.`PublishDate`, `t`.`RSSSourceId`, `t`.`Title`, `t`.`UpdatedBy`, `t`.`UpdatedDate`
FROM (
SELECT `a`.`Id`, `a`.`CreatedBy`, `a`.`CreatedDate`, `a`.`Description`, `a`.`ImageURL`, `a`.`IsDeleted`, `a`.`Link`, `a`.`ModifiedDate`, `a`.`PublishDate`, `a`.`RSSSourceId`, `a`.`Title`, `a`.`UpdatedBy`, `a`.`UpdatedDate`
FROM `Article` AS `a`
ORDER BY `a`.`PublishDate` DESC
LIMIT @__p_0
) AS `t`
As can be seen on SQL1 there is not even an "ORDER BY" clause. Based on the result I get, the ordering seems to be done in client side. But I couldn't find any warning doing so. As per the changelog of EFCore 3.x LINQ queries are no longer evaluated on the client
Note: I don't even know why there is an inner sql in the first place.
MySQL version: 5.7
Operating system: Windows 10
Pomelo.EntityFrameworkCore.MySql version: 3.1.1
Microsoft.AspNetCore.App version: 3.1.4
See upstream issue at https://github.com/dotnet/efcore/issues/23267.
See upstream issue at dotnet/efcore#23267.
The contract of Distinct, does not guarantee the ordering hence once Distinct is applied the inner order by is gone. If ordering is important after applying Distinct then please add OrderBy operation after Distinct.
As pointed out by @smitpatel in the upstream issue linked by @mguinness, you need to apply the OrderByDescending() call _after_ the Destinct() call, because DISTINCT does not explicitly preserve the previous order.
@kocburak Depending on what you want the result to be, one of the following two queries should be what you are looking for:
```c#
var topFiveArticles = _repositoryContext.Article
.Distinct()
.Take(5)
.OrderByDescending(article => article.PublishDate)
.ToList();
var topFiveOrFewerArticles = _repositoryContext.Article
.Take(5)
.Distinct()
.OrderByDescending(article => article.PublishDate)
.ToList();
```
On side-note, there is no ordering done on the client, it is just whatever implicit ordering database gave to us. That ordering could change based on database and its implementation of Distinct operation.
@smitpatel so what you are saying the following two codes are not different from each other from LINQ perspective ? This seems a little bit odd to me.
_repositoryContext.Article.Take(5).OrderByDescending(article => article.PublishDate).Distinct().ToList();
_repositoryContext.Article.Take(5).Distinct().ToList()
@kocburak Yes, that is correct. From the Remarks section of Enumerable.Distinct Method:
[...] The Distinct
(IEnumerable method returns an unordered sequence that contains no duplicate values. It uses the default equality comparer, Default, to compare values. [...])
It is true though, that using LINQ to Objects, while it is not guaranteed that the sequence will be in order, it is in practice when you order it first and then call Distinct().
I have not done real research on how other databases handle a combination of DISTINCT and ORDER BY, but a quick search found these two StackOverflow questions that show, that database handling of DISTINCT and ORDER BY is more complex than it looks like:
So considering all of that, it makes sense to me that Distinct() does not preserve any ordering.
From a practical point of view, you usually order the data as the last operation (unless you use Skip() and/or Take()).
Also, since DISTINCT is basically just a fancy GROUP BY, and GROUP BY is handled before ORDER BY, you want to put your Distinct() call before the OrderBy() call.
Thank you very much everyone for taking time to help. I thought this was some kind of bug but since the behavior is design intent, issue can be closed.
Most helpful comment
@kocburak Yes, that is correct. From the
Remarkssection of Enumerable.Distinct Method:It is true though, that using LINQ to Objects, while it is not guaranteed that the sequence will be in order, it is in practice when you order it first and then call
Distinct().I have not done real research on how other databases handle a combination of
DISTINCTandORDER BY, but a quick search found these two StackOverflow questions that show, that database handling ofDISTINCTandORDER BYis more complex than it looks like:So considering all of that, it makes sense to me that
Distinct()does not preserve any ordering.From a practical point of view, you usually order the data as the last operation (unless you use
Skip()and/orTake()).Also, since
DISTINCTis basically just a fancyGROUP BY, andGROUP BYis handled beforeORDER BY, you want to put yourDistinct()call before theOrderBy()call.