I noticed in the Distinct there is no overload for receiving a func, we always have to create a new lambdacomparer and put the func inside the comparer, is it possible to have that abstracted and use it like:
Distinct((a, b) => a.Stuff == b.Stuff && a.Stuff1 == b.Stuff1)
Does this even make sense?
If so i think i have a simple solution for this.
To be efficient, wouldn't it need two funcs, one for equality and one for hash code?
what would the hash code do? i dont really know how efficient it would be.
Please give me some guidance, i can gladly contribute to this.
Distinct works by building a hash set (https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Set.cs) so that for each new item it sees, it can hash it and determine relatively quickly whether it's already seen that item. That requires a hash code, and that's why IEqualityComparer has both a comparison and hash code method. Without that, Distinct would end up being O(N^2), having to compare every new item individually against every previous item yielded to determine whether it was the same.
So the overload would need to take two delegates, and the call site would end up being quite complicated. I don't think this is something we should add.
If we wanted to do something, I'd prefer to see us add the equivalent of Comparer.Create for EqualityComparer, with EqualityComparer.Create taking the same two delegates. That could then be used with Distinct, but also elsewhere.
I think the right way to improve this is to add DistinctBy
(already mentioned in comments of https://github.com/dotnet/corefx/issues/14119).
With that and C# 7.0 tuples, you could write:
c#
DistinctBy(a => (a.Stuff, a.Stuff1))
This is simpler to use than the proposed overload of Distinct
and doesn't have any issues with GetHashCode
(because the implementation can call GetHashCode
on the tuple).
i like @svick idea alot, how hard would it be?
There should by all kinds of XxxBy
methods (DistinctBy, ExceptBy, MaxBy, ...). I have needed them many times and therefore implemented them.
I have seen some of them proposed in a scattered around way. I would be willing to make a ticket with a formal API proposal that centralizes all of the XxxBy
proposals. Is that a good idea?
.DistinctBy(a => (a.Stuff, a.Stuff1))
is the equivalent of:
.GroupBy(a => (a.Stuff, a.Stuff1)).Select(group => group.First())
Or by merging the Select into the GroupBy:
.GroupBy(a => (a.Stuff, a.Stuff1), (key, group) => group.First())
But with performance opportunities that these equivalents do not have.
Same situation with MaxBy. .MaxBy(a => (a.Stuff, a.Stuff1))
is the equivalent of the poorly-performing:
.OrderByDescending(a => (a.Stuff, a.Stuff1)).First()
any news on this? will it enter in any release?
I recently learned that Distinct has a documented guarantee that values will be returned in the original order in which they are seen, but GroupBy documentation explicitly does not guarantee that groups will be returned in the original order in which each key was first seen.
Edit: Actually it's the other way around
@jnm2 Where did you see that? The documentation for GroupBy<TSource,TKey>(IEnumerable<TSource>, Func<TSource,TKey>)
says:
The
IGrouping<TKey,TElement>
objects are yielded in an order based on the order of the elements insource
that produced the first key of eachIGrouping<TKey,TElement>
. Elements in a grouping are yielded in the order they appear insource
.
So, at least this overload does guarantee the order. (Though not all overloads have this note.)
Oh no, I had them exactly backwards! GroupBy preserves ordering and Distinct does not.
So, something like this?
public static class Enumerable
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
}
FWIW F# core has a similar method.
This request is already covered by #27687. Closing this one.
Most helpful comment
I think the right way to improve this is to add
DistinctBy
(already mentioned in comments of https://github.com/dotnet/corefx/issues/14119).With that and C# 7.0 tuples, you could write:
c# DistinctBy(a => (a.Stuff, a.Stuff1))
This is simpler to use than the proposed overload of
Distinct
and doesn't have any issues withGetHashCode
(because the implementation can callGetHashCode
on the tuple).