Runtime: Feature Request: Overload for Distinct to receive a func.

Created on 18 Oct 2018 · 14Comments · Source: dotnet/runtime

I noticed in the Distinct there is no overload for receiving a func, we always have to create a new lambdacomparer and put the func inside the comparer, is it possible to have that abstracted and use it like:
Distinct((a, b) => a.Stuff == b.Stuff && a.Stuff1 == b.Stuff1)
Does this even make sense?
If so i think i have a simple solution for this.

api-suggestion area-System.Linq

Source

RubenMateus

Most helpful comment

I think the right way to improve this is to add DistinctBy (already mentioned in comments of https://github.com/dotnet/corefx/issues/14119).

With that and C# 7.0 tuples, you could write:

c# DistinctBy(a => (a.Stuff, a.Stuff1))

This is simpler to use than the proposed overload of Distinct and doesn't have any issues with GetHashCode (because the implementation can call GetHashCode on the tuple).

svick on 18 Oct 2018

👍6

All 14 comments

To be efficient, wouldn't it need two funcs, one for equality and one for hash code?

stephentoub on 18 Oct 2018

what would the hash code do? i dont really know how efficient it would be.
Please give me some guidance, i can gladly contribute to this.

RubenMateus on 18 Oct 2018

Distinct works by building a hash set (https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Set.cs) so that for each new item it sees, it can hash it and determine relatively quickly whether it's already seen that item. That requires a hash code, and that's why IEqualityComparer has both a comparison and hash code method. Without that, Distinct would end up being O(N^2), having to compare every new item individually against every previous item yielded to determine whether it was the same.

So the overload would need to take two delegates, and the call site would end up being quite complicated. I don't think this is something we should add.

If we wanted to do something, I'd prefer to see us add the equivalent of Comparer.Create for EqualityComparer, with EqualityComparer.Create taking the same two delegates. That could then be used with Distinct, but also elsewhere.

stephentoub on 18 Oct 2018

👍1

I think the right way to improve this is to add DistinctBy (already mentioned in comments of https://github.com/dotnet/corefx/issues/14119).

With that and C# 7.0 tuples, you could write:

c# DistinctBy(a => (a.Stuff, a.Stuff1))

This is simpler to use than the proposed overload of Distinct and doesn't have any issues with GetHashCode (because the implementation can call GetHashCode on the tuple).

svick on 18 Oct 2018

👍6

i like @svick idea alot, how hard would it be?

RubenMateus on 18 Oct 2018

There should by all kinds of XxxBy methods (DistinctBy, ExceptBy, MaxBy, ...). I have needed them many times and therefore implemented them.

I have seen some of them proposed in a scattered around way. I would be willing to make a ticket with a formal API proposal that centralizes all of the XxxBy proposals. Is that a good idea?

GSPP on 19 Oct 2018

👍3

.DistinctBy(a => (a.Stuff, a.Stuff1)) is the equivalent of:

.GroupBy(a => (a.Stuff, a.Stuff1)).Select(group => group.First())

Or by merging the Select into the GroupBy:

.GroupBy(a => (a.Stuff, a.Stuff1), (key, group) => group.First())

But with performance opportunities that these equivalents do not have.

jnm2 on 20 Oct 2018

Same situation with MaxBy. .MaxBy(a => (a.Stuff, a.Stuff1)) is the equivalent of the poorly-performing:

.OrderByDescending(a => (a.Stuff, a.Stuff1)).First()

jnm2 on 20 Oct 2018

any news on this? will it enter in any release?

RubenMateus on 21 Jul 2020

I recently learned that Distinct has a documented guarantee that values will be returned in the original order in which they are seen, but GroupBy documentation explicitly does not guarantee that groups will be returned in the original order in which each key was first seen.

Edit: Actually it's the other way around

jnm2 on 22 Jul 2020

@jnm2 Where did you see that? The documentation for GroupBy<TSource,TKey>(IEnumerable<TSource>, Func<TSource,TKey>) says:

The IGrouping<TKey,TElement> objects are yielded in an order based on the order of the elements in source that produced the first key of each IGrouping<TKey,TElement>. Elements in a grouping are yielded in the order they appear in source.

So, at least this overload does guarantee the order. (Though not all overloads have this note.)

svick on 22 Jul 2020

Oh no, I had them exactly backwards! GroupBy preserves ordering and Distinct does not.

jnm2 on 22 Jul 2020

So, something like this?

public static class Enumerable
{
    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
}

FWIW F# core has a similar method.