I propose to add LINQ methods of the pattern *By
. For example:
IEnumerable<TSource> DistinctBy<TSource, TKey>(
IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer = null)
This method would behave like Distinct
except that equality is determined based on the key provided by keySelector
. The key could be any value including anonymous types and value tuples.
A motivating case could be this:
IEnumerable<Order> allOrders = GetTransactions();
IEnumerable<int> completedOrderIDs = GetCompletedOrderIDs();
IEnumerable<Order> remainingOrders =
allOrders.ExceptBy(completedOrderIDs, order => order.ID, orderID => orderID);
Logic like that is reasonably common in business logic code. It is not easy to implement without ExceptBy
. In particular, the following is undesirable because it leads to quadratic cost and repeated enumeration:
IEnumerable<Order> remainingOrders = allOrders.Where(o => !completedOrderIDs.Contains(o.ID));
In the past I have had a need for *By
methods many times so I have written them myself. A web search reveals great interest. I believe there is a strong case for adding methods like this.
There has been interest on this issue tracker as well:
Here is the API propsal. All of these methods come with an overload with and without comparer
. I kept the existing naming conventions and argument ordering.
IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst)
IEnumerable<TSource1> ExceptBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond)
IEnumerable<TSource1> ExceptBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond, IEqualityComparer<TKey> comparer)
IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst)
IEnumerable<TSource1> IntersectBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond)
IEnumerable<TSource1> IntersectBy<TSource1, TSource2, TKey>(this IEnumerable<TSource1> first, IEnumerable<TSource2> second, Func<TSource1, TKey> keySelectorFirst, Func<TSource2, TKey> keySelectorSecond, IEqualityComparer<TKey> comparer)
IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector)
IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer)
TResult Max<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TResult defaultValue)
TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer)
TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TSource defaultValue)
TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer)
TResult Min<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TResult defaultValue)
TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer)
TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult> comparer, TSource defaultValue)
Further notes:
DistinctBy
: The order of the output elements should be documented to be the same order as the input. Any reasonable implementation that comes to mind does it this way. For compatibility reasons this could not ever be changed anyway after the first version ships. Distinct
should be similarly documented if not already done. Distinct
already behaves this way.ExceptBy
and IntersectBy
the same is true for the first input. Only items from the first input are ever returned and their order can be kept. Except
does the same thing today.UnionBy
I'm not sure about the order.MinBy
/MaxBy
it should be documented that the first element in the sequence with the minimum/maximum key is returned.MinBy
/MaxBy
can take a defaultValue
which is used in case the sequence is empty. Overloads without default value throw in that case.Min
/Max
to bring them on par with the functionality added by MinBy
/MaxBy
(comparer
and defaultValue
).ExceptBy
and IntersectBy
can be different. This is because we only ever return items of the first sequence. We need two key selectors in that case. There's a simpler overload that has only one element type as well.Not sure why dotnet/runtime#14753 was closed way back when.
It is crazy that LINQ doesn't have these methods built in. I think it's been rare in the 10 years since LINQ's introduction that I've ever had a project where I don't end up writing one or more of these extension methods again and again myself.
I think the argument (mentioned in the other thread) of, "oh well you can do these things by creating a custom comparer" doesn't hold water. Writing a custom comparer is very heavyweight. And, in practice, I can probably count on one hand the number of times I've seen developers actually go and write their own comparers. Instead they usually a) use home-grown extension methods that do these things or b) fall back to writing a foreach loop.
One of the main points of LINQ is brevity- writing what you want to do, not all the plumbing code required to do it. Forcing users to write a separate class each time they want to slice their enumerable a different way is the absolute antithesis of this.
To implement these methods, I'd imagine it could just use something like Jon Skeet's ProjectionEqualityComparer and call the existing non-*By methods. (Probably deserves a separate issue, but that class should be exposed publicly in the framework proper as well)
@maryamariyan
The original issue is five years old. Seven versions of .NET Core have been released since then.
These methods are extremely convenient (unlike the overloads with IEqualityComparer) and are easy to implement.
In an effort to unify my own By extension methods (with the hope that those would someday would become part of the runtime) I came looking for a reference implementation that I could grab.
As far as I can tell the extrema-finding functions from both MoreLINQ and Ix.NET cause the evaluation of the selector function for single-element collections.
The decision to skip it (when possible) does come with the side effects of "hiding" a possible exception occurring in it, but at the same time- there is no "valid selector guarantee" with the empty collection either- so to me it would make sense to avoid the possibly lengthy selector evaluation (when possible)- but since the normal Enumerable.Min(source, selector)
does it as well- might be the dominating factor in the final decision...
Anyway, can anybody point me to an existing implementation of the proposed API - looking through these 100+ pages of search results is quite painful (in more than one ways)- and I see no way of filtering out the "ByTest.cs" files...
/cc: @eiriktsarpalis @adamsitnik
@terrajobst What needs to happen to get these API suggestions reviewed and get any official feedback from the team? I saw you mention in the standup yesterday that you guys have a clear plate now... :)
@terrajobst What needs to happen to get these API suggestions reviewed and get any official feedback from the team? I saw you mention in the standup yesterday that you guys have a clear plate now... :)
The API process is documented. Each area has an owner and it's their responsibility to work with the issue opener to get the issue either into the state api-ready-for-review or closed.
In this case, that's @eiriktsarpalis and @adamsitnik.
A few remarks:
MinBy
and MaxBy
. They are useful methods without good alternatives at the moment. I'm not sure if we should add the defaultValue
overloads.DistinctBy
is useful. It should also come with an IQueryable implementation.ExceptBy
solve? You assert that a naive workaround would take quadratic time, however most instances I've seen would simply use completedOrderIDs.ToHashSet();
outside of the main pipeline which is effectively equivalent.ExceptBy
overloads that select keys for the second
enumerable? Using your example it could be easily rewritten in terms of the first oveload: allOrders.ExceptBy(completedOrderIDs.Select(orderID => orderID), order => order.ID);
UnionBy
correctly, is it basically using keySelector
to define custom a equality semantics for TSource
? How is that different from passing a custom IComparer
to the existing Union
method? Why should it even have a different name?IntersectBy
signatures. Am I right to assume they're just mistakenly copied from the ExceptBy
APIs?@eiriktsarpalis Is there a reason you guys haven't reviewed this (or scheduled it for review) yet? I realize it doesn't follow the template but it seems pretty clear what it's asking and obviously you guys are aware of the proposal.
@eiriktsarpalis
How is that different from passing a custom
IComparer
to the existing Union method?
It's not different in terms of behavior (I assume you meant IEqualityComparer), and the same is true of all those proposed methods (ExceptBy, UnionBy, IntersectBy, DistinctBy). But writing a custom equality comparer every time you want to perform one of those operations based on a given property is time-consuming, and the logic of the comparison isn't immediately visible without looking at the implementation. The *By overloads make the code terser and more readable.
Is there a reason you guys haven't reviewed this (or scheduled it for review) yet? I realize it doesn't follow the template but it seems pretty clear what it's asking and obviously you guys are aware of the proposal.
Couple of reasons. The size of the backlog is significant so it takes us time before we can get to triage everything. I also need to be confident that any proposal has a strong chance of getting approved before I mark it as ready for review: this includes gathering evidence that the addition provides value to users, poking holes in the design, and finalizing the API shape. Following the template is part of that, but that's merely a convention meant to speed up API review that frankly takes a tiny fraction of the whole effort.
It's not different in terms of behavior (I assume you meant IEqualityComparer), and the same is true of all those proposed methods (ExceptBy, UnionBy, IntersectBy, DistinctBy). But writing a custom equality comparer every time you want to perform one of those operations based on a given property is time-consuming, and the logic of the comparison isn't immediately visible without looking at the implementation. The *By overloads make the code terser and more readable.
I suppose my original question is should they contain the By
suffix at all, given that they could just be thought of as overloads that accept a key selector instead of a comparer. While most current LINQ methods accepting a keySelector
do have a By
suffix, none of them have equivalent methods that accept IEqualityComparer
instead.
Most helpful comment
Not sure why dotnet/runtime#14753 was closed way back when.
It is crazy that LINQ doesn't have these methods built in. I think it's been rare in the 10 years since LINQ's introduction that I've ever had a project where I don't end up writing one or more of these extension methods again and again myself.
I think the argument (mentioned in the other thread) of, "oh well you can do these things by creating a custom comparer" doesn't hold water. Writing a custom comparer is very heavyweight. And, in practice, I can probably count on one hand the number of times I've seen developers actually go and write their own comparers. Instead they usually a) use home-grown extension methods that do these things or b) fall back to writing a foreach loop.
One of the main points of LINQ is brevity- writing what you want to do, not all the plumbing code required to do it. Forcing users to write a separate class each time they want to slice their enumerable a different way is the absolute antithesis of this.
To implement these methods, I'd imagine it could just use something like Jon Skeet's ProjectionEqualityComparer and call the existing non-*By methods. (Probably deserves a separate issue, but that class should be exposed publicly in the framework proper as well)