https://github.com/dotnet/corefx/issues/13842 introduces TakeLast
and SkipLast
which existed in Ix.NET (System.Interactive
, which is part of the Reactive Extensions project) for a long time. This may be a good time to review other operators we got in Ix.NET and see whether they're worth adding to .NET proper.
I'm pasting a short selection (using the Ix names) below:
No overloads of these operators exist with an IComparer<TSource>
. The existing overload uses Comparer<TSource>.Default
to get the comparer. It seems an omission that we don't provide an option to pass in a custom comparer.
public static TSource Min<TSource>(this IEnumerable<TSource> source, IComparer<TSource> comparer);
public static TSource Max<TSource>(this IEnumerable<TSource> source, IComparer<TSource> comparer);
Similar to MinimalBy
and MaximalBy
in Mathematica. The current overloads with a Func<TSource, TResult>
are merely shortcuts for xs.Select(f).Max()
rather than providing a means to get the list of values that have the lowest projected value (e.g. the list of people with the lowest value for Age
).
public static IList<TSource> MinBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
public static IList<TSource> MinBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer);
public static IList<TSource> MaxBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
public static IList<TSource> MaxBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer);
Just like Task<Task<T>>
can be unwrapped into Task<T>
or AggregateException
can be flattened. Obviously, this is just a macro for xss.SelectMany(xs => xs)
, but possibly more descriptive and having better performance.
public static IEnumerable<TSource> Concat<TSource>(this IEnumerable<IEnumerable<TSource>> sources);
public static IEnumerable<TSource> Concat<TSource>(params IEnumerable<TSource>[] sources);
I noticed we've added Prepend
. Ix provides StartWith
which is basically the same but takes a params
array with elements to prepend. (Note it comes from the dual of StartWith
on IObservable<T>
where we can start an event sequence with one or more initial values provided from an array.) It could be done quite easily today using new[] { ... }.Concat(xs)
but may be deemed a natural extension to Prepend
.
public static IEnumerable<TSource> StartWith<TSource>(this IEnumerable<TSource> source, params TSource[] values);
Totally analogous signatures to Aggregate
but yields every intermediate computed result. As usual, it comes from the duality with Rx where we deal with streams of events (that may never terminate). This said, it has value on a lazy enumerable sequence, e.g. when processing large logs and providing all intermediate aggregation values as the analysis is going on. Aggregate
is pretty much Scan
followed by Last
, only optimized.
public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> accumulator);
public static IEnumerable<TSource> Scan<TSource>(this IEnumerable<TSource> source, Func<TSource, TSource, TSource> accumulator);
There are a bunch more that are likely a bit more esoteric and of lesser value, but nonetheless worth having a look at. One family of them deals with sharing side-effects of enumeration in a less aggressive way than the traditional ToArray
to ToList
approach; see things like Share
, Memoize
, and Publish
.
It's been a while since I've worked on / looked at this code, so some design decisions may have eroded from my memory, but I'll do my best recollecting these if questions come up.
+1
Related: MinBy
and MaxBy
were part of https://github.com/dotnet/corefx/issues/2146, which was closed.
Also relevant: https://github.com/morelinq/morelinq. DistinctBy
sounds like a good idea, for example.
@bartdesmet Nice proposal. I have a few comments, though:
MinBy
/ MaxBy
should return ILists, though, seems very un-Linqish (IEnumerables would be better).Concat (or Flatten or Unwrap)
Obviously, this is just a macro for xss.SelectMany(xs => xs), but possibly more descriptive and having better performance.
SelectMany
is only run once per subcollection, not per item, so it does not have too much overhead. Also, I have recently made changes that speed it up significantly: https://github.com/dotnet/corefx/pull/13942StartWith (Prepend with multiple values)
I think PrependMany
would be a better name, and if we add this we should also add EndWith
/ AppendMany
for symmetry.
We may want to hold on this until there is support for params IEnumerable<T>
in C#. That way, with the following signature:
IEnumerable<T> PrependMany<T>(this IEnumerable<T> source, params IEnumerable<T> prependees)
{
}
var onetwothree = Enumerable.Range(3, 1).PrependMany(1, 2);
the compiler can cache the [1, 2]
collection in a static field. If we had the signature as params T[]
, a defensive copy would have to be allocated each time.
Scan (aggregate yielding intermediate results)
@jamesqo
I don't think
MinBy
/MaxBy
should return ILists, though, seems very un-Linqish (IEnumerables would be better).
I would expect them to return a single value (for example, MoreLinq's version does that). But if it's going to return a sequence, a list does make sense to me, since it can't really be lazy (though I would prefer IReadOnlyList<T>
).
Should it return a sequence containing all the values with the minimum/maximum key? Not sure. I think getting just a single value is the common case, and returning a sequence would make that common case more complex and also less efficient.
On the other hand, knowing if there's a tie (and what values are tied) sounds useful too. Maybe both versions of the method should exist? Though in that case I'm not sure about naming (maybe AllMinBy
?).
I'm neither firmly for or against most of these, but a MinBy or MaxBy that returns more than one value is horrible, at least under that name. If you've more than one minima or maxima you either need more information that just the value (which that doesn't seem to offer to give) or you need to rethink your comparison criteria.
@JonHanna When you add a selector it becomes feasible that multiple elements may be mapped to the same maxima. For example, if you wanted to find the oldest people in a List<Person>
:
struct Person { string Name; int Age; }
var people = new List<Person>
{
new Person { Name = "John", Age = 25 },
new Person { Name = "Tommy", Age = 80 },
new Person { Name = "Mary", Age = 80 }
};
var oldest = people.MaxBy(p => p.Age);
If we took away the selector then returning an IEnumerable wouldn't make sense. That said, it might still be confusing/inconvenient that MaxBy
returns a list when Max
returns a scalar. Maybe we should rename it to MaximaBy
.
Oh, I get that it's possible to have more than one, but I don't think "MaxBy" is a good match for that, and often nor is just the value; one would often want to know indices too.
I'll refrain from debating names here and provide some clarifications on the semantic decisions we made in Ix. (See Wadler's law.)
MinBy
and MaxBy
in Ix.NET initially returned one element (first match) but we changed it based on feedback from users, especially folks familiar with Mathematica etc. It's quite hard to get the list of minima and maxima from existing operators (short of using Aggregate
with a List<T>
seed value and all the min/max logic embedded in its selector functions, or using a much more expensive OrderBy
), but the opposite is easier (get one from many, albeit at the cost of having an allocation).
These observations / discussions lead me to conclude both cases (return a list a la Mathematica or a single element a la Scala) have a use. Also, getting the index is easy by composition with the overload of Select
that provides access to the element's index.
The choice of IList<T>
predates the existence of IReadOnlyList<T>
. We explicitly chose a return type that reflects the eager nature of the operator. In fact, LINQ has places where the type promises too much, e.g. GroupBy
not returning lazy enumerables (though that's totally possible but not trivial; we have a few such implementations laying around here) and thus not working on infinite streams (even if you just want a few groups and a few elements per group). A similar design decision was made in Rx for the return type of the Buffer
operator being IObservable<IList<T>>
, so Ix reflected that as part of its symmetry.
With regards to IE<T> Concat(IE<IE<T>>)
aka Flatten
, there may be very little performance to be gained over SelectMany
with the identity function, so it's more a matter of expectation as to whether one finds such an operator when needed or expected. Things like Task<Task<T>>
and AggregateException
have similar notions of flattening. For example, Scala has a flatten function too.
@bartdesmet Regarding Flatten
, I've changed my mind over the past week. Common patterns like
List<List<T>> listOfLists = ...
T[] flat = listOfLists.SelectMany(l => l).ToArray();
are currently very hard to optimize because we do not know l => l
will always result in an ICollection, and we may not be able to get the count of all the collections cheaply. Therefore, we have to do dynamic resizing during ToArray
.
With Flatten
, however, we can check if the source implements IEnumerable<ICollection<TSource>>
and if so just sum up all of the counts of the collections and preallocate an array. So I guess it would be a good idea to add such a function.
I'd expect MinBy
to return a single thing and MinsBy
to return zero or more things.
What's the status of this issue? It's still labeled api-needs-work
, but there seems to be an agreement on the shape of MaxBy
/MinBy
, at least. Maybe the issue should be split into separate issues for each method (or each method family) to ease the process of refining the API.
IMO MaxBy
and MinBy
are sorely needed. They address a very common need, and the current alternatives are horrible:
OrderByDescending(...).First()
is O(n log n)
when MaxBy
could be O(n)
Aggregate
can be leveraged for this, but it's unwieldy and hard to read.DistinctBy
, and possibly IntersectBy
, UnionBy
, ExceptBy
and SequenceEqualBy
would also be useful.
@thomaslevesque
SequenceEqualBy
That seems unnecessary since the return value doesn't need the type of the sequence before applying the selector. Just do source.Select(...).SequenceEqual(other.Select(...))
The other set operations you mention seem like they would be good additions, though.
That seems unnecessary since the return value doesn't need the type of the sequence before applying the selector. Just do source.Select(...).SequenceEqual(other.Select(...)) The other set operations you mention seem like they would be good additions, though.
It's true that SequenceEqualBy
can be achieved with existing methods, but it's less intuitive. Also, a specific implementation could be slightly faster, since it wouldn't have the overhead of the Select
iterator.
@thomaslevesque
OrderByDescending(...).First()
isO(n log n)
whenMaxBy
could beO(n)
Not anymore, see https://github.com/dotnet/corefx/pull/2401.
@svick nice!
With regards to IE
Concat(IE >) aka Flatten, there may be very little performance to be gained over SelectMany with the identity function, so it's more a matter of expectation as to whether one finds such an operator when needed or expected. Things like Task > and AggregateException have similar notions of flattening. For example, Scala has a flatten function too.
This would be great! Just as long as it is not called Merge
....
Regarding MinBy
and MaxBy
, I definitely think a single value should be returned. While multiple results are of course a possibility, I think that the By
suffix indicates a desired behavior in the vein of DistinctBy
, ExceptBy
, and IntersectBy
. In my mode of reasoning, which often involves fuzzy logic, The keySelector
strongly suggests that you do not care which one is returned, so long as it is maximal or minimal. These operations tend to be inherently reductive (in a highly useful way).
LINQ is all about productivity. Why not make a productive library even more productive?
My vote is then yeah.
So nice suggestion! I want to use Ix.NET methods without Introducing Library via nuget.
But it is so hard to introduce all Ix.Net methods at the same time to dotnet core.How about introduce and implement them individually?
In this issue, we discuss about Scan
methods.
https://github.com/dotnet/corefx/issues/31348
without Introducing Library via nuget.
Why?
without Introducing Library via nuget.
Why?
https://github.com/dotnet/corefx/issues/14119#issuecomment-459973238
Because I want to use Ix.NET methods as language builtin Collection extension methods.
And I want to make many C#er can use Ix.NET methods.
I think, there is so big difference between
Many C#er study and can use LINQ to Objects (language builtin Collection extension methods).
But Libary Collection extension methods are not same. Some people like and use Ix.NET, MoreLinq and others, but the other people don't use them and don't know them.
About LINQ to Objects, code reading, code review and communication are so easy because many people can use them.
But about Library Collection extension methods, they are not easy. Because not all people know them.
There are so many useful methods in Ix.NET. They make so big productivity.
If Ix.NET methods are added in .NET core in the future, I think that many C#er will be able to use them. And code reading, code review and communication about them will become so easy. I think this introduce more productivity for C#er and C# community.
Some people say that "You can create your own Collection extension methods". Many people create their own Collection extension methods individually. Sometime they are so useful, but sometime they are confused. Because they are not same implementation and specification.
If C#(.NET Core) provides Ix.NET methods as language builtin methods, we can use as common implementation and specification methods. It is so useful and clear.
Because I want to use Ix.NET methods as language builtin Collection extension methods. And I want to make many C#er know and can use Ix.NET methods.
Thanks. I read all of those as, paraphrasing: "because I don't want to use another library". To me that's not a good enough reason to add such methods. The core frameworks simply cannot scale to add every possible extension method someone might want. For the .NET ecosystem to thrive, developers need to be comfortable using libraries from NuGet and the broader ecosystem.
Thank you!
For the .NET ecosystem to thrive, developers need to be comfortable using libraries from NuGet and the broader ecosystem.
Yes! I understand and agree above.
Thanks. I read all of those as, paraphrasinAnd I want to make many C#er can use Ix.NET methods.
I'm so sorry. My main suggestion and opiniton are that "I want to make many C#er can use Ix.NET methods."
But I understand next important point at the same time.
The core frameworks simply cannot scale to add every possible extension method someone might want.
By the way, Ix.NET TakeLast
and SkipLast
methods were introduced in dotnet/runtime#19431. This means that "If the method is useful and need, it will be added as .NET core builtin method.", isn't it? So I will suggest issues that introduce Ix.NET methods indivisually.
So I submited new issue IsEmpty
Ix.NET method in https://github.com/dotnet/corefx/issues/35054
And I take part in the issue about Scan
like method : https://github.com/dotnet/corefx/issues/31348
So I will not introduce all Ix.NET methods. But I will introduce methods that they are useful and need for .NET core from Ix.NET.
This may be a good time to review other operators we got in Ix.NET and see whether they're worth adding to .NET proper.
I'd like to question the premise here. In Ix there is an existing popular library, available from NuGet, for free, that already enables any developer using .NET to get this functionality. What's the benefit of copying that functionality into Enumerable? I would much rather spend such efforts removing whatever roadblocks exist for folks to partake in the NuGet ecosystem.
I'm totally for growing the ecosystem and ironing out any wrinkles that may hinder adoption of libraries through NuGet.
One of the motivating issues for opening this was the conflict introduced by TakeLast and SkipLast in Ix where the same extension method ended up being provided in .NET, thus having to add #if in Ix to create different build flavors to resolve the ambiguity for users of those operators (while keeping downlevel support). This is the main drawback of multiple libraries providing the same extension methods and may be a topic to consider in the design of "extension everything".
From the list above, the one inconsistency I see in System.Linq is the lack of a comparer on Min and Max, while other operators do provide a customization for comparers. All other proposals do not intersect with existing operators and can be punted to Ix.
As for Ix, given the recent work on System.Linq.Async over there, I think it's totally fair to continue going down the road of contributing such libraries through NuGet and focus on discoverability.
Related to #27687
As already mentioned adding more methods to System.Linq comes at a cost, so we would need to see strong evidence for the utility of any new method on a case by case basis. As it stands this issue is more of a wish list of assorted LINQ methods, making it more difficult to evaluate during API review. I would recommend closing it and creating separate issues for each proposal, so that we can have a more focused debate.
A few thoughts on the individual proposals:
MaxBy
and MinBy
are genuinely useful additions, that cannot be easily expressed with the existing API. Already covered by #27687.Concat
seems like a useful addition. Would you be able to create a dedicated issue for it?StartWith
since it's really just a slightly different Prepend
.Scan
seems pretty niche. While I can see why it might make sense in Rx-like flows, I don't think it would see much use in the context of LINQ. IMHO Aggregate
is pretty niche as well and should best be avoided.
Most helpful comment
Thanks. I read all of those as, paraphrasing: "because I don't want to use another library". To me that's not a good enough reason to add such methods. The core frameworks simply cannot scale to add every possible extension method someone might want. For the .NET ecosystem to thrive, developers need to be comfortable using libraries from NuGet and the broader ecosystem.