Runtime: Proposal: Introduce an extension method 'AsList()' in the 'IEnumerable/IEnumerable<T>'.

Created on 9 Oct 2020  路  21Comments  路  Source: dotnet/runtime

Background and Motivation

The ToList() method is always creating an instance of List<T> object in which on some cases, not really necessary. If the current instance of the IEnumerable is already a List<T>, then, the creation is not needed, but instead, simply cast it back to that specific type.

Sample simple implementation can be found here.

Most developers used the ToList() without needing the back-reference to the original object (and/or IEnumerable), specially if the items where created via yielded operation. However, without knowing behind the scene, it adds an overhead in every call. You will notice the differences if you are processing huge items within the IEnumerable and/or within the iteration.

Proposed API

public static List<T> AsList<T>(this IEnumerable<T> value)

It is up to you whether to use an IList<T> or List<T>.

Usage Examples

Let us say, the method GetPeople() returns a yielded IEnumerable of type Person class.

var people = GetPeople(1000).AsList();

Risks

No breaking changes on the existing behavior of the runtime. In addition to that, by having this method, there will be big performance and memory-efficiency benefits to most use-cases.

Please note that the ToList() will still be present, it will works alongside with this new proposed AsList() method. But of course, the case is different.

api-suggestion area-System.Linq

Most helpful comment

I don't think we would be able to take this proposal on account of the following reasons (already sketched above):

  • Chained LINQ operations hardly ever return lists, so this would almost always resort to copying.
  • Potentially gives write access to a collection that shouldn't be modified by the caller.
  • Can have unpredictable performance depending on the runtime type of the IEnumerable.
  • The AsList name does not clearly communicate intention, and as @Clockwork-Muse mentioned it would almost certainly lead to confusion with ToList.
  • It is trivial to implement as an extension method, for users that need it and know where to use it.

All 21 comments

Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley
See info in area-owners.md if you want to be subscribed.

This assumes that there isn't a chain of linq operations, which is rarely going to be the case. The moment you have _anything_, you need a new collection. Certain operations you won't even know the item count until you going to process it. Something like Select changes the item types, so can't be put back into the original array anyways.

Additionally, the output of linq operations are, from the outer collection standpoint, independent; if you add an item to the output of ToList(), then the item wasn't added to the original collection. This proposal would change that, which would be surprising.

Let us say, the method GetPeople() returns a yielded IEnumerable of type Person class.

If you mean GetPeople() is doing something like:

IEnumerable<Person> GetPeople(int count)
{
    for (int i = 0; i < count; i++)
    {
        yield return new Person();
    }
}

... there's no backing array/list.


Note that there are optimizations for the case where there aren't any operations - it'll return a duplicate list, but it creates it at the proper size and does a single copy.

I only place the yielded GetPeople() on the sample calls, but not really on the actual use-case it is being solved. Think of it this way. Let us say, such method return the List<T> of type IEnumerable<T>.

public IEnumerable<Person> GetPeople(int count)
{
    var list = new List<Person>();
    for (int i = 0; i < count; i++)
    {
        list.Add(new Person());
    }
    return list;
}

Most developers does not care that calling the ToList() outside the boundary of this method is an additional overhead.

var list = GetPeople(1000).ToList(); // Creates additional LIst<T>

And on this case, what I am trying to eliminate is to remove the separation of the original reference compared to the new reference, therefore, the code below returns the same instance.

var list = GetPeople(1000).AsList().AsList().AsList(); // Same instance

I would humbly say, maybe, 80-90% of your code or cases is ignoring the original list. So the case that is being solved by this is quite high in my opinion.

The request is not for the chain-of-calls, identical to what have the ToList() is doing, so whatever it is, it will create or not-create the List<T> if necessary.

Think of it as a method alternative to ToList() but is somehow more useful to the people caring on performance and efficiencies, specifically those OSS maintainers.

Most developers does not care that calling the ToList() outside the boundary of this method is an additional overhead.

... because most developers are probably just iterating through the results (via foreach). If they need a list they probably already _have_ one, so they'd just add the results to their existing collection.

And on this case, what I am trying to eliminate is to remove the separation of the original reference compared to the new reference, therefore, the code below returns the same instance.

And like I said before, none of the current linq extension methods return an unmodified reference to the source list, so this would be surprising.

Think of it as a method alternative to ToList() but is somehow more useful to the people caring on performance and efficiencies, specifically those OSS maintainers.

If the maintainers really want more performance, they should really bug whoever maintains GetPeople() to return a list (or take a mutable collection), not return IEnumerable<T>. For one thing, you don't have any guarantee that the method will always return a list as the concrete type; attempting to use this method _specifically_ for performance improvements isn't guaranteed to work, you'd be relying on undocumented behavior (which is a good way to get yourself burned) - especially since this would be per-call behavior.

If they need a list they probably already have one, so they'd just add the results to their existing collection.

Probably? Let us not assume that it happens all the time. Most of the cases, the developers is calling the ToList() to build the list if they already have it as a collection from the source (aka IEnumerable<T>).

One sample is, querying the data from the DB or from the API that returns a referenced of the List<T> but of type IEnumerable<T> as polymorphed.

In OOP POV, IEnumerable<T> is the best area to encapsulate and polymorph things, not through the List<T>. Herewith, I am saying, polymorphing to the lowest ref possible the wider range you can polymorph.

And, calling the ToList() would remove that reference, in which I guess completely make no sense if you are really to get those reference.

You have the point that we should bug down how maintains that GetPeople() method, but IMO, that is not the most of all the cases.

I dislike (at least at the sample implementation) that it isn't known beforehand whether it allocates or not.
For ToList it's know to allocate (and create a "copy").
To take this into account, it should be TryAsList and returning false if it isn't a List<T>. But then there is no benefit over a manual check
```c#
IEnumerable source = ...

if (source is List list)
{
// ...
}
```

It is known to allocate it. The case is to create a type of List<T> if it is not of that type. As mentioned, most users are creating a list from the source itself by calling the ToList() method, if it is not explicitly instantiated through new List<T>(). It forces allocating it to a variable and 'test' casting is not necessarry for the upcoming usage.

With TryAsList(), it has no help at all as you try to counter act on the else condition to force the creation, hence you know that the source is not of type List<T>, it seems to just an additional code line with no purpose IMO.

It is known to allocate it.

Not in the sample implementation.
If it's already an list, then it is only casted without any further allocation.

just an additional code line with no purpose IMO.

Exactely. This sample is to show that I don't see any use for such an extension, as a mojor point (does it allocate or not) is not obvious.

@gfoidl - now I got confused. I am referring to ToList() method that is always allocating, however, you're on the AsList() method on my own library implementation, I tried reading back again the conversation to realign the context.

Anyway, you are correct that on the AsList(), it is not known beforehand and that is the problem that I am solving here. This proposed method will help us remove the extra allocation of creating a new List<T> object as the developers (most likely, not doing a code like if (source is List<T>) { /* do something*/ } most of the time) but rather than, simply calling the ToList() method, as part of laziness (me as well).

Exactely. This sample is to show that I don't see any use for such an extension, as a mojor point (does it allocate or not) is not obvious.

I did explain the benefits above, and hope you get the point of where should you use the extension on its purpose. Try use my sample and call it in the iteration or a list and/or source with million items inside it, and compare/see the spikes of the memory allocations when being executed together with ToList().

Why does my library and Dapper have it? That's because it is important.

Furthermore, a different story, OfType<T>() and ToArray() can also be challenged, but I will bring that in a different thread.

@mikependon: sorry if I've confused you, this is not my intention.

My whole point is: if I use the extension AsList I want to know whether it will allocate or not.

it will create or not-create the List<T> if necessary.

So the "when" is hidden. If I have to know that it won't allocate if it's already a list, then I can just do if (source is List<T> list) and I'm done.

caring on performance and efficiencies,

Especially on those areas I have to know when an allocation happens. If this is hidden by this API, then I don't see the point for this API.

No worries 馃槅. Anyway, with the code like below, you force the consumers to implement the false condition in a separate manner.

var people = GetPeople(1000); // Returns an IEnumerable<T> of type List<T>
if (source is List<T> list) // Additional coding which is not needed, IMO
{
    Process(list);
}
else
{
    Process(people); // Additional coding which is not needed, IMO
}

This is where the AsList() becomes so useful, in which most case (I would say 80% to 90% of the scenarios) people are not checking the source collection (in this case the people variable). They're just using the ToList() right away. See below:

var people = GetPeople(1000).AsList();
// Make some changes to the `people`
Process(people); // Polymorphing (IEnumerable<T>)

The code is cleaner, neat and is easy to maintain, not to mention, performant and efficient.

If you always would like to expect an allocation, then use the ToList(), I am not proposing to remove that馃榾. Hope this helps explain!

The code is cleaner, neat and is easy to maintain, not to mention, performant and efficient.

Except for maybe the " cleaner, neat" part, this is all false. The moment the internal implementation changes the rest of the line is invalidated.

If you always would like to expect an allocation, then use the ToList(), I am not proposing to remove that馃榾. Hope this helps explain!

If I don't want any extra allocations (like, any at all - in many cases it's not going to be enough of an issue), I'm going to be yelling at whoever maintains GetPeople to take a mutable collection. Because that list is going to be created at the start of my program - GetPeople creating a new list on each call would be its own problem. If you're worried about allocations of intermediate collections, you're probably not going to be using linq in the first place (since they're rather common).

Except for maybe the " cleaner, neat" part, this is all false. The moment the internal implementation changes the rest of the line is invalidated.

I cannot understand what internal implementation you are referring here. Please elaborate. This request will not trigger a change on the existing capability.

It is also good if you can justify the "false" statement on behalf of the performant, efficient and easy to maintain comments I made. I already explain the benefits on the above comments.

I think you missed the point that GetPeople could be an output or a resultset of an external library embedded on your solution, and that you do not have controls there.

@mikependon -

The problem I'm referring to is when the internal implementation of GetPeople changes, or even if it doesn't always return a list.

The signature states IEnumerable, you have no guarantee that it ever returns a list in the first place. The "false" claim is due to the fact that you have no way to guarantee whether any particular call to the method will gain the benefit, which makes it unusable from a performance tuning point of view. Attempting to _regain_ the possible benefit (by, for example, using a cached external list) will make it more complicated than if you'd just assumed it wasn't a list to begin with. Taking a dependency on the internal behavior of another method is always risky, and prone to being broken.

Cool, got it! From my stand point, whatever is the underlying type, for as long it has inheritted the type IEnumerable, then the ToList() method can be called. Now, it is unfortunate in most cases that if the source itself is a type of List<T>, the method ToList() is still returning a new instance of List<T>, breaking the chain of references. Well, that is OK, for as long you as the caller is aware of it.

Now, with AsList<T> method, it behaves like ToList<T> but just checks the instance equality prior the actual creation. This is to ensure that the back reference is not broken. And again, it is a new use-case that ToList<T> is not solving.

Why ToList<T> only? It is the most used method from the Linq.

Referring to the performance, now I understand why we have deviations here. I am only referring to the fact if you compare only ToList<T> and AsList<T> method, head-to-head. I am not referring to the source collection implementation as you referring, which in my case we have no control on that. It is the boundary between the .NET Engine/Runtime and the user of the .NET, someone needs to limit the access/standard as difference users may write different code for the same intention, again we have no control on that area. Therefore, if we eliminate that area and if we are to compare these methods, no doubt the AsList<T> will ahead from the ToList<T>. So, let me know if you need some samples.

Imho, I think such proposals come out of a bad design where you basically use List everywhere but abstract it out using IEnumerable everywhere. From perf point of view it's a no anyway. Method you propose simply contributes to this bad design. IEnumerable is designed to be readonly and now you're trying to find a way to mutate the contents in (1/x cases, it's unpredictable) and on top of that get it to bcl. Maybe your use case is strictly faster iteration, but it has another effects and I don't think it belongs to bcl. Also it opens paths to questions like why AsList and not AsArray, AsHashSet, AsWhatever?

IEnumerable is designed to be readonly and now you're trying to find a way to mutate the contents in (1/x cases, it's unpredictable) and on top of that get it to bcl.

@En3Tho - then, please justify then why the ToList() method exists.

AsList and not AsArray, AsHashSet, AsWhatever

Already explained above, I will take the other As convention-based method on the other thread.

[EDIT]:

Maybe your use case is strictly faster iteration, but it has another effects and I don't think it belongs to bcl.

Again, please justify. What are the effects if this is a new method and not affecting the others?

@mikependon

then, please justify then why the ToList() method exists.

As a convenience to get a concrete/mutable collection when you have a chain of linq operations, as opposed to calling something like new List<Person>(someEnumerable); - it also enables optimizations in certain cases. The expectation is that it always allocates, because in most cases there isn't going to be a raw collection immediately behind it.

Again, please justify. What are the effects if this is a new method and not affecting the others?

It's not going to affect the behavior or implementation of any of the others. The effects it's going to have include:

  • Cause confusion as to the difference between ToList and AsList (especially as in the majority case it still has to allocate a new list regardless)
  • Cause people to have false expectations about performance gains when they use it on a method that is not returning a list, or only sometimes returns a list.
  • Cause issues if the cast list wasn't intended to be mutable. IEnumerable is inherently readonly, so I should be able to share it with multiple callers. Mutating the list can potentially invalidate something that was supposed to be an invariant.

This _feels like_ you're attempting to get a "global solution for a local problem" - that there's some method you're calling that is returning a list as an IEnumerable, and you want to try to remove that particular allocation.

I don't think we would be able to take this proposal on account of the following reasons (already sketched above):

  • Chained LINQ operations hardly ever return lists, so this would almost always resort to copying.
  • Potentially gives write access to a collection that shouldn't be modified by the caller.
  • Can have unpredictable performance depending on the runtime type of the IEnumerable.
  • The AsList name does not clearly communicate intention, and as @Clockwork-Muse mentioned it would almost certainly lead to confusion with ToList.
  • It is trivial to implement as an extension method, for users that need it and know where to use it.

Thanks gents for the clarity. I just closed this ticket now.

Thank you for taking the time to contribute the proposal.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jkotas picture jkotas  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments

yahorsi picture yahorsi  路  3Comments

v0l picture v0l  路  3Comments

nalywa picture nalywa  路  3Comments