Runtime: LINQ FindIndex extension method

Created on 13 Oct 2019 · 12Comments · Source: dotnet/runtime

I keep finding myself writing an FindIndex LINQ expression for IEnumerable<T> that is predicate based to find the index of the first item in a collection that satisfy a criteria

Currently, this could be achieved using ⏬ combo which is not ideal and not very readable at first glance:
``` C#
source
.Select((item, index) => new { item, index })
.FirstOrDefault(x => predicate(x.item))?.index ?? -1;

Let's add new LINQ extensions for this purpose that work exactly like `First` except it returns the index.

A reference implementation:

``` C#
public static int FindIndex<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
   int i = 0;
   foreach (var item in source)
   {
      if (predicate(item))
         return i;
      i++;
   }
   return -1;
}

public static int FindIndex<T>(this IEnumerable<T> source, Func<T, bool> predicate, int start)
{
    var index = source.Skip(start).IndexOf(predicate);
    return index == -1 ? -1 : index + start;
}

api-suggestion area-System.Linq

Source

RezaJooyandeh

👍1

Most helpful comment

@Reza1024

You might deal with a huge collection

You're right that you don't want to convert a huge enumerable to a list. But you also don't want to re-enumerate it, you want to retrieve all the information you need on the first pass.

or even a db

If you're dealing with a database, you almost always have some kind of key and I can't imagine why you'd want to use index instead of that key.

IndexOf(predicate)

I didn't notice this before, but IndexOf searches for a T, not for items matching a predicate. List<T> has a method that does this, called FindIndex. If that's the method that you actually want, I think you should make that clearer by using that name, instead of IndexOf.

counting how many objects are smaller than the one that satisfies the predicate

I think the proposed approach is harder to understand and harder to translate to a query language (usually SQL) than what is currently possible.

Consider:

c# players.Count(p => p.Score < currentPlayer.Score) players.OrderBy(p => p.Score).IndexOf(p => p.Id == currentplayer.Id)

The former can be trivially translated to SQL, the latter would require a much more complex query. And I'd say the latter is also harder to understand.

svick on 23 Feb 2020

👍2

All 12 comments

Alternative current combo:

int i = -1;
var index = source.FirstOrDefault(item =>
    {
        i++;
        return predicate(item);
    }) == default(someType) ? -1 : index;

... which at least has the benefit of not needing the tuple allocations, although it's no more safe if the collection naturally contains the type default (although variations for that exist as well).

public static int IndexOf<T>(this IEnumerable<T> source, Func<T, bool> predicate, int start)

This feels a little concerning, however, since this implies you plan on repeated iteration over the same source. Since the general recommendation is to only iterate over a given IEnumerable _once_, and you're advised to capture it off into an List or similar, there exists a method already for that.

Clockwork-Muse on 13 Oct 2019

👍2

That would worked, but as you said only if default does not belong to the collection. That is why I added an anonymous object wrapper in my suggested implementation so FirstOrDefault returning null would guarantee that the item does not exist.

RezaJooyandeh on 15 Oct 2019

I'm not in favor of a version that takes an offset. It feels very prone to misuse.

scalablecory on 15 Oct 2019

👍1

I have rarely felt the need for the one that receives an offset so I agree @scalablecory. I merely suggested it as most of the similar apis have an offset version, otherwise I think the no offset version is the important one that I also keep coming across the need for.

RezaJooyandeh on 22 Oct 2019

What's the use case for this? I personally always use IList<T> when I need to work with indexes in order to prevent double iteration of the IEnumerable<T> as @Clockwork-Muse already mentioned.

I know it's a recommended practice to return IEnumerable<T> from functions, but is also a recommended practice to return a ReadOnlyCollection<T>.

thargol1 on 19 Feb 2020

👍1

The use case is when you do not have a IList<T>, but just IEnumerable<T>, if the object was a IList<T> we could have used IndexOf. One could covnert an enumerable to a list and then use IndexOf, but that would be unnecessary memory allocation, hence the need for this extension. @thargol1, you could see this as a parallel to First or FirstOrDefault extension methods, with the difference that we care about the index instead of the actual object.

RezaJooyandeh on 20 Feb 2020

@Reza1024 I think the question is: what are you going to do with that index?

If you have an IList<T>, you're probably going to use the index in a subsequent call to the indexer or RemoveAt. But if you have an IEnumerable<T>, you can't use those. You could use it with ElementAt, but that would require enumerating multiple times, and you probably don't want to do that.

svick on 20 Feb 2020

@svick @thargol1
IReadOnlyList<T> doesn't have IndexOf().

Tyrrrz on 20 Feb 2020

@svick Sometimes the goal is not necessarily using the element. It could be just reporting the placement. Sometimes we know the item exist, we just need to know where it is.

Also re-enumerating in not always bad. It depends on what you are trying to optimize for, in many scenarios it could be done differently or converted to a list/array for performance reasons, but it is not a always true. You might deal with a huge collection or even a db and it might be better at times to do multiple enumeration instead of loading everything.

Another scenario might be having a list and wanting to see based on a certain ordering, what is the index of an item, without actually rearranging the items:
collection.OrderBy(ordering).IndexOf(predicate); When you think about IQueryable objects and the fact that could be smarter in terms of optimizing the linq operations. It could become even more interesting. For example one could implement queryable.OrderBy(ordering).IndexOf(predicate) as counting how many objects are smaller than the one that satisfies the predicate instead of doing a sort and then looking for the index.

RezaJooyandeh on 23 Feb 2020

@Reza1024

You might deal with a huge collection

You're right that you don't want to convert a huge enumerable to a list. But you also don't want to re-enumerate it, you want to retrieve all the information you need on the first pass.

or even a db

If you're dealing with a database, you almost always have some kind of key and I can't imagine why you'd want to use index instead of that key.

IndexOf(predicate)

counting how many objects are smaller than the one that satisfies the predicate

I think the proposed approach is harder to understand and harder to translate to a query language (usually SQL) than what is currently possible.

Consider:

c# players.Count(p => p.Score < currentPlayer.Score) players.OrderBy(p => p.Score).IndexOf(p => p.Id == currentplayer.Id)

The former can be trivially translated to SQL, the latter would require a much more complex query. And I'd say the latter is also harder to understand.

svick on 23 Feb 2020

👍2

@svick Good suggestions. I changed it to FindIndex so it would be more clear.

RezaJooyandeh on 28 May 2020

As has already been mentioned, this addition would encourage double enumeration for arbitrary IEnumerables. In general then I think the Select/FirstOrDefault workaround that you originally posted is preferable, since it encourages flowing all required data in a single go.

Another issue I have with indices in LINQ is that they don't compose with chained enumerables. In most cases they should be best avoided. For the few corner cases the Select and Where index overloads probably work well enough. And for everything else you're probably better off writing a foreach loop/rolling your own extension method.

I'm going to close this issue. Feel free to reopen if you would like to continue this conversation.