Runtime: Add 'split' support for ReadOnlySpan<char> similar to string

Created on 23 Jan 2018  Â·  89Comments  Â·  Source: dotnet/runtime

Latest Update:

New proposal to re-review: Enumerator returns Range instead of ReadOnlySpan<T>
```C#
partial class MemoryExtensions
{
public static SpanSplitEnumerator Split(this ReadOnlySpan span, T separator,
StringSplitOptions options = StringSplitOptions.None) where T : IEquatable {}

public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
    public SpanSplitEnumerator<T> GetEnumerator() { return this;  }
    public bool MoveNext();
    public Range Current { get; }
}

}

**Previously approved API Proposal**
```C#
public static SpanSplitEnumerator<T> Split(this ReadOnlySpan<T> span, T seperator,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
    public SpanSplitEnumerator GetEnumerator() { return this;  }
    public bool MoveNext();
    public ReadOnlySpan<T> Current { get; }
}

Split off from https://github.com/dotnet/corefx/issues/21395#issuecomment-359342832

_From @Joe4evr on January 21, 2018 23:16_

Can I throw in another suggestion? I'd really like to see some ability to split a ReadOnlySpan<char>. Obviously, you can't return a collection of Spans directly, but isn't that what Memory<T> is for?

// equivalent to the overloads of 'String.Split()'
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, StringSplitOptions options);

> The reason for choosing IReadOnlyList<T> is to make the resulting collection indexable, just like string[]. It would be nice if the implementation is ImmutableArray<T>, but I'm not sure if that's a concern for distribution and such.

_From @ahsonkhan on January 22, 2018 01:36_

@Joe4evr, out of curiosity, do you have a scenario atm where these APIs would be useful? If so, can you please show the code sample?

> I would replace the char[] overloads with ReadOnlySpan\

_From @Joe4evr on January 22, 2018 08:35_

My scenario is taking a relatively big string of user input and then parsing that to populate a more complex object. So rather than take the whole string at once, I'd like to parse it in pieces at a time. It'd be pretty nice if this can be facilitated by the Span/Memory<T> APIs so that this code won't have to allocate an extra 30-40 tiny strings whenever it runs.

Admittedly, I only started on this particular case earlier today, mostly to experiment and find out how much I could get out of the Span APIs at this time.

> Maybe it was a bit naive of me to expect a collection like I did, but I'd at least like to see some API to deal with this scenario a little easier, because I'll probably not be the only one looking to split a span up into smaller chunks like this.

_From @stephentoub on January 22, 2018 08:41_

Splitting support would be good, but I don't think it would look like the proposed methods; as @ahsonkhan points out, that would result in a lot of allocation (including needing to copy the whole input string to the heap, since you can't store the span into a returned interface implementation).

I would instead expect a design more like an iterator implemented as a ref struct, e.g.
C# public ref struct CharSpanSplitter { public CharSpanSplitter(ReadOnlySpan<char> value, char separator, StringSplitOptions options); public bool TryMoveNext(out ReadOnlySpan<char> result); }

cc @KrzysztofCwalina, @stephentoub, @Joe4evr

api-needs-work area-System.Memory utf8-impact

Most helpful comment

A slightly different take on this. Considering that Range is now available, and the non-zero cost of Slice on potentially-many results, what if the enumerable returned Ranges instead of sliced RoS. In other words, a list of delineations. The consumer could then decide when/if to Slice(range)

foreach (var range in span.Split(','))
{
    var x = span.Slice(range);
}

Perhaps that's too abstract for a public api.

All 89 comments

Presumed usage would be like this?

var splitter = span.Split(',', StringSplitOptions.RemoveEmptyEntries);
while (splitter.TryMoveNext(out var slice))
{
    //....
}

Yeah, looks good.

Would it be possible to implement the enumerable pattern, even if it's not possible to implement IEnumerable<T>?

I think the interface would look something like:

```c#
public static SpanSplitEnumerable Split(this ReadOnlySpan span, char seperator);

public ref struct SpanSplitEnumerable
{
public SpanSplitEnumerator GetEnumerator();
}

public ref struct SpanSplitEnumerator
{
public bool MoveNext();
public ReadOnlySpan Current { get; }
}

That way, the usage could be:

```c#
foreach (var slice in span.Split(','))
{
    …
}

If indexing is important, something like ref struct SpanSplitList would probably work, but I think it couldn't guarantee zero allocations.

Sorry, didn't see this issue before posting https://github.com/dotnet/corefx/issues/21395#issuecomment-359802015.

As mentioned, I'd love to see something like Google Guava's CharMatcher for this. It could be useful for other scenarios as well.

It would be great if the split worked also with Span and returned ReadOnlyMemory>, ReadOnlySpan>

It would be great if the split worked also with Span and returned ReadOnlyMemory>, ReadOnlySpan>

Clarifying the comment so the generic types show up:
It would be great if the split worked also with Span<byte> and returned ReadOnlyMemory<ReadOnlyMemory<T>>, ReadOnlySpan<ReadOnlyMemory<T>>

Simple implementation for SpanSplitEnumerable<T> Split<T>(this ReadOnlySpan<T> span, T separator) interface where the return type is foreachable. Unit tests included.

https://gist.github.com/LordJZ/92b7decebe52178a445a0b82f63e585a

The sentinel is a "clever trick" to avoid having a boolean field, can be replaced with a backing field.

We needed features like this on top of ASP.NET's StringTokenizer type in the aspnet/WebHooks repo. See that repo's TrimmingTokenizer and have seen some interest (aspnet/WebHooks#324) in making that more broadly available.

It would be great to point customers to something in the BCL instead of (say) adding TrimmingTokenizer features to StringTokenizer. What is the expected timeframe for the "Future" milestone that would include a dotnet/corefx#26528 fix?

/cc @davidfowl

@ahsonkhan if this proposal makes sense to you, perhaps you could help shepherd it to review, as it seems there would be volunteers to implement it?

cc @JeremyKuhne who has been doing thinking about low allocation string operations.

StringBuilder added an enumerator for the next version that we should consider when deciding on a pattern for enumerating spans. https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Text/StringBuilder.cs#L587

if this proposal makes sense to you, perhaps you could help shepherd it to review, as it seems there would be volunteers to implement it?

Let me make sure the API shape is clear. Incorporating the recent feedback (to get foreach support, make it generic), here is the API:

```C#
public static SpanSplitEnumerable Split(this ReadOnlySpan span, T seperator,
StringSplitOptions options = StringSplitOptions.None) where T : IEquatable {}

public ref struct SpanSplitEnumerable where T : IEquatable
{
public SpanSplitEnumerator GetEnumerator();
}

public ref struct SpanSplitEnumerator where T : IEquatable
{
public bool MoveNext();
public ReadOnlySpan Current { get; }
}
```

@KrzysztofCwalina this is the enumerating spans issue we discussed yesterday.

Does StringSplitOptions really make sense for a generic T?

Does StringSplitOptions really make sense for a generic T?

I suppose a SpanSplitOptions which would mimic StringSplitOptions makes more sense here.

Should the enumerator and the enumerable be combined? i.e. what's the reason to have two types?

@KrzysztofCwalina I think it makes the API cleaner and easier to understand. The foreach pattern requires an enumerable and an enumerator, and that's exactly what the proposed API provides.

It there any advantage in combining them, apart from decreasing the size of the API surface?

Video

We don't think we want these APIs:

  1. They allocate
  2. They aren't as convenient as String.Split

If you care about allocations, then you want a different API. And if you don't care about allocations, well, then you can just use String.Split.

So what API would we like to see? It could be a struct-based enumerator that allows the consumer to foreach the individual spans without allocations. Alternatively (or additionally) we could have a method that allows the consumer to pass in a buffer with the locations, which, for the most part, could be stackallocated.

We don't think we want these APIs:

1. They allocate

I readily concur with this. If I don't care about allocations, I'll simply be using string.Split. If I care enough to want to avoid allocating substrings, then obviously I would like to avoid _any_ allocations whatsoever.

Suggest ReadOnlySpan<T>.Split returns ReadOnlySpan<ReadOnlySpan<T>> on the stack so there is no more allocation.

If I want to use foreach, which allocations, then I can still enumerate over the ReadOnlySpan<ReadOnlySpan<T>>. Otherwise, I'll use a simple for loop to have zero allocation.

Can you use ReadOnlySpan<T> as a generic argument?

@schungx

Suggest ReadOnlySpan<T>.Split returns ReadOnlySpan<ReadOnlySpan<T>> on the stack so there is no more allocation.

Split can't return a Span that was stack allocated by itself. It could take that Span as an additional parameter, but then you have to somehow handle the case where that Span is not big enough. I believe this is what @terrajobst meant by "we could have a method that allows the consumer to pass in a buffer with the locations".

@terrajobst, the API that @svick proposed above does not seem to allocate. Am I missing something? Or are you saying that we want both convenience and no allocations?

@svick, I wonder how important it is for this enumerator to be easy to understand. The classic enumerator pattern has two types for many reasons that are less and less applicable once you deal with by ref structs that cannot implement enumerator interfaces and get copied when passed around. I do agree we should discuss pros and cons and maybe indeed it's better to have two types, but I wanted to open the discussion as it seems such an overkill to add two by ref types just so we can split.

@KrzysztofCwalina I agree that my argument of "this version of the API is slightly easier to understand" is fairly weak in this case. But in my opinion, the argument of "this version of the API is slightly smaller" is even weaker, which is why I prefer to have two types.

But ultimately it doesn't make that much of a difference, as long as I'll be able to have foreach (var slice in span.Split(',')) without allocations, I'll be happy.

@svick I'm suspect a normal foreach will always allocate because the current state must be kept somewhere, even if you somehow come up with value-based iterators...

And what is the _purpose_ of this again? That's to prevent allocations in parsing code in tight loops, where all we're doing is manipulating streams of text without ever changing them.

@schungx There's nothing really new to come up with. For example, foreach over a List<T> already does not allocate on the heap, because the current state is kept in the enumerator struct on the stack. A very similar approach can be used here.

  1. They allocate

It could be a struct-based enumerator that allows the consumer to foreach the individual spans without allocations.

Should the enumerator and the enumerable be combined? i.e. what's the reason to have two types?

The APIs, as suggested here, don't allocate and are essentially what was recommended. Maybe it was missed since it wasn't at the top. Will copy it over to the top post.

Making it into a single type:
```C#
public static SpanSplitEnumerator Split(this ReadOnlySpan span, T seperator,
StringSplitOptions options = StringSplitOptions.None) where T : IEquatable {}

public ref struct SpanSplitEnumerator where T : IEquatable
{
public SpanSplitEnumerator GetEnumerator() { return this; }
public bool MoveNext();
public ReadOnlySpan Current { get; }
}
```

  • We don't think it should be constrained to just char, so leaving it as T is fine.
  • While we don't want to expose an overload that works on Span<T>, we should make sure we can

    • We should rename SpanSplitEnumerator to ReadOnlySpanSplitEnumerator to

  • The enumerator should probably live on the same type as the method, which would be MemoryExtensions
  • Using StringSplitOptions might feel odd, but (1) we've never extended it and (2) and the operations are about how the split is performed, which applies to this API to
  • If we ever need to expose char[] (matches any) we'd add a new method called SplitAny so that we can also have the equivalent of string (matches the sequence)
  • We might want to provide a similar method for memories, but let's wait until that's required

Are we going to have an overload that takes ReadOnlySpan as the delimiter as well?

I tried implementing the APIs above, and additionally a two more split methods in my personal repository. The types and methods themselves can be found in ConsoleApp4 (lol) folder, and the tests can be found in SplitTests folder.

public static ReadOnlySpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> span, T delimiter,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}
public static ReadOnlySpanSplitBySequenceEnumerator<T> Split<T>(this ReadOnlySpan<T> span,
    ReadOnlySpan<T> delimiter, StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}
public static ReadOnlySpanSplitByAnyEnumerator<T> SplitByAny<T>(this ReadOnlySpan<T> span,
    ReadOnlySpan<T> delimiters, StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

The first one replaces string.Split(char), the second one replaces string.Split(string), and the third replaces string.Split(char[]). The last two are not really included in the original discussion (and would take a bit of time to go through approval process even if it does), but I personally think it is worth including to have the complementary alternatives as well.

It's not a great code perhaps, but I'd love to open a PR for this issue (with the code above). Although the last two methods are not approved, at least I can open one for the regular T version.

A slightly different take on this. Considering that Range is now available, and the non-zero cost of Slice on potentially-many results, what if the enumerable returned Ranges instead of sliced RoS. In other words, a list of delineations. The consumer could then decide when/if to Slice(range)

foreach (var range in span.Split(','))
{
    var x = span.Slice(range);
}

Perhaps that's too abstract for a public api.

Did some experimenting with this as well.
Not sure if it's an optimal solution though.
https://github.com/bbartels/coreclr/blob/master/src/System.Private.CoreLib/shared/System/MemoryExtensions.Split.cs

@grant-d I like that API

That seems reasonable to me.

Only problem is that it would conflict with the approved api, as only the return type differs.

Correct; it would need to go back into api triage

Returning ranges has some real advantages, namely we can operate on a ReadOnlySpan<T> but then apply the results to other types, in particular Span<T>, Memory<T>, and ReadOnlyMemory<T>, e.g.
C# foreach (Range r in memory.Span.Split(':')) Process(memory[r]);

Correct; it would need to go back into api triage

To clarify, can you share the proposed range based API shape and then we can reconcile it/review it?
I can update the original post if you provide the precise shape, and we can re-review.

I suppose it's just substituting Range for ReadOnlySpan\ I don't think overloading Slice would work as stephen suggested (Probably just a typo). A ReadOnlySpan.Slice call would be ambiguous, as it already has an overload that takes an int.

So something like:

public static SpanSplitEnumerator<T> Split(this ReadOnlySpan<T> span, T seperator,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
    public SpanSplitEnumerator<T> GetEnumerator() { return this;  }
    public bool MoveNext();
    public Range Current { get; }
}

I don't think overloading Slice would work as stephen suggested (Probably just a typo)

I didn't suggest overloading Slice. If a type has a Slice(int, int) method, the C# 8 compiler now supports passing a Range as an indexer on the type and the compiler expands it into a call to the Slice(int, int) method.

I didn't suggest overloading Slice.

memory.Span.Slice(':')

I believe this is what's causing the confusion.

Returning ranges has some real advantages, namely we can operate on a ReadOnlySpan<T> but then apply the results to other types, in particular Span<T>, Memory<T>, and ReadOnlyMemory<T>, e.g.

foreach (Range r in memory.Span.Slice(':')) Process(memory[r]);

I meant in the foreach, you used memory.Span.Slice(':').

Ah, sorry, I meant Split, not Slice.

Maybe it would make sense to add another enumerator, which would allow for splitting by a sequence.
Just like string.Split("??") is possible.

public static SpanSplitEnumerator<T> Split(this ReadOnlySpan<T> span, T seperator,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

public static SpanSplitSequenceEnumerator<T> Split(this ReadOnlySpan<T> span, ReadOnlySpan<T> seperators,
    StringSplitOptions options = StringSplitOptions.None) where T : IEquatable<T> {}

public ref struct SpanSplitEnumerator<T> where T : IEquatable<T>
{
    public SpanSplitEnumerator<T> GetEnumerator() { return this;  }
    public bool MoveNext();
    public Range Current { get; }
}

public ref struct SpanSplitSequenceEnumerator<T> where T : IEquatable<T>
{
    public SpanSplitSequenceEnumerator<T> GetEnumerator() { return this;  }
    public bool MoveNext();
    public Range Current { get; }
}

Could also be merged into one enumerator that just holds an ReadOnlySpan for separators. Would be a little more space efficient when T is a large struct (> 12 bytes on 64 bit systems), but also a little less space efficient on smaller structures (which I reckon is the most common use-case with char). Maybe something to debate in triage.

A split api that returns ranges is pure genius @grant-d. That would be a really nice api to aim for.

Sad to see we haven't made progress on this!

Can I suggest allowing to pass a delegate bool IsSeparatorPredicate(ReadOnlySpan<T> span)?

This will enable splitting on surrogate pairs etc.

@scalablecory, how about Rune? (and ReadOnlySpan of Rune)

Rune doesn't solve grapheme clusters, and dealing with normalization would become unwieldy. A predicate would handle all cases.

Isn't one of the main points within the Span ecosystem to be allocation free? I feel like bringing in a delegate would just go against that mentality. Wouldn't allowing to pass a ReadOnlySpan as a separator solve the graphmere cluster problem just as well, without an allocation?

Can I suggest allowing to pass a delegate bool IsSeparatorPredicate(ReadOnlySpan span)?

And turn split into an O(N^2) algorithm?

I think this should be kept relatively simple, making the common case easy.

Can I suggest allowing to pass a delegate bool IsSeparatorPredicate(ReadOnlySpan span)?

And turn split into an O(N^2) algorithm?

I think this should be kept relatively simple, making the common case easy.

I don't see how this would be any different than String.Split(string[])

I don't see how this would be any different than String.Split(string[])

Because string.Split(string[]) takes an array of the exact strings to use, which means a) the implementation knows exactly how many characters need to be compared, and b) regardless string matching algorithms exist to find substrings in O(N+M) time rather than the naive O(N*M) time. Your proposed IsSeparatePredicate that takes a span would presumably need to be invoked at every character and for every length after that character until the end of the string to determine whether it represents a separator.

I don't see how this would be any different than String.Split(string[])

Because string.Split(string[]) takes an array of the exact strings to use, which means a) the implementation knows exactly how many characters need to be compared, and b) regardless string matching algorithms exist to find substrings in O(N+M) time rather than the naive O(N*M) time. Your proposed IsSeparatePredicate that takes a span would presumably need to be invoked at every character and for every length after that character until the end of the string to determine whether it represents a separator.

Oh, I see the confusion; my fault. I should not make suggestions right before going to sleep!

Imagine something more like: bool IsSeparatorPredicate(ReadOnlySpan span, out int length), which would be called only once per character. it gets passed the full remaining span, and returns the length of the separator to skip in the out param.

Imagine something more like: bool IsSeparatorPredicate(ReadOnlySpan span, out int length), which would be called only once per character. it gets passed the full remaining span, and returns the length of the separator to skip in the out param.

Don't think I would be opposed to this idea, but I definitely think there should be a separate (simpler) type that would just handle the cases of splitting on a single character or splitting on a sequence.

there should be a separate (simpler) type

And cheaper. Adding O(N) delegate invocations in an API whose purpose is to reduce overheads is not the right choice. We should start with the 99% case, and only add additional complexity/cost when and where there's a real need.

Agreed, not proposing to get rid of the simple overload :)

Video

Based on the conversation, we believe we would prefer something more scoped:

```C#
partial class MemoryExtensions
{
public static SpanSplitEnumerator Split(this ReadOnlySpan span);
public static SpanSplitEnumerator Split(this ReadOnlySpan span, char separator);
public static SpanSplitEnumerator Split(this ReadOnlySpan span, string separator);

// This API seems possible, but we can defer until we see scenarios
// public static SpanSplitEnumerator<char> SplitAny(this ReadOnlySpan<char> span, ReadOnlySpan<char> separators);

// This API seems possible, but we should defer until we see scenarios. It wouldn't be good make this work for UTF8, for example.
// public static SpanSplitEnumerator<T> Split<T>(this ReadOnlySpan<T> span, T separator);

}
```

The enumerator returns ranges so you cans split a memory via this API and use the span to slice the memory and to unify span and read-only spans. It's unclear whether we need different enumerator types for different Split behaviors, but we don't expect people to exchange those, which is why they are currently proposed to be nested types too.

I should have time in the next couple of weeks to tackle this

Having this for bytes would be useful as well.....

Unless I am missing something, with the currently proposed SpanSplitEnumerator:

public static SpanSplitEnumerator<char> Split(this ReadOnlySpan<char> span, string separator);

would not work with the other overloads. At least not without an allocation.

If the SpanSplitEnumerator holds a single T as a separator, one could not split on string, and if the type holds a ReadOnlySpan\ Which in this case afaik would only work by allocating an array. A stack allocation wouldn't be possible, as the Enumerator is returned from the Split method.

One solution could be to have a separate type for splitting on sequences.

Threw together a quick implementation of this to showcase this issue:
https://gist.github.com/bbartels/87c7daae28d4905c60ae77724a401b20

@bbartels we also decided that the enumerator type may differ depending on overload as needed.

@davidfowl we're hoping to defer anything regarding UTF-8 until we iron out Char8, so we don't have duplicate byte and Char8 APIs.

If the SpanSplitEnumerator holds a single T as a separator, one could not split on string, and if the type holds a ReadOnlySpan then you would somehow need to pass the single char separator to the type.

It could hold both and just use one of them, with a bool or enum value dictating which (or neither) to use.

Not really sure what the corefx team would value more, fewer types overall and less code duplication or a slimmer memory footprint on each individual type.
For now I went with two different types based on splitting on a single length separator and a multi length one.
But could change it to what @stephentoub suggested if deemed better!

Added some tests to verify correctness of my proposed implementation. (Tests were adapted from the official string.split tests)
https://gist.github.com/bbartels/87c7daae28d4905c60ae77724a401b20

May I suggest another overload for enumerating backwards through the sequence?

Just came across a scenario today where it would have required me to check equality of a splitted sequence starting at the last entry. This way I wouldn't have to store the result somewhere before I iterate through them in reverse.
Something like:

partial class MemoryExtensions
{
    public static SpanSplitEnumerator<char> Split(this ReadOnlySpan<char> span, bool reverse = false);
    public static SpanSplitEnumerator<char> Split(this ReadOnlySpan<char> span, char separator, bool reverse = false);
    public static SpanSplitEnumerator<char> Split(this ReadOnlySpan<char> span, string separator, bool reverse = false);
}

Or you could add a Reverse method to SpanSplitEnumerator<char> that returns another enumerator that goes in reverse?

I like the idea, but I feel like it might be a little ambiguous. As in, does it return a completely new enumerator that has no previous state, or assuming you've already called MoveNext() a couple times does the returned enumerator preserve that state and just goes backwards from there.

@bbartels That could be solved by splitting SpanSplitEnumerator into enumerable and enumerator: only the enumerable would have Reverse(), so there is no confusion about what calling Reverse() in the middle of enumerating does, because it's not possible. (Note that I've unsuccessfully argued for that split before, for a different reason.)

I'd actually like a Reverse() method on the SpanSplitEnumerator that doesn't return anything, just flips a switch to change direction. This way you could use it as a lookahead: If a certain condition is true for the current element, you could go back and do some extra processing on the previous.

Would 'require' the aforementioned overloads for the Split methods, as you'd otherwise have to manually get the enumerator and call Reverse before you could foreach over it.

@bbartels you can do lookahead and reset with current spec already; just save a copy of the enumerator and restore if you need to back up. granted, you can't easily go back a variable number of times...

@grant-d I assigned this to you. thanks!

_[Edited]_
Question about desired behavior.
Compare the existing string idiom (assume we _keep_ empty entries):

// 012345
  "|ab|c|".Split('|'); // string[4] { "", "ab", "c", --> "" <-- }

Note the final "" on the _rhs_ boundary.

If we use a comparable Range enumerator instead (per this issue), then the results would _ostensibly_ be:

Range(0, 0) // <empty> (ok)
Range(1, 3) // ab (ok)
Range(4, 5) // c (ok)
Range(6, 6) // <empty> (oops!)

The final value is questionable since start=6 would be IndexOutOfRange.
So we have a different semantic to the existing idiom.

  • Do we emit a final value regardless, even though it represents an invalid index?
  • Or do we ignore it.
  • (My vote) Or maybe it should be Range(5, 5) which sorta kinda parallels Range(0, 0). Though it breaks the pattern of next.start = previous.end + 1, eg (1, _3_) -> (_4_, 5)

@terrajobst?

I think Range(6,6) is more appropriate. There shouldn't be any IndexOutOfRangeException, because anything using the range shouldn't read >= the exclusive end index.

c# "|ab|c|"[6..6].ToString() == ""

Though in the case of Range(6, 6), I think the first element should then be Range(-1, -1) to retain semantic equivalence.

I am busy implementing and noticed there's this overload:
Split(this ReadOnlySpan<char> span, string separator);,
but no corresponding overload with:
Split(this ReadOnlySpan<char> span, string separator, StringComparison comparisonType);

Is that a spec miss, should I add it?
Or was it intentional, in which case is the behavior Ordinal?

Please un-assign me, I am not going to get to this any time soon. Thanks

Done. Thanks.

I already have a working implementation that conforms with https://github.com/dotnet/corefx/issues/26528#issuecomment-523152939: https://gist.github.com/bbartels/87c7daae28d4905c60ae77724a401b20.
I could submit a PR once it is migrated to dotnet/runtime?

@bbartels please do :)

I would love to see these extensions for ReadOnlySpan with any generic equtable type, ignore-empty and definitely for ReadOnlySequence - in my experience when working with pipes sequences are more useful than spans.

@bbartels , dotnet/runtime is open for business :)

@danmosemsft I really should have commented here, I already submitted a PR a couple weeks ago!
See https://github.com/dotnet/runtime/pull/295

Could use some feedback over there regarding the bullet points I noted!

Ran into a good use case for this just today. Would love to see this finished up.

One minor nitpick though: doesn't having a single type acting as both enumerable and enumerator create a less-than-obvious inconsistency with the rest of .NET?

var split = "a,b,c".Split(',');

foreach (var letter in split) 
{
    // Hit three times
} 

foreach (var letter in split) 
{
    // Hit three times 
}

var splitSpan = "a,b,c".AsSpan().Split(',');

foreach (var letter in splitSpan) 
{
    // Hit three times
} 

foreach (var letter in splitSpan) 
{
    // Never hit because splitSpan was already consumed by first foreach
}

@jonathanmarston

One minor nitpick though: doesn't having a single type acting as both enumerable and enumerator create a less-than-obvious inconsistency with the rest of .NET?

The compiler creates a copy of the struct enumerator before enumerating (SharpLab Example).

The compiler creates a copy of the struct enumerator before enumerating
Ah yes. I see it now. Thanks!

is this already available? 🥺

@wstaelens Still waiting for the PR (https://github.com/dotnet/runtime/pull/295) to be reviewed/approved, but feel free to copy the implementation into your codebase if you need it early.

Reopening since we'll need to send this through review again.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jzabroski picture jzabroski  Â·  3Comments

GitAntoinee picture GitAntoinee  Â·  3Comments

yahorsi picture yahorsi  Â·  3Comments

jamesqo picture jamesqo  Â·  3Comments

btecu picture btecu  Â·  3Comments