As we look to support more efficient parsing/formatting/usage of ReadOnlySpan<char>
as slices of System.Strings, we're planning to add a variety of APIs across corefx that operate with ReadOnlySpan<char>
(https://github.com/dotnet/corefx/issues/21281). But for such APIs to be truly useful, and for ReadOnlySpan<char>
to be generally helpful as a string-like type, we need a set of extension methods on ReadOnlySpan<char>
that mirror the corresponding operations on string, e.g. Equals with various kinds of string comparisons, Trim, Replace, etc. We need to define, implement, test, and ship a core set of these (more can of course be added in the future), e.g.
Edit by @ahsonkhan - Updated APIs:
```C#
// All these do ordinal comparisons, and hence do not rely on StringComparison, and can live in System.Memory.dll
public static class MemoryExtensions
{
// If we decide to add overloads to Trim in the future that are non-ordinal and take StringComparison
// (similar to https://github.com/dotnet/corefx/issues/1244), they will be .NET Core specific.
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static bool IsWhiteSpace(this ReadOnlySpan<char> span);
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);
// Does exactly what string does today, i.e. ordinal, case-sensitive, culture-insensitive comparison
public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
// To me, these are complementary to the Trim APIs and hence we should add them.
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
}
// Live in CoreLib and only available on .NET Core, exposed from System.Memory.dll
// Atm, this class in corelib is called 'Span' and contains the .NET Core specific implementation of the extension methods
public static class MemoryExtensions
{
public static bool Equals(this ReadOnlySpan
public static int Compare(this ReadOnlySpan
public static bool Contains(this ReadOnlySpan
public static bool StartsWith(this ReadOnlySpan
public static bool EndsWith(this ReadOnlySpan
public static bool Replace(this ReadOnlySpan
public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}
<details>
<summary>Original Proposal</summary>
```C#
public static class SpanExtensions
{
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count); // this will need to allocate if any chars are removed from the middle
...
}
I'm particularly interested in IndexOf(char)
and LastIndexOf(char)
so far.
Some of these extensions may also be useful for types other than char:
```c#
public static bool Equals
where T : struct, IEquatable
public static bool StartsWith(this ReadOnlySpan
where T : struct, IEquatable
// and overloads with IEqualityComparer
```
I'm particularly interested in
IndexOf(char)
andLastIndexOf(char)
so far.
We have IndexOf
Some of these extensions may also be useful for types other than char:
Equals
StartsWith
We have StartsWith already. Also, is the semantics of Equals the same as SequenceEqual? If so, that exists as well.
https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L137
What about adding these as well?
C#
public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue); // this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
Updated proposal here: https://github.com/dotnet/corefx/issues/21395#issuecomment-357410425
```C#
public static class MemoryExtensions
{
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static bool Equals(this ReadOnlySpan
public static int Compare(this ReadOnlySpan
// this will need to allocate if any chars are removed from the middle
public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count);
public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
// this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue);
public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
}
```
Are these APIs going to be present in both the OOB and inbox version of System.Memory? What is the OOB implementation going to look like?
The ToUpper and ToLower APIs are also allocating. Should we rather change the signatures to be non-allocating?
Are these APIs going to be present in both the OOB and inbox version of System.Memory? What is the OOB implementation going to look like?
I was thinking, similar to APIs like StartsWith, SequenceEqual, etc. (https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/MemoryExtensions.cs#L281), that there will only be an OOB version. What would be the value of having inbox versions as well?
The ToUpper and ToLower APIs are also allocating. Should we rather change the signatures to be non-allocating?
Yes. I will work on the API signatures (adding destination span that is passed in as well), to try to get them to be non-allocating.
The problem with ToLower, ToUpper and other similar globalization APIs is that it is impossible to implement them efficiently in the OOB version. The implementation would always have to allocate the strings from the Span first. It make them kind of pointless to have in the OOB System.Memory. Have you though about having these APIs in System.Runtime or similar contract instead?
Wouldn't work for string (as immuatable); but for regular Span<char>
?
public static void ToUpper(this Span<char> span);
public static void ToLower(this Span<char> span);
@benaadams
string.ToUpper
depends on the current culture and there's no one-to-one mapping between characters. For example the one-character, lowercase ß
in German becomes the two-character SS
(groß
=> GROSS
)
any thoughts on supporting running Regex on ReadOnlySpan
I think APIs that don't specify otherwise (including the ToUpper/Lower) should be culture invariant. If We want to provide culture aware APIs, they should take an explicit CultureInfo argument.
@rafael-aero There's a separate issue for that: https://github.com/dotnet/corefx/issues/24145.
FWIW, @Tragetaschen
For example the one-character, lowercase ß in German becomes the two-character SS (groß => GROSS)
This is not correct anymore, there is a ẞ in Unicode for years now, U+1E9E. It's part of german Orthography since July 2017. https://en.wikipedia.org/wiki/Capital_ẞ
I Imagine there are still other examples though.
It would be great if we could also switch
on Spans of char
as if they were strings.
@SomeAnon42 Special-casing ReadOnlySpan<char>
the way string
is special-cased would be a language-level change. (https://github.com/dotnet/csharplang)
@Tragetaschen @benaadams CoreCLR already operates under the assumption that the case change produces the same length string. https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Globalization/TextInfo.Windows.cs#L26-L76
This may be a bug. If not, it would be consistent for span to do the same thing.
Updated proposal here: https://github.com/dotnet/corefx/issues/21395#issuecomment-357410425
Added:
```C#
public static class MemoryExtensions
{
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static bool Equals(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool Contains(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static int Compare(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
// this will need to allocate if any chars are removed from the middle
public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count);
public static ReadOnlySpan<char> Remove(this ReadOnlySpan<char> source, int startIndex, int count);
// non-allocating alternative:
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, Span<char> result);
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> result);
public static bool StartsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> source, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> value);
// this will need to allocate if the length of the chars being removed is not equal to the length of the chars being added
public static ReadOnlySpan<char> Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue);
// non-allocating alternative:
public static bool Replace(this Span<char> oldValue, ReadOnlySpan<char> newValue, Span<char> result);
public static ReadOnlySpan<char> ToUpper(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToUpperInvariant(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLower(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> ToLowerInvariant(this ReadOnlySpan<char> span);
// these will allocate
public static ReadOnlySpan<char> PadLeft(this ReadOnlySpan<char> span, int totalWidth);
public static ReadOnlySpan<char> PadLeft(this ReadOnlySpan<char> span, int paddingChar);
public static ReadOnlySpan<char> PadRight(this ReadOnlySpan<char> span, int totalWidth);
public static ReadOnlySpan<char> PadRight(this ReadOnlySpan<char> span, int paddingChar);
// non-allocating alternative
public static bool PadLeft(this ReadOnlySpan<char> span, int totalWidth, Span<char> result);
public static bool PadLeft(this ReadOnlySpan<char> span, int paddingChar, Span<char> result);
public static bool PadRight(this ReadOnlySpan<char> span, int totalWidth, Span<char> result);
public static bool PadRight(this ReadOnlySpan<char> span, int paddingChar, Span<char> result);
}
```
Are the signatures of Replace methods right? They seem to be missing an argument. Also, why is the allocating variant of Replace returning ReadOnlySpan? Shouldn't it rather return writeable span?
We don't think this is ready yet. @ahsonkhan, please redesign the APIs to avoid allocations by returning new spans. Instead, they should follow our TryXxx
pattern. But we have to keep in mind that many times folks will start with ReadOnlySpan<char>
from a string
where in-place isn't possible. It would help to have some sample code that shows how these APIs are used.
Update:
Most of the APIs that are designed to avoid allocations do not need to return bytesWritten since that will be known to the user before the call. For example, we know that Remove will write exactly source.Length - count
chars to the destination span. The exception to this is Replace, which is why it needs bytesWritten.
```C#
public static class MemoryExtensions
{
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) { throw null; }
public static bool IsWhiteSpace(this ReadOnlySpan<char> span) { throw null; }
// APIs designed to avoid allocations:
// do we need bytesWritten? It should be source.Length - count on success
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination) { throw null; }
public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten) { throw null; }
// do we need bytesWritten? It should be source.Length on success
public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination) { throw null; }
// do we need bytesWritten? It should be totalWidth on success or source.Length if totalWidth < source.Length
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination) { throw null; }
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination) { throw null; }
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination) { throw null; }
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, char paddingChar, Span<char> destination) { throw null; }
}
## Sample Usage
**Approach 1 to use PadLeft:**
```C#
ReadOnlySpan<char> span; // input
int totalWidth = 50; // known value, assume >= span.Length
Span<char> destination = new char[totalWidth]; //if set to less than totalWidth, PadLeft will return false
if (span.PadLeft(totalWidth, destination))
{
// destination has some pad characters (if there is extra space) followed by the entire input
// input is at the end of the destination buffer
}
else
{
// resize destination to be >= totalWidth
var result = span.PadLeft(totalWidth, destination);
Debug.Assert(result);
}
// only if destination.Length > totalWidth, it will be necessary to slice to get the precise segment.
destination = destination.Slice(0, totalWidth);
Approach 2 to use PadLeft:
```C#
ReadOnlySpan
int totalWidth = x; // unknown value, x >= 0, could be less than span.Length
Span
if (source.Length > totalWidth) return; // user can choose to do nothing, or call PadLeft which will just do a copy
if (destination.Length < totalWidth) destination = new char[totalWidth];
span.PadLeft(totalWidth, destination); // no need to slice, the destination is of the right size.
**Sample usage for Replace:**
```C#
string testStr = "abcdefghibclmbcp";
ReadOnlySpan<char> source = testStr.AsReadOnlySpan();
Span<char> destination = new char[source.Length];
// oldValue.Length >= newValue.Length
source.Replace("a".AsReadOnlySpan(), "z".AsReadOnlySpan(), destination, out int bytesWritten)
// not necessary here
destination = destination.Slice(0, bytesWritten);
// oldValue.Length < newValue.Length
while (!source.Replace("z".AsReadOnlySpan(), "ab".AsReadOnlySpan(), destination, out int bytesWritten))
{
// enlarge destination
// maximum size required: string.Length + (string.Length/oldValue.Length * (newValue.Length - oldValue.Length))
// maximum size required: 16 + (16/1 * (2 - 1)) = 16 + 16 = 32
// growing the destination to 32 will gaurantee success in this case
}
destination = destination.Slice(0, bytesWritten);
1) Should these methods be generic? If not, should we provide byte specific overloads?
2) When should we provide the bytesWritten overload? The APIs are currently designed to only provide it if it cannot be calculated, or already known, by the caller (in constant time) before the method is called.
```c#
// Approved, but needs to go somewhere else due to globalization
public static class MemoryExtensions
{
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static int Compare(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool IsWhiteSpace(this ReadOnlySpan<char> span);
}
// Needs more work
public static class MemoryExtensions
{
// APIs designed to avoid allocations:
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);
public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
// do we need bytesWritten? It should be source.Length on success
public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}
// We probably don't want these
public static class MemoryExtensions
{
public static bool PadLeft(this ReadOnlySpan
public static bool PadLeft(this ReadOnlySpan
public static bool PadRight(this ReadOnlySpan
public static bool PadRight(this ReadOnlySpan
}
```
We probably don't want these
PadLeft
Time for a NuGet span left-pad? Will become the most popular library and the world will come to depend on it - mwhahahaha!
needs to go somewhere else due to globalization
What about the Ordinal versions?
public static ReadOnlySpan<T> Trim(this ReadOnlySpan<T> span, T trimElement);
public static ReadOnlySpan<T> Trim(this ReadOnlySpan<T> span, ReadOnlySpan<T> trimElements);
// etc
// Needs more work
I understand that we need to consider StringComparison overload for Replace, but what additional work/design decisions are there for the Remove API? It doesn't have any concerns regarding comparison/globalization/etc.
```C#
// All these do ordinal comparisons, and hence do not rely on StringComparison, and can live in System.Memory.dll
public static class MemoryExtensions
{
// If we decide to add overloads to Trim in the future that are non-ordinal and take StringComparison
// (similar to https://github.com/dotnet/corefx/issues/1244), they will be .NET Core specific.
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static ReadOnlySpan
public static bool IsWhiteSpace(this ReadOnlySpan<char> span);
public static bool Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);
// Does exactly what string does today, i.e. ordinal, case-sensitive, culture-insensitive comparison
public static bool Replace(this ReadOnlySpan<char> source, ReadOnlySpan<char> oldValue, ReadOnlySpan<char> newValue, Span<char> destination, out int bytesWritten);
// To me, these are complementary to the Trim APIs and hence we should add them.
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static bool PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static bool PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
}
// Live in CoreLib and only available on .NET Core, exposed from System.Memory.dll
// Atm, this class in corelib is called 'Span' and contains the .NET Core specific implementation of the extension methods
public static class MemoryExtensions
{
public static bool Equals(this ReadOnlySpan
public static int Compare(this ReadOnlySpan
public static bool Contains(this ReadOnlySpan
public static bool StartsWith(this ReadOnlySpan
public static bool EndsWith(this ReadOnlySpan
public static bool Replace(this ReadOnlySpan
public static bool ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture = CultureInfo.CurrentCulture);
public static bool ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static bool ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture = CultureInfo.CurrentCulture);
public static bool ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}
```
Can I throw in another suggestion? I'd really like to see some ability to split a ReadOnlySpan<char>
. Obviously, you can't return a collection of Span
s directly, but isn't that what Memory<T>
is for?
// equivalent to the overloads of 'String.Split()'
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, char[] seperator, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, int count, StringSplitOptions options);
public static IReadOnlyList<ReadOnlyMemory<char>> Split(this ReadOnlySpan<char> span, string[] seperator, StringSplitOptions options);
The reason for choosing IReadOnlyList<T>
is to make the resulting collection indexable, just like string[]
. It would be nice if the implementation is ImmutableArray<T>
, but I'm not sure if that's a concern for distribution and such.
@Joe4evr, out of curiosity, do you have a scenario atm where these APIs would be useful? If so, can you please show the code sample?
I would replace the char[] overloads with ReadOnlySpan\
My scenario is taking a relatively big string of user input and then parsing that to populate a more complex object. So rather than take the whole string at once, I'd like to parse it in pieces at a time. It'd be pretty nice if this can be facilitated by the Span
/Memory<T>
APIs so that this code won't have to allocate an extra 30-40 tiny strings whenever it runs.
Admittedly, I only started on this particular case earlier today, mostly to experiment and find out how much I could get out of the Span APIs at this time.
Maybe it was a bit naive of me to expect a collection like I did, but I'd at least like to see some API to deal with this scenario a little easier, because I'll probably not be the only one looking to split a span up into smaller chunks like this.
Splitting support would be good, but I don't think it would look like the proposed methods; as @ahsonkhan points out, that would result in a lot of allocation (including needing to copy the whole input string to the heap, since you can't store the span into a returned interface implementation).
I would instead expect a design more like an iterator implemented as a ref struct, e.g.
C#
public ref struct CharSpanSplitter
{
public CharSpanSplitter(ReadOnlySpan<char> value, char separator, StringSplitOptions options);
public bool TryMoveNext(out ReadOnlySpan<char> result);
}
Can I throw in another suggestion? I'd really like to see some ability to split a
ReadOnlySpan<char>
.
I would instead expect a design more like an iterator implemented as a ref struct
@Joe4evr, would it be fine for me to split the conversation related to the split API into a new issue, especially since it would require additional design?
Edit: Created https://github.com/dotnet/corefx/issues/26528
bool Equals(this ReadOnlySpan
span, ReadOnlySpan value, StringComparison comparison);
StringComparison
is just a conveniece wrapper over System.Globalization.CompareInfo. Do you also need new overloads that take Span on System.Globalization.CompareInfo
to make this work? Same for all other stuff that deals with globalization.
public static bool ToUpper(this ReadOnlySpan
source, Span destination, CultureInfo culture = CultureInfo.CurrentCulture);
CultureInfo.CurrentCulture
cannot use a property as the default value. Default value has to be a constant.
@stephentoub Fair enough. Glad I at least got the ball rolling on the matter.
CultureInfo.CurrentCulture
cannot use a property as the default value. Default value has to be a constant.
Added the proper overload without optional arguments (updated the first post).
StringComparison
is just a conveniece wrapper over System.Globalization.CompareInfo.
How so? StringComparison is just an enum with culture specific options and intersects with some of CompareOptions.
Do you also need new overloads that take Span on
System.Globalization.CompareInfo
to make this work?
~I don't know what you mean by this. Are you asking if we would need additional overloads that take CompareInfo instead of StringComparison as arguments?~
Edit: I think I get what you mean. We may need to expose the compare APIs on CompareInfo that take Span, publicly. This is used by Equals/Compare (and can probably remain internal):
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Globalization/CompareInfo.cs#L393
We would need to add additional span-based overloads for IsPrefix/IsSuffix/IndexOf.
This is used by Equals/Compare (and can probably remain internal
It would be odd for the overloads to remain internal. We have made the Span overloads public in all other similar cases - Stream, BinaryReader/Writer, ... .
I would instead expect a design more like an iterator implemented as a ref struct
@stephentoub @Joe4evr Why stop with a "string splitter"? I'd love to see something like Google Guava's CharMatcher
, which could be much more useful (might be a case for a separate proposal, though.)
StringComparison
or CultureInfo
implicit. We might get feedback, but we think forcing the developer to make a decision is less prone to errors than selecting a default.Replace
as we don't think the API interaction / pattern works out. It need more thought/design.public static class MemoryExtensions
{
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> Trim(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, char trimChar);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> trimChars);
public static bool IsWhiteSpace(this ReadOnlySpan<char> span);
public static void Remove(this ReadOnlySpan<char> source, int startIndex, int count, Span<char> destination);
public static void PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static void PadLeft(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
public static void PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination);
public static void PadRight(this ReadOnlySpan<char> source, int totalWidth, Span<char> destination, char paddingChar);
// Those need access to globalization APIs. We'll also expose them from
// the .NET Framework OOB (slow span). They will try to extract the string
// from the underlying span (because slow span stores it) -- or -- allocate
// a new string. This avoids bifurcating the API surface.
public static bool Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static int CompareTo(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison);
public static void ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static void ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static void ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static void ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
}
one last note regarding
```C#
public static int IndexOf(this ReadOnlySpan
I want to have this API out the matched length of the searched characters as it is not always equal to the length of the value span.
Example: imagine the source span have the characters ```A ̀ ``` and we are searching for ```Á``` with culture aware operation. IndexOf will return the index of the match and will return the matched length is 2 and not just 1.
so I suggest the API will be
```C#
public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison, out int matchedLength);
The benefit having the outmatched length is:
Given that the proposed API can result in correctness issues, I would agree that we need the out int matchedLength
parameter for the non-ordinal IndexOf.
Change:
```C#
public static int IndexOf(this ReadOnlySpan
StringComparison comparison);
To:
```C#
public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value,
StringComparison comparison, out int matchedLength);
I want to have this API out the matched length of the searched characters as it is not always equal to the length of the value span.
Then maybe we want a Ordinal overload without this out parameter?
Example: imagine the source span have the characters A ̀ and we are searching for Á with culture aware operation. IndexOf will return the index of the match and will return the matched length is 2 and not just 1.
Could we add the example to docs (remarks) of the method?
Then maybe we want an Ordinal overload without this out parameter?
I think matchedLength would be useful even in the ordinal case too. if you write some code to do a repeated search inside the span would just work for any case (either ordinal or non-ordinal). if we have the ordinal overload, the user has to use Span.Length to perform the subsequent operation.
I don't have a strong feeling though about that :-)
here is some code sample for IndexOf
```C#
// string has A character followed by the COMBINING GRAVE ACCENT ̀ which is equivalent to À (0x00C0)
string source = "A\u0300 Some Other text A\u0300";
string value = "\u00C0"; // À
ReadOnlySpan<char> sourceSpan = source.AsReadOnlySpan();
ReadOnlySpan<char> valueSpan = value.AsReadOnlySpan();
while (true)
{
int index = sourceSpan.IndexOf(valueSpan, StringComparison.InvariantCulture, out int matachedLength);
if (index < 0 || index + matachedLength >= sourceSpan.Length)
{
break; // done searching
}
sourceSpan = sourceSpan.Slice(index + matachedLength); // Slice for next search
}
```
Then maybe we want a Ordinal overload without this out parameter?
We already have that, essentially, with IndexOf\
C#
public static int IndexOf<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>
We already have that, essentially, with IndexOf
Ah, makes sense.
StringComparison
parameters should be comparisonType
, not comparison
, to be consistent with the existing naming used everywhere else.source
while others use span
as the name of the this
parameter. It'd be nice to be consistent with the naming.Some extension methods use
source
while others usespan
as the name of thethis
parameter. It'd be nice to be consistent with the naming.
The APIs where this
is source
have another span parameter as destination
. The other ones don't. That is the reason for the inconsistency. If we feel strongly about the parameter renaming, I can do it in all methods. We should probably address this together with other inconsistencies like https://github.com/dotnet/corefx/issues/26894 (and https://github.com/dotnet/corefx/pull/26694#discussion_r165419775).
Based on the discussion here, we want to make an adjustment to the following APIs to return an int (number of characters written to the destination). Also, instead of throwing when the destination is too small, we would return -1 isntead.
Changing return type from void to int.
C#
public static int ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static int ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);
Done
Thanks! Nice addition that nicely rounds up what can be done with Spans.
Most helpful comment
Splitting support would be good, but I don't think it would look like the proposed methods; as @ahsonkhan points out, that would result in a lot of allocation (including needing to copy the whole input string to the heap, since you can't store the span into a returned interface implementation).
I would instead expect a design more like an iterator implemented as a ref struct, e.g.
C# public ref struct CharSpanSplitter { public CharSpanSplitter(ReadOnlySpan<char> value, char separator, StringSplitOptions options); public bool TryMoveNext(out ReadOnlySpan<char> result); }