C#
namespace System.Globalization
{
public partial class CompareInfo : IDeserializationCallback
{
public int IndexOf(ReadOnlySpan<char> source, ReadOnlySpan<char> value, CompareOptions options = CompareOptions.None);
public int LastIndexOf(ReadOnlySpan<char> source, ReadOnlySpan<char> value, CompareOptions options = CompareOptions.None);
public int Compare(ReadOnlySpan<char> source, ReadOnlySpan<char> value, CompareOptions options = CompareOptions.None);
public bool IsSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
public bool IsPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
public static bool IsSortable(ReadOnlySpan<char> text);
}
}
While doing some Span optimizations in Regex I noticed that the current comparison extension methods on ROS don't support comparison with a specific culture: MemoryExtensions.CompareTo(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType).
With string I can do that in different ways either by string.Compare(string, string, CultureInfo, CompareOptions); or by StringComparer.Create(CultureInfo, bool ignoreCase).Compare(string, string).
cc @tarekgh @ahsonkhan
Is there a specific reason that we excluded a CultureInfo overload in the extension method?
I don't think there was any specific reason to not expose this one. you cannot use the string for this comparison or your code is already using spans?
by the way, I believe we didn't expose any span API that takes CompareOptions either. If we are going to expose API takes culture, then we have to use CompareOptions that time.
I don't think there was any specific reason to not expose this one.
The current situation isn't ideal as I don't see a way to compare two strings with a specific culture and CompareOptions. In my case I need to allocate the string from the Span to do the comparison.
you cannot use the string for this comparison or your code is already using spans?
Correct, I'm switching to Span to reduce allocation.
by the way, I believe we didn't expose any span API that takes CompareOptions either. If we are going to expose API takes culture, then we have to use CompareOptions that time.
Agreed.
@ahsonkhan do you know any other workaround from the span side?
@ViktorHofer do you have the original strings that you constructed the span from in the first place? if so, I believe you can use these strings without the allocation. otherwise, I would say just file API request issue but I am not sure if we still open to introduce more new APIs in 2.1.
Not really in this layer where I'm operating. I will file an issue as I believe others will hit that also.
Reused this item for the proposal.
Should we also consider providing the other overloads on string like the ones that take bool and CultureInfo without CompareOptions:
C#
public static int Compare(string strA, string strB, bool ignoreCase)
public static int Compare(string strA, string strB, bool ignoreCase, CultureInfo culture)
What about the other APIs like Start/EndsWith? Consider adding those to the proposal as well.
do you know any other workaround from the span side?
There isn't really a workaround for this that doesn't end up re-allocating the string.
Should we also consider providing the other overloads on string like the ones that take bool and CultureInfo without CompareOptions:
I believe just using bool ignoreCase is legacy and not needed if you have a CompareOptions enum. @tarekgh can correct me here.
I don't think we need the APIs takes the bool ignore cases. but we may need to have IndexOf/LastIndexOf overloads that take culture and CompareOptions
but we may need to have IndexOf/LastIndexOf overloads that take culture and CompareOptions
We currently don't have such for string. So this will be entirely new APIs?
We currently don't have such for string. So this will be entirely new APIs?
We have these exposed from CompareInfo class.
What about the other APIs like Start/EndsWith? Consider adding those to the proposal as well.
@tarekgh should we also add these? I see that these only have bool ignoreCase, CultureInfo culture overloads on string. Should we mimic them or can we also make use of CompareOptions here?
should we also add these? I see that these only have bool ignoreCase, CultureInfo culture overloads on string. Should we mimic them or can we also make use of CompareOptions here?
The apis should look like:
C#
public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, CultureInfo culture, CompareOptions options);
public static int LastIndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, CultureInfo culture, CompareOptions options);
public static bool IsPrefix(String, String, CultureInfo culture, CompareOptions options);
public static bool IsSuffix(String, String, CultureInfo culture, CompareOptions options);
The other option is to add these extension methods to CompareInfo (instead of the Span) and at that time we'll not pass CultureInfo.
I prefer the CompareInfo option because this will be consistent with the strings APIs.
Also, I prefer to expose the proposed CompareTo from CompareInfo too with the name "Compare"
It would be great if we could finish the shape of this api-suggestion and send it to the review meeting. @tarekgh or @ahsonkhan could you drive that effort?
@ahsonkhan let me know if you'll not be able to do it.
It would be great if we could finish the shape of this api-suggestion and send it to the review meeting.
let me know if you'll not be able to do it.
If it is not urgent, I will be able to do it after ZBB. Otherwise, I would appreciate the help, @tarekgh.
I think this can wait.
@karelz this one is in the backlog now for quite some time. can we please prioritize it for the next meeting?
@ViktorHofer, it needs to be marked api-ready-for-review first. @tarekgh, aside from additional overloads that you mentioned here, does the originally proposed API look fine to you (CompareTo overload only)? I am going to flip the label
oh right, I missed that. thanks!
I prefer the proposal to expose the new APIs from CompareInfo (I mean IndexOf, LastIndexOf, IsSuffix, IsPrefix, and Compare). I mean the APIs which takes CompareCoptions. the reason is we'll not need to pass CultureInfo to the APIs and we'll be consistent with the other APIs takes string parameters. if I get some time I can try to get the proposal but if you have cycles you may do it.
@ViktorHofer @ahsonkhan
Here is what I think we should expose. if you agree, please go ahead and update the proposal with it. or let me know what you think.
C#
namespace System.Globalization
{
public partial class CompareInfo : IDeserializationCallback
{
public int IndexOf(ReadOnlySpan<char> source, ReadOnlySpan<char> value, CompareOptions options);
public int LastIndexOf(ReadOnlySpan<char> source, ReadOnlySpan<char> value, CompareOptions options);
public int Compare(ReadOnlySpan<char> source, ReadOnlySpan<char> value, CompareOptions options);
}
}
These apis seem reasonable to me. Thanks Tarek!
Here is what I think we should expose.
Looking at the existing APIs on CompareInfo, should we also add IsSuffix, IsPrefix? What about overloads that don't take CompareOption at all?
Looking at the existing APIs on CompareInfo, should we also add IsSuffix, IsPrefix? What about overloads that don't take CompareOption at all?
IsSuffix, IsPrefix can be useful to add. I don't think the overload that doesn't take CompareOption is useful as still can pass None value. I don't have strong feeling about it though
I added IsPrefix & IsSuffix to the proposal as I don't see any reason not to expose them - they are already implemented.
What about overloads that don't take CompareOption at all?
If you are just talking about IsPrefix and IsSuffix than I don't have a strong opinion either. For convenience we could add a default value options = CompareOptions.None to the API shape.
If you are just talking about IsPrefix and IsSuffix than I don't have a strong opinion either.
I was talking about Compare method.
Ok. FWIW we should discuss for all the proposed APIs if a default CompareOptions value makes sense here.
What about bool IsSortable(ReadOnlySpan<char> text)? It's the only method left in CompareInfo that we haven't considered yet.
IsSortable is nice to have but I don't consider it must to have there.
Ok I added it to the proposal. I also made the CompareOptions overload optional. I saw other APIs that we released with 2.1 where we used default values so I believe this should be fine. We can discuss these in the review meeting.
The scenario is reasonable.
@GrabYourPitchforks brought up that we should also add an overload for GetHashCode().
It's worth pointing out CompareInfo can be derived from and the existing overloads are virtual. Presumably, these new APIs wouldn't normally delegate to them (because we want to avoid allocating a string in the first place), which now means that if a custom derivative exists and overrides the other methods, these methods would do the wrong thing.
However, the type doesn't expose a public constructor so in practice nobody can derive from it. And it doesn't seem our implementation does either.
Options:
Ignore. We make these methods non-virtual and don't delegate to the existing methods.
Handle. We make these methods virtual and have the implementation do a type check. If the instance is CompareInfo exactly, do the cheap thing. Otherwise, create strings and dispatch to the existing virtual methods. Implementers would have to override them to not pay the allocation cost.
We're leaning towards option (1) as we don't believe deriving from CompareInfo to be a thing.
In particular, we need a new overload int GetHashCode(ReadOnlySpan<char> value, CompareOptions options = default).
I'm tracking the GetHashCode overload at https://github.com/dotnet/corefx/issues/31302.
@ViktorHofer is there any push to have these APIs in 3.0?
talked offline with @ViktorHofer and moving this issue to future release as it is not urgent to 3.0
linking related issue https://github.com/dotnet/corefx/issues/33543
I just hit this when trying to avoid string allocations in SqlClient. Every time you ExecuteReader() it'll allocate strings for the metadata column names and I can't avoid it because I can't use a CompareInfo.
This shouldn't be too hard to implement. The apis are already approved. I gave it a try once but had to stop because of other prios.
If you feel strongly about it you could give it a try.
@tarekgh @danmosemsft what do you think. Is it already too late for this or is there still some time?
Generally I'd like community work to be able to continue but this is code is fiddly and perf sensitive so I'd be inclined to suggest it wait until master becomes 5.0. No reason not to work on a PR though, we'd very much welcome that and we can certainly look at it.
I would certainly love to see this happen.
@Wraith2 which API you want to use? I am asking to know if we can provide this API now and provide the rest later.
GetHashCode(ROS<char>) which is under https://github.com/dotnet/corefx/issues/31942 and from this proposal Compare(ROS<char>,ROS<char>,CompareInfo). With those I can have a char[] source wrapped in a struct that I can use as a dictionary key in the ordinal case and then fall back to the Compare if the ordinal lookup fails and I need to compare based on the collation reported by the database.
I keep trying to optimize various bits of code to not allocate strings unless they're needed and I keep hitting the hashcode wall. This time I went past that using an internal class StringOrCharArray from coreclr but then I hit this problem. Spans are nice for new code but working them into legacy code is awkward due to incomplete api coverage like this.
@Wraith2 for GetHashCode(ROS
C#
public partial class CompareInfo : System.Runtime.Serialization.IDeserializationCallback
{
public int GetHashCode(System.ReadOnlySpan<char> source, System.Globalization.CompareOptions options) { throw null; }
}
For the Compare(...) one, does your scenario need to pass different CompareOptions? or you always call with same options?
The specific use case i'm looking at is here. It builds a dictionary using ordinal hashcodes and tries a dictionary lookup. If it fails it then does a linear OrdinalCaseInsensitive search which I can replicate. If that fails it grabs the current culture invariant CompareInfo object and some default CompareOptions values and uses them together to do another linear search, this part I can't replicate.
So the first phase I could really do with the ability to just do ROS<char>.GetHashCode() but as I said if you implement your own hashcode you can get around that and I nicked some coreclr code that does that. The third part I can't replicate and I'm going to need to fall back to allocating strings.
To ensure I am fully understanding you:
Are you saying, trying to implement your
```C#
public int IndexOf(string fieldName)
but instead of taking string, it take ROS<char)?
Also, you are saying, if you get the CompareInfo method
```C#
public int Compare(ReadOnlySpan<char> source, ReadOnlySpan<char> value, CompareOptions options = CompareOptions.None);
You will be able to implement the whole scenario without any allocation.
Am I understanding correctly?
Without allocating strings, yes I think so. Still working through an isolated partial implementation.
Can I check that the intention is for this to get into 5.0, with no workaround for libraries targeting older versions of .NET? NodaTime 3.0 is aiming to target netstandard2.0, using the System.Memory package for span integration. Am I right to not expect any support for this Compare(ReadOnlySpan<char>, ReadOnlySpan<char>, CompareOptions) in future versions of that package, basically?
@jskeet So far this is marked for netcoreapp 5.0 and not netstandard 2.0. Unfortunately, the only workaround for this has to allocate strings.
@terrajobst is there a way we can add these new APIs to netsatndard 2.0?
@terrajobst is there a way we can add these new APIs to netsatndard 2.0?
No. .NET Standard 2.0 is set in stone.
It can't be added to .NET Standard 2.0... but it could be added to the System.Memory nuget package. That's what I was hoping for :)
System.Memory is not a nuget package anymore. It was folded into runtime (aka inboxed). It cannot evolve.
Also, adding this to System.Memory would mean using extension methods for this that is not the natural shape for these APIs and does not follow API design guidelines.
Missed a few overloads on CompareInfo earlier:
namespace System.Globalization
{
public sealed partial class CompareInfo : System.Runtime.Serialization.IDeserializationCallback
{
/* EXISTING APIS (partial list) */
public int Compare(string? string1, string? string2, System.Globalization.CompareOptions options) { throw null; }
public System.Globalization.SortKey GetSortKey(string source, System.Globalization.CompareOptions options) { throw null; }
public int IndexOf(string source, char value, System.Globalization.CompareOptions options) { throw null; }
public int IndexOf(string source, string value, System.Globalization.CompareOptions options) { throw null; }
public bool IsPrefix(string source, string prefix, System.Globalization.CompareOptions options) { throw null; }
public static bool IsSortable(char ch) { throw null; }
public static bool IsSortable(string text) { throw null; }
public bool IsSuffix(string source, string suffix, System.Globalization.CompareOptions options) { throw null; }
public int LastIndexOf(string source, char value, System.Globalization.CompareOptions options) { throw null; }
public int LastIndexOf(string source, string value, System.Globalization.CompareOptions options) { throw null; }
/* ALREADY APPROVED APIS */
public int Compare(System.ReadOnlySpan<char> string1, System.ReadOnlySpan<char> string2, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
public int IndexOf(System.ReadOnlySpan<char> source, System.ReadOnlySpan<char> value, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
public bool IsPrefix(System.ReadOnlySpan<char> source, System.ReadOnlySpan<char> prefix, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
public static bool IsSortable(System.ReadOnlySpan<char> text) { throw null; }
public bool IsSuffix(System.ReadOnlySpan<char> source, System.ReadOnlySpan<char> suffix, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
public int LastIndexOf(System.ReadOnlySpan<char> source, System.ReadOnlySpan<char> value, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
/* NEW PROPOSED APIS */
public int GetSortKey(System.ReadOnlySpan<char> source, System.Span<byte> destination, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
public int GetSortKeyLength(System.ReadOnlySpan<char> source, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
public int IndexOf(System.ReadOnlySpan<char> source, System.Text.Rune value, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
public static bool IsSortable(System.Text.Rune value) { throw null; }
public int LastIndexOf(System.ReadOnlySpan<char> source, System.Text.Rune value, System.Globalization.CompareOptions options = System.Globalization.CompareOptions.None) { throw null; }
}
}
Marked as blocking because it's blocking PR https://github.com/dotnet/runtime/pull/1514/.
Approved over email. Feedback was to rename the destination parameter of GetSortKey to _destination_ instead of _sortKey_. I've updated the proposal to reflect this.