Hi team!
Within the Microsoft Docs there are several locations mentioning case-sensitive and culture-sensitive string comparison. However, I could not find a valid source (within Microsoft Docs or externally) explaining the culture-specific sorting rules. In this page, there is the example of "goodbye" versus "Goodbye" with the result that "goodbye" precedes "Goodbye". This seems to imply that "g" < "G".
Similar in https://docs.microsoft.com/en-gb/dotnet/api/system.string.compare?view=netframework-4.7.1, where small letter "i" is compared to capital letter "I". Here again, it is implied that lower-case letters precede upper-case letters. But I could not find a source explicitly stating the (case-sensitive) sorting rules for the various cultures.
If there is a reference within Microsoft Docs that I missed, then it would be great to emphasize it. If the sorting rules used by the .Net framework are based on an external document, then please add a reference to it to the Microsoft Docs.
Kind regards,
David
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
Thanks for bringing this to our attention @david-haemmerle
Can you look at this article, and let me know if it has the information you need: https://docs.microsoft.com/dotnet/csharp/how-to/compare-strings
If it does, we'll fix this issue by linking from String.CompareTo. If not, we'll add the needed information in addition to adding the needed links.
Hi @BillWagner
Thank you for your quick reply. Unfortunately, the link does not contain the info I'm looking for. Maybe my inquiry was not clear enough; I'll try rephrasing it.
Basically, I'd like to have a reference for the comparison/sorting rules of the available cultures. For example, I'd like to know if "a" < "A" or vice versa in en-US _before_ writing the code and actually performing the comparison (so that one can predict the outcome of the comparison). Microsoft Docs contain some examples like "goodbye" vs "Goodbye" or "i" vs "I", but no reference to the comprehensive rules.
Thanks for the clarifications @david-haemmerle That helps a lot.
@tarekgh may have some thoughts about this.
For myself, I'm not sure, @david-haemmerle, that publishing a reference to the comparison rules is practical. Aside from the fact that culture-specific sorting and comparison rules are subject to change, the rules would also have to cover normalizations, comparison of single characters with graphemes, etc. It would need to do this for the entire Unicode character range (including the Unicode supplemental planes) for each predefined culture.
That said, I do seem to recall seeing a rather detailed document on culture-sensitive character weights in Windows (which the .NET Framework and .NET Core running on Windows relies on) a number of years ago. A quick search, however, has failed to find it. @ShawnSteele may remember the document (or realize that I've misremembered the whole thing).
@david-haemmerle please check the following link, you can download the whole Sorting Weight Tables and have the full information there. https://www.microsoft.com/en-us/download/details.aspx?id=10921
Also, here is the link for other stuff if you are interested in https://msdn.microsoft.com/en-us/library/cc249013.aspx.
Thanks for the link to the downloadable Sorting Weight Tables, @targekgh. That's what I was thinking of. We should add the link to the appropriate places in the documentation. I'll add that to our schedule for the upcoming sprint. And thanks for raising this issue, @david-haemmerle.
Thank you all a lot for your input! @tarekgh The Sorting Weight Table is what I was looking for. 👍 And I have to say, I really like and appreciate this way of communicating with the teams at Microsoft. Keep up the great work.