Runtime: Improving the developer experience with regard to default string globalization

Created on 28 Oct 2020  路  30Comments  路  Source: dotnet/runtime

Improving the developer experience with regard to default string globalization

Note: This is a companion document to https://github.com/dotnet/docs/issues/21249, which discusses the behaviors as they currently exist. It's assumed the reader is generally familiar with that document.

Summary

Since .NET 5 standardized on using ICU instead of NLS for globalization across all supported platforms (see breaking change notice), we've received a few reports of string and globalization APIs not behaving as expected.

These reports generally fall into one of two buckets:

  1. The developer wasn't intending to make their code globalization-aware, and the switch to ICU exposed an unintentional dependency in the developer's code which led to an unwanted behavioral change. See: https://github.com/dotnet/runtime/issues/43736, https://github.com/dotnet/runtime/issues/42234, https://github.com/dotnet/runtime/issues/40922, https://github.com/dotnet/runtime/issues/40258, https://github.com/dotnet/runtime/issues/36177, https://github.com/dotnet/runtime/issues/33997, https://github.com/dotnet/runtime/issues/43802, https://github.com/dotnet/runtime/issues/36891, https://github.com/dotnet/runtime/issues/43772, https://github.com/dotnet/runtime/issues/44687.

  2. The developer intended to use a globalization feature, and the switch from NLS to ICU introduced an unexpected behavior. See: https://github.com/dotnet/runtime/issues/43795, https://github.com/dotnet/runtime/issues/39523.

For scenarios which fall into the second bucket, the runtime offers a compat switch to restore the old behavior. The remainder of this document will focus on the first bucket. This bucket is where most of the reports seem to fall.

To address these, we plan a three-pronged approach: improve documentation in this area, audit existing tutorials and code samples, and change the new project experience to reduce the "pit of failure" surface for .NET developers. __We are soliciting the community's feedback on all of these. Please use this issue to discuss.__

Improving documentation regarding string APIs

There are currently breaking change and compatibility noticed posted at the following locations:

We are additionally tracking through https://github.com/dotnet/docs/issues/21249 improvements to the string docs all-up, including recommendations for which Roslyn analyzer rules to enable and updating the string docs to include a table of the default globalization mode for each of the APIs.

This work alone won't make the experience better, but it should help make information more complete and accessible to developers who are searching for it. This _does not_ solve the problem of "How would somebody know to seek out this information in the first place?" - later sections of this proposal should address those.

Reviewing samples and tutorials for proper string handling patterns

We should review samples and tutorials to ensure they're not ingraining incorrect code patterns in our audience's minds. This is potentially a very large undertaking due to .NET samples being scattered across many different sites, some of which haven't been updated in over a decade.

At the very least, the samples that accompany the API documentation should be clarified so that they avoid performing linguistic operations when ordinal operations were likely more appropriate. A _not at all exhaustive_ list is provided below.

A simple search for these patterns will likely produce many false positives. We also shouldn't assume that every such instance of a culture-aware comparison is incorrect. More on this later.

Dev experience changes to reduce the "pit of failure"

The document https://github.com/dotnet/docs/issues/21249 suggests that developers manually enable the Roslyn analyzer rules __CA1307__ and __CA1309__ in their code bases. That rule will flag calls to string.IndexOf(string) and other culture-aware APIs, requesting that the developer explicitly pass StringComparison.CurrentCulture to indicate "yes, I really did intend for this to be culture-aware."

This helps, but it requires an active gesture on the part of the developer. Ideally we would instead alert developers to potential problems (or even fix these problems automatically!) without requiring the developer to have first sought help.

There are some various options we can take here, each with their own pros and cons. I'll describe some potential paths in a section below.

A bit of historical context

When .NET Framework was introduced two decades ago (!!!), the killer app was creating rich UI-based applications. .NET Framework introduced WinForms as the successor to VB6's rapid application development model. It also introduced WebForms as a way to create web-based GUIs with similar fidelity to native WinForms apps. End users interface directly with these app models, which led to rich localization and globalization support being weaved throughout these app models from a very early stage.

Part of this early work involved ensuring that string instances could unambiguously hold data in any supported language. Historically this had been accomplished by storing the string as a sequence of 8-bit C-style chars (LPSTR), leaving their intepretation up to the active Windows code pages. .NET uses UTF-16 for its string representation, removing the reliance on code pages.

At the same time, since user interaction was such a crucial component of early .NET applications, it was important that applications behave according to the user's expectations. This is especially pronounced in applications that perform searching and collation, such as a personnel system which lists all employees' names alphabetically. The end user might expect ordering to be performed according to the conventions of U.S. English, or of Hungarian, or of Turkish, or of another language, depending on how they've configured their system. (The rules for performing Hungarian or Turkish collation are non-trivial.)

To support these scenarios, the .NET Framework APIs which search for one substring within another string or which compare two strings use _the thread's current culture_ (StringComparison.CurrentCulture) by default. This includes APIs like string.IndexOf(string), string.CompareTo(string), and similar. Contrarily, .NET Framework APIs which search for individual chars within a string use _ordinal_ (StringComparison.Ordinal) searching by default. This includes APIs like string.IndexOf(char) and string.StartsWith(char).

string.Contains is the exception to this rule. It was introduced in .NET Framework 2.0 - _after_ the other string APIs - and does not follow the same convention. For string.Contains, both the string-based and the char-based overloads use _ordinal_ behavior by default.

An important aspect of globalized behavior is that it's _not stable_ across platforms. Language itself is fluid, and conventions change. The globalization data that ships with the operating system encompasses not just language conventions, but also geopolitical concerns such as the default currency symbol, and the OS regularly updates this data. While these updates are not intended to be breaking, they make no guarantee of behavioral compatibility.

This globalized-by-default behavior might be appropriate for UI-based applications where an end user is interacting directly with the app, it's often _not_ appropriate for all other scenarios. Web and backend services usually need to process data in a manner that remains consistent across runs and is not influenced by any linguistic conventions. Command-line tools similarly should usually exhibit consistent behavior regardless of the language of the user who launched the tool. Even within a GUI app running on a user's local machine, any underlying business logic should usually run uninfluenced by the user's culture settings.

Now that .NET has adopted Span<T> as a first-class citizen (and ReadOnlySpan<char> as the convention for a cheap string slice), there are also consistency issues to deal with. All Span<T>-based extension methods (including extension methods that operate on ReadOnlySpan<char>) are _ordinal_ by default, unless an explicit StringComparison has been provided. As developers begin using span-based code more frequently, the risk of mixing and matching linguistic and non-linguistic operations on the same text increases.

string str = GetString();
bool b1 = str.StartsWith("Hello"); // uses 'CurrentCulture' by default

ReadOnlySpan<char> span = str.AsSpan();
bool b2 = span.StartsWith("Hello"); // uses 'Ordinal' by default

This mismatch of expectations could cause developers to introduce latent bugs into their code bases.

Possible paths forward

Option A: Enable Roslyn analyzer warnings by default

As mentioned earlier, the Roslyn analyzer rules __CA1307__ and __CA1309__ are intended to alert developers when they're invoking an string-based API that uses linguistic behavior by default. We can go further and enable these rules by default in applications targeting .NET 6+, producing compiler warnings when these patterns are observed. We can also mark APIs like string.IndexOf(string) as [EditorBrowsable(Never)], effectively hiding them from Intellisense and guiding developers toward the StringComparison-consuming overloads.

The developer would see the warnings both within the Visual Studio IDE and on the console during compilation.

string str = GetString();
if (str.StartsWith("Hello")) // This line produces warning CA1307.
{
    /* do something */
}

if (str.StartsWith("Hello", StringComparison.CurrentCulture)) // Explicit comparison specified, no warning produced.
{
    /* do something */
}

__Pros:__ The developer is alerted to the problem early, potentially before they even observe the problem in production.

__Cons:__ This may introduce noise in code bases where the developer truly did intend to call globalization-aware APIs, including within enterprise code bases which have been brought forward across several .NET Framework versions. It also risks introducing a very steep learning curve for new .NET developers, who are now confronted with globalization-related issues while still within the first few minutes of writing their first "Hello, world!" application.

_Alternative proposal:_ Enable these rules in all application types except WinForms and WPF. This assumes that calls to methods like string.IndexOf(string) _where the user intended the default globalization behavior_ are very rare outside of WinForms and WPF projects.

Option B: Provide a compatibilty switch to change the string defaults

Under this proposal, we provide a switch which forces _all_ string APIs to default to Ordinal unless an explicit StringComparison has been provided. This encompasses string.IndexOf(string), string.Compare, and similar APIs. Globalization-specific APIs like System.Globalization.CompareInfo would be unaffected by this switch.

This switch would be application-wide, just like the existing globalization switches. There would be no facility for individual libraries to control this behavior. Library developers would still need to call APIs which take a StringComparison parameter if they want a strong guarantee on what behavior they'll get. (Library devs may want to enable the Roslyn analyzer rules to help flag non-compliant call sites.)

This switch would default to Ordinal and would be distinct from the _UseNls_ switch that .NET 5 already provides. The two switches could be toggled independently. Defaulting string APIs to Ordinal matches how strings behave in other languages like C/C++, Java, Python, Rust, and others. Interestingly, Silverlight 2 and 3 also shipped with "string defaults to Ordinal" behavior, but this was later reverted with Silverlight 4. This switch would also mean that string.ToUpper and string.ToLower become equivalent to string.ToUpperInvariant and string.ToLowerInvariant.

Underlying this proposal is an assumption that stringy operations should be ordinal _unless the call site explicitly requests otherwise_. This makes writing globalization-friendly code a deliberate action rather than an automatic behavior. WinForms UI controls like list boxes could still behave in a manner appropriate for their own scenarios.

__Pros:__ Provides uniformity across the API surface. Also provides significant performance increases since ordinal operations are considerably faster than linguistic operations.

__Cons:__ This could be a substantial breaking change, especially for large applications which can't audit every line of code within third-party dependencies. It also deviates from documented defaults. This could cause confusion if somebody is following an older tutorial or if somebody really did intend to invoke a linguistic operation. We'd also need to figure out how it affects globalization-aware APIs like int.ToString and int.Parse.

Option C: Shenanigans with reference assemblies

This is akin to Option B above but is intended to be less breaking to the .NET ecosystem. Here, we introduce no globalization switch, and we don't change any existing runtime behavior. Instead, we make two changes to .NET 6's reference assemblies.

  1. Remove string API overloads that don't take StringComparison.
  2. Change existing string API overloads which take StringComparison to default these parameters to Ordinal.

Consider overloads of string.StartsWith. Here is how the overloads currently appear in the reference assemblies and how they would appear after this proposal.

//
// .NET 5 reference assemblies
//
public sealed class string
{
    public bool StartsWith(char value);
    public bool StartsWith(string value);
    public bool StartsWith(string value, bool ignoreCase, CultureInfo? culture);
    public bool StartsWith(string value, StringComparison comparisonType);
}

//
// .NET 6 proposed reference assemblies
//
public sealed class string
{
    public bool StartsWith(char value);
    // public bool StartsWith(string value); // (REMOVED)
    public bool StartsWith(string value, bool ignoreCase); // (ADDED, to accelerate OrdinalIgnoreCase scenarios)
    public bool StartsWith(string value, bool ignoreCase, CultureInfo? culture);
    public bool StartsWith(string value, StringComparison comparisonType = StringComparison.Ordinal); // default value added
}

The end effect of this is that if a call site reads someString.StartsWith("Hello"), the .NET 6 compiler will no longer bind the call site to string.StartsWith(string). It will instead be bound to string.StartsWith(string, StringComparison) with the value _Ordinal_ burned in at the call site. Existing assemblies which were compiled against .NET 5 or earlier will continue to call the original method, which still exists within the runtime and still has its old behavior.

__Pros:__ Provides uniformity across the API surface, while retaining behavioral compatibility for assemblies which don't target .NET 6.

__Cons:__ This feels like an abuse of the reference assembly system. It also means that if you're inspecting code, you need to know its target framework (by cracking open the .csproj!) to deduce what the runtime behavior will be. There may also be issues with dynamic compilation and other scenarios where the runtime assemblies are used directly instead of using reference assemblies.

Option D: Revert back to NLS by default when running on Windows

We flip the switches so that ICU is no longer the default globalization stack when .NET apps run on Windows. This does not back out the "ICU everywhere" feature; apps running on Windows can still opt-in to using ICU if desired.

This needn't be exclusive of other options. For example, this can be undertaken jointly with obsoleting APIs which are culture-aware by default. The goal of this proposal is to act as a compat shim rather than to address any latent bugs which might exist in today's callers.

__Pros:__ .NET Framework and .NET Core applications which were built and tested on Windows will continue to work the same way when ported to .NET on Windows.

__Cons:__ Like .NET Core, .NET applications will behave differently across different OSes. Without compile-time alerts, it does not prevent new incorrect call sites from being introduced into the wider .NET ecosystem.

Option E: Do nothing

We take no proactive measures regarding the developer experience. All of our efforts are focused solely on documentation, samples, and similar developer education. Basically, leave the world as it exists today in .NET 5.

__Pros:__ We understand the world as it exists today. Once developers observe a misbehavior in their applications, they can consult our documentation or third-party channels like StackOverflow to self-assist.

__Cons:__ It leaves the "pit of failure" fairly wide and relies on developers to experience a problem before seeking assistance. This potentially leads to the continued introduction of fragile code into the wider .NET ecosystem.

Follow-up work

If we could answer the following questions, that might help inform our decision on what path to take. This issue does not propose a way to discover the answers to these questions.

  • How often are developers writing UI-layer code vs. business logic or other non-UI code? How can we detect this layering even within a single project?

  • What percentage of calls to string.IndexOf(string) would _in practice_ return different results if we were to flip the default from CurrentCulture to Ordinal?

  • Do we need to address APIs like int.Parse at the same time? Example: Does a proposed 'ordinal by default' switch also mean that int.Parse and decimal.ToString are invariant by default?

  • What other options are missing from the above list?

Design Discussion area-System.Runtime breaking-change

Most helpful comment

I do not have a clear preference regarding the options yet, but I can look at it from an end user perspective. I do think that the direction should definitely be towards making the behaviour of the default overloads of Contains() and IndexOf() more alike and not further apart. Yes there has always been an inconsistency, but I fear that increasing this inconsistency will lead to a less developer friendly environment. In an ideal scenario, the calls to IndexOf() should always return higher than -1, when Contains() returns true.

I'm not familiar with how other languages deal with this, but that's just common sense talking. Especially beginning developers will likely check string content through Contains() before retrieving the subsequent index of found string. Obviously this is not necessary if you just check for the value -1 instead using Contains() beforehand but it is likely to happen (I've seen it when training new developers). Increasing the difference between the two calls will make for bugs which aren't obvious by looking at the code.

Using default compiler warnings to solve this issue seems less friendly. I feel like making applications globalization aware is a choice a developer has to make at some point in time, but it's certainly not required for all applications. I agree with the point that adding default warnings for this, will burden beginning developers with something they have zero knowledge about. Obviously learning about this is good but if you introduce it too early, it'll be more of an annoyance than anything else. Adding the warnings won't ensure that the developer learns proper handling either, he/she might just glance at MSDN and just pick an option and paste that single option in every call it's needed without thinking much of it. _Just to be done with it._

Having the behaviour Ordinal by default (which would be my preference), will guarantee consistent behaviour across platforms. If a developer starts to make his/her application culture aware, obviously at that point enforcing explicit StringComparison argument is pretty much necessary. It's not practical to rely on default overloads in that case, in part because it makes code reviews harder (you have to keep the defaults in mind every time you see a call). So at that point I feel the warnings are less of a burden and actually prove valuable. That how it's been and it seems logical to continue that approach (no default warnings for this, manually opt-in to them). The company I work is busy with making a large application globalization aware and when that direction became clear we just enabled those warnings. There were tons of them to handle but in the end the result was very clear and explicit.

All 30 comments

Not sure if this is identical to option D, but to start with:

  • Make ICU opt-in instead of opt-out for .NET 5. Use the next year to educate people, allowing them time to fix their stuff. The can opt-in to the new behavior in order to fix their applications on their own schedule. In combination with some of the above proposals.
    Make ICU opt-out for .NET 6 (if that's even going to be an option, I guess).
    (edit - I think this is basically option D.)

Related:
Parsing a number under invariant culture info is a bit slower and more complex in code. (querying NumberFormatInfo) This is highly unexpected, as converting between numbers and strings are likely a hot path in any type of application.

There must be a path for number that doesn't touch globalization totally. NumberFormatInfo.Invariantainfo is still costful.

The challenge I see with A,B,C is that different people will have equally valid opinions on what the default behavior should be when the methods that don't take StringComparison are used. One of the camps is going to be unhappy, either way.

For option B, it isn't 100% clear to me - are you saying that the default OOTB setting will be Ordinal (ie change in behavior) and that people who want current behavior will have to opt-out? Or that people would have to opt-in.
For something like Option B, if it's opt-in how will this be discoverable for the average Jane/Joe in order to get good % coverage of people who migrate.

For some reason, I kind of like the idea of Option B over Option C. In my mind it's basically like saying "here's a switch that controls the default value of StringComparison if it's unspecified, choose what makes sense for your project type".

I strongly preference Option A.

To address these concerns:

Con: This may introduce noise in code bases where the developer truly did intend to call globalization-aware APIs, including within enterprise code bases which have been brought forward across several .NET Framework versions.

If the intent was truly to call globalization-aware API's I don't think that ask is too steep to explicitly declare your intention, even when pulling code forward.

We have made gigantic leaps forward previously with breaking API in other projects. One such example that comes to mind is when NUnit3 broke several NUnit2 behaviors. We used the excellent work from @wachulski and this Roslyn Analyzer/CodeFix Project https://github.com/nunit/nunit.analyzers. While the transition was not pleasant it was made possible through the hard work of the open source community and Roslyn's rewrite abilities.

I cannot stress how critical it is to have the code fix offer to correct this, this turns an impossible task into a much more palatable one.

Con: It also risks introducing a very steep learning curve for new .NET developers, who are now confronted with globalization-related issues while still within the first few minutes of writing their first "Hello, world!" application.

While I agree that it is not pleasant to encounter this out of the gate, this might be a great learning opportunity for developers on additional considerations when programming cross platform. There are plenty of practices that are sub-optimal when learning to program in a new language, throwing an additional warning at that time I think is a "good thing" as it enforces "good" coding habits out of the gate.

One example of such practice that comes to mind is that you can find hundreds of thousands of examples out on the internet of people performing ad-hoc Sql using SqlCommand(string) performing string concatenation as opposed to using any of the widely available SQL Injection Sanitation libraries or SqlPreparedStatements. This problem is not unique to C# (in fact I would argue it is much more prevalent on the PHP side). This doesn't mean that out of the box the Compiler/IDE/Analyzer set shouldn't alert you to this practice being considered bad, it is just another place where a learning experience can occur. The onerous is then put forth on the developer to make a decision on if they will change their code or if they will simply mark it as ignored.

I would say that this:

We can also mark APIs like string.IndexOf(string) as [EditorBrowsable(Never)], effectively hiding them from Intellisense and guiding developers toward the StringComparison-consuming overloads.

Might be a bridge too far, however I am willing to accept it for now, this is ideal because copy and paste programmers from something like StackOverflow will continue to work, albeit throw a warning that can then be addressed.

Parsing a number under invariant culture info is a bit slower and more complex in code.

Presumably if we had an API whose behavior was known to be invariant (see Utf8Parser for an example), we'd hard-code knowledge of the expected syntax and wouldn't consult globalization tables at all. For example, we wouldn't need to look up the invariant culture's negative number sign or thousands separator. We'd just hard-code the literals '-' and ','.

Another concern: how about other entries that cannot pass the choice, namely interfaces IComparable<T>?
It seems that current IComparable<T> implementation of string defaults to culture aware path. That means using string for key in ordered collection, changing the thread's culture later will break the collection. In this case, changing the default behavior is actually a "fix".

Generally speaking, string globalization problem has a very wide impact, and very long history. It worth a specially designed mechanism.

There is no happy path here. The root of the problem is that developers want to think of a string as a char[]. Despite underlying implementation details those aren't the same. In fact, even a char isn't always a character (since some might require four bytes to represent). Strings are really an array of graphemes, but perhaps that ship has sailed. If you have a single grapheme then a method like ToUpper() makes sense. If all you have is a UTF16 (char) then ToUpper() _might_ make sense, or not. Same with IndexOf(), etc. Strings should probably be defined along the lines of String<"en-US", UTF16, CompareOptions.None> (Options F?), but I'm guessing that ship has also sailed. If the string carried along the necessary information then all string functions could deal in a language appropriate way and if you wanted ordinal behavior you would cast or copy the data into an array.

Sadly, "you don't always get what you want ...", or so the Rolling Stones tell me. I'd prefer consistence and the move to ICU has already introduced breaking changes. For that reason I would choose Option B. Let everything be ordinal unless explicit parameters indicate otherwise. Make it a global option, and generate warning when possible indicating possible bad assumptions in the original code. As noted this would also make it easier for people moving from other languages to C#.

Can you make every method in String/char Span class default to 1 single mode which then could be switched by runtime config?

Like for UI people will set/or maybe projects will be created with this setting by default { "DefaultStringBehaviour": CurrentCulture } and for backend there will be { "DefaultStringBehaviour": Ordinal }? Setting will exist in appsettings or runtimeconfig, could be set from IDE.

Then explicitly set behaviour when calling methods will always be preserved, but default could be changed by users depending on their needs.

Furthermore A Roslyn Analyzer could read appsettings and give warnings based on current default behavior mode if it sees that comparison won't ever work in selected mode (for const strings for example)

Can you make every method in String/char Span class default to 1 single mode which then could be switched by runtime config?

Would "every method" include Equals(String) and op_Equality(String, String), so that the runtime config would be able to make them use the CurrentCulture mode? If so, I think developers would expect C# switch on String to do the same, but the C# compiler would then have to generate two versions of each such statement. The JIT compiler could delete one of them, though.

My biggest concern is that str.contains(something) and -1 != str.indexOf(something) always produce the same result, even if that result changes. Making those behave differently is likely to break more programs more dramatically than a subtle behavior change in what they both match.

My biggest concern is that str.contains(something) and -1 != str.indexOf(something) always produce the same result, even if that result changes. Making those behave differently is likely to break more programs more dramatically than a subtle behavior change in what they both match

Just to make sure it's clear, this is already not the case; they already may behave differently (which is part of the problem).

"Just to make sure it's clear, this is already not the case; they already may behave differently (which is part of the problem)."

Ah, that's useful information!

  • Under what circumstances (any concrete example will do) would they behave differently from one another before .NET Core 3.1 and the changes to ICU that it carried with it (at least on some platforms)?
  • If the default used for indexOf(mySubstring) is a better choice, why isn't this same default equally appropriate for contains(mySubstring), which performs no more than a portion of the same operation?

For ensuing discussion, it would help to understand this.

@lupestro

Under what circumstances (any concrete example will do) would they behave differently from one another before .NET Core 3.1 and the changes to ICU that it carried with it (at least on some platforms)?

Several of them are linked in the above document in the first few lines of the issue:

The developer wasn't intending to make their code globalization-aware, and the switch to ICU exposed an unintentional dependency in the developer's code which led to an unwanted behavioral change. See: #43736, #42234, #40922, #40258, #36177, #33997, #43802, #36891, #43772.

I can speak to this one specifically since I reported it: https://github.com/dotnet/runtime/issues/43802

Based on comments and my understanding when the switch to ICU occurs the behavior that is currently experienced in Linux and MacOS will then be experienced in Windows. The jury is out on what the expected behavior was. I can say for that specific example our use case was an internal tool designed to sort Solution Files in a deterministic manner.

After upgrading to .NET 5 on Windows (where all of our developers are) this would result in several hundred solution files now sorting in a different deterministic manner, causing several developers to question why there are changes in portions of the monolithic code base they did not touch.

TODAY they would have encountered this as well, IF they had been developing on alternative platforms (Linux or MacOS), however we've been getting away "for free" for a long time due to our heavy investments in Windows.

Under what circumstances (any concrete example will do) would they behave differently from one another before .NET Core 3.1 and the changes to ICU that it carried with it (at least on some platforms)?

```C#
using System;
using System.Globalization;

class Program
{
static void Main()
{
CultureInfo.CurrentCulture = new CultureInfo("de-DE");
Console.WriteLine("ss".IndexOf("脽") != -1); // culture-sensitive
Console.WriteLine("ss".Contains("脽")); // culture-insensitive
}
}
```

If the default used for indexOf(mySubstring) is a better choice, why isn't this same default equally appropriate for contains(mySubstring), which performs no more than a portion of the same operation?

From my perspective with 20/20 hindsight, the inconsistency is a mistake of history; Contains(string) was introduced after IndexOf(string), and I don't know whether it using ordinal instead of being culture-aware as is IndexOf(string)/StartsWith(string)/EndsWith(string) was a deliberate difference (e.g. "We don't like that IndexOf(string) is culture-aware, so let's make new things we add be ordinal instead") or an accident.

@GrabYourPitchforks , to clarify option B: are you saying that the default OOTB value for this proposed setting would be Ordinal (ie change in behavior) and that people who want current behavior will have to opt-out?
It would be good to clarify this in the original text, because I think I know the answer based on the phrasing but I'm not sure.
Thanks again for the energy you've put into these proposals! And improving the docs/messaging/visibility.

@ericsampson The assumption is Ordinal by default. The text implies this but doesn't state it outright. I'll clarify in the next rev.

I for one like the sound of Option B, if the decision was made that it's time to do something significant to make things better in the long run, since this ICU change is already going to break some eggs :)

Levi, if that was the chosen path, would you envision two separate switches like this (naming/polarity TBD) having defaults as shown:

System.Globalization.UseNls = false;
System.Globalization.OrdinalStringComparison = true;

@ericsampson I've clarified the Option B proposal above. Your assumption re: the default behaviors is correct.

I think I would prefer option B coupled with a correction to the historical inconsistency around contains() - its behavior should track the behavior of indexOf, regardless of which behavior is chosen.

I do not have a clear preference regarding the options yet, but I can look at it from an end user perspective. I do think that the direction should definitely be towards making the behaviour of the default overloads of Contains() and IndexOf() more alike and not further apart. Yes there has always been an inconsistency, but I fear that increasing this inconsistency will lead to a less developer friendly environment. In an ideal scenario, the calls to IndexOf() should always return higher than -1, when Contains() returns true.

I'm not familiar with how other languages deal with this, but that's just common sense talking. Especially beginning developers will likely check string content through Contains() before retrieving the subsequent index of found string. Obviously this is not necessary if you just check for the value -1 instead using Contains() beforehand but it is likely to happen (I've seen it when training new developers). Increasing the difference between the two calls will make for bugs which aren't obvious by looking at the code.

Using default compiler warnings to solve this issue seems less friendly. I feel like making applications globalization aware is a choice a developer has to make at some point in time, but it's certainly not required for all applications. I agree with the point that adding default warnings for this, will burden beginning developers with something they have zero knowledge about. Obviously learning about this is good but if you introduce it too early, it'll be more of an annoyance than anything else. Adding the warnings won't ensure that the developer learns proper handling either, he/she might just glance at MSDN and just pick an option and paste that single option in every call it's needed without thinking much of it. _Just to be done with it._

Having the behaviour Ordinal by default (which would be my preference), will guarantee consistent behaviour across platforms. If a developer starts to make his/her application culture aware, obviously at that point enforcing explicit StringComparison argument is pretty much necessary. It's not practical to rely on default overloads in that case, in part because it makes code reviews harder (you have to keep the defaults in mind every time you see a call). So at that point I feel the warnings are less of a burden and actually prove valuable. That how it's been and it seems logical to continue that approach (no default warnings for this, manually opt-in to them). The company I work is busy with making a large application globalization aware and when that direction became clear we just enabled those warnings. There were tons of them to handle but in the end the result was very clear and explicit.

In my experience, most of the time, embeded globalization of basic string methods seems to be a _premature optimization_: we will do it if we really intent to; but for now, please just run the code as it appears. Just as @pyrocumulus points out. If we have to append StringComparison argument everytime to make it _correct_, will just frustrate many developers. And the code will read tedious. We average devs expect IndexOf(string) and Contains(string) to be semantically exchangable.

What should happen is, if behavior is different and may change then its ambitious code, this should be a build error, and dev should be forced to switch to non-ambitious . I am not a fan of building code, which complies successfully but its actually not what i thought it was, i would rather it hard error and give suggestions...for replacing. ambitious code should just be removed... force this as steps for the migration path update to .net 5, not doing this will just create more confusion.... i see no other option which does not create frustration.

@GrabYourPitchforks , given that 5.0 is coming out very soon, which of options A,B,C,D are going to be implemented for 5.0 GA? Thanks : )

@ericsampson This issue is tracking potential Future (.NET 6) work. For .NET 5, we improved the docs, and there's a global runtime switch to re-enable the legacy Windows behavior. That switch has no effect on non-Windows platforms.

My two cents:

Option A: This option seems good. The con that concerns me most is the learning curve issue, but I feel this would be largely mitigated if [EditorBrowsable(Never)] were employed. Ultimately this would be a bit of a pain point, but one I feel would be worth it in the long run.

Option B: I feel like this compatibility switch will break more code than it will fix. Library authors will suddenly be forced to always be explicit with no guarantees to the defaults, and will have to go through all their existing code to ensure this. Even application authors when turning this switch on may have to go through large amounts of existing code to look for callsites that may have _intentionally_ been using APIs that are culture-sensitive by default. Ordinal behaviour by default would have been great if it were like that from the beginning, but unfortunately it wasn't.

Option C: While this is certainly an interesting option that seems good in theory, it feels like a hack, and I'm sure it will cause all sorts of obscure issues that are hard to foresee. Plus, similar to the issues above with B, this could cause previously-correct code to break once retargeted and recompiled.

Option D: Certainly not in favour of this. The most important thing to me is to keep consistent behaviour across OSes. Depending on how easy it is to opt-in to preferring ICU (ideally there would be an MSBuild property), this might not be so bad. But ideally all developers, and especially library authors, should be pushed towards ensuring compatibility with ICU.

Option E: Honestly not a bad option.

@GrabYourPitchforks oh. I'm a little bit bummed then that ICU was kept opt-out vs opt-in for 5, which would have eased the transition.

An ordinal comparison mode ("option B") would be convenient even if it's disabled by default. As the OP points out, there are applications which simply don't care about globalization, such as company-internal tools. Executing System.Globalization.CultureInfo.CurrentCulture = System.Globalization.CultureInfo.InvariantCulture; some time in early startup is useful for them, but an "ordinal culture" would be even more convenient in applications where most string processing is intended to use ordinal comparisons.

An ordinal comparison mode ("option B") would be convenient even if it's disabled by default. As the OP points out, there are applications which simply don't care about globalization, such as company-internal tools. Executing System.Globalization.CultureInfo.CurrentCulture = System.Globalization.CultureInfo.InvariantCulture; some time in early startup is useful for them, but an "ordinal culture" would be even more convenient in applications where most string processing is intended to use ordinal comparisons.

I like the idea of an "ordinal culture", just a really barebones way of working on strings.

@ericsampson

* Make ICU opt-in instead of opt-out for .NET 5. Use the next year to educate people, allowing them time to fix their stuff.

This will not work, because people will usually only listen to education when they run into problems.

Another thing about option D is that it essentially amounts to damage control so delayed that it becomes unnecessary.

Presumably the motivation behind reverting the default on Windows is to improve compatibility with pre-.NET 5 software. But the damage to that software is done. .NET 5 is out. Over the next year, people are going to have to fix this software, or find ways to work around the compatibility issues. After this, the benefits of reverting the default to NFS diminish significantly.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bencz picture bencz  路  3Comments

btecu picture btecu  路  3Comments

jkotas picture jkotas  路  3Comments

v0l picture v0l  路  3Comments

omariom picture omariom  路  3Comments