using System;
namespace wtfdot
{
class Program
{
static void Main(string[] args)
{
System.Threading.Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("nb-NO");
int.Parse("-1");
}
}
}
produces
Unhandled Exception: System.FormatException: Input string was not in a correct format.
at System.Number.StringToNumber(ReadOnlySpan1 str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) at System.Number.ParseInt32(ReadOnlySpan
1 s, NumberStyles style, NumberFormatInfo info)
at System.Int32.Parse(String s)
dotnet --version
2.2.103
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
The negative sign is \u2212
for some reason, and they probably get this info from the current OS.
is it possible to parse both - and \u2212 as minus sign? I'm pretty sure Norwegians use "-" from keyboard
The negative sign is \u2212 for some reason, and they probably get this info from the current OS.
Yes, we check for NumberFormatInfo.NegativeSign
.
I'm pretty sure Norwegians use "-" from keyboard
At least from looking at our docs, it does look like the Norwegian keyboard layout uses U+002D (Hyphen-Minus)
: https://docs.microsoft.com/en-us/globalization/windows-keyboard-layouts#N
It seems we are operating by design - we expect to be running int.Parse
in the same culture that the input was formatted in.
We don't in general know all the values that NumberFormatInfo.NegativeSign
may take today and in the future on various cultures so it seems like we could not reasonably be more tolerant.
@veonua is it possible to set the thread culture to match the culture the number was formatted in? If it might have either nb-No
or (say) en-US
negative signs, you might have to use "TryParse" and fall back from one to the other if the first fails.
@danmosemsft If what 99 % of users of a culture type can't be parsed correctly, then I think it makes int.Parse
nigh useless for that culture and I would consider that a bug.
The right fix might be to change CLDR data for the Norwegian culture, but I think the .Net team is in a better position to attempt to make that change, than a random person.
Yes, I think we should confirm that the actual behavior here matches what the docs are saying (i.e. that the Norwegian keyboard uses U+002D (Hyphen-Minus)
, but that the culture info uses U+2212 (Minus Sign)
). If that is the case, I think it is a usability bug.
so basically ALL of client apps _must_ have two (or maybe more) TryParse
calls, do you think application developers has more knowledge and skills to
handle situations like this?
This either should be fixed in the framework or must be written in every
book and documentation.
My expectation that Framework would isolate me from platform & cluture
specific issues,
On Tue, Jan 22, 2019, 20:59 Dan Moseley <[email protected] wrote:
It seems we are operating by design - we expect to be running int.Parse
in the same culture that the input was formatted in.We don't in general know all the values that NumberFormatInfo.NegativeSign
may take today and in the future on various cultures so it seems like we
could not reasonably be more tolerant.@veonua https://github.com/veonua is it possible to set the thread
culture to match the culture the number was formatted in? If it might have
either nb-No or (say) en-US negative signs, you might have to use
"TryParse" and fall back from one to the other if the first fails.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dotnet/corefx/issues/34672#issuecomment-456541136,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABeALj-DIy7Dn23cLZ4L3rcWmeYnj6U_ks5vF22LgaJpZM4aH1zR
.
My expectation that Framework would isolate me from platform & cluture specific issues
That's what CultureInfo.InvariantCulture is for. Pass that to your Parse call as the provider, or set it as the current culture, and regardless of locale you'll get the same parsing behavior.
The right fix might be to change CLDR data for the Norwegian culture
Yes. I don't believe we should be second-guessing the data from ICU / the OS. If there's an issue with that data, we should work with the provider of it to fix it.
cc: @tarekgh, @krwq
That's what CultureInfo.InvariantCulture is for. Pass that to your Parse call as the provider, or set it as the current culture, and regardless of locale you'll get the same parsing behavior.
So I have to keep in mind, what culture I should use for every parse? Even all my application uses nb-No, int parse must be in CultureInfo.InvariantCulture ?
and minimal code I have to provide is
var str = Console.ReadLine();
int i=0;
if (int.TryParse(str, out i)) {
Console.WriteLine("it is "+i);
} else {
if (int.TryParse(str, NumberStyles.Any, CultureInfo.InvariantCulture, out i)) {
Console.WriteLine("You are Norwegian!");
}
}
There are many differences (not just NegativeSign) between Windows and Linux, see other issues here and in related .NET Core repos.
This prints nothing on Windows but prints a lot of text on Linux:
C#
foreach (var c in CultureInfo.GetCultures(CultureTypes.AllCultures)) {
if (c.NumberFormat.NegativeSign != "-")
Console.WriteLine(c);
}
@0xd4d
I had a look at the strings used for NegativeSign
on my Ubuntu 18.04. What I found:
| string | example culture |
|---|---|
| U+002D HYPHEN-MINUS | en
English |
| U+061C ARABIC LETTER MARK, U+002D HYPHEN-MINUS | ar
Arabic |
| U+200E LEFT-TO-RIGHT MARK, U+002D HYPHEN-MINUS | he
Hebrew |
| U+200E LEFT-TO-RIGHT MARK, U+002D HYPHEN-MINUS, U+200E LEFT-TO-RIGHT MARK | ps
Pashto |
| U+200F RIGHT-TO-LEFT MARK, U+002D HYPHEN-MINUS | ckb
Central Kurdish |
| U+2212 MINUS SIGN | nb
Norwegian Bokmål |
| U+200E LEFT-TO-RIGHT MARK, U+2212 MINUS SIGN | fa
Persian |
So, all cultures use either U+002D HYPHEN-MINUS or U+2212 MINUS SIGN, though some surround them with additional marks, to make them display properly in that language. I haven't tested what effect those have on int.Parse
.
@veonua to summarize this thread:
by that, I am closing this issue but feel free to reply with any more questions if you have any. Thanks.
@tarekgh
If you think this is wrong, you can raise your concern to CLDR to fix that.
Like I said above, I think this is a bug in .Net. And I think fixing those should be a responsibility of the .Net team, even if the ultimate source of the bug is in a third-party dependency.
if you don't have control over the source of the string you are parsing, then Invariant will still be the best guess to use. You cannot just magically make any number formatted with some culture can be parsed with other culture.
The invariant culture is not a good option for parsing strings from users. And it seems like you're saying that it's fine if using the actual culture also doesn't work correctly for that. This is not about having a string that uses one culture and parsing it with another culture. It's about parsing strings from Norwegian users using the Norwegian culture not working.
In my opinion, int.Parse
should correctly parse what a regular user of a given culture is likely to type. It's not good enough if it's only supposed to parse the result of int.ToString()
.
@svick, I believe @tarekgh's point was that this is not something that .NET should fix or can reasonably workaround.
The issue (if there is one) is likely in the CLDR metadata published by the Unicode Consortium and a bug needs to be filed there: http://cldr.unicode.org/index/bug-reports.
For reference, the latest locale data:
@svick do you agree, that we cannot generally parse numbers correctly without knowing the culture? For example, 100,123
is a much smaller number in fr-FR than in en-US?
In which case I think it comes down to a possible bug in the culture data, which is coming from CLDR. We do not want to get in the business of defining our own culture data as it is complex and ever changing.
@danmosemsft
do you agree, that we cannot generally parse numbers correctly without knowing the culture? For example,
100,123
is a much bigger number in fr-FR than in en-US?
Of course.
In which case I think it comes down to a possible bug in the culture data, which is coming from CLDR. We do not want to get in the business of defining our own culture data as it is complex and ever changing.
I understand that. My problem is that what I see here is:
This is not a bug in our code, so we're going to close this issue and we will leave it to someone else to fix CLDR.
The attitude I would like to see is:
This problem is affecting our customers, so we will keep this issue open and we will work with the maintainers of CLDR on resolving it.
@svick - ah, I see. We ask folks to report directly to CLDR because we do not have expertise in the culture in which the issue is being reported, and therefore only make things less efficient trying to be in the "middle" of any discussion.
I believe CLDR is the closest to a standard across Windows and Unix - it is not something niche that .NET chose to depend on. If the issue was something specific to .NET's use of ICU/CLDR then we probably would want to be involved - that's not the case for the choice of negative sign in nb-NO.
@tarekgh is that a reasonable summary?
It also looks like this might be an issue with the CLDR metadata included with Ubuntu 18.04.
On Windows 10 (Build 17763), the program given in the OP succeeds.
On Ubuntu 18.04, the program fails with a FormatException.
@danmosemsft yes this is a good summary.
To be more helpful, here is the link can report any issue to CLDR http://cldr.unicode.org/index/bug-reports
Also I want to mention, we have fixed different parsing issues for nb-NO culture before. So, we really care about the customers when we have more control over the issue. @svick sorry if I really left any bad impression when I closed the issue, but what @danmosemsft mentioned explain why I closed the issue.
@tannergooding Windows trying to get closer to CLDR as much as can but still there is some difference which can be expected to see (or intentionally decided). I still think would be the best this problem be fixed in CLDR (if it is considered really a problem for such culture). anyway, there is a lot collaboration between CLDR and Windows and it is really going is very good direction.
Most helpful comment
@veonua to summarize this thread:
by that, I am closing this issue but feel free to reply with any more questions if you have any. Thanks.