Machinelearning: Poor accuracy with non-US regional settings

Created on 3 Mar 2020  Â·  6Comments  Â·  Source: dotnet/machinelearning

System information

  • OS Windows 10:
  • .NET Core 2.2:

Issue

Getting poor accuracy running the training code on a system with non-US regional settings. The issue is number format. After replacing ',' as decimal symbol to '.' all works fine.

Accuracy with ',' decimal symbol:

image

Accuracy with '.' decimal symbol:

image

Is there some way to take a control of localization in .NET ML?

Thanks
Damir

P1 bug image

Most helpful comment

We may want to setup a CI leg which tests another culture. Perhaps "de-DE" or "ru-RU".

All 6 comments

Hi Damir,

You bring up a very good edge case we shouldn't forget, which is that decimal points are represented with commas in a lot of regions outside the US, (For example, 3.5 * 10^1 is represented as 3.5 in the US and 3,5 in Europe, and 3.5 * 10^5 is represented as 350,000 in the US and 350.000 in Europe). We will look into this, thank you!

Edit: Another good edge case is the negative (-) sign. In some cultures (e.g. Faroese in Denmark), -1 is denoted as: −1. Here, the negative sign has ASCII value of 8722 (which is not on the normal ASCII table), and the default negative sign has ASCII value of 45. In addition, some cultures have text written from right to left (e.g. Arabic, Hebrew), so with these the negative sign will be to the right of the value (e.g. 3-, 4.05-).

We may want to setup a CI leg which tests another culture. Perhaps "de-DE" or "ru-RU".

How about this or similar?

MLContext ctx = new MLContext(.., culture: CultureInfo.InvariantCulure)

Here is a .NET Fiddle that demonstrates some of these cultural differences in .NET Core 3.1 (major thanks to @justinormont for the idea and the initial implementation): https://dotnetfiddle.net/LtAtoi

Hey @ddobric , can you provide sample code and data for which you've used to obtain the difference in your results above? It'll be helpful in testing the validity of a PR to fix this issue. Thanks!

Hi @ddobric, I'm reaching out again to ask if you can provide sample code and data with which you've encountered this issue. Thanks.

Was this page helpful?
0 / 5 - 0 ratings