Msbuild: Cultures aliased by ICU cannot be used for resource localization on non-Windows environments

Created on 31 Oct 2018  路  10Comments  路  Source: dotnet/msbuild

_From @CodingDinosaur on October 12, 2018 3:30_

When building or running under .NET Core on a Unix-based environment, certain cultures cannot be utilized for resource localization, such as getting localized strings. This impacts both the build process (e.g., identifying and processing resource files) and the lookup of resources at runtime. These cultures can be used as expected when building or running under Windows.

The affected cultures are those which are aliased by ICU -- that is, to save on DB space for certain cases, ICU defines some locales as an "alias" of another. There are 42 locale aliases in ICU 57, of those two of the most common are zh-TW and zh-CN. For a full list of affected locales, see: ICU Aliased Locale List - CultureIssueDemonstration Readme.

This platform-inconsistent behavior when trying to localize certain resources, necessitates using special code and workarounds both at build and deploy time when developing cross-platform applications.

See a demo of this issue in CodingDinosaur/CultureIssueDemonstration

Symptoms

  • Resource files for the affected locales will not be generated into resource assemblies if building in a Unix-based environment.

    • For example: MyStrings.zh-TW.resx will not have a resource assembly created if the build occurs on a Unix-based environment, but will if built under Windows

  • Resources for the affected locales will not be utilized even if the resource files are present, falling back to the default resources.

    • For example: Consider MyStrings.resx and MyStrings.zh-TW.resx. If a custom build step is used to generate a copy the resource assembly for MyStrings.zh-TW.resx to workaround the first issue, requesting a resource utilizing culture zh-TW will still return the resource from the default MyStrings.resx resources.

  • Affected cultures are missing when running under Unix-based environments and calling CultureInfo.GetCultures
  • Some culture data for affected cultures is platform inconsistent, notably the parent locale

Expected behavior

  • Primarily, that whatever the "correct" behaviors for these locales are (as it relates to resources and CultureInfo) they be consistent between Windows and Unix-like environments.
  • Secondarily, that aliased locales would "just work" in .NET Core for resource localization - e.g., that we could have resources defined as zh-TW, just as we can on Windows today, and properly retrieve the expected resources.

Brief Analysis

Most of the above symptoms boil down to uloc_getAvailable in ICU's C API.

For example, the zh-TW resource files do not get copied during build, because during the task SplitResourcesByCulture, the culture is validated against a cache based ultimately on CultureInfo.GetCultures, which in turn, on Unix, ultimately relies on ICU. A diagnostic MSBuild log shows why the file is missing:

Removed Item(s): 
  _MixedResourceWithNoCulture=
    Resources/MyNetCoreProject.MyResources.zh-TW.resx
        OriginalItemSpec=Resources/MyNetCoreProject.MyResources.zh-TW.resx
        TargetPath=Resources/MyNetCoreProject.MyResources.zh-TW.resx
        WithCulture=false

From which we can follow back to the offending path:

Microsoft/msbuild/src/Tasks/Microsoft.Common.CurrentVersion.targets - SplitResourceByCulture ->
Microsoft.Build.Tasks.AssignCulture.Execute ->
Microsoft.Build.Tasks.Culture.GetItemCultureInfo ->
Microsoft.Build.Tasks.CultureInfoCache.IsValidCultureString ->
Microsoft.Build.Shared.AssemblyUtilities.GetAllCultures ->
CultureInfo.GetCultures ->
CultureData.GetCultures ->
CultureData(Unix).EnumCultures ->
System.Globalization.Native/locale.cpp:GlobalizationNative_GetLocales
https://github.com/dotnet/coreclr/blob/8ba838fb54d6c07271d026b2d77bedcb9e2a786a/src/corefx/System.Globalization.Native/locale.cpp#L162-L171

ICU does not return aliases when getting a list of locales -- whether with uloc_getAvailable or Locale::getAvailableLocales (and uloc_countAvailable does not include them in its count).

That ICU does not return the aliases in this manner appears to be intentional, both based on the numerous references to a lack of alias mapping in the uloc documentation, and the following bug:

https://unicode-org.atlassian.net/browse/ICU-4309

uloc_getAvailable returns sr_YU, even though it is an %%ALIAS locale. None of the other %%ALIAS locales are returned.

TracBot made changes - 01/Jul/18 1:59 PM
Resolution Fixed [ 10004 ]
Status Done [ 10002 ] Done [ 10002 ]

ICU-4309 was fixed via: https://github.com/unicode-org/icu/commit/ab68bb319601bc467784dcbdcc6d52131a2863d2
Which seems to further indicate that ICU not returning aliases when calling uloc_getAvailable is intentional.

In-Depth Analysis

A full analysis can be seen in the test repo README: CodingDinosaur/CultureIssueDemonstration

Test Repos

I have two test repos that help demonstrate this issue:

CultureIssueDemonstration

  • https://github.com/CodingDinosaur/CultureIssueDemonstration
  • Demonstrates the symptoms described at both build-time and run-time
  • Running the provided test scripts will allow for running the test code under your current platform, or on Linux via Docker. Thus is is recommended to run under Windows with Docker installed to compare the results under both Windows and Linux.
  • Contains a README file that goes into more detail on the issue and the apparent cause

CultureIcuTest

_Copied from original issue: dotnet/coreclr#20388_

Most helpful comment

Is there any update to this issue, or at least a workaround?

My understanding is, that it's currently not possible to have an ASP.NET Core application localized with zh- cultures on Linux, which seems like a pretty common use case.

All 10 comments

Thanks @CodingDinosaur for reporting the issue and listing the details. this is very helpful. we'll take a look.

@cdmihai it is wrong to have msbuild depends on only the list returned from CultureInfo.GetCultures. this was ok before Windows 10 and Linux support but now it is not valid approach.

https://github.com/Microsoft/msbuild/blob/e70a3159d64f9ed6ec3b60253ef863fa883a99b1/src/Shared/AssemblyUtilities.cs#L105

for example, in Windows 10, if you try to create any culture which Windows doesn't have data for, the operation still succeed and the culture can be created as long as the culture name conform to BCP-47 spec.
I understand may be msbuild doing that for perf reason which can be kept but will need to add extra case when failing finding any culture in the list, try to call CultureInfo.GetCultureInfo and find out if can create the culture.

We'll try to look how we can enhance the support for aliased culture as this issue suggested but whatever we do here, msbuild will need to do something more. do you want me open a new issue in msbuild to track that?

_From @cdmihai on October 15, 2018 18:23_

@tarekgh would this be a suitable issue? https://github.com/Microsoft/msbuild/issues/1454

Regarding removing valid locale checks, the biggest issue with this is that it would be a breaking change for MSBuild. The msbuild repo build logic itself has the assumption that non-existing locales are rejected, so a lot of strings are put in Microsoft.shared.resx. If we remove the locale check in the SplitResourceByCulture, then the repo fails building with some invalid locale error (fuzzy memory from ~2 years ago). I have no data on this, but this could also break a lot of other existing repos.

FYI @rainersigwald

Regarding removing valid locale checks, the biggest issue with this is that it would be a breaking change for MSBuild. The msbuild repo build logic itself has the assumption that non-existing locales are rejected, so a lot of strings are put in Microsoft.shared.resx. If we remove the locale check in the SplitResourceByCulture, then the repo fails building with some invalid locale error (fuzzy memory from ~2 years ago). I have no data on this, but this could also break a lot of other existing repos.

I am not sure I understand the breaking scenario here. The scenario you are describing can occur today. for example build some project with some culture introduced in Windows 10 and then run on down-level platform don't have this culture.

_From @cdmihai on October 15, 2018 20:40_

True, but that's the class of valid locales introduced in different windows versions. The new breaking scenario is for the class of always invalid locales, that were never meant to be locales, and users expect msbuild to not treat them as locales.

The new breaking scenario is for the class of always invalid locales, that were never meant to be locales, and users expect msbuild to not treat them as locales.

I am not sure I agree with that. If the OS/.Net can create a culture for these, then those should be a valid locales to use. Why you think msbuild should reject such cultures?

_From @cdmihai on October 15, 2018 23:16_

Personally I agree that msbuild should not care and just do what the OS does, but I fear there are actual customers who depend on this behaviour, and changing this might break them. But this is just gut feeling based on the fact that the msbuild repo itself is doing it, and I don't have actual data on it. Alternatively we can only enable it in .net core msbuild, and then customers will have to opt-in to the break by transitioning to .net core. But it's not nice to diverge behaviour.

The only breaking scenario I can think of is when we allow creating resources with a culture not returned by CultureInfo.GetCultures and then move this resources to other machine which cannot understand the used culture. This scenario can happen today anyway. do you have any breaking scenario you can think of?

Microsoft/msbuild#1454 is specific to custom cultures. I would suggest updating it to include the other supported system cultures which is not enumerated by CultureInfo.GetCultures

After looking at this issue, it looks ICU not enumerating the aliased cultures for good reasons. I believe the framework should follow that too and not enumerate such aliased cultures. The framework still can create such aliased culture if anyone want to use them. e.g.:

C# new CultureInfo("zh-TW");

will work fine.

Considering that, I believe the resource issue should be fixed from msbuild side.
msbuild should not depend on the enumerated list only not because of aliased cultures only but also for supporting the behavior of Windows 10 which can create any culture as long as the used name is conforming to the BCP-47 specs.

I am going to move this issue to msbuild repo.

@CodingDinosaur thanks again for reporting this issue.

Is there any update to this issue, or at least a workaround?

My understanding is, that it's currently not possible to have an ASP.NET Core application localized with zh- cultures on Linux, which seems like a pretty common use case.

Was this page helpful?
0 / 5 - 0 ratings