Node: Intl.NumberFormat changed behavior (possibly after ICU 58 -> 59 bump)

Created on 6 Sep 2017  Â·  22Comments  Â·  Source: nodejs/node

  • Version:
    v8.4.0
  • Platform:
    Linux 8a6b069d5b19 4.10.0-22-generic #24~16.04.1-Ubuntu SMP Tue May 23 17:03:51 UTC 2017 x86_64 GNU/Linux
  • Subsystem:
    Ubuntu

We're using Jest on Travis for testing and discovered that after Node upgraded from v7.9.0 to v8.4.0 some tests failed. That's okay I thought, but. Finally, I found this odd behavior:

$ node -v
v7.9.0
$ node
> Intl.NumberFormat('en', {style: 'currency', currency: 'USD', minimumFractionDigits: 0}).format(9);            
'$9'
> Intl.NumberFormat('ru', {style: 'currency', currency: 'USD', minimumFractionDigits: 0}).format(9);            
'$9' 

----

$ node -v
v8.4.0
$ node
> Intl.NumberFormat('en', {style: 'currency', currency: 'USD', minimumFractionDigits: 0}).format(9);            
'$9'
> Intl.NumberFormat('ru', {style: 'currency', currency: 'USD', minimumFractionDigits: 0}).format(9);            
'US$ 9'  

Maybe I missed some breaking changes, so I'll appreciate any help with that.
Otherwise, I think it's a regression and we should reflect on the proper solution.

V8 Engine confirmed-bug i18n-api

All 22 comments

Just wondering, do you have full-icu installed?

Nope! Only the small-icu that we have already packed.

Also, this bug is not reproducible on Mac by default.
On homebrew node is installed with system-icu option (proof).

Further investigation showed that the fallback logic for unknown locale changed:

$ node -v
v7.9.0
$ node
> Intl.NumberFormat('xx', {style: 'currency', currency: 'USD', minimumFractionDigits: 0}).format(9);            
'$9'
> Intl.NumberFormat(['xx', 'en'], {style: 'currency', currency: 'USD', minimumFractionDigits: 0}).format(9);            
'$9' 
> Intl.NumberFormat('xx', {style: 'currency', currency: 'USD', currencyDisplay: 'symbol'}).resolvedOptions()    
{ locale: 'en-US',                                      
  numberingSystem: 'latn',                              
  style: 'currency',                                    
  useGrouping: true,                                    
  minimumIntegerDigits: 1,                              
  minimumFractionDigits: 2,                             
  maximumFractionDigits: 2,                             
  currency: 'USD',                                      
  currencyDisplay: 'symbol' }

----

$ node -v
v8.4.0
$ node
> Intl.NumberFormat('xx', {style: 'currency', currency: 'USD', minimumFractionDigits: 0}).format(9);            
'US$ 9'
> Intl.NumberFormat(['xx', 'en'], {style: 'currency', currency: 'USD', minimumFractionDigits: 0}).format(9);            
'$9' 
> Intl.NumberFormat('xx', {style: 'currency', currency: 'USD', currencyDisplay: 'symbol'}).resolvedOptions()    
{ locale: 'und',            
  numberingSystem: 'latn',  
  style: 'currency',        
  useGrouping: true,        
  minimumIntegerDigits: 1,  
  minimumFractionDigits: 2, 
  maximumFractionDigits: 2, 
  currency: 'USD',          
  currencyDisplay: 'symbol' } 

And we don't have any test for that. We just verify that in case of unknown locale we return defaults here.

Interestingly, %GetDefaultICULocale() returns en-US so that test would fail on Node.js v8.x...

@nodejs/intl Please weigh in.

So it looks like the test case is hitting this piece of code in V8...

https://github.com/nodejs/node/blob/56a0577b2ebf111346779cf790a1f54e0e464078/deps/v8/src/objects/intl-objects.cc#L456-L459

the comment makes it look like there is a legitimate bug somewhere. Digging deeper...

v8/v8@2b5a36d571625f82a19b2ad7aabc249ff4b0a3f4 from July might or might not help with this. It's not in our tree yet.

@bnoordhuis Unfortunately that commit does not fix this. I think I got to the bottom of this, however. The key lies in the following lines:

https://github.com/nodejs/node/blob/56a0577b2ebf111346779cf790a1f54e0e464078/deps/v8/src/objects/intl-objects.cc#L707-L720

The docs for uloc_forLanguageTag says:

int32_t uloc_forLanguageTag(const char* langtag,
                            char* localeID,
                            int32_t localeIDCapacity,
                            int32_t* parsedLength,
                            UErrorCode* err);

parsedLength: if not NULL, successfully parsed length for the input language tag is set

Returns: the length of the locale ID.

So the returned value should actually be used for the icu_length parameter and checked, instead of parsedLength.

For the test case in the OP, bcp47_locale contains "und" as normalized by other layers, so icu_length would always be 3. This piece of code would then read icu_result which is uninitializd. This might explain a bit of non-deterministic behaviors between different ICU versions and different machines (e.g. why @bnoordhuis's patch fixes the tests for him).

If that were fixed, we should also use the default ICU locale instead of returning NULL, to truly fix the tests.

All of this is included in the following patch, which fixes this bug and also factors out the common BCP 47 conversion code for me:

From 588bb87e12c9867eca7141b93fff4329a38e5650 Mon Sep 17 00:00:00 2001
From: Timothy Gu <[email protected]>
Date: Thu, 7 Sep 2017 22:08:41 +0800
Subject: [PATCH] Fix BCP47-to-ICU conversion

---
 deps/v8/src/objects/intl-objects.cc | 77 ++++++++++---------------------------
 1 file changed, 21 insertions(+), 56 deletions(-)

diff --git a/deps/v8/src/objects/intl-objects.cc b/deps/v8/src/objects/intl-objects.cc
index fd6546b390..7a56821a19 100644
--- a/deps/v8/src/objects/intl-objects.cc
+++ b/deps/v8/src/objects/intl-objects.cc
@@ -698,26 +698,30 @@ void SetResolvedBreakIteratorSettings(Isolate* isolate,
         .Assert();
   }
 }
+
+// Convert BCP47 into ICU locale format.
+icu::Locale ConvertFromBCP47(Handle<String> locale) {
+  v8::String::Utf8Value bcp47_locale(v8::Utils::ToLocal(locale));
+  if (bcp47_locale.length() != 0) {
+    UErrorCode status = U_ZERO_ERROR;
+    char icu_result[ULOC_FULLNAME_CAPACITY];
+    int icu_length = uloc_forLanguageTag(*bcp47_locale, icu_result,
+                                         ULOC_FULLNAME_CAPACITY, nullptr,
+                                         &status);
+    if (!U_FAILURE(status) && icu_length != 0) {
+      return icu::Locale(icu_result);
+    }
+  }
+  return icu::Locale();
+}
+
 }  // namespace

 // static
 icu::SimpleDateFormat* DateFormat::InitializeDateTimeFormat(
     Isolate* isolate, Handle<String> locale, Handle<JSObject> options,
     Handle<JSObject> resolved) {
-  // Convert BCP47 into ICU locale format.
-  UErrorCode status = U_ZERO_ERROR;
-  icu::Locale icu_locale;
-  char icu_result[ULOC_FULLNAME_CAPACITY];
-  int icu_length = 0;
-  v8::String::Utf8Value bcp47_locale(v8::Utils::ToLocal(locale));
-  if (bcp47_locale.length() != 0) {
-    uloc_forLanguageTag(*bcp47_locale, icu_result, ULOC_FULLNAME_CAPACITY,
-                        &icu_length, &status);
-    if (U_FAILURE(status) || icu_length == 0) {
-      return NULL;
-    }
-    icu_locale = icu::Locale(icu_result);
-  }
+  icu::Locale icu_locale = ConvertFromBCP47(locale);

   icu::SimpleDateFormat* date_format =
       CreateICUDateFormat(isolate, icu_locale, options);
@@ -753,20 +757,7 @@ void DateFormat::DeleteDateFormat(const v8::WeakCallbackInfo<void>& data) {
 icu::DecimalFormat* NumberFormat::InitializeNumberFormat(
     Isolate* isolate, Handle<String> locale, Handle<JSObject> options,
     Handle<JSObject> resolved) {
-  // Convert BCP47 into ICU locale format.
-  UErrorCode status = U_ZERO_ERROR;
-  icu::Locale icu_locale;
-  char icu_result[ULOC_FULLNAME_CAPACITY];
-  int icu_length = 0;
-  v8::String::Utf8Value bcp47_locale(v8::Utils::ToLocal(locale));
-  if (bcp47_locale.length() != 0) {
-    uloc_forLanguageTag(*bcp47_locale, icu_result, ULOC_FULLNAME_CAPACITY,
-                        &icu_length, &status);
-    if (U_FAILURE(status) || icu_length == 0) {
-      return NULL;
-    }
-    icu_locale = icu::Locale(icu_result);
-  }
+  icu::Locale icu_locale = ConvertFromBCP47(locale);

   icu::DecimalFormat* number_format =
       CreateICUNumberFormat(isolate, icu_locale, options);
@@ -804,20 +795,7 @@ icu::Collator* Collator::InitializeCollator(Isolate* isolate,
                                             Handle<String> locale,
                                             Handle<JSObject> options,
                                             Handle<JSObject> resolved) {
-  // Convert BCP47 into ICU locale format.
-  UErrorCode status = U_ZERO_ERROR;
-  icu::Locale icu_locale;
-  char icu_result[ULOC_FULLNAME_CAPACITY];
-  int icu_length = 0;
-  v8::String::Utf8Value bcp47_locale(v8::Utils::ToLocal(locale));
-  if (bcp47_locale.length() != 0) {
-    uloc_forLanguageTag(*bcp47_locale, icu_result, ULOC_FULLNAME_CAPACITY,
-                        &icu_length, &status);
-    if (U_FAILURE(status) || icu_length == 0) {
-      return NULL;
-    }
-    icu_locale = icu::Locale(icu_result);
-  }
+  icu::Locale icu_locale = ConvertFromBCP47(locale);

   icu::Collator* collator = CreateICUCollator(isolate, icu_locale, options);
   if (!collator) {
@@ -852,20 +830,7 @@ void Collator::DeleteCollator(const v8::WeakCallbackInfo<void>& data) {
 icu::BreakIterator* V8BreakIterator::InitializeBreakIterator(
     Isolate* isolate, Handle<String> locale, Handle<JSObject> options,
     Handle<JSObject> resolved) {
-  // Convert BCP47 into ICU locale format.
-  UErrorCode status = U_ZERO_ERROR;
-  icu::Locale icu_locale;
-  char icu_result[ULOC_FULLNAME_CAPACITY];
-  int icu_length = 0;
-  v8::String::Utf8Value bcp47_locale(v8::Utils::ToLocal(locale));
-  if (bcp47_locale.length() != 0) {
-    uloc_forLanguageTag(*bcp47_locale, icu_result, ULOC_FULLNAME_CAPACITY,
-                        &icu_length, &status);
-    if (U_FAILURE(status) || icu_length == 0) {
-      return NULL;
-    }
-    icu_locale = icu::Locale(icu_result);
-  }
+  icu::Locale icu_locale = ConvertFromBCP47(locale);

   icu::BreakIterator* break_iterator =
       CreateICUBreakIterator(isolate, icu_locale, options);
-- 
2.11.0

Does this patch look okay to be submitted to V8?

/cc @nodejs/v8 @addaleax

@sclinede I don't think this is a bug. The display for USD changed to US$ because there are a lot of currencies which use the dollarsign in the world.

@srl295 I think the problem lies in the fact that V8 isn't getting the default ICU locale, not that ICU changed its behaviors. The default locale did not change between 58 and 59, nor did the from/toLanguageTag functions.

@TimothyGu ok - read through the code now (and the change I mentioned above is an old one, so nevermind that).

So:

If that were fixed, we should also use the default ICU locale instead of returning NULL, to truly fix the tests.

If there are other errors besides parse errors here, it might want to propagate those.

@srl295 as for me the problem is not about actual sign for the dollar. I understand that there are different kinds of dollars exist. The problem is that logic changed _silently_. So my thoughts were it should be mentioned in upgrade notes or there should be no diff to previous version of Node.

@sclinede Again the difference of behavior is a bug in V8. We are working to resolve it.

@srl295

If there are other errors besides parse errors here, it might want to propagate those.

Good point.

@TimothyGu thanks for the clarification

@timothygu I can’t say much about the patch’s content beyond that I don’t see anything wrong with it, and it seems perfectly fine to be submitted to me – let me know if you need anything or run into any trouble!

So this bug actually goes deeper than https://github.com/nodejs/node/issues/15223#issuecomment-327814621, but about how V8 checks if a system default is supported fundamentally. I've filed a CL at V8 (https://chromium-review.googlesource.com/c/v8/v8/+/668350) that would address those issues, while I will be submitting the patch in https://github.com/nodejs/node/issues/15223#issuecomment-327814621 separately.

Appears to affect Date.toLocaleDateString() also. As of 8.5.0 (possibly earlier) it returns an ISO date string; with 7.10 it returns a locale-formatted string like 7/17/2017.

@TimothyGu any chance you can provide a status update on this? Doesn't look like your patch landed in V8 yet?

@apapirovski it was reviewed and ready to get in but seems like the V8 team forgot about it :/

@TimothyGu so, is this issue fixed by https://chromium-review.googlesource.com/c/v8/v8/+/668350?

If so, we'll have to backport the patch to Node 8 and master (v10.x will get it from there)

@targos Yes I believe so. I'll try to find time to backport it, but if not it would be great if someone else could :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fanjunzhi picture fanjunzhi  Â·  3Comments

vsemozhetbyt picture vsemozhetbyt  Â·  3Comments

danialkhansari picture danialkhansari  Â·  3Comments

cong88 picture cong88  Â·  3Comments

seishun picture seishun  Â·  3Comments