Node: The letter '莽' is no longer upper-cased by .toUpperCase from 7.1.0

Created on 24 Nov 2016  路  16Comments  路  Source: nodejs/node

  • Version: 7.1.0
  • Platform: OSX

Hello @nodejs,

I stumbled upon what seems to be a bug in node.js v7.1.0.

In node v6.9.1 & v7.0.0, we have the following:

'su莽ais'.toUpperCase()
>>> 'SU脟AIS'

while in node v7.1.0

'su莽ais'.toUpperCase()
>>> 'SU莽AIS'

I noticed this because it made a CI build fail (ref) and broke some of the phonetic algorithms of the Talisman library.

Sorry if this is normal and I did not read the changelogs carefully enough.

Have a good day.

i18n-api question

Most helpful comment

It's fixed in v8 ToT: https://bugs.chromium.org/p/v8/issues/detail?id=5681&desc=2
Thanks again for the report.

All 16 comments

Does it work when you install the full-icu package? V8 delegates more to ICU in v7.x and the release binaries only include en_US because of size constraints.

Hello @bnoordhuis, I tried the following:

git clone [email protected]:Yomguithereal/talisman.git
cd talisman
git checkout 003d44a3aaa81155331aaa23bc7708fa489428d6
npm install
npm install full-icu
npm test

I still have the same problem. Did I do it wrong?

I could replicate this bug on Linux Mint :

$uname -a
Linux ### 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

cc @nodejs/intl

I tried to build nodejs up to the 0f871e1 commit and uppercasing works as expected. This seems to confirm that the introduction of small-icu in the 3d1766f492 commit is the source of the problem, as @bnoordhuis said.

confirmed here too

[hf@archT440 ~]$ node -v
v7.2.0
[hf@archT440 ~]$ node
> 'a莽o'.toUpperCase()
'A莽O'
> 

It's not that commit - it's ICU 58.1. Applying ICU 57 on top of current HEAD also gives the right result. Or rather, some interaction between node/v8 and ICU 58. Because ICU's own toUpper works correctly for v58.1 here.

Also configure --without-intl produces

The latest v8 (ToT) does not use el-Upper transliterator any more and just uses ICU's uppercase API. I'll test with ToT v8.

Seems to be something wrong in v8's optimizations under Runtime_StringToUpperCaseI18N. Commenting out the optimizations produces the correct result. Thanks @jungshik - I may float a patch that just disables these lines for now, to be replaced when v8 is refreshed.

a couple possible workarounds

  • compile: configure --without-intl
  • runtime: node --no_icu_case_mapping

I can reproduce the bug with the latest ToT (without reading the bug much, I thought it's about Greek) when --icu_case_mapping flag is on. So, there's obviously a bug. Thank you for the report.

@jungshik It's Bug synergy!!

There was a typo in ToLatin1Upper() ( E7 should be F7) in runtime-i18n.cc

Thanks @srl295 and @jungshik. Tell me if I can do anything to help.

It's fixed in v8 ToT: https://bugs.chromium.org/p/v8/issues/detail?id=5681&desc=2
Thanks again for the report.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

speakeasypuncture picture speakeasypuncture  路  152Comments

silverwind picture silverwind  路  113Comments

benjamingr picture benjamingr  路  135Comments

VanCoding picture VanCoding  路  204Comments

yury-s picture yury-s  路  89Comments