Hello @nodejs,
I stumbled upon what seems to be a bug in node.js v7.1.0.
In node v6.9.1 & v7.0.0, we have the following:
'su莽ais'.toUpperCase()
>>> 'SU脟AIS'
while in node v7.1.0
'su莽ais'.toUpperCase()
>>> 'SU莽AIS'
I noticed this because it made a CI build fail (ref) and broke some of the phonetic algorithms of the Talisman library.
Sorry if this is normal and I did not read the changelogs carefully enough.
Have a good day.
Does it work when you install the full-icu package? V8 delegates more to ICU in v7.x and the release binaries only include en_US because of size constraints.
Hello @bnoordhuis, I tried the following:
git clone [email protected]:Yomguithereal/talisman.git
cd talisman
git checkout 003d44a3aaa81155331aaa23bc7708fa489428d6
npm install
npm install full-icu
npm test
I still have the same problem. Did I do it wrong?
I could replicate this bug on Linux Mint :
$uname -a
Linux ### 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
cc @nodejs/intl
I tried to build nodejs up to the 0f871e1 commit and uppercasing works as expected. This seems to confirm that the introduction of small-icu in the 3d1766f492 commit is the source of the problem, as @bnoordhuis said.
confirmed here too
[hf@archT440 ~]$ node -v
v7.2.0
[hf@archT440 ~]$ node
> 'a莽o'.toUpperCase()
'A莽O'
>
It's not that commit - it's ICU 58.1. Applying ICU 57 on top of current HEAD also gives the right result. Or rather, some interaction between node/v8 and ICU 58. Because ICU's own toUpper works correctly for v58.1 here.
Also configure --without-intl produces 脟
The latest v8 (ToT) does not use el-Upper transliterator any more and just uses ICU's uppercase API. I'll test with ToT v8.
Seems to be something wrong in v8's optimizations under Runtime_StringToUpperCaseI18N. Commenting out the optimizations produces the correct result. Thanks @jungshik - I may float a patch that just disables these lines for now, to be replaced when v8 is refreshed.
a couple possible workarounds
configure --without-intlnode --no_icu_case_mappingI can reproduce the bug with the latest ToT (without reading the bug much, I thought it's about Greek) when --icu_case_mapping flag is on. So, there's obviously a bug. Thank you for the report.
@jungshik It's Bug synergy!!
There was a typo in ToLatin1Upper() ( E7 should be F7) in runtime-i18n.cc
Thanks @srl295 and @jungshik. Tell me if I can do anything to help.
It's fixed in v8 ToT: https://bugs.chromium.org/p/v8/issues/detail?id=5681&desc=2
Thanks again for the report.
Most helpful comment
It's fixed in v8 ToT: https://bugs.chromium.org/p/v8/issues/detail?id=5681&desc=2
Thanks again for the report.