Njs: String.prototype.split() fails to split a unicode string correctly

Created on 14 Feb 2019  ·  3Comments  ·  Source: nginx/njs

Hi!

>> '\u0431\u0435,\u043b\u0438,\u0431\u0435\u0440,\u0434\u0430'.split(',')
[
 'бе',
 'ли',
 'бер',
 'да'
]
>> '\u0431\u0435,\u043b\u0438,\u0431\u0435\u0440,\u0434\u0430'.split('')
[
 '�',
 '�',
 '�',
 '�',
 ',',
 '�',
 '�',
 '�',
 '�',
 ',',
 '�',
 '�',
 '�',
 '�',
 '�',
 '�',
 ',',
 '�',
 '�',
 '�',
 '�'
]
>> 

bug

All 3 comments

while our unicode strings are not UTF-16 strings, i think we should not break surrogate pairs there, as required by the spec.
more on this:
https://stackoverflow.com/questions/4547609/how-do-you-get-a-string-to-a-character-array-in-javascript/34717402#34717402

@drsm Thank you for the report.

Please, try the patch below:
https://gist.github.com/xeioex/35d9cc06fb9559ca32ce1e085c7f2d92


>> 'αβγ'.split('')
[
 'α',
 'β',
 'γ'
]

>> '囲碁織'.split('')
[
 '囲',
 '碁',
 '織'
]

>> '𝟘𝟙𝟚𝟛'.split('')
[
 '𝟘',
 '𝟙',
 '𝟚',
 '𝟛'
]

>> 'яαяαяα'.split('α')
[
 'я',
 'я',
 'я',
 ''
]

@xeioex
the patch works fine for me, thanks!

>> 'фыва asdf 👍'.split('')
[
 'ф',
 'ы',
 'в',
 'а',
 ' ',
 'a',
 's',
 'd',
 'f',
 ' ',
 '👍'
]
Was this page helpful?
0 / 5 - 0 ratings

Related issues

drsm picture drsm  ·  4Comments

porunov picture porunov  ·  4Comments

reyou picture reyou  ·  5Comments

xeioex picture xeioex  ·  3Comments

laith-leo picture laith-leo  ·  5Comments