Three.js: unicode character split issue in function createPaths(src/extras/core/Font.js)

Created on 4 May 2018  路  10Comments  路  Source: mrdoob/three.js

Description of the problem

While creating paths for common string text, the current function createPaths( text, size, divisions, data ) works fine, but not for those characters whose unicode is > U+FFFF due to below split function:
var chars = String( text ).split( '' );
For example, character "饾劄" (U+1D11E) is split into 2 parts, so it won't be able to referred correctly while looking up from the font glyphs.

I guess there're pretty less requirements as mine, maybe the impact is almost zero. But still better to handle properly for those kind of scenarios. 馃槂

Thanks.

Three.js version
  • [ ] Dev
  • [x] r92
  • [ ] ...
Browser
  • [ ] All of them
  • [x] Chrome
  • [ ] Firefox
  • [ ] Internet Explorer
OS
  • [ ] All of them
  • [ ] Windows
  • [x] macOS
  • [ ] Linux
  • [ ] Android
  • [ ] iOS
Hardware Requirements (graphics card, VR Device, ...)
Bug

All 10 comments

I think Array.from() should solve the problem: https://jsfiddle.net/5LLchg9q/

Array.from() is not supported in IE 11 and other older browsers but we could still use split() as a fallback. Or we just add a polyfill.

Yes, both Array.from(text) & [...text] works.
I think use split() as fallback is reasonable.

So instead of

var chars = String( text ).split( '' );

let's do this:

var chars = Array.from ? Array.from( text ) : String( text ).split( '' ); // see #13988

Would you like to do a PR with the change?

Hi @Mugen87 , PR #13998 is created, pls kindly review. Thanks.

https://jsfiddle.net/5LLchg9q/1/

Why not use
String.prototype.match.call(string, /[\uD800-\uDBFF][\uDC00-\uDFFF]?|[^\uD800-\uDFFF]|./g)
instead of
String( text ).split( '' )

@gero3 Can you please explain the regex a bit? 馃槆

@Mugen87,
For the unicode point within \u10000-\u10FFFF, I believe they are encoded with Extended UCS-2 in JS as a pair of surrogate points: \\uD800 - \uDBFF and low surrogate range is \uDC00 - \uDFFF.
Such as \u10001 => \uD800\uDC01.
high surrogate = Math.floor((unicode point - 0x10000) / 0x400) + 0xD800
low surrogate = (unicode point - 0x10000) % 0x400 + 0xDC00

So I guess @gero3 intends to pick all those pairs by [\uD800-\uDBFF][\uDC00-\uDFFF] and ignore the invalid character with only one single surrogate by [^\uD800-\uDFFF]? (Although the single surrogate scenario should not happen.)

If this is true, then I guess we should use below regex:
/[\uD800-\uDBFF][\uDC00-\uDFFF]|[^\uD800-\uDFFF]/g
or
/[\0-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]/g

@gero3 , Pls correct me if I'm wrong, thanks.

https://jsfiddle.net/vj49vuxb/2/

@mooncaker816 If you are happy with the current implementation of your PR, I would prefer this way. It's just easier to read than introducing a new regex.

The regex looks like a condensed version of https://github.com/dotcypress/runes

The examples in the README file seem to work with Array.from().

Was this page helpful?
0 / 5 - 0 ratings