I am currently in the works of a text layout for SkiaSharp and need to be able to configure the shaper to use UTF-16 indices instead of UTF-8. The only way to do that is adding UTF-16 values to the buffer instead of UTF-8. The shaper should use the current encoding to process the text properly otherwise clusters have the wrong indices.
I would expect that the shaper uses the current encoding.
UTF-8 is used all times.
VS bug #742852
This is a good thing. I will update harfbuzz and get support for utf8/16/32.
Will try to implement this myself the next days. In general, adding AddUtf16, AddUtf32 to Buffer and changing SKShaper to use the current TextEncoding. GlyphId will be nonvalid value. Probably need more advanced features in the future so it makes sense I implement it myself.
Cool! Just an FYI, there is a SKTextEncoding and a SKEncoding.
I see Google uses both differently, SKTextEncoding is for taking a chunk of memory, and writing it to the screen. SKEncoding is for taking a chunk of text in memory and converting it to glyph data. There were using the SKTextEncoding for both in the past, but they split it.
But, looking at harfbuzz, they have hb_buffer_add_codepoints which is the glyph ids if i am not mistaken... I am not sure how that works off the top of my head, but you can try it out and it may be what you need.
Codepoint's are no glyph Id's. Each character is mapped to one codepoint but one codepoint can be mapped to several glyphs. The codepoint of character a could be mapped to glyph id 1 in font A but in font B it is mapped to 2. The codepoint should be the same.
Still learning this stuff so I could be mistaken. This is quite a complex topic.
We will figure it out 馃憤
Merged in https://github.com/mono/SkiaSharp/pull/766 and others.
Most helpful comment
Codepoint's are no glyph Id's. Each character is mapped to one codepoint but one codepoint can be mapped to several glyphs. The codepoint of character
acould be mapped to glyph id1in fontAbut in fontBit is mapped to2. The codepoint should be the same.Still learning this stuff so I could be mistaken. This is quite a complex topic.
We will figure it out 馃憤