Gutenberg: Normalize characters with combining marks to correctly composed characters

Created on 17 Apr 2017 · 2Comments · Source: WordPress/gutenberg

At https://github.com/tinymce/tinymce/issues/1971 and https://core.trac.wordpress.org/ticket/30130 @Zodiac1978 describes an occasional problem when pasting Unicode text into TinyMCE:

I have a PDF file with German Umlauts (��) and if I copy & paste them into the TinyMCE from WordPress I get the vowel (uoaUOA) which followed by a diaeresis (http://www.fileformat.info/info/unicode/char/0308/index.htm) instead of just one precomposed character.

This results in some problems:

Search for words with umlauts doesn't work

Proofreading fails

W3C validation fails with warning "Text run is not in Unicode Normalization Form C." because precomposed characters are prefered (See: http://www.w3.org/International/docs/charmod-norm/#choice-of-normalization-form)

We're probably in a good place to fix this in our JavaScript code.

With ES6 we have a normalize function in JS:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

And there is a polyfill for older browsers:
https://github.com/walling/unorm

[Feature] Raw Handling [Type] Bug

Source

nylen

👍2

All 2 comments

Unless there's a misunderstanding of the issue here, this appears to be blocked by the upstream issue https://github.com/tinymce/tinymce/issues/1971 . If I'm wrong, can you clarify the action steps necessary for Gutenberg specifically?

From today's editor bug scrub: https://wordpress.slack.com/archives/C02QB2JS7/p1518111268000525

aduth on 8 Feb 2018

👍1

Oh, I remember this, 3 years ago:

We could use normalize in JavaScript, but with limited browser support.

This is not a TinyMCE problem though, it affects all input. A PHP solution might be better, but we could consider also cleaning this up for the editor on paste.

Moving back to formatting.

Not blocked by the TinyMCE issue. We can clean this up on paste in the visual editor at least, maybe also in text. I'll have a look once #5966 is merged.