Forking from #970: JS.md talk about Unicode normalization.
A simple example from that document: my name "Jean-François Bastien" can be normalized two ways with
Ç ↔ C+◌̧
This is a nice gotcha in Unicode. While interfacing between JS and wasm it would be good to know what to expect from producers and consumers. We may choose not to normalize, but we should say so.
I see 4 ways in which we can discuss normalization in JS.md:
If we choose 2. or 3. we should specify which form of normalization we expect (because of course there are multiple forms of normalization).
1, 2, and 3 seems like a good source for esoteric bugs in JS engines.
I vote 4.
I think the convertToJSString
function Web.md#names already specifies 4. Seems fine to add clarifying text to say that no normalization occurs, though.
Agreed with @lukewagner.
FWIW, CSS doesn't normalize at all either.
Yeah, nothing in the web platform uses Unicode normalization, other than string.normalize() in JavaScript and IDNA in URLs. 4 is definitely what you want here.
Sweet. I want to make sure we document these decisions, and it seems we've reached consensus. Closing.
Most helpful comment
I think the
convertToJSString
function Web.md#names already specifies 4. Seems fine to add clarifying text to say that no normalization occurs, though.