Html: should it be called "case-sensitive"?

Created on 7 Nov 2019  路  6Comments  路  Source: whatwg/html

https://html.spec.whatwg.org/multipage/infrastructure.html#case-sensitivity-and-string-comparison

When discussing issue #5066 members of the I18N WG expressed concern that the use of the term "case-sensitive" was misleading, since what is actually happening is codepoint-by-codepoint comparison. Should "case-sensitive" be called something else, such as "identical" or "codepoint"?

Note: I'm aware that the term "case sensitive" is already used and linked widely, not only in HTML but also in many attendant specs that refer to the definition provided by HTML.

i18n-needs-resolution

All 6 comments

I'd be okay with minting "equals" for this (or perhaps "is", since I suspect that's generally what we use these days), in https://infra.spec.whatwg.org/#strings. But someone would have to do the work of identifying all specifications and getting them all changed.

"visually equivalent"? Though I suppose that would have other issues.

@Yay295 There are plenty of visually equivalent strings that should not be treated as equivalent. Cf. here

"case dependent" Might be a good way to go too?

I reviewed the current use of case-sensitive in HTML. There are 39 occurrences of it (not counting the 2 in the definition). In a number of cases, the phrase is used to emphasize that value-matching expects string equality, especially with regard to case. And this is, obviously, only in the HTML spec. Before making PRs, I thought I ought to check on the right course of action. What I had in mind was:

  1. In Infra, define is as code point-by-code point comparison. Include a synonym identical to for cases where the spec needs to emphasize the equality.
    a. Include a note emphasizing that this is case and code point sequence (normalization) sensitive.
  2. In HTML replace case-sensitive with a reference to is.
    a. Include a note explaining that this was formerly called case-sensitive and retaining the anchor/definition for other, yet-to-be-modified specs.
  3. Replace the 39 occurrences with suitable edits to is/identical to (or whatever we decide to spell them as)

Does this sound like the right approach?

Generally sounds good to me.

In Infra, define is as code point-by-code point comparison. Include a synonym identical to for cases where the spec needs to emphasize the equality.

Probably you'd also want to include a note that by default, specs using Infra compare strings in this manner. I don't want people to think that if a spec compares strings without linking to "is" or "identical to", then it's undefined. (This would basically be a port of HTML's "Except where otherwise stated, string comparisons must be performed in a case-sensitive manner.")

a. Include a note explaining that this was formerly called case-sensitive and retaining the anchor/definition for other, yet-to-be-modified specs.

Since HTML doesn't have automatic cross-linking anyway, we'd have to add a bullet point for the relevant "is" to the Infra section of https://html.spec.whatwg.org/#dependencies. At that point maybe you should just add the appropriate anchor there (probably as an empty <span id="case-sensitive"></span>). That seems cleaner than having a leftover <dfn> in https://html.spec.whatwg.org/#case-sensitivity-and-string-comparison.

At that point https://html.spec.whatwg.org/#case-sensitivity-and-string-comparison will be almost deleted; the remainder would just be the uses of "prefix match" which are in AppCache and which @annevk preferred to leave alone until we remove AppCache.

Was this page helpful?
0 / 5 - 0 ratings