I just noted that automatically generated IDs in respec do not remove parentheses from the ID value, resulting, for example, in URLs like this:
https://rawgit.com/w3c/wcag21/target-size_ISSUE-60/guidelines/#target-size-(no-exception)
Having parentheses is not only hard to type, but they are also quite uncommon in URLs. Email clients sometimes struggle with those characters when linking them, too, resulting in missing the last ), therefore breaking the link.
Can respec just ignore parentheses in IDs, resulting in links like:
https://rawgit.com/w3c/wcag21/target-size_ISSUE-60/guidelines/#target-size-no-exception
I recognize that the HTML spec allows any value for IDs, but this is more a style question, and I hope it can be considered.
Stripping those characters is doable (I agree... they are super ugly), but it risks the id not longer being unique so might be a bit tricky. Need to check what the impact would be.
It used to be the behavior that respec would trim parentheses. The example that triggered this issue is that in a WCAG 2.1 publication of 30 June, parentheses were trimmed in generated IDs, as in https://www.w3.org/TR/2017/WD-WCAG21-20170630/#audio-only-and-video-only-prerecorded, while in a 28 July publication of the same document they are not, as in https://www.w3.org/TR/2017/WD-WCAG21-20170728/#audio-only-and-video-only-(prerecorded). The change in behaviour broke some of our cross references, which in to get the publication out I manually updated, but much prefer going back to the previous behaviour. We actually built a lot of infrastructure around the predictable generated ID pattern.
I am reasonably sure the following commit is where the change in behaviour came in, but I don't know how to fix it.
https://github.com/w3c/respec/commit/c1a7be5890bd8b136418609074dec86cd2e75e8b
BTW the reason I'm pretty sure that commit is where the regression came in is, if I run the code in deleted lines 167 - 170, I get the previous result with parentheses excluded, while if I run the replacement code in added lines 171 - 174 I get the new result with parentheses left in. I tested this in a browser console with:
/* old version, result is "audio-only-and-video-only-prerecorded" */
alert ("Audio-only and Video-only (Prerecorded)".toLowerCase().split(/[^-.0-9a-z_]+/i).join("-").replace(/^-+/, "").replace(/-+$/, ""));
/* new version, result is "audio-only-and-video-only-(prerecorded)" */
alert ("Audio-only and Video-only (Prerecorded)".toLowerCase().replace(/\s/gm, "-").replace(/^-+/, "").replace(/-+$/, "").replace(/-+/g, "-"));
Thanks for checking, @michael-n-cooper. I need to look back a bit further as to why I made the change in the first place. I'll see about re-introducing the .split(/[^-.0-9a-z_]+/i) part.
Thanks, @michael-n-cooper, for the excellent research!
another impact of the change in the generation of ids - the internal links to the said anchors are invalid, because they would need to be URL encoded for at least some characters.
For instance, the TOC link to https://w3c.github.io/webrtc-pc/#x%22hold%22-functionality in the WebRTC spec fails in validator.nu because the quote characters are not %-encoded in the markup.
Twitter also seems to struggle with these URLs: when pasted into twitter, it doesn't parse them properly. I'll try fix this now...
Ok, I have a PR ready for this: #1387... note that there is a very high probability that _a lot of links are going to break_ once I merge that.
Once I merge, I'll let spec-prod know... but naturally people will get upset 馃樋. If folks yell at us for this would appreciate some support - but it's going to have to be an "ask for forgiveness, not for permission" situation.