As I understand it, '_' (U+005F LOW LINE) belongs to Unicode general category 'Pc' (Connector_Punctuation), and so has the 'ID_Continue' property, and so matches the ECMAScript nonterminal UnicodeIDContinue. This means that IdentifierPart derives '_' in two distinct ways (via UnicodeIDContinue and via the '_' literal), and so is technically ambiguous. I don't think this causes any semantic ambiguity (because the spec doesn't much care about how IdentifierPart matches source text), but it's odd.
This is correct:
/\p{ID_Continue}/u.test('_');
// → true
We can just drop the _ from IdentifierPart.
spec.html | 1 -
1 file changed, 1 deletion(-)
diff --git a/spec.html b/spec.html
index b552ed8..6b5339e 100644
--- a/spec.html
+++ b/spec.html
@@ -9813,7 +9813,6 @@
IdentifierPart ::
UnicodeIDContinue
`$`
- `_`
`\` UnicodeEscapeSequence
<ZWNJ>
<ZWJ>
In that case please add a note: it would be confusing for a non-expert reader to see _ in IdentifierStart but not in IdentifierPart.
We could also make IdentifierPart consume IdentifierStart to avoid the duplication. IdentifierStart is a guaranteed to be a subset of IdentifierPart because ID_Start is guaranteed to be a subset of ID_Continue.
spec.html | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/spec.html b/spec.html
index b552ed8..a64ca7c 100644
--- a/spec.html
+++ b/spec.html
@@ -9811,10 +9811,8 @@
`\` UnicodeEscapeSequence
IdentifierPart ::
+ IdentifierStart
UnicodeIDContinue
- `$`
- `_`
- `\` UnicodeEscapeSequence
<ZWNJ>
<ZWJ>
IdentifierPart :: IdentifierStart UnicodeIDContinue <ZWNJ> <ZWJ>
With that, IdentifierPart would be ambiguous on all the characters in the intersection of IdentifierStart and UnicodeIDContinue (i.e., _ and p{ID_Start}).
It seems to me that the simplest way to eliminate this ambiguity is:
_UnicodeIDContinue_ ::
any Unicode code point other than U+005F (LOW LINE) with the Unicode property “ID_Continue”
@jmdyck Since a1f915b1e858b8e876706ac1580bb4497c9b6365 and #1053 are landed, does that mean this can be closed?
Yup, thanks.