Ecma262: IdentifierPart is ambiguous re '_' ?

Created on 5 Jan 2018  ·  7Comments  ·  Source: tc39/ecma262

As I understand it, '_' (U+005F LOW LINE) belongs to Unicode general category 'Pc' (Connector_Punctuation), and so has the 'ID_Continue' property, and so matches the ECMAScript nonterminal UnicodeIDContinue. This means that IdentifierPart derives '_' in two distinct ways (via UnicodeIDContinue and via the '_' literal), and so is technically ambiguous. I don't think this causes any semantic ambiguity (because the spec doesn't much care about how IdentifierPart matches source text), but it's odd.

All 7 comments

This is correct:

/\p{ID_Continue}/u.test('_');
// → true

We can just drop the _ from IdentifierPart.

spec.html | 1 -
 1 file changed, 1 deletion(-)

diff --git a/spec.html b/spec.html
index b552ed8..6b5339e 100644
--- a/spec.html
+++ b/spec.html
@@ -9813,7 +9813,6 @@
       IdentifierPart ::
         UnicodeIDContinue
         `$`
-        `_`
         `\` UnicodeEscapeSequence
         <ZWNJ>
         <ZWJ>

In that case please add a note: it would be confusing for a non-expert reader to see _ in IdentifierStart but not in IdentifierPart.

We could also make IdentifierPart consume IdentifierStart to avoid the duplication. IdentifierStart is a guaranteed to be a subset of IdentifierPart because ID_Start is guaranteed to be a subset of ID_Continue.

spec.html | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/spec.html b/spec.html
index b552ed8..a64ca7c 100644
--- a/spec.html
+++ b/spec.html
@@ -9811,10 +9811,8 @@
         `\` UnicodeEscapeSequence

       IdentifierPart ::
+        IdentifierStart
         UnicodeIDContinue
-        `$`
-        `_`
-        `\` UnicodeEscapeSequence
         <ZWNJ>
         <ZWJ>
  IdentifierPart ::
    IdentifierStart
    UnicodeIDContinue
    <ZWNJ>
    <ZWJ>

With that, IdentifierPart would be ambiguous on all the characters in the intersection of IdentifierStart and UnicodeIDContinue (i.e., _ and p{ID_Start}).

It seems to me that the simplest way to eliminate this ambiguity is:

_UnicodeIDContinue_ ::
any Unicode code point other than U+005F (LOW LINE) with the Unicode property “ID_Continue”

@jmdyck Since a1f915b1e858b8e876706ac1580bb4497c9b6365 and #1053 are landed, does that mean this can be closed?

Yup, thanks.

Was this page helpful?
0 / 5 - 0 ratings