Ecma262: IdentifierPart is ambiguous re '_' ?

Created on 5 Jan 2018 · 7Comments · Source: tc39/ecma262

As I understand it, '_' (U+005F LOW LINE) belongs to Unicode general category 'Pc' (Connector_Punctuation), and so has the 'ID_Continue' property, and so matches the ECMAScript nonterminal UnicodeIDContinue. This means that IdentifierPart derives '_' in two distinct ways (via UnicodeIDContinue and via the '_' literal), and so is technically ambiguous. I don't think this causes any semantic ambiguity (because the spec doesn't much care about how IdentifierPart matches source text), but it's odd.

Source

jmdyck

👍2

All 7 comments

This is correct:

/\p{ID_Continue}/u.test('_');
// → true

We can just drop the _ from IdentifierPart.

spec.html | 1 -
 1 file changed, 1 deletion(-)

diff --git a/spec.html b/spec.html
index b552ed8..6b5339e 100644
--- a/spec.html
+++ b/spec.html
@@ -9813,7 +9813,6 @@
       IdentifierPart ::
         UnicodeIDContinue
         `$`
-        `_`
         `\` UnicodeEscapeSequence
         &lt;ZWNJ&gt;
         &lt;ZWJ&gt;

mathiasbynens on 5 Jan 2018

👍1

In that case please add a note: it would be confusing for a non-expert reader to see _ in IdentifierStart but not in IdentifierPart.

nicolo-ribaudo on 5 Jan 2018

We could also make IdentifierPart consume IdentifierStart to avoid the duplication. IdentifierStart is a guaranteed to be a subset of IdentifierPart because ID_Start is guaranteed to be a subset of ID_Continue.

spec.html | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/spec.html b/spec.html
index b552ed8..a64ca7c 100644
--- a/spec.html
+++ b/spec.html
@@ -9811,10 +9811,8 @@
         `\` UnicodeEscapeSequence

       IdentifierPart ::
+        IdentifierStart
         UnicodeIDContinue
-        `$`
-        `_`
-        `\` UnicodeEscapeSequence
         &lt;ZWNJ&gt;
         &lt;ZWJ&gt;

mathiasbynens on 5 Jan 2018

  IdentifierPart ::
    IdentifierStart
    UnicodeIDContinue
    &lt;ZWNJ&gt;
    &lt;ZWJ&gt;

With that, IdentifierPart would be ambiguous on all the characters in the intersection of IdentifierStart and UnicodeIDContinue (i.e., _ and p{ID_Start}).

jmdyck on 5 Jan 2018

👍1

It seems to me that the simplest way to eliminate this ambiguity is:

_UnicodeIDContinue_ ::
any Unicode code point other than U+005F (LOW LINE) with the Unicode property “ID_Continue”