Boa: The lexer doesn't take into account goal symbols

Created on 1 Apr 2020  路  11Comments  路  Source: boa-dev/boa

Getting the code from the harness/assert.js file in the test262 suite, this fails to parse:

function assert(mustBeTrue, message) {
  if (mustBeTrue === true) {
    return;
  }

  if (message === undefined) {
    message = 'Expected true but got ' + assert._toString(mustBeTrue);
  }
  $ERROR(message);
}

assert._isSameValue = function (a, b) {
  if (a === b) {
    // Handle +/-0 vs. -/+0
    return a !== 0 || 1 / a === 1 / b;
  }

  // Handle NaN vs. NaN
  return a !== a && b !== b;
};

With the error:

ParsingError: Expected one of ';', '}' or 'line terminator', got 'b' at line 24, col 24

Interestingly, the error is given in column 24 (marked here):

    return a !== 0 || 1 / a === 1 / b;
                       ^

But I'm guessing the issue is with the last b. I think this is a regression, since this particular file parsed fine before the rewrite, but it might have been something new in the file. Once fixed, we need to add a test for it.

bug lexer

Most helpful comment

https://v8.dev/blog/understanding-ecmascript-part-3
Funnily enough this post came out talking about the same thing.

All 11 comments

I'll work on fixing this. Ideally, this should be fixed before Boa 0.7.

After some debugging the error seems to be coming form:

1 / a === 1 / b

Checking the token stream: cargo run -- --dump-tokens

[
    Token {
        kind: NumericLiteral(
            1.0,
        ),
        pos: Position {
            column_number: 1,
            line_number: 1,
        },
    },
    Token {
        kind: RegularExpressionLiteral(
            " a === 1 ",
            "",
        ),
        pos: Position {
            column_number: 3,
            line_number: 1,
        },
    },
    ...
]

It seems to be a lexer bug not a parser one. It lexes / a === 1 / as a regex.

Hope that helps. :)

Good find!
I do like the ast and token output we have now

Do we need to refactor the lexer now? 馃槀

I'm not exactly sure what the lexer is supposed to do in this situation, since / a === 1 / is a regex. but not in this context 1 / a === 1 / b.

we probably need a context aware lexer or something like that. any thoughts?

hmmm interesting. We could maybe check what token can precede a regex? and see if the previous token is one of those?

we probably need a context aware lexer or something like that. any thoughts?

I'm sure the Lexer is context aware in some places, so it shouldn't be too hard to see what's before it and work it out based on that. Basically what @Razican said

There seems to be some information on context-aware lexical grammar in the spec. I will review this today and see if I can improve the lexer.

https://v8.dev/blog/understanding-ecmascript-part-3
Funnily enough this post came out talking about the same thing.

Reading this, it seems that the parser needs to call the lexer, and we cannot have a full list of tokens before calling the parser. So this clearly needs a rewrite in the way the parser/lexer work together.

I would say we can release version 0.7 with this known limitation, and we can work on this later. I'm still working on the parser modularization, which I think it's a good point to start.

I think that String interning would also help with this, as we could maybe have Tokens that are Copy, and therefore ease their manipulation from one side to another.

Was this page helpful?
0 / 5 - 0 ratings