Boa: The lexer doesn't take into account goal symbols

Created on 1 Apr 2020 · 11Comments · Source: boa-dev/boa

Getting the code from the harness/assert.js file in the test262 suite, this fails to parse:

function assert(mustBeTrue, message) {
  if (mustBeTrue === true) {
    return;
  }

  if (message === undefined) {
    message = 'Expected true but got ' + assert._toString(mustBeTrue);
  }
  $ERROR(message);
}

assert._isSameValue = function (a, b) {
  if (a === b) {
    // Handle +/-0 vs. -/+0
    return a !== 0 || 1 / a === 1 / b;
  }

  // Handle NaN vs. NaN
  return a !== a && b !== b;
};

With the error:

ParsingError: Expected one of ';', '}' or 'line terminator', got 'b' at line 24, col 24

Interestingly, the error is given in column 24 (marked here):

    return a !== 0 || 1 / a === 1 / b;
                       ^

But I'm guessing the issue is with the last b. I think this is a regression, since this particular file parsed fine before the rewrite, but it might have been something new in the file. Once fixed, we need to add a test for it.

bug lexer

Source

Razican

👍1

Most helpful comment

https://v8.dev/blog/understanding-ecmascript-part-3
Funnily enough this post came out talking about the same thing.

jasonwilliams on 3 Apr 2020

❤2 😄1

All 11 comments

I'll work on fixing this. Ideally, this should be fixed before Boa 0.7.

Razican on 1 Apr 2020

👍1

After some debugging the error seems to be coming form:

1 / a === 1 / b

Checking the token stream: cargo run -- --dump-tokens

[
    Token {
        kind: NumericLiteral(
            1.0,
        ),
        pos: Position {
            column_number: 1,
            line_number: 1,
        },
    },
    Token {
        kind: RegularExpressionLiteral(
            " a === 1 ",
            "",
        ),
        pos: Position {
            column_number: 3,
            line_number: 1,
        },
    },
    ...
]

It seems to be a lexer bug not a parser one. It lexes / a === 1 / as a regex.

Hope that helps. :)

HalidOdat on 1 Apr 2020

👀1 👍1

Good find!
I do like the ast and token output we have now

jasonwilliams on 1 Apr 2020

Do we need to refactor the lexer now? 😂

jasonwilliams on 1 Apr 2020

😄2

I'm not exactly sure what the lexer is supposed to do in this situation, since / a === 1 / is a regex. but not in this context 1 / a === 1 / b.

HalidOdat on 1 Apr 2020

we probably need a context aware lexer or something like that. any thoughts?

HalidOdat on 1 Apr 2020

hmmm interesting. We could maybe check what token can precede a regex? and see if the previous token is one of those?

Razican on 1 Apr 2020

👍1

we probably need a context aware lexer or something like that. any thoughts?

I'm sure the Lexer is context aware in some places, so it shouldn't be too hard to see what's before it and work it out based on that. Basically what @Razican said

jasonwilliams on 2 Apr 2020

👍2

There seems to be some information on context-aware lexical grammar in the spec. I will review this today and see if I can improve the lexer.

Razican on 2 Apr 2020

👍1

https://v8.dev/blog/understanding-ecmascript-part-3
Funnily enough this post came out talking about the same thing.

jasonwilliams on 3 Apr 2020

❤2 😄1

Reading this, it seems that the parser needs to call the lexer, and we cannot have a full list of tokens before calling the parser. So this clearly needs a rewrite in the way the parser/lexer work together.

I would say we can release version 0.7 with this known limitation, and we can work on this later. I'm still working on the parser modularization, which I think it's a good point to start.

I think that String interning would also help with this, as we could maybe have Tokens that are Copy, and therefore ease their manipulation from one side to another.

Razican on 3 Apr 2020

❤2

Was this page helpful?

0 / 5 - 0 ratings