Getting the code from the harness/assert.js file in the test262 suite, this fails to parse:
function assert(mustBeTrue, message) {
if (mustBeTrue === true) {
return;
}
if (message === undefined) {
message = 'Expected true but got ' + assert._toString(mustBeTrue);
}
$ERROR(message);
}
assert._isSameValue = function (a, b) {
if (a === b) {
// Handle +/-0 vs. -/+0
return a !== 0 || 1 / a === 1 / b;
}
// Handle NaN vs. NaN
return a !== a && b !== b;
};
With the error:
ParsingError: Expected one of ';', '}' or 'line terminator', got 'b' at line 24, col 24
Interestingly, the error is given in column 24 (marked here):
return a !== 0 || 1 / a === 1 / b;
^
But I'm guessing the issue is with the last b. I think this is a regression, since this particular file parsed fine before the rewrite, but it might have been something new in the file. Once fixed, we need to add a test for it.
I'll work on fixing this. Ideally, this should be fixed before Boa 0.7.
After some debugging the error seems to be coming form:
1 / a === 1 / b
Checking the token stream: cargo run -- --dump-tokens
[
Token {
kind: NumericLiteral(
1.0,
),
pos: Position {
column_number: 1,
line_number: 1,
},
},
Token {
kind: RegularExpressionLiteral(
" a === 1 ",
"",
),
pos: Position {
column_number: 3,
line_number: 1,
},
},
...
]
It seems to be a lexer bug not a parser one. It lexes / a === 1 / as a regex.
Hope that helps. :)
Good find!
I do like the ast and token output we have now
Do we need to refactor the lexer now? 馃槀
I'm not exactly sure what the lexer is supposed to do in this situation, since / a === 1 / is a regex. but not in this context 1 / a === 1 / b.
we probably need a context aware lexer or something like that. any thoughts?
hmmm interesting. We could maybe check what token can precede a regex? and see if the previous token is one of those?
we probably need a context aware lexer or something like that. any thoughts?
I'm sure the Lexer is context aware in some places, so it shouldn't be too hard to see what's before it and work it out based on that. Basically what @Razican said
There seems to be some information on context-aware lexical grammar in the spec. I will review this today and see if I can improve the lexer.
https://v8.dev/blog/understanding-ecmascript-part-3
Funnily enough this post came out talking about the same thing.
Reading this, it seems that the parser needs to call the lexer, and we cannot have a full list of tokens before calling the parser. So this clearly needs a rewrite in the way the parser/lexer work together.
I would say we can release version 0.7 with this known limitation, and we can work on this later. I'm still working on the parser modularization, which I think it's a good point to start.
I think that String interning would also help with this, as we could maybe have Tokens that are Copy, and therefore ease their manipulation from one side to another.
Most helpful comment
https://v8.dev/blog/understanding-ecmascript-part-3
Funnily enough this post came out talking about the same thing.