Boa: Streaming Parsing (from Lexer to Parser)

Created on 30 Mar 2020  路  9Comments  路  Source: boa-dev/boa

The way Boa works right now is that the Lexer will run till completion. Then once an array of tokens is filled this is sent over the parser.

However the parser doesn't need to wait idle whilst the Lexer is running, the parser can begin working through the tokens as they come through..

Completely inspired from how Go parses its templates, Rob talks about it in more detail here:
https://youtu.be/HxaD_trXwRE?t=960

Parser Cursor

Does the cursor need to change?
Can it work in its current state? @Razican is working on the updated cursor right now.
https://github.com/jasonwilliams/boa/pull/287

Can Rust's Async/Await help here?

Maybe, i'm not sure.

Diagram

Blank Diagram (1)

@Razican @HalidOdat thoughts?

enhancement help wanted parser

All 9 comments

About the cursor, it might need to return a Result on each call, but for the rest, the external API could stay without much change.

Something we might need would be to add an EOF token to the stream, if we don't want to keep the parser waiting for more tokens indefinitely. Or, on the other hand, we might want exactly that.

Doing this in multiple threads could be done by lexing in one thread, parsing in another and executing in another. Doing concurrent parser might prove to be more difficult, though.

The lexer can close the channel once it's finished, this will signal to the parser there is nothing more to come down the pipe.

It might be due to the way parsing and lexing are related (goal symbols) this may not be possible.

Related:
https://github.com/jasonwilliams/boa/issues/294

It might be due to the way parsing and lexing are related (goal symbols) this may not be possible.

I think that what can be done is for the parser to request lexing of new tokens to the lexer via calls to next() in the cursor.

I may close this seeing as we can't do it that way dur to the lexer needing context

I may close this seeing as we can't do it that way dur to the lexer needing context

Actually, I think this is now more important than ever!

Instead of "streaming" we could do "lazy" lexing, and making the parser iterate over it, while changing context as needed. @maciejhirsz proposed to use Logos for this, and I think it makes sense. Not sure if there was any progress in that front.

I'm on holidays since yesterday, so I should have time to dig into Boa at last :)

Streaming (iterating) the tokens during parsing if anything makes things easier, since the Parser can supply the Lexer with context if need be. The main culprit in JS is the regex literal, and the rule here is fairly simple: any expression that begins with division token / (that is a unary prefix expression with operator /) should switch the lexer into regex mode. It's fairly easy to integrate that into standard recursive Pratt parsing for nested expressions.

I'm on holidays since yesterday, so I should have time to dig into Boa at last :)

Streaming (iterating) the tokens during parsing if anything makes things easier, since the Parser can supply the Lexer with context if need be. The main culprit in JS is the regex literal, and the rule here is fairly simple: any expression that begins with division token / (that is a unary prefix expression with operator /) should switch the lexer into regex mode. It's fairly easy to integrate that into standard recursive Pratt parsing for nested expressions.

Let us know if we can be of any help :)

Was this page helpful?
0 / 5 - 0 ratings