Antlr4: JavaScript target usability and browser support

Created on 27 Jan 2019  Â·  14Comments  Â·  Source: antlr/antlr4

I've been an antlr user for a long time now. I've started about 7 years ago I think, having done an extensive research back then, I came to the conclusion that antlr was the best tool for writing parsers and compilers in Java, over these past 7 years I have become even more convinced that antlr is an amazing tool. It has been my go-to tool for a long time.

Recently I found myself needing a small DSL in a browser-based JavaScript project, I have followed all the documentation and ended up with broken project. antlr4 npm module requires "fs" module, which has no equivalent in a modern browser. I tracked down usages of "fs" (file system) module and it seems quite frivolous, some convenience stuff to be able to work directly with a file system, things that just about any node.js programmer should be able to do, and if not - well, you probably won't understand how to use those convenience methods either.
Following documentation https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md
you find that the only ways to make the module (antlr4) work in a browser is to either use webpack, or not to use any packager/builder at all. That is quite poor. If you do use webpack - you still end up with those utility methods that rely on fs - they will now fail though, as fs will be replaced with null:

/antlr4/CharStreams.js:

....
var isNodeJs = typeof window === 'undefined' && typeof importScripts === 'undefined';
var fs = isNodeJs ? require("fs") : null;
....

  // Asynchronously creates an InputStream from a file on disk given
  // the encoding of the bytes in that file (defaults to 'utf8' if
  // encoding is null).
  //
  // Invokes callback(error, result) on completion.
  fromPath: function(path, encoding, callback) {
    fs.readFile(path, encoding, function(err, data) {
      var is = null;
      if (data !== null) {
        is = new InputStream(data, true);
      }
      callback(err, is);
    });
  },

  // Synchronously creates an InputStream given a path to a file
  // on disk and the encoding of the bytes in that file (defaults to
  // 'utf8' if encoding is null).
  fromPathSync: function(path, encoding) {
    var data = fs.readFileSync(path, encoding);
    return new InputStream(data, true);
  }
...

as you can see - those methods will simply fail with a "no property XYZ on null (fs)" exception.

One way to address this would be to move require statements into methods that use them, this would at least not break bundling. Personally I feel those methods should be separated and moved into a separate set of utility classes.

All the code is written in an outdated style, node.js has supported ES6 modules for a while now, which allow more granular imports/exports and are supported in browsers also, compared to commons-style require modules.

In browsers amount of code you pull in matters. Having a monolithic library with everything inside of it is not desirable. You would prefer to import just the pieces that you need and bundle as little of the library in your distribution as possible.

Instead this is what antlr4 looks like:
```js
exports.atn = require('./atn/index');
exports.codepointat = require('./polyfills/codepointat');
exports.dfa = require('./dfa/index');
exports.fromcodepoint = require('./polyfills/fromcodepoint');
exports.tree = require('./tree/index');
exports.error = require('./error/index');
exports.Token = require('./Token').Token;
exports.CharStreams = require('./CharStreams').CharStreams;
exports.CommonToken = require('./Token').CommonToken;
exports.InputStream = require('./InputStream').InputStream;
exports.FileStream = require('./FileStream').FileStream;
exports.CommonTokenStream = require('./CommonTokenStream').CommonTokenStream;
exports.Lexer = require('./Lexer').Lexer;
exports.Parser = require('./Parser').Parser;
var pc = require('./PredictionContext');
exports.PredictionContextCache = pc.PredictionContextCache;
exports.ParserRuleContext = require('./ParserRuleContext').ParserRuleContext;
exports.Interval = require('./IntervalSet').Interval;
exports.Utils = require('./Utils');

Generated code suffers from similar problems. Here's a sample from lexer:
```js
// Generated from Reactive.g4 by ANTLR 4.7.1
// jshint ignore: start
var antlr4 = require('antlr4/index');

Now we just pulled in the entire library. This is followed by:

var serializedATN = ["\u0003\u608b\ua72a\u8133\ub9ed\u417c\u3be7\u7786\u5964",
    "\u0002\u001b\u00a4\b\u0001\u0004\u0002\t\u0002\u0004\u0003\t\u0003\u0004",
    // another 98 lines like this
    "\u0002"].join("");

there are a couple of reasons why this is bad:

  • it's not readable
  • it could be replaced by a single string, which would have better performance and would take less space

    • if this is binary (is it ?) - it could be replaced with Base64 encoding instead, which is a lot more common way of representing binary.

parser has similar issues. Beyond that - there's a fair amount of repetition across lexer and parser as well as within the parser.

here's an example snippet that works fine in Java or any other compiled language, but is quire offensive in JS:

ReactiveParser.prototype.expression_sempred = function(localctx, predIndex) {
    switch(predIndex) {
        case 0:
            return this.precpred(this._ctx, 12);
        case 1:
            return this.precpred(this._ctx, 11);
        case 2:
            return this.precpred(this._ctx, 10);
        case 3:
            return this.precpred(this._ctx, 9);
        case 4:
            return this.precpred(this._ctx, 8);
        case 5:
            return this.precpred(this._ctx, 7);
        case 6:
            return this.precpred(this._ctx, 6);
        case 7:
            return this.precpred(this._ctx, 5);
        case 8:
            return this.precpred(this._ctx, 4);
        case 9:
            return this.precpred(this._ctx, 3);
        case 10:
            return this.precpred(this._ctx, 2);
        case 11:
            return this.precpred(this._ctx, 1);
        default:
            throw "No predicate with index:" + predIndex;
    }
};

here's how it could look like to respect code size:

ReactiveParser.prototype.expression_sempred = function (localctx, predIndex) {
    var precedence;
    switch (predIndex) {
        case 0:
            precedence = 12;
            break;
        case 1:
            precedence = 11;
            break;
        case 2:
            precedence = 10;
            break;
        case 3:
            precedence = 9;
            break;
        case 4:
            precedence = 8;
            break;
        case 5:
            precedence = 7;
            break;
        case 6:
            precedence = 6;
            break;
        case 7:
            precedence = 5;
            break;
        case 8:
            precedence = 4;
            break;
        case 9:
            precedence = 3;
            break;
        case 10:
            precedence = 2;
            break;
        case 11:
            precedence = 1;
            break;
        default:
            throw "No predicate with index:" + predIndex;
    }

    return this.precpred(this._ctx, precedence);
};

here's another alternative:

ReactiveParser.prototype.expression_sempred = function (localctx, predIndex) {
    var precedence = [12,11,10,9,8,7,6,5,4,3,2,1][predIndex];

    if(precedence === undefined){
        throw "No predicate with index:" + predIndex;
    }

    return this.precpred(this._ctx, precedence);
};

this can be optimized by a minification tool by mangling precedence into something like a, resulting in smaller code footprint and subsequently faster load and parse time of the code in the browser.

frameworks like Chevrotain claim to be faster and lighter than others, including antlr:
https://sap.github.io/chevrotain/performance/

I think their claim need not have basis. I love antlr and I'd like to see it become less awkward in JavaScript and specifically browser space.

Most helpful comment

I'm working on a refactor of the antlr4 JavaScript runtime with a few goals:

  • Provide few to no breaking changes (the idea is to update to the latest syntax, and do some cleanup. That's it.)

    • update to the latest syntax and use classes

    • which are just sugar for what's in the codebase as-is but will vastly improve readability and possibly performance in some areas

  • Use JSDOC documentation for everything (and generate some docs)
  • Handle Browser vs Node environment detection
  • Use ES Modules vs CommonJS modules

    • allows the use of Rollup to generate a browser bundle, and only include the code that is required for browsers (might be all of it, but it's worth a look).

  • For Browsers, provide a pre-built bundle

    • minified for production

    • source maps for development

    • watch task for development (build bundle on change detection)

  • For NodeJS, provide pre-built (transpiled) code
  • The Antlr repository can maintain the pre-built source, and the published npm module can include the browser bundle and the nodejs library.
  • add Makefile and document build system

I believe this to be the best first step forward for the runtime as it will allow easier refactoring in the future. This can begin long term (future) efforts to:

  • Host generated documentation
  • Unit Test with Jest
  • refactor "side effect" functions into "atomic" functions

    • including removing methods that accrete native methods

  • Provide a solution to make the library easier to work with, i.e., a base class or object that provides helper methods
  • Accrete Promise methods (to allow for the use of async/await)
  • split the JavaScript runtime into a shared core library and allow each environment to handle what they're capable of: NodeJS supports streams and Buffers natively, Browsers have WebWorkers to offload work from the main event loop thread

    • publish an npm module for core, browser, and node separately?

Of course, if the desired future is to use TypeScript (which is a great idea) that's fine too. I was just thinking less in terms of complete rewrites and more in terms of more short term improvements.

All 14 comments

I ran bundle analyzer on my proejct, and obtained some interesting numbers.

antlr4 module:
image

100k g-zipped is not terrible, but it's more than I'd like to pay to be able to use my parser. For comparrison, here's the entire core of the framework engine that I was working on for 5 years, in the same project, it's about the same size:
image
and almost 10% of that is the parser/lexer/visitor generated by antlr
image

Hi
The main benefits of Antlr are:

  • left recursion in grammar
  • portability across target languages
    We all know that every pro comes with a con, and the above are no exception. Their respective cons are:
  • heavy runtime
  • focus on comparable code vs language specific idioms
    If you’re looking for the fastest and most compact Javascript parser, then Antlr is the wrong choice. If otoh you are fine with sacrificing a few kBs and optimizations for portability and ease of use, then I don’t think there is a better choice.
    Re your other comments, I’m always surprised when people criticize code that they get for free in the same way they would for code they are paying for.
    I can’t afford to dedicate 2 man/month per year of my free time only to keep the code up to date with the latest fashionable Javascript variant, and will not do so unless:
  • it measurably improves performance
  • it does not affect cross browser compatibility
    But I’m totally open to pre agreed PRs

Envoyé de mon iPhone

Le 28 janv. 2019 à 03:21, Alex Goldring notifications@github.com a écrit :

I ran bundle analyzer on my proejct, and obtained some interesting numbers.

antlr4 module:

100k g-zipped is not terrible, but it's more than I'd like to pay to be able to use my parser. For comparrison, here's the entire core of the framework engine that I was working on for 5 years, in the same project, it's about the same size:

and almost 10% of that is the parser/lexer/visitor generated by antlr

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@ericvergnaud
Thank you for your response. I understand your position as the maintainer.

When it comes to comparable code - antlr is originally written in java, java has a package system built into it which works on a very granular level. JavaScript, until recently (ES6) did not have that. Now it does. Using said module system is not a matter of fashion, simply a matter of language evolution. Like when java not have lambdas or C++ didn't support unicode per default.

focus on comparable code vs language specific idioms

JavaScript may be a language, but one also has to consider the environment. Browsers are quite different from node.js for a plethora of reasons.

your other comments, I’m always surprised when people criticize code that they get for free in the same way they would for code they are paying for.

I appreciate your (I assume) work and dedication of your time, If you believe that things I said were not constructive - please point it out to me, I would love to provide a clarification if I was vague.

  • it measurably improves performance

1) Size of your library does affect performance, since JS is distributed in source form, every time you want to run a piece of code - it has to parse, compile and optimize. Larger library is slower to start and takes more time to optimize. That's a fairly generic argument, but it also addressed your point:

  • heavy runtime

2) Code blocks with more statements in them take up more memory, this makes them less cache friendly - this pertains to my specific example. My first suggestion had 2 statements (STORE_CONSTANT and JUMP) vs your many, I'm not even sure how many, at the basic level at least 2 dereferences, a function call, 3 reads and a return which needs to put something on the stack. My point is - that code produces a lot of instructions as well as having large code footprint.

  • it does not affect cross browser compatibility

Current implementation is not browser compatible. What allows it to be used in the browser at all is Webpack or another library. If you rely on webpack - you can use import statements, as those can be transpiled for you automagically. If you don't want to rely on webpack - you can also distribute a pre-bundled version, using said webpack. This is a de facto standard.

If I have come off rude, that was not my intention. For any offense to your sensibilities - I offer my apology. My intention is to help improve this project.

Have you tried the TypeScript ANTLR runtime? https://github.com/tunnelvisionlabs/antlr4ts

@KvanTTT
I tried it, it's quite good. Usability is much better, generated code is smaller in size. The runtime is larger though, mostly due to TS->JS compilation. Might improve in the future.

Please also note that the above ts runtime might not be on par with the official runtimes.

I'm working on a refactor of the antlr4 JavaScript runtime with a few goals:

  • Provide few to no breaking changes (the idea is to update to the latest syntax, and do some cleanup. That's it.)

    • update to the latest syntax and use classes

    • which are just sugar for what's in the codebase as-is but will vastly improve readability and possibly performance in some areas

  • Use JSDOC documentation for everything (and generate some docs)
  • Handle Browser vs Node environment detection
  • Use ES Modules vs CommonJS modules

    • allows the use of Rollup to generate a browser bundle, and only include the code that is required for browsers (might be all of it, but it's worth a look).

  • For Browsers, provide a pre-built bundle

    • minified for production

    • source maps for development

    • watch task for development (build bundle on change detection)

  • For NodeJS, provide pre-built (transpiled) code
  • The Antlr repository can maintain the pre-built source, and the published npm module can include the browser bundle and the nodejs library.
  • add Makefile and document build system

I believe this to be the best first step forward for the runtime as it will allow easier refactoring in the future. This can begin long term (future) efforts to:

  • Host generated documentation
  • Unit Test with Jest
  • refactor "side effect" functions into "atomic" functions

    • including removing methods that accrete native methods

  • Provide a solution to make the library easier to work with, i.e., a base class or object that provides helper methods
  • Accrete Promise methods (to allow for the use of async/await)
  • split the JavaScript runtime into a shared core library and allow each environment to handle what they're capable of: NodeJS supports streams and Buffers natively, Browsers have WebWorkers to offload work from the main event loop thread

    • publish an npm module for core, browser, and node separately?

Of course, if the desired future is to use TypeScript (which is a great idea) that's fine too. I was just thinking less in terms of complete rewrites and more in terms of more short term improvements.

Hi Justin
Thanks for this, but may I suggest that you break the below down to individual items.
We can then have item specific conversations to clarify expected benefits and potential show stoppers.
Éric

Envoyé de mon iPhone

Le 6 mars 2019 à 18:35, Justin Beaudry notifications@github.com a écrit :

I'm working on a refactor of the antlr4 JavaScript runtime with a few goals:

Provide few to no breaking changes (the idea is to update to the latest syntax, and do some cleanup. That's it.)
update to the latest syntax and use classes
which are just sugar for what's in the codebase as-is but will vastly improve readability and possibly performance in some areas
Use JSDOC documentation for everything (and generate some docs)
Handle Browser vs Node environment detection
Use ES Modules vs CommonJS modules
allows the use of Rollup to generate a browser bundle, and only include the code that is required for browsers (might be all of it, but it's worth a look).
For Browsers, provide a pre-built bundle
minified for production
source maps for development
watch task for development (build bundle on change detection)
For NodeJS, provide pre-built (transpiled) code
The Antlr repository can maintain the pre-built source, and the published npm module can include the browser bundle and the nodejs library.
add Makefile and document build system
I believe this to be the best first step forward for the runtime as it will allow easier refactoring in the future. This can begin long term (future) efforts to:

Host generated documentation
Unit Test with Jest
refactor "side effect" functions into "atomic" functions
including removing methods that accrete native methods
Provide a solution to make the library easier to work with, i.e., a base class or object that provides helper methods
Accrete Promise methods (to allow for the use of async/await)
split the JavaScript runtime into a shared core library and allow each environment to handle what they're capable of: NodeJS supports streams and Buffers natively, Browsers have WebWorkers to offload work from the main event loop thread
publish an npm module for core, browser, and node separately?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@JustinBeaudry

Sounds lovely, I'm sure a lot of people will appreciate it. I know for a fact that a lot of companies use antlr for front-end side parsing.

I would recommend not to provide anything related to multi-threading (such as Worker/WebWorker), making sure that the code can be used inside of that environment is awesome, but I would prefer to have the decision remain up to the user. Workers are heavy-weight threads, they eat up a fair bit of resources and take time to spin up/shut down, which makes them less suited for small work-loads.

As for buffers, they exists in browser context also, just are called differently.

@ericvergnaud Agreed. Is there a better forum for this discussion? Or is this issue acceptable?

@Usnul

I would recommend not to provide anything related to multi-threading (such as Worker/WebWorker)

Agreed. I was thinking more along the lines of providing some examples or even a script (gist?) that could be used to demonstrate this.

As for buffers, they exists in browser context also, just are called differently.

Yes, ArrayBuffers are defined in the JavaScript spec (so supported in both), however, NodeJS Buffers have a benefit when calling .slice() on them, in that, the slice is just a "view" over the Buffer and shares the same memory. (though not sure if that's relevant at all to this project).

This is the right place but not the right issue.
As mentioned we need an issue per item such that each item can be discussed separately

Envoyé de mon iPhone

Le 7 mars 2019 à 09:57, Justin Beaudry notifications@github.com a écrit :

@ericvergnaud Agreed. Is there a better forum for this discussion? Or is this issue acceptable?

@Usnul

I would recommend not to provide anything related to multi-threading (such as Worker/WebWorker)

Agreed. I was thinking more along the lines of providing some examples or even a script (gist?) that could be used to demonstrate this.

As for buffers, they exists in browser context also, just are called differently.

Yes, ArrayBuffers are defined in the JavaScript spec (so supported in both), however, NodeJS Buffers have a benefit when calling .slice() on them, in that, the slice is just a "view" over the Buffer and shares the same memory. (though not sure if that's relevant at all to this project).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

That sounds reasonable. I’ll create one issue for now and we’ll go from
there. Thanks!

On Wed, Mar 6, 2019 at 8:07 PM ericvergnaud notifications@github.com
wrote:

This is the right place but not the right issue.
As mentioned we need an issue per item such that each item can be
discussed separately

Envoyé de mon iPhone

Le 7 mars 2019 à 09:57, Justin Beaudry notifications@github.com a
écrit :

@ericvergnaud Agreed. Is there a better forum for this discussion? Or is
this issue acceptable?

@Usnul

I would recommend not to provide anything related to multi-threading
(such as Worker/WebWorker)

Agreed. I was thinking more along the lines of providing some examples
or even a script (gist?) that could be used to demonstrate this.

As for buffers, they exists in browser context also, just are called
differently.

Yes, ArrayBuffers are defined in the JavaScript spec (so supported in
both), however, NodeJS Buffers have a benefit when calling .slice() on
them, in that, the slice is just a "view" over the Buffer and shares the
same memory. (though not sure if that's relevant at all to this project).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/antlr/antlr4/issues/2477#issuecomment-470377101, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAp6syFXej-YDqqQbB9YaMWPotYtkqiwks5vUJCTgaJpZM4aUyLe
.

>

Thank You,

Justin Beaudry

m: 512.767.5805
e: beaudry.[email protected]

antlr4 npm module requires "fs" module, which has no equivalent in a modern browser.

I fixed error caused by this in webpack config

module.exports = {
    entry: './index.js',
    output: {
        path: './',
        filename: 'your.bundle.js'
    },
    node: {
        module: "empty",
        fs: "empty"
    }
};

Hi,

this is already documented here:
https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md

Eric

Le 12 déc. 2019 à 21:16, Artemee notifications@github.com a écrit :

Recently I found myself needing a small DSL in a browser-based JavaScript project, I have followed all the documentation and ended up with broken project. antlr4 npm module requires "fs" module, which has no equivalent in a modern browser.

I fixed error caused by this in webpack config

module.exports = {
entry: './index.js',
output: {
path: './',
filename: 'your.bundle.js'
},
node: {
module: "empty",
fs: "empty"
}
};
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2477?email_source=notifications&email_token=AAZNQJHIQHTU4TZAYEETDPTQYI2UBA5CNFSM4GSTELPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGWUC2A#issuecomment-565002600, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJFTIELYFDS3SDURLRDQYI2UBANCNFSM4GSTELPA.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mdakin picture mdakin  Â·  31Comments

JitCompiler picture JitCompiler  Â·  17Comments

parrt picture parrt  Â·  27Comments

pavelvelikhov picture pavelvelikhov  Â·  57Comments

ikarienator picture ikarienator  Â·  19Comments