Pegjs: Shorthand for Semantic Actions

Created on 10 Sep 2018  路  13Comments  路  Source: pegjs/pegjs

It would be nice to add shorthand for semantic actions.

Say, instead of writing { return value }, we write, for example { extract } which is defined in the initializer.

For example:

{
  function extract(value) {
    return value;
  }
  function concat(head, tail) {
    return head ? [head].concat(tail) : [];
  }
  function toAddExpr(head, tail) {
    return { type: 'addExpr', expressions: concat(head, tail) };
  }
}

List
  = '(' _ head:Item? tail:( _ ',' _ value:Item { extract } )* _ ')' { concat }

// Another kind of list
Add
  = '(' _ head:Multiply? tail:( _ '+' _ value:Multiply { extract } )* _ ')' { toAddExpr }

First, this will make us able to reuse functions... in a nicer way :smile:
Second, this will make our grammar a bit more readable.

I believe it also will be more useful when the expression contained in the action shorthand is a member expression { foo.bar.baz } instead of just identifier { foo }. So that grammar writers can organize their functions inside of an object or even a module.

discussion feature

Most helpful comment

Actually, I have been thinking, these changes might just work:

CodeBlock "code block"
  = "=>" _ expession:CallExpression {
       return `return ${ expession };`;
     }
  / "{" @Code "}"
  / "{" { error("Unbalanced brace."); }

// Will be based on ECMAScript's CallExpression
CallExpression
  = ...
  / MemberExpression

// Will be based on ECMAScript's MemberExpression
MemberExpression
  = ...
  / ValidIdentifier

// Change `LabelIdentifier` into `ValidIdentifier`

This way will still require to integrate some stuff like ECMAScript's primary expressions (numbers, booleans, arrays, etc) to be used as arguments, so will need to carefully figure out what to add.


balancing brackets and braces

This won't be fixed until a proper JavaScript parser is built into the PEG.js parser, but to be honest I'm slightly hesitant about this as there are a few plugin projects that generate parsers in other languages (C, PHP, TypeScript, etc), and I also am working on a computer language that I hope to one day generate parsers in.


Alongside _PEG.js v0.12_, I will be working on OpenPEG, which will offer an NPM package that is essentially a stripped down version of PEG.js without any JavaScript and parser generation involved, but enough features so that JavaScript-based projects like PEG.js can use it as a backend. When _v0.12_ comes out I will try and ensure any plugin projects that generate custom parsers are notified of OpenPEG, and before v1 implement a full ECMAScript 2015 parser into the PEG.js grammar parser.

All 13 comments

Interesting idea, but I personally dont think that would be a clearer way, since it is not obvious which arguments are being passed to the function. Also, if you'd need to pass a "custom" parameter - you'd need to mix those with regular function calls, so it wont look as clean as in the simpler cases.

Someone in #235 suggested the syntax => to go along with arrow functions, which I thought was quite clean and concise.

  = '(' expr:some_expression ')' => expr
  ;

I'm leaning towards choosing one of the following for shorthands syntax:

  • => expr; _(needs support for syntax)_
  • { => expr } _(usable now, but needs unwrapping)_
  • { > expr } _(needs support for syntax)_

Haven't decided yet, so it's open for discussion.

As for what the OP wants, it would be best to implement a plugin (after or before the shorthand syntax is decided) that uses Acorn or @babel/parser to unwrap the identifier or member expression, transform it to a call expression while adding the labels as arguments and return the generated code.

=> expr;

Best in my opinion.

{ => expr }

Conflicts with Javascript syntax IMO. Since it's inside { } you'd expect it to be a full arrow function (() =>).

{ > expr}

A bit orthogonal to any other syntax in either PegJS or Javascript, doesn't immediately read "this returns a value, shorthand" IMO - mainly because it's in curly braces, I think. Same argument as {=> expr}, you expect Javascript to be in there.


Further, adding non-JS syntax in {} is a problem for syntax highlighters, linters, etc. I recommend against it.

If I may suggest yet another option, perhaps > by itself (not inside of a predicate block). That helps keep things aligned when you vertically space out rules:

    = foo:bar qux:(' '+ @qix)+
    > {foo, qux}
    ;

as well as inline

some_rule = foo:bar qux:(' '+ @qix)+ > {foo, qux};

Why is the semicolon needed for '=>'? I'd like to return a value in nested code instead of using buildList() for example:

  = "(" _ head:Expression _ tail:("," _ expr:Expression => expr)* ")" {
      return [head, ...tail]
    }

I find this cleaner than using a magic index (below). Another option is the ability to refer to nested labels. e.g. ("," _ tail:Expression)* ")"

  = "(" _ head:Expression _ tail:("," _ Expression)* ")" {
      return buildList(head, tail, 2)
    }

I was looking at parser.pegjs, and I see around line 434 there is CodeBlock. What would need to be done to try it out? The rule Code simply reads SourceCharacter, which is just '.'

CodeBlock "code block"
  = "=>" __ @Code // this?
  / "{" @Code "}"

@mikeaustin Yea, that's about right, but there's no way for it to know where to end this sequence then, so will consume evrything after =>

Maybe "Code" could be a little smarter, balancing brackets and braces, and handling LineTerminator? It wouldn't need to know about full JavaScript, but that may be harder than it sounds.

Actually, I have been thinking, these changes might just work:

CodeBlock "code block"
  = "=>" _ expession:CallExpression {
       return `return ${ expession };`;
     }
  / "{" @Code "}"
  / "{" { error("Unbalanced brace."); }

// Will be based on ECMAScript's CallExpression
CallExpression
  = ...
  / MemberExpression

// Will be based on ECMAScript's MemberExpression
MemberExpression
  = ...
  / ValidIdentifier

// Change `LabelIdentifier` into `ValidIdentifier`

This way will still require to integrate some stuff like ECMAScript's primary expressions (numbers, booleans, arrays, etc) to be used as arguments, so will need to carefully figure out what to add.


balancing brackets and braces

This won't be fixed until a proper JavaScript parser is built into the PEG.js parser, but to be honest I'm slightly hesitant about this as there are a few plugin projects that generate parsers in other languages (C, PHP, TypeScript, etc), and I also am working on a computer language that I hope to one day generate parsers in.


Alongside _PEG.js v0.12_, I will be working on OpenPEG, which will offer an NPM package that is essentially a stripped down version of PEG.js without any JavaScript and parser generation involved, but enough features so that JavaScript-based projects like PEG.js can use it as a backend. When _v0.12_ comes out I will try and ensure any plugin projects that generate custom parsers are notified of OpenPEG, and before v1 implement a full ECMAScript 2015 parser into the PEG.js grammar parser.

FWIW, I've started using template literals for this. It also helps me with syntax highlighting of JS by my text editor...

exports = module.exports = functionBodies`${grammarScript}
...
objectText =
    head:word
    rest:(_txt_ word)*
    ${f=>{
        return new Txt(rest.reduce((a,b)=>([...a,...b]),[head]))
        }}
word = ch:(wordCharacter/escapedCharacter)+ ${chJoin}
...
`
function functionBodies(glue, ...fns){
    return glue.map( (str,i) => str + (fns[i]||'').toString().replace(/^[^{]*/,'').replace(/[^}]*$/, '') ).join('')
    }

function chJoin(ch){return ch.join('')}

How about using the proposed pipe operator (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Pipeline_operator)? Since, basically, you're asking to pipe the data?

{
  function extract(value) {
    return value;
  }
  function concat(head, tail) {
    return head ? [head].concat(tail) : [];
  }
  function toAddExpr(head, tail) {
    return { type: 'addExpr', expressions: concat(head, tail) };
  }
}

List
  = '(' _ head:Item? tail:( _ ',' _ value:Item |> extract )* _ ')' |> concat

// Another kind of list
Add
  = '(' _ head:Multiply? tail:( _ '+' _ value:Multiply |> extract  )* _ ')' |> toAddExpr

None of this should happen. There should be no short syntax.

None of this is necessary if we just parse arrows normally, instead of trying to shim them in sideways.

Whereas this is a neat idea, it sets up extensive grammar ambiguities versus JavaScript. Anyone who's parsed JS and remembers what a debacle with was knows that this will basically murder a parser.

Instead of trying to create fancy new stuff, we should just support Javascript. Arrow functions are older than ES6 and ES6 is from 2015. This was solved six years ago. No invention should happen here.

The pipe operator is badly flawed and probably won't actually make it into Javascript, and the both the language that it originally comes from (F#) and the language that popularized it (Elixir) are backing away from it. Besides, this isn't piping in any sense.

The reason PEG was so successful was that it was minimal, and stayed close to the language, allowing it to be fast, small, and predictable.

Arrow functions are older than ES6 and ES6 is from 2015. This was solved six years ago. No invention should happen here.

Uh. That's news to me. Care to expand?

The reason PEG was so successful was that it was minimal, and stayed close to the language, allowing it to be fast, small, and predictable.

The reason PEG (as a concept, not this library) is successful is because of its ability to represent complex grammars (recursive, etc.) in a simplistic way. Packrat was not invented with this library; it is not a syntax. It's an algorithm.

Arrow functions are older than ES6 and ES6 is from 2015. This was solved six years ago. No invention should happen here.

Uh. That's news to me. Care to expand?

I don't really know what you're asking.

Arrow functions are older than ES6. This was the second biggest fight in ES6, is what derailed ES4, and what derailed ES5+. Everyone had been asking for them, at that point, since the mid-90s, because they did in fact already exist in E4X, and were taken away because Google and Apple threw a fit at Hixie about Microsoft ever inventing anything.

You now know E4X as React, and think Facebook invented it. Facebook thinks they ripped off Hyperscript. The Hyperscript guy is clear that he was just reimplementing a useful thing from old IE.

They were going to be left out of ES6 entirely, just like template strings, but then Coffeescript came along and gave the JS community both, and then the JS community yelled until the ECMA folks budged. Only took 18 months

Arrow functions do everything that needs to happen here. You even seem to be the person who brought them up in this thread, back in 2018, which makes your disagreement quite startling; I was trying to back you up.

More importantly to me, if it's done with an arrow function, nothing has been added.

Peg's differences from JS are absolutely minimal. Supporting this by just doing ES6 stuff means that list doesn't change.

That's extremely valuable.


The reason PEG (as a concept, not this library) is successful is because of its ability to represent complex grammars (recursive, etc.) in a simplistic way.

I don't agree. Many parsers do a much better job of this, and aren't even slightly popular, even with the people who know about them (like Earley.)

The traditional explanation is a combination of error message quality and speed, but I don't agree with that either, because many parsers have better error messages faster (again, like Earley,) and aren't even slightly popular, even with the people who know about them

Also, note that PEG has three serious complexity ceilings.

One, anything you want to put through a peg grammar has to have a combinatoric expression that doesn't overwhelm the local machine's cache and evaluation throughput (by example #623)

Two, many common jobs, such as parsing BNF, are frequently brutally difficult in peg (by example #489)

Three, it's worth noting that every other JS PEG library, even significantly more powerful ones, have failed. I tried to switch away, and have come back, many times. In particular, I've tried to switch to canopy a bunch of times, because it allows me to target c, ruby, and python in addition to javascript

Granted, all I can speak for are the two dozen or so people I know who are using it. And I can, because I asked a couple days ago, when I realized the new non-maintainer was throwing away the software and replacing it with something he made from scratch, after years of no published changes

But every single one of them said to me either that they need a parser that doesn't have a lot of native conceptual overhead, or that they needed something fast and small whose behavior was reliable

Unreleased 0.11 behaves meaningfully differently in node vs chrome, and node is made of chrome. Try writing some property tests against it. It's honestly kind of terrifying.

.

Packrat was not invented with this library; it is not a syntax. It's an algorithm.

I didn't say anything about Packrat, friend. I'm not sure what you're trying to correct.

Packrat parsing is not, however, an algorithm, for the same reason that sort isn't an algorithm. Packrat parsing is a task, and there are many ways to go about it.

Indeed, most intro haskell books make you do three or four different packrat parsers, because they're a great way to get really hung up on the performance problems of haskell's approach to monads, and they want to show you how changing the approach to writing packrats (that is, changing the algorithm) yields better results.

.

Please reconsider that thumbs down. This library has been dead for three years, and I'd like to resurrect it now.

Part of the reason the library is dead is that people keep trying to greenfield invent features, instead of performing simple maintenance, like adding the es() module function that's been sitting in dev for two years

I have to manually modify my PEG parsers by cutting lines and stapling hand-written javascript onto the end of them

The starry eyes should close for a little while, and some practical greasy elbows should start. PEG's the only major NPM library I've ever seen with declining use. Given that I'm not aware of a reasonable replacement, that's bizarre and confusing to me.

image

I have bugfixes that I want to contribute right now, but I can't, because

  1. 0.10 hasn't been published since dmajda left,
  2. 0.11 is three years old, has never been published, and it was announced a month ago would never be published, and
  3. The replacement, 0.12, isn't peg at all, but something the other guy wrote from scratch in a different programming language, and none of us can see it

I know it's impolite, but we need to face that this library is being killed

It's time to face that we need to accept a normal development process if this library is ever going to see updates again. It's 2020. We haven't seen anything since 2017.

No, dev branch doesn't count. And the new maintainer's gonna have to re-open a whole lot of issues, because 0.11 is not of sufficient quality, and starting over from 10 is a lot less work than fixing 11

Was this page helpful?
0 / 5 - 0 ratings