Current behavior: 2 * 5 + SUM(1, 2, 3)
[
"2",
"*",
"5",
"+",
"SUM",
"(",
[
"1",
",",
"2",
",",
"3"
],
")"
]
Desired behaviour: 2 * 5 + SUM(1, 2, 3)
[
"2",
" ",
"*",
" ",
"5",
" ",
"+",
" ",
"SUM",
"(",
[
"1",
",",
" ",
"2",
",",
" ",
"3"
],
")"
]
Grammar to copy: https://pastebin.com/zpwqT6Uw
PEG.js playground https://pegjs.org/online
What am I missing?
@futagoza incredibly sorry to bother you but it is the first time I'm dealing with PEG.js and this issue is critical for me. May I ask you for a little hint?
Best Regards,
Marek
I tried looking through your grammar (yesterday and just now) but because it's really hard to understand it (naming conventions aside, the format, to be honest, is all over the place), it took me a while to track down a solution:
const
returns [left_space, cnst, right_space]
)[].concat.apply([], con)
Even still, to be honest with you this feels like a hacky solution to me. Have you got a link to a spec or something? It would help to know what rules I can and cant change to gain the desired result without the above hacky solution.
If not, as long as your willing to put the time in and tidy up the grammar and renaming some rules (so its easier to figure out what you want), then I'll gladly try to take another stab at it 😉
@marek-baranowski - Sorry I didn't see this until now. Hopefully this is still useful to you
If you want to keep the spaces, just treat them like matchable content.
It's not really clear exactly what you'd want for two spaces. You could either have a string of two spaces, or an array of two one-space strings. Normally I'd expect the former, but ... all your thing is per-character
Also ... why would you want stray characters like that, except for the function call? The parser should be summing those up for you.
Anyway
This is what you asked for:
Document = Expression*
Whitespace
= tx:[ \r\n]+ { return tx.join(''); }
Number
= str:[0-9]+ { return str.join(''); }
Oper
= '+'
/ '-'
/ '/'
/ '*'
/ ','
Label
= l:[a-zA-Z]+ { return l.join(''); }
Parens
= '(' Whitespace? ex:Expression* Whitespace? ')' { return ex; }
Expression
= Number
/ Oper
/ Whitespace
/ Label
/ Parens
/ [^()]+
Thing is, I am not super convinced that it's actually what you want. By example instead you could parse the numbers and operators, and return a standardized node shape for each one:
Document = Expression*
Whitespace
= tx:[ \r\n]+ { return {
ast: 'whitespace', value: tx.join('')
}; }
Number
= str:[0-9]+ { return {
ast: 'number', value: parseInt(str,10)
}; }
Oper
= '+' { return { ast: 'oper', value: 'add' }}
/ '-' { return { ast: 'oper', value: 'subtract' }}
/ '/' { return { ast: 'oper', value: 'divide' }}
/ '*' { return { ast: 'oper', value: 'multiply' }}
/ ',' { return { ast: 'oper', value: 'sequence' }}
Label
= l:[a-zA-Z]+ { return {
ast: 'label', value: l.join('')
}; }
Parens
= '(' Whitespace? ex:Expression* Whitespace? ')' {
return { ast: 'parens', value: ex
}; }
Expression
= Number
/ Oper
/ Whitespace
/ Label
/ Parens
/ [^()]+
Now you still have your whitespace, but you also have a proper parsed tree, and don't need to write a parser to parse your parser's output, and it's also easy as pie now to start adding regularized features like line numbers and so forth
@marek-baranowski - I'd like to reduce the size of this issue tracker somewhat
If the above is what you need, would you please consider closing this issue? Thanks 😄
If it isn't, please let me know why, and I'll try again
@marek-baranowski Another gentle ping :smiley_cat:
Also, I wrote a PEG.js plugin pegjs-syntactic-actions to facilitate debugging of grammars, and specifically see what characters are captured by what rule independently of the actions, which is probably your issue here as explained by @StoneCypher.
The reasoning of this plugin is: I find it is often/sometimes difficult to understand the global result when it is not what we expect, because it results from the combination of many small actions, and finding the action which behaves badly/stangely could be time-consuming. With this plugin, we see what rule captures what character, and it gives the name of the action to act on.
oh wow this is really neat actually
Most helpful comment
@marek-baranowski Another gentle ping :smiley_cat:
Also, I wrote a PEG.js plugin pegjs-syntactic-actions to facilitate debugging of grammars, and specifically see what characters are captured by what rule independently of the actions, which is probably your issue here as explained by @StoneCypher.
The reasoning of this plugin is: I find it is often/sometimes difficult to understand the global result when it is not what we expect, because it results from the combination of many small actions, and finding the action which behaves badly/stangely could be time-consuming. With this plugin, we see what rule captures what character, and it gives the name of the action to act on.