Vscode-ng-language-service: Textmate grammar for template expressions

Created on 23 Jan 2020  ·  27Comments  ·  Source: angular/vscode-ng-language-service

__Is your feature request related to a problem? Please describe.__

The language service extension now provides a Textmate grammar for the Angular template syntax (#483), and describes template expressions via a JavaScript grammar. However, template expressions are not JavaScript, and to have an accurate grammar long-term, it would be useful to define a grammar specifically for Angular template expressions.

__Additional context__

_Originally posted by @dannymcgee in https://github.com/angular/vscode-ng-language-service/issues/541#issuecomment-577494343_:

@ayazhafiz I've been thinking about the issue with embedded grammars getting stuck and the potential solution I mentioned earlier in this thread, and I made a really rough proof of concept that seems promising.

The basic idea is:

  • We embed our own subset of the JavaScript grammar instead of the full JS grammar, limited to just the syntax that's actually valid in Angular templates
  • We have two _versions_ of that grammar, one which recognizes single quotes as an end condition, and one that recognizes double quotes
  • For each template binding, we inject the correct grammar depending on whether the value was bound with double or single quotes

The hard (or annoying) part of that equation is having to maintain two nearly identical grammars for the JavaScript subset. The solution I came up with was to generate the JSON files dynamically with TypeScript, where each grammar "repository" is defined in a separate file. Then you can just swap out the imports for the handful of repositories that actually need to be different between the two versions. The proof of concept for that is here: https://github.com/dannymcgee/vscode-grammar-from-ts-experiment

There are some other advantages to doing it this way, namely much better code organization and much more readable regex patterns since they can be written as regex literals and benefit from syntax highlighting. And the additional benefits of defining our own grammar for the scripting language is that we can add more accurate grammar scopes for things like pipes and the as local-val syntax.

Do you think this is a reasonable solution? And if so, any guidance on how to split up my PRs for ease of review?

_Response; originally posted by @ayazhafiz in https://github.com/angular/vscode-ng-language-service/issues/541#issuecomment-577500730_

@dannymcgee thanks for taking the time to do this. I like your idea of a template syntax grammar; it's definitely the "correct" way to do it.

I'm not sure that we need two separate grammars for single and double quotes. I guess the case where this matters is if someone uses single quotes inside an attribute itself demarcated by single quotes (and symmetrically for double quotes), though I wonder how common this pattern is -- if it's not very common or does not significantly impact the stability of the grammar, it may not be worth the cost of maintaining two disjoint grammars. I'm not opposed to compiling the grammar from TypeScript definitions if the way to move forward is with two grammars. What do you think?

For PRs, incremental features should work fine. Looking at your PoC, for example, there could be a PR for all numeric things, then one for all objects, etc. After everything gets merged we can swap out the call to the JS grammar to the template syntax grammar.

I will open another issue for this where can discuss further.

feature

All 27 comments

I'm not sure that we need two separate grammars for single and double quotes. I guess the case where this matters is if someone uses single quotes inside an attribute itself demarcated by single quotes (and symmetrically for double quotes), though I wonder how common this pattern is -- if it's not very common or does not significantly impact the stability of the grammar, it may not be worth the cost of maintaining two disjoint grammars. I'm not opposed to compiling the grammar from TypeScript definitions if the way to move forward is with two grammars. What do you think?

If we define one embeddable grammar that recognizes both double- and single-quoted strings, then in many cases what _should_ be the end quote for the attribute binding will instead be recognized as the opening quote for a new string within the template expression. A common failure point I was seeing in my PoC testing was with ternary expressions:

<fa-icon [icon]="isLocked ? 'lock' : 'lock-open'"></fa-icon>

The JS grammar's end condition for ternary expressions looks like this: (?=$|[;,})\]]), so without encapsulating the ternary in parens or ending it with a semicolon, the grammar broke.

However, this particular case actually _isn't_ failing right now with the full JS grammar, but was with my cherry-picked subset, so this must have been a side effect of excluding some other pattern that I didn't think was relevant to template expressions. So it's possible that just carefully designing the template expression grammar from scratch instead of haphazardly copying and pasting from the JS grammar could sidestep issues like this -- but that is a whole lot more work. :)

There is also a third option. The way I designed my PoC was by defining the expression syntax as a separate language to be embedded into the template grammar, just like the JS grammar is being embedded now. If we instead built these expression patterns directly into the template grammar, we could potentially sidestep the whole issue by just selectively including certain sets of patterns inside certain attribute values. This would also be more accurate to the way Angular actually works, since interpolation and different types of bindings permit different types of expressions.

I agree with you, we should keep separate definition. I think that compiled
definitions would be preferred here.

On Wed, Jan 22, 2020 at 9:45 PM Danny McGee notifications@github.com
wrote:

I'm not sure that we need two separate grammars for single and double
quotes. I guess the case where this matters is if someone uses single
quotes inside an attribute itself demarcated by single quotes (and
symmetrically for double quotes), though I wonder how common this pattern
is -- if it's not very common or does not significantly impact the
stability of the grammar, it may not be worth the cost of maintaining two
disjoint grammars. I'm not opposed to compiling the grammar from TypeScript
definitions if the way to move forward is with two grammars. What do you
think?

If we define one embeddable grammar that recognizes both double- and
single-quoted strings, then in many cases what should be the end quote
for the attribute binding will instead be recognized as the opening quote
for a new string within the template expression. A common failure point I
was seeing in my PoC testing was with ternary expressions:

The JS grammar's end condition for ternary expressions looks like this:
(?=$|[;,})]]), so without encapsulating the ternary in parens or ending
it with a semicolon, the grammar broke.

However, this particular case actually isn't failing right now with the
full JS grammar, but was with my cherry-picked subset, so this must have
been a side effect of excluding some other pattern that I didn't think was
relevant to template expressions. So it's possible that just carefully
designing the template expression grammar from scratch instead of
haphazardly copying and pasting from the JS grammar could sidestep issues
like this -- but that is a whole lot more work. :)

There is also a third option. The way I designed my PoC was by defining
the expression syntax as a separate language to be embedded into the
template grammar, just like the JS grammar is being embedded now. If we
instead built these expression patterns directly into the template grammar,
we could potentially sidestep the whole issue by just selectively including
certain sets of patterns inside certain attribute values. This would also
be more accurate to the way Angular actually works, since interpolation and
different types of bindings permit different types of expressions.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/angular/vscode-ng-language-service/issues/571?email_source=notifications&email_token=AE6GL6U5LJQCDMLHSGFUQ73Q7EVIZA5CNFSM4KKQUDJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJWC5GQ#issuecomment-577515162,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AE6GL6XJSTNHDNSBMZ3FBKTQ7EVIZANCNFSM4KKQUDJA
.

@ayazhafiz "We embed our own subset of the JavaScript grammar instead of the full JS grammar, limited to just the syntax that's actually valid in Angular templates"

I have done this with my grammar, and I can confirm it reduces a lot the JS grammar itself. Refer to angular-expressions.json5

@dannymcgee There is something in the expression pattern in the JS grammar itself, that causes it to break with a missing semicolon.

I have scavenged the reduced JS grammar in the past and added the missing fixes for it to fit the needs of angular expressions.

We have two versions of that grammar, one which recognizes single quotes as an end condition, and one that recognizes double quotes

I wonder how common this pattern is [...]

The specs says double quoted (attribute="value"), single quoted (attribute='value'), unquoted - if there is no space character in the attribute value (attribute=value) and empty (attribute) are all valid.

But I personally think it looks ugly and hard to read.

Is there any way I could help you guys?

Is there any way I could help you guys?

@ghaschel If you wanted to coordinate and help open PRs to get the expression grammar in, that would be fantastic. Maybe we could split tasks between the three of us? And since you've already done the work of extracting the relevant patterns from the JS grammar, if we can refer back to your repo that would make things go much faster and smoother.

Do either of you know if it possible (or more importantly, reasonable) to create a grammar that doesn't escape the starting/ending patterns that it is included in?

Any help would be welcome and highly appreciated. I think we can go ahead and start with incremental addition to the JSON grammar for template expressions, or refactor to TypeScript definitions, or both.

Do either of you know if it possible (or more importantly, reasonable) to create a grammar that doesn't escape the starting/ending patterns that it is included in?

Sorry, I'm not sure if I understand what you mean? I've only ever created outright replacement grammars, never attempted the injectionSelector thing we're doing here, if that's what you're referring to.

Do either of you know if it possible (or more importantly, reasonable) to create a grammar that doesn't escape the starting/ending patterns that it is included in?

I'm not sure I understand what you mean. There are two ways to match patters. An exact match, and a starting/ending match with specific scope matching inside. Is this related to what you're thinking?

Is there any way I could help you guys?

@ghaschel If you wanted to coordinate and help open PRs to get the expression grammar in, that would be fantastic. Maybe we could split tasks between the three of us? And since you've already done the work of extracting the relevant patterns from the JS grammar, if we can refer back to your repo that would make things go much faster and smoother.

I'm down with that. I'm just not sure to what repository to commit to and what technologies would we be using... Are there any real advantages in using Typescript instead of yaml/json5?

I have had in the past in my grammar the problems described in https://github.com/angular/vscode-ng-language-service/issues/575
I think the reduced/adapted JS grammar I derived would solve this problem

What I mean is a grammar that never matches outside the pattern that it is included in. A lot of the issues we are seeing are because JS syntax is bleeding out of the scope it was included in, trying to match a pattern past the assumed-ending match of its parent, because the grammar tries to finish matching JS first before closing the parent. This is my understanding, anyway -- please correct me if I am wrong.

I'm just not sure to what repository to commit to

All definitions for the syntaxes and their tests should be available in the syntaxes/ folder on the root of this repository. Let me know if you have any questions. I think the best course of action is to start with just adding JSON definitions for a template expression grammar in a similar way it is already being done (a new file). I will start at this weekend; my apologies, I have been very busy this week.

and what technologies would we be using... Are there any real advantages in using Typescript instead of yaml/json5?

I think the advantage of TS definitions relates to Danny's comment mentioned in the opening of this issue; we will have to eventually support separate grammars for attributes demarcated by single and double quotes. It's more maintainable to use as many common repositories as possible, for which we need some kind of scripting lang. I am not familiar with JSON5; does it provide something similar? In any case, this is not a blocker. I think the goal should first be to provide stability of the grammar (fixing low-hanging issues for what is being reported on a daily basis), and we can add the dual grammars/switch technologies concurrently.

@ayazhafiz What I mean is a grammar that never matches outside the pattern that it is included in. A lot of the issues we are seeing are because JS syntax is bleeding out of the scope it was included in, trying to match a pattern past the assumed-ending match of its parent, because the grammar tries to finish matching JS first before closing the parent. This is my understanding, anyway -- please correct me if I am wrong.

Ahh, okay. Yeah, for better or worse this is the default behavior for embedded TextMate grammars. It's essentially the parent grammar's responsibility to define the end condition for the embedded pattern. The problem as I understand it is that if the embedded grammar enters into its own multi-line state (i.e. one demarcated by "begin"/"end" patterns instead of a simple pattern match), the grammar parser will not look for the parent grammar's end condition until after it's found the embedded grammar's current state's end condition.

From what I can tell, the main reason this is such a problem with Angular templates is that there are valid Angular template expressions which aren't correctly parsed by the JS grammar (e.g., pipes are recognized as bitwise operators and the as local-val syntax is parsed as a typecast). That issue doesn't necessarily need to be fixed by forcing the embedded grammar to also check for the parent state's end condition — it could also be fixed by just ensuring that the embedded grammar is correctly pattern-matching according to the rules of Angular's template expression syntax instead of JavaScript's. It seems like that is what @ghaschel has done with his grammar.

That approach wouldn't fix the issue of some _invalid_ syntax breaking the syntax highlighting (like opening a brace in an expression and never closing it again), but I don't know how concerned we really need to be about that.

@ghaschel Are there any real advantages in using Typescript instead of yaml/json5?

The biggest advantage IMO is not having to double-escape characters in regex patterns and getting syntax highlighting for the expressions themselves (the TypeScript grammar scopes for regular expressions are pretty excellent, so I have my theme configured to take advantage of that). But it's an extra build step and an added dev dependency and it's not hugely important to me, so I'd be just as happy continuing with plain JSON as long as the files stay manageably small.

@dannymcgee it could also be fixed by just ensuring that the embedded grammar is correctly pattern-matching according to the rules of Angular's template expression syntax instead of JavaScript's. It seems like that is what @ghaschel has done with his grammar.

That's precisely what I've done with my grammars. That and adding some extra matches like optional chaining support.

@ayazhafiz I think the advantage of TS definitions relates to Danny's comment mentioned in the opening of this issue; we will have to eventually support separate grammars for attributes demarcated by single and double quotes. It's more maintainable to use as many common repositories as possible, for which we need some kind of scripting lang. I am not familiar with JSON5; does it provide something similar? In any case, this is not a blocker. I think the goal should first be to provide stability of the grammar (fixing low-hanging issues for what is being reported on a daily basis), and we can add the dual grammars/switch technologies concurrently.

JSON5 doesn't provide scripting like that, it's basically a more permissive JSON. The only advantages over plain JSON would be having multiple repositories divided by folders/subfolders and files that would be merged into a single json per main folder. As far I could understand we would be generating multiple JSON grammar files and having them being injected into the main language. I think something more dynamic like Typescript would be the ideal, then.

The biggest advantage IMO is not having to double-escape characters in regex patterns and getting syntax highlighting for the expressions themselves (the TypeScript grammar scopes for regular expressions are pretty excellent, so I have my theme configured to take advantage of that). But it's an extra build step and an added dev dependency and it's not hugely important to me, so I'd be just as happy continuing with plain JSON as long as the files stay manageably small.

I don't believe pure JSON would be a good choice for us as the expression grammar itself is already extensive, No problems working with Typescript for me and I'm fine with the extra build step.
Is something like the example you added in your PoC what we would be using?

I don't believe pure JSON would be a good choice for us as the expression grammar itself is already extensive, No problems working with Typescript for me and I'm fine with the extra build step.
Is something like the example you added in your PoC what we would be using?

Yep, that would be the plan. I should be able to go ahead and open a PR tonight to refactor the existing grammars to use that system, and from there we can start adding the custom template expression grammar.

Do you two mind if I use your existing grammars to start on the template expression grammar?

Do you two mind if I use your existing grammars to start on the template expression grammar?

I don't mind, but you'll definitely want to use @ghaschel's version as mine is wrong :)

I can make a PR with my grammar as soon as #581 is merged. =]

I started working on porting my grammar to the format @dannymcgee introduced in #581.

Okay cool. I decided to hold off since your comment. Excited to see it; let me know if we can help.


От: Guilherme Haschel notifications@github.com
Отправлено: среда, января 29, 2020 5:29 AM
Кому: angular/vscode-ng-language-service
Копия: hafiz; Mention
Тема: Re: [angular/vscode-ng-language-service] Textmate grammar for template expressions (#571)

I started working on porting my grammar to the format @dannymcgeehttps://github.com/dannymcgee introduced in #581https://github.com/angular/vscode-ng-language-service/pull/581.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/angular/vscode-ng-language-service/issues/571?email_source=notifications&email_token=AE6GL6W2NQHGIAM35CO2CODRAGAC5A5CNFSM4KKQUDJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKHFXSY#issuecomment-579754955, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE6GL6SLH3QEY3NHNVUUPXLRAGAC5ANCNFSM4KKQUDJA.

image

Its going fine... There are a few regexes I'll have to rewrite because it seems that the tsc compiler doesn't see them as valid.

I think we can expect the expressions to be fully functional by friday at most.

Great to hear! Feel free to submit incremental PRs as you go, if you'd like.

@ghaschel Its going fine... There are a few regexes I'll have to rewrite because it seems that the tsc compiler doesn't see them as valid.

Oof, that's a pain, sorry about that! In those cases it might honestly be easier to just keep the patterns in string format instead of refactoring them to regex literal, unless there's a way we can update the build function to handle them. What's the specific issue you're seeing?

@ghaschel how is it going? Can we help in any way?

@ayazhafiz I'm sorry for disappearing like this. Had a really hard rest of the week at work, end of a project, had to work in the weekend.

It's going well. I will finish doing this by the end of the day.

No worries at all, I totally understand. Take your time! Just checking up. I saw your PR just now too.

Just added another commit to fix the snapshot tests. I had to regenerate template and inline template snapshot files.
Let me know if there's anything wrong with my pull request.

Will do. I'll take a look later today. I also am very busy at work ))

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

_This action has been performed automatically by a bot._

Was this page helpful?
0 / 5 - 0 ratings