Monaco-editor: Monarch grammar: Regex starting with ^ should only match start of source line, but it does not.

Created on 7 Feb 2020  路  6Comments  路  Source: microsoft/monaco-editor

monaco-editor version: 0.19.3
Browser: Chrome 79
OS: macOS
Playground code that reproduces the issue:

According to the Monarch docs:

If it starts with a ^ character, the expression only matches at the start of a source line.

I have tried both specifying a regex literal /^!/ as well as a string "^!", but neither works.

You can reproduce the issue on the special Monarch playground at https://microsoft.github.io/monaco-editor/monarch.html. In the Language syntax definition, specify:

return {
  // The main tokenizer for our languages
  tokenizer: {
    root: [
      [/^\!/, {token: 'delimiter.curly', next: 'jsonInBang', nextEmbedded: 'json'}],
    ],
    jsonInBang: [
      [/^\!/, {token: 'delimiter.curly', next: '@pop', nextEmbedded: '@pop'}],
    ],
  },
};

And then in the Language editor, specify:

!
  {
    "foo": "bar",
    "baz": 42,
    "quux": [
      {
        "foo": "bar!",
      }
    ],
    "quux2": [
      {
        "foo": "bar",
      }
    ]
  }
!

If you look carefully, the JSON inside the ! starts out being highlighted correctly, but after the ! in the string literal "bar!", it stops working. It appears the /^\!/ has matched it even though it is not the start of a source code line, which is not what the docs claim should happen.

bug

All 6 comments

Stepping through the code, I believe the issue is that MonarchTokenizer._myTokenize() has proper checks for ^:

https://github.com/Microsoft/vscode/blob/0aed1023c556dfcd9fc211855f14b5a183b8a3e0/src/vs/editor/standalone/common/monarch/monarchLexer.ts#L614-L625

In particular, it does this check:

if (pos === 0 || !rule.matchOnlyAtLineStart) {

By comparison, when in an embedded grammar, I believe MonarchTokenizer._findLeavingNestedModeOffset() is used to test the rules to determine whether to exit the embedded mode. As best I can tell, it not appear to check if matchOnlyAtLineStart is set on each rule.

Ah, so the issue is a bit more subtle than I realized. In monarchCompile.ts, as part of Rule.setRegex(), the regular expression is rewritten:

https://github.com/Microsoft/vscode/blob/0aed1023c556dfcd9fc211855f14b5a183b8a3e0/src/vs/editor/standalone/common/monarch/monarchCompile.ts#L372

wrapping it in ^(?:<original_regex>), dropping the ^, if necessary (and setting matchOnlyAtLineStart to reflect that change).

As such, this check in _findLeavingNestedModeOffset() is expected to succeed:

https://github.com/Microsoft/vscode/blob/0aed1023c556dfcd9fc211855f14b5a183b8a3e0/src/vs/editor/standalone/common/monarch/monarchLexer.ts#L499

since those characters should have been added by setRegex().

I suspect the fix might be as simple as changing this code:

https://github.com/Microsoft/vscode/blob/0aed1023c556dfcd9fc211855f14b5a183b8a3e0/src/vs/editor/standalone/common/monarch/monarchLexer.ts#L503-L506

To be something like:

if (result === -1 || (result !== 0 && rule.matchOnlyAtLineStart)) {
    continue;
}

Though it would probably be slightly more efficient to rewrite the regex to include a leading ^ before passing it to line.search() so that it can fail faster, if appropriate.

Also, I think the RegExp is re-created each time _findLeavingNestedModeOffset() is run, so it might benefit from caching it on the rule or in a WeakMap or something?

I'm trying to fix this myself and I have been following the instructions on CONTRIBUTING.md, but it is proving difficult. What I really want is a dev version of https://microsoft.github.io/monaco-editor/monarch.html that I can edit/refresh quickly.

I followed the instructions for npm run release and npm run build-website, but it takes 3 minutes to build on my laptop, so that is a slow dev cycle. Even once it is done, it does not seem to be loading my dev version, but what was downloaded into node_modules/, even if I add ?editor=src, so that's no good.

I poked around the various .html files in test/, but none of them seem to offer a similar level of interactivity as the playground.

It would be helpful if the docs provided more detail about how the maintainers get a good dev cycle going.

@bolinfest The trick is to use yarn run watch in the vscode repo. That will be a process that will continue executing until killed. Then, you can edit .ts files and within <1s the generated .js are written to disk and then you can reload the webpage.

It usually just means that you edit the .ts file with autosave on and then alt-tab to a browser and reload and then you see your change...

@alexdima I had that working for the establish test pages, but it did not seem to be working for the playground (which I cared about because I didn't see a good test page for grammars, so I wanted to use monarch.html). Should the fast edit/refresh cycle be working for the playground, too?

For testing changes like that, I typically use one of the generated files like at http://127.0.0.1:8080/monaco-editor/test/playground.generated/extending-language-services-custom-languages.html

and just edit the .html to contain the Monarch definition I want and the content I want and then just reload...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

akosyakov picture akosyakov  路  3Comments

robclive picture robclive  路  3Comments

PinkyJie picture PinkyJie  路  3Comments

Kang-Jun-sik picture Kang-Jun-sik  路  3Comments

Spongman picture Spongman  路  3Comments