Vscode: Can I get scope / scopeRange at a position?

Created on 24 Nov 2015  路  23Comments  路  Source: microsoft/vscode

_From @billti on November 1, 2015 6:10_

The API call document.getWordRangeAtPosition(position) appears to use its own definition of a word. For example, my tmLanguage defines attrib-name as a token/scope, yet getWordRangeAtPosition appears to break this into 2 words on the - character.

How can I get token ranges at a position based on my custom syntax? (And it would be really useful if I could get the scope name that goes along with it too).

_Copied from original issue: Microsoft/vscode-extensionbuilders#76_

api feature-request tokenization

Most helpful comment

In the meantime, I've published an extension, scope-info, that provides an API to query the scope at a particular position. It works by querying the installed extensions for language definitions and grammars, and then maintains a parse-state for each open document using vscode-textmate. Only one instance will exist per vscode instance, regardless of how many other extensions depend on it.

Example usage:

import * as api from 'scope-info'
async function example(doc : vscode.TextDocument, pos: vscode.Position) {
    const siExt = vscode.extensions.getExtension<api.ScopeInfoAPI>('siegebell.scope-info');
    const si = await siExt.activate();
    const t1 : api.Token = si.getScopeAt(doc, pos);
}

Notes:

  • For typings, refer to scope-info.d.ts.
  • You can also query the vscode-textmate-IGrammar and scope name of a language.
  • Your extension should list 'siegebell.scope-info' as an extensionDependency.
  • If multiple extensions contribute to the same language, scope-info may pick the wrong one.
  • Scope-info might return a scope corresponding to a slightly newer or older document version than what your extension thinks is current.
  • Pull requests are welcome.

All 23 comments

_From @vilic on November 1, 2015 15:19_

:+1:

_From @egamma on November 2, 2015 8:16_

Exposing the scope names in the API is on the backlog, but will not make it into the November update.

_From @jrieken on November 2, 2015 18:16_

@billti despite the lack of access to scopes you can define your a custom word definition such that it will be picked up by document.getWordRangeAtPosition. You can register a ITokenTypeClassificationSupport which can contribute a regex to classify words.

_From @billti on November 2, 2015 19:3_

Thanks @jrieken , I spotted that, and it may be a useful interim solution. But generally for now, if I want to know the classification accurately for a position in a CFG, seems I'll need to document.getText() and run my own parser over it - is that right?

_From @jrieken on November 3, 2015 9:59_

unfortunately yes

@egamma on November 2, 2015 8:16
Exposing the scope names in the API is on the backlog, but will not make it into the November update.

Is there any update on if/when we can expect a way to get the scopes at a position or offset?

@hoovercj all I can currently say is that this is still on the backlog, sorry.

@egamma Any progress on this? Is there any way I can contribute? :)

Would it be trivial to provide a command that returns a url to the TextMate grammar file being used for a particular document/languageId (or return the contents of the file to keep them read-only)? Then we could use vscode-textmate ourselves to get the token info at a particular location.

@siegebell -- As a short-term solution, I have successfully included a textmate grammar with my extension , referenced that, and referenced the built-in vscode-textmate package to get token scopes in an extension.

It's a pain and it really should be part of the API, but it's definitely possible to do today.

I was given the advice to use: var tm = require(path.join(require.main.filename, '../../node_modules/vscode-textmate/release/main.js')); to access vscode-textmate, but since I have a language server I had to evaluate require.main.filename in the language client and pass it over with the initializationOptions to get the right value in my server.

@TimonVS exposing the scopes in API requires that we re-visit the internal representation of scopes, this requires major re-architecting and this makes challenging to open-up for contributions.

In the meantime, I've published an extension, scope-info, that provides an API to query the scope at a particular position. It works by querying the installed extensions for language definitions and grammars, and then maintains a parse-state for each open document using vscode-textmate. Only one instance will exist per vscode instance, regardless of how many other extensions depend on it.

Example usage:

import * as api from 'scope-info'
async function example(doc : vscode.TextDocument, pos: vscode.Position) {
    const siExt = vscode.extensions.getExtension<api.ScopeInfoAPI>('siegebell.scope-info');
    const si = await siExt.activate();
    const t1 : api.Token = si.getScopeAt(doc, pos);
}

Notes:

  • For typings, refer to scope-info.d.ts.
  • You can also query the vscode-textmate-IGrammar and scope name of a language.
  • Your extension should list 'siegebell.scope-info' as an extensionDependency.
  • If multiple extensions contribute to the same language, scope-info may pick the wrong one.
  • Scope-info might return a scope corresponding to a slightly newer or older document version than what your extension thinks is current.
  • Pull requests are welcome.

exposing the scopes in API requires that we re-visit the internal representation of scopes, this requires major re-architecting and this makes challenging to open-up for contributions

@alexandrudima I believe the above was done as part of #18317

@aeschli Will #18068 be covering the feature ask in this current issue or are we suggesting extension authors to use https://marketplace.visualstudio.com/items?itemName=siegebell.scope-info?

Alex added a developer tool that lets you see the tokens at a location. See https://github.com/Microsoft/vscode/pull/17933#issuecomment-271515251

There's still no extension API that returns text-mate scopes. Several reasons for that one of them that we don't want that extensions start depending on a particular tokenizer grammar.

I think it is enough get the color at position, then associate it to an applied style: string, number, keyword...

This would also be very useful for me. I'm writing an extension for a custom ebnf syntax. The textmate grammar has all the information needed to provide linting, even for renaming symbols and basic syntax validation. (For this just filter the tokens by not having any scope attached -> unexpected token & syntax error)

I currently load the 'vscode-textmate' module that comes with vscode using some dirty workaround and use that to reparse open files. It's a lot of wasted CPU time and I can't easily do incremental changes. (I assume vscode already does this internally to speed up syntax highlighting)

Here's a few functions I could use:

  • Get token at position
  • Get a list of tokens for the entire file, or in a Range
  • Get token text, scopes & Range
  • Get a list of tokens filtered by scope (this can be achieved by using the above two, but could be optimized separately)
  • Open additional files and get them tokenized in the background (for #include directive)

Here's my extension for some reference on how this information can be used:
https://github.com/Victorious3/vscode-TatSu/blob/635d3c1351b55048feac44f09203a95f1fc0c275/server/src/parse.ts

I don't understand why in my language extension, I need to re-parse all file to know if a character is commented, is string or not,
other ideas:

  • grammar correction only for string and comments
  • separate editor for escaped string where \r n are converted (like language injection of IntelliJ )
  • regex visualizer for regex token
    etc etc

aeschli commented on Jan 19, 2017

There's still no extension API that returns text-mate scopes. Several reasons for that one of them that we don't want that extensions start depending on a particular tokenizer grammar.

That's actually very sad. The grammar already did most of the work needed for making outlines, and now we have to start all over, type all the same REGEX in to a TypeScript module and repeat it to get the same data?

Extensions covering languages that don't have servers usually bring their own grammar files too, so why can they not rely on the same grammar file for both needs.

Actually I think VS Code should build the outline from the grammar scopes (for languages that don't already have a symbol provider), as it would increase the number of languages that would benefit from the outline feature. The textmate grammar system is severely underutilized.

This is in addition to the common language extensions needs, such as knowing comments and strings.

Write like this?

document.getText(document.getWordRangeAtPosition(position, /[a-zA-Z_][\-a-zA-Z0-9_]*/));

I just ran into this issue when trying to extend auto-correct to behave in a smart way depending on the current cursor environment. Hence, I would love to see this functionality as well!

Are there any possible workarounds to get this working with the extension test host? I'd love to be able to write an end-to-end test to validate semantic highlighting is working, but couldn't find a way.

I'd love to be able to write an end-to-end test to validate semantic highlighting is working, but couldn't find a way.

An example ive seen is here:
https://github.com/styled-components/vscode-styled-components/blob/master/src/tests/suite/colorization.test.js

The idea is it has a fixture file, then calls captureSyntaxTokens and validates that against a pre defined results file. I'm not sure if there's more efficient ways but it works as an end to end test for syntax highlighting. I don't know if this changes for semantic highlighting

An example ive seen is here:
https://github.com/styled-components/vscode-styled-components/blob/master/src/tests/suite/colorization.test.js

Thanks for the suggestion! Looks this works well for testing TextMate grammars, but unfortunately not for semantic tokenization, as it invokes the grammar directly.

Was this page helpful?
0 / 5 - 0 ratings