_From @billti on November 1, 2015 6:10_
The API call document.getWordRangeAtPosition(position) appears to use its own definition of a word. For example, my tmLanguage defines attrib-name as a token/scope, yet getWordRangeAtPosition appears to break this into 2 words on the - character.
How can I get token ranges at a position based on my custom syntax? (And it would be really useful if I could get the scope name that goes along with it too).
_Copied from original issue: Microsoft/vscode-extensionbuilders#76_
_From @vilic on November 1, 2015 15:19_
:+1:
_From @egamma on November 2, 2015 8:16_
Exposing the scope names in the API is on the backlog, but will not make it into the November update.
_From @jrieken on November 2, 2015 18:16_
@billti despite the lack of access to scopes you can define your a custom word definition such that it will be picked up by document.getWordRangeAtPosition. You can register a ITokenTypeClassificationSupport which can contribute a regex to classify words.
_From @billti on November 2, 2015 19:3_
Thanks @jrieken , I spotted that, and it may be a useful interim solution. But generally for now, if I want to know the classification accurately for a position in a CFG, seems I'll need to document.getText() and run my own parser over it - is that right?
_From @jrieken on November 3, 2015 9:59_
unfortunately yes
@egamma on November 2, 2015 8:16
Exposing the scope names in the API is on the backlog, but will not make it into the November update.
Is there any update on if/when we can expect a way to get the scopes at a position or offset?
@hoovercj all I can currently say is that this is still on the backlog, sorry.
@egamma Any progress on this? Is there any way I can contribute? :)
Would it be trivial to provide a command that returns a url to the TextMate grammar file being used for a particular document/languageId (or return the contents of the file to keep them read-only)? Then we could use vscode-textmate ourselves to get the token info at a particular location.
@siegebell -- As a short-term solution, I have successfully included a textmate grammar with my extension , referenced that, and referenced the built-in vscode-textmate package to get token scopes in an extension.
It's a pain and it really should be part of the API, but it's definitely possible to do today.
I was given the advice to use: var tm = require(path.join(require.main.filename, '../../node_modules/vscode-textmate/release/main.js')); to access vscode-textmate, but since I have a language server I had to evaluate require.main.filename in the language client and pass it over with the initializationOptions to get the right value in my server.
@TimonVS exposing the scopes in API requires that we re-visit the internal representation of scopes, this requires major re-architecting and this makes challenging to open-up for contributions.
In the meantime, I've published an extension, scope-info, that provides an API to query the scope at a particular position. It works by querying the installed extensions for language definitions and grammars, and then maintains a parse-state for each open document using vscode-textmate. Only one instance will exist per vscode instance, regardless of how many other extensions depend on it.
Example usage:
import * as api from 'scope-info'
async function example(doc : vscode.TextDocument, pos: vscode.Position) {
const siExt = vscode.extensions.getExtension<api.ScopeInfoAPI>('siegebell.scope-info');
const si = await siExt.activate();
const t1 : api.Token = si.getScopeAt(doc, pos);
}
Notes:
IGrammar and scope name of a language.extensionDependency.exposing the scopes in API requires that we re-visit the internal representation of scopes, this requires major re-architecting and this makes challenging to open-up for contributions
@alexandrudima I believe the above was done as part of #18317
@aeschli Will #18068 be covering the feature ask in this current issue or are we suggesting extension authors to use https://marketplace.visualstudio.com/items?itemName=siegebell.scope-info?
Alex added a developer tool that lets you see the tokens at a location. See https://github.com/Microsoft/vscode/pull/17933#issuecomment-271515251
There's still no extension API that returns text-mate scopes. Several reasons for that one of them that we don't want that extensions start depending on a particular tokenizer grammar.
I think it is enough get the color at position, then associate it to an applied style: string, number, keyword...
This would also be very useful for me. I'm writing an extension for a custom ebnf syntax. The textmate grammar has all the information needed to provide linting, even for renaming symbols and basic syntax validation. (For this just filter the tokens by not having any scope attached -> unexpected token & syntax error)
I currently load the 'vscode-textmate' module that comes with vscode using some dirty workaround and use that to reparse open files. It's a lot of wasted CPU time and I can't easily do incremental changes. (I assume vscode already does this internally to speed up syntax highlighting)
Here's a few functions I could use:
#include directive)Here's my extension for some reference on how this information can be used:
https://github.com/Victorious3/vscode-TatSu/blob/635d3c1351b55048feac44f09203a95f1fc0c275/server/src/parse.ts
I don't understand why in my language extension, I need to re-parse all file to know if a character is commented, is string or not,
other ideas:
aeschli commented on Jan 19, 2017
There's still no extension API that returns text-mate scopes. Several reasons for that one of them that we don't want that extensions start depending on a particular tokenizer grammar.
That's actually very sad. The grammar already did most of the work needed for making outlines, and now we have to start all over, type all the same REGEX in to a TypeScript module and repeat it to get the same data?
Extensions covering languages that don't have servers usually bring their own grammar files too, so why can they not rely on the same grammar file for both needs.
Actually I think VS Code should build the outline from the grammar scopes (for languages that don't already have a symbol provider), as it would increase the number of languages that would benefit from the outline feature. The textmate grammar system is severely underutilized.
This is in addition to the common language extensions needs, such as knowing comments and strings.
Write like this?
document.getText(document.getWordRangeAtPosition(position, /[a-zA-Z_][\-a-zA-Z0-9_]*/));
I just ran into this issue when trying to extend auto-correct to behave in a smart way depending on the current cursor environment. Hence, I would love to see this functionality as well!
Are there any possible workarounds to get this working with the extension test host? I'd love to be able to write an end-to-end test to validate semantic highlighting is working, but couldn't find a way.
I'd love to be able to write an end-to-end test to validate semantic highlighting is working, but couldn't find a way.
An example ive seen is here:
https://github.com/styled-components/vscode-styled-components/blob/master/src/tests/suite/colorization.test.js
The idea is it has a fixture file, then calls captureSyntaxTokens and validates that against a pre defined results file. I'm not sure if there's more efficient ways but it works as an end to end test for syntax highlighting. I don't know if this changes for semantic highlighting
An example ive seen is here:
https://github.com/styled-components/vscode-styled-components/blob/master/src/tests/suite/colorization.test.js
Thanks for the suggestion! Looks this works well for testing TextMate grammars, but unfortunately not for semantic tokenization, as it invokes the grammar directly.
Most helpful comment
In the meantime, I've published an extension, scope-info, that provides an API to query the scope at a particular position. It works by querying the installed extensions for language definitions and grammars, and then maintains a parse-state for each open document using vscode-textmate. Only one instance will exist per vscode instance, regardless of how many other extensions depend on it.
Example usage:
Notes:
IGrammarand scope name of a language.extensionDependency.