how do I get the token type? E.g. know type at a position (in e.g. a .css model), if it's a value/color etc.
before I used something like tokens = model.getLineTokens then tokenIndex = tokens.findTokenIndexAtOffset and then tokens.getTokenType(tokenIndex) - here was a list of the token types.. (not a sane type - but it worked)
but now the token is only color info - no type info.. I assume that this is all because of perf. but:
i get that i can call the new monaco.editor.tokenize that brings back some good token info - but that seems a overkill, because before i could use what was already parsed + on updates only iterate the range that was modified (including the state of the range that was modified)
thoughts?
and thanks a bunch - I know this is digging into internal code/using non-pub API
@AndersMad I was also doing some digging and then it turned out: https://github.com/Microsoft/monaco-editor/issues/332#issuecomment-274429960
@akosyakov did you end up tokenizing the entire document for each change then?
@AndersMad yes whenever content or language of a model is changed, so far there was not performance issues
The model no longer holds on to the tokens, it only holds the resolved token+theme information.
ok, so i think the idea here is to manually, on each update mark it dirty and update it (pref. in a worker process), by keeping the token[][] array (using monaco.editor.tokenize) with an iterator on top and let each service (lazy update if marked dirty) use that then.. because methods like getModeIdAtPosition is also gone (e.g. for lookup)..
@alexandrudima: however, I do find it a bit overhead to get the model text (joins all lines) and then having the monaco.editor.tokenize split the string by CrLf - instead for being able to run it directly on the model lines.. maybe a simple access to the tokenizer e.g. by public version of getSafeTokenizationSupport? then one can maintain the tokens data very fast for non-mixed language models too (by lines updates)..
You're perfectly correct, it is terribly inefficient.
To get the human readable (non themed) tokens (in an efficient way) you could access some privates on the model. I do not wish to expose this on the API at this time, as I would like to explore running tokenizers in web workers and that would break any exposed API.
var editor = monaco.editor.create(document.getElementById("container"), {
value: "function hello() {\n\talert('Hello world!');\n}",
language: "javascript"
});
function getTokensAtLine(model, lineNumber) {
// Force line's state to be accurate
model.getLineTokens(lineNumber, /*inaccurateTokensAcceptable*/false);
// Get the tokenization state at the beginning of this line
var freshState = model._lines[lineNumber - 1].getState().clone();
// Get the human readable tokens on this line
return model._tokenizationSupport.tokenize(model.getLineContent(lineNumber), freshState, 0).tokens;
}
// Could observe TokenizationRegistry.onDidChange instead of timeout
setTimeout(function() {
var tokens = getTokensAtLine(editor.getModel(), 2);
console.log(tokens);
}, 1000);
ah - it's right there on the model.. I was looking at inspectTokens.ts that is accessing a lot more internals to get the tokenizationSupport.. thanks! :)
.. and fyi it works super - I can now tokenize a (changes) range in a document adding e.g. color decorators to a css file and it is blazingly fast!! generated a 1mb huge .css file and edited it with no problems..
Most helpful comment
You're perfectly correct, it is terribly inefficient.
To get the human readable (non themed) tokens (in an efficient way) you could access some privates on the model. I do not wish to expose this on the API at this time, as I would like to explore running tokenizers in web workers and that would break any exposed API.