Monaco-editor: token info (changes made on 0.8)

Created on 1 Feb 2017 · 8Comments · Source: microsoft/monaco-editor

how do I get the token type? E.g. know type at a position (in e.g. a .css model), if it's a value/color etc.

before I used something like tokens = model.getLineTokens then tokenIndex = tokens.findTokenIndexAtOffset and then tokens.getTokenType(tokenIndex) - here was a list of the token types.. (not a sane type - but it worked)

but now the token is only color info - no type info.. I assume that this is all because of perf. but:

i get that i can call the new monaco.editor.tokenize that brings back some good token info - but that seems a overkill, because before i could use what was already parsed + on updates only iterate the range that was modified (including the state of the range that was modified)

thoughts?

and thanks a bunch - I know this is digging into internal code/using non-pub API

Source

AndersMad

Most helpful comment

You're perfectly correct, it is terribly inefficient.

To get the human readable (non themed) tokens (in an efficient way) you could access some privates on the model. I do not wish to expose this on the API at this time, as I would like to explore running tokenizers in web workers and that would break any exposed API.

var editor = monaco.editor.create(document.getElementById("container"), {
    value: "function hello() {\n\talert('Hello world!');\n}",
    language: "javascript"
});

function getTokensAtLine(model, lineNumber) {
    // Force line's state to be accurate
    model.getLineTokens(lineNumber, /*inaccurateTokensAcceptable*/false);
    // Get the tokenization state at the beginning of this line
    var freshState = model._lines[lineNumber - 1].getState().clone();
    // Get the human readable tokens on this line
    return model._tokenizationSupport.tokenize(model.getLineContent(lineNumber), freshState, 0).tokens;
}

// Could observe TokenizationRegistry.onDidChange instead of timeout
setTimeout(function() {
    var tokens = getTokensAtLine(editor.getModel(), 2);
    console.log(tokens);
}, 1000);

alexdima on 3 Feb 2017

❤1 👍1

All 8 comments

@AndersMad I was also doing some digging and then it turned out: https://github.com/Microsoft/monaco-editor/issues/332#issuecomment-274429960

akosyakov on 2 Feb 2017

@akosyakov did you end up tokenizing the entire document for each change then?

AndersMad on 2 Feb 2017

@AndersMad yes whenever content or language of a model is changed, so far there was not performance issues

akosyakov on 2 Feb 2017

The model no longer holds on to the tokens, it only holds the resolved token+theme information.

alexdima on 3 Feb 2017

ok, so i think the idea here is to manually, on each update mark it dirty and update it (pref. in a worker process), by keeping the token[][] array (using monaco.editor.tokenize) with an iterator on top and let each service (lazy update if marked dirty) use that then.. because methods like getModeIdAtPosition is also gone (e.g. for lookup)..

@alexandrudima: however, I do find it a bit overhead to get the model text (joins all lines) and then having the monaco.editor.tokenize split the string by CrLf - instead for being able to run it directly on the model lines.. maybe a simple access to the tokenizer e.g. by public version of getSafeTokenizationSupport? then one can maintain the tokens data very fast for non-mixed language models too (by lines updates)..

AndersMad on 3 Feb 2017

You're perfectly correct, it is terribly inefficient.

var editor = monaco.editor.create(document.getElementById("container"), {
    value: "function hello() {\n\talert('Hello world!');\n}",
    language: "javascript"
});

function getTokensAtLine(model, lineNumber) {
    // Force line's state to be accurate
    model.getLineTokens(lineNumber, /*inaccurateTokensAcceptable*/false);
    // Get the tokenization state at the beginning of this line
    var freshState = model._lines[lineNumber - 1].getState().clone();
    // Get the human readable tokens on this line
    return model._tokenizationSupport.tokenize(model.getLineContent(lineNumber), freshState, 0).tokens;
}

// Could observe TokenizationRegistry.onDidChange instead of timeout
setTimeout(function() {
    var tokens = getTokensAtLine(editor.getModel(), 2);
    console.log(tokens);
}, 1000);

alexdima on 3 Feb 2017

❤1 👍1

ah - it's right there on the model.. I was looking at inspectTokens.ts that is accessing a lot more internals to get the tokenizationSupport.. thanks! :)

AndersMad on 3 Feb 2017

.. and fyi it works super - I can now tokenize a (changes) range in a document adding e.g. color decorators to a css file and it is blazingly fast!! generated a 1mb huge .css file and edited it with no problems..

AndersMad on 3 Feb 2017

👍1

Was this page helpful?

0 / 5 - 0 ratings