Vscode-powershell: Add support for all token types with the new semantic highlighting

Created on 2 Aug 2020  路  11Comments  路  Source: PowerShell/vscode-powershell

Summary of the new feature

Powershell has 20 tokentypes and it would be nice if the semantic highlighting supported all of them. See: https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.pstokentype

The current semantic highlighting implementation is currently missing the following types:

  • Attribute
  • CommandArgument
  • GroupEnd/GroupStart
  • LineContinuation (backtick)
  • LoopLabel
  • StatementSeparator (Semicolon)

If/When these gets added I would suggest changing the following mappings:

  • Value in Using namespace <Value> should use CommandArgument scope. Currently it uses Function.
  • Attributes like [cmdletbinding()] should use Attribute scope. Currently it uses Type.
  • Names in functions and configurations should use CommandArgument scope. Currently they use the Function scope except functions with no dashes which has no semantic token type.
  • Unquoted parameter values like "C:\SomePath" in Get-ChildItem C:\SomePath should use CommandArgument scope. Currently they use the function scope.

The following screenshot hopefully shows what I mean:
Mapping issue

Area-Semantic Highlighting Area-UI Issue-Enhancement

Most helpful comment

If there's a way to do inheritance or something where you have a new token type called Attribute that falls back to Type if the theme doesn't support it that might be cool

The problem is that no popular theme will support it, and we've just increased complexity without fixing the issue. Most people will continue using their Dark+/Monokai/Solarized theme and see attributes and types the same. So it'll end up being like in https://github.com/PowerShell/vscode-powershell/issues/2856#issuecomment-669240917, where we correctly categorise the token differently and the colour is the same.

Otherwise I'm not sure it's a great idea to use a classification that isn't accurate (when an accurate one exists).

This is the real problem with themes. The themes themselves tend to be hacks designed to achieve nice coloration in the face of bad token categorisation. Which in turns forces the token classifiers to lie. I seem to recall there was a contentious change in the EditorSyntax repo where tokens were categorised more correctly and people complained because highlighting got worse.

I think the ideal here would be for us to define a correct token type, and a hack fallback, like: attribute -> label. That way the ISE theme could colour things with high specificity but users of popular themes will still get the color differentiation they're looking for.

All 11 comments

PSTokenType is the old type... It's a Windows PowerShell v2 relic...

Token is the new type:
https://docs.microsoft.com/dotnet/api/system.management.automation.language.token

Which has a TokenKind:
https://docs.microsoft.com/dotnet/api/system.management.automation.language.tokenkind

Interesting, I didn't know that. I still think it's worth taking inspiration from PSTokenType when mapping TokenKind to the VS code tokens because 20 tokens is a lot more manageable than 100+ and the old TokenType is basically logical groupings of the newer more granular TokenKind.
Personally I'm mostly interested in getting separate tokens for Attributes and unquoted parameters so I can make them stick out a bit more from everything else.

The best way to see if we can do anything about this is to run it through the PowerShell parser/tokenizer:

$tokens = $null
$errs = $null
[System.management.automation.language.Parser]::ParseInput($myScriptAsAStr, [ref] $tokens, [ref] $errs)

Then look at $tokens. If there's something distinguishing about Attributes or un-quoted parameters, we can do something!... But if there isn't, then we can't, I'm afraid.

(I didn't test that code, but it should be something along those lines... The docs: https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.language.parser.parseinput?view=powershellsdk-7.0.0#System_Management_Automation_Language_Parser_ParseInput_System_String_System_Management_Automation_Language_Token____System_Management_Automation_Language_ParseError____)

Then you can do something :) Attributes have the "AttributeName" token flag which as far as I can tell is completely unique to attributes.
Unquoted parameters have the kind+TokenFlag combination of "Generic" + "None" which would affect the same things I suggested "CommandArgument" would affect in the original post above.

For convenience here's a table showing the relevant types and their Kind+TokenFlag combinations:

Token | Kind | TokenFlags
------------ | ------------- | -------------
Command | Generic | CommandName
Unquoted Parameter | Generic | None
Type | Identifier | TypeName
Attribute| Identifier | TypeName,AttributeName

I don't know how I would do it. Updating the if statements and/or switch would be simple enough, but none of the standard semantic token types seem to really fit with the Powershell tokens so I would have to define a new type/sub type like it describes here: https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#semantic-token-classification

Yeah I'd rather not add a new token type... Maybe use one of the token types we don't actually use.

Type is the most accurate (it is a type, but it is also an attribute). If there's a way to do inheritance or something where you have a new token type called Attribute that falls back to Type if the theme doesn't support it that might be cool. Otherwise I'm not sure it's a great idea to use a classification that isn't accurate (when an accurate one exists).

If there's a way to do inheritance or something where you have a new token type called Attribute that falls back to Type if the theme doesn't support it that might be cool

The problem is that no popular theme will support it, and we've just increased complexity without fixing the issue. Most people will continue using their Dark+/Monokai/Solarized theme and see attributes and types the same. So it'll end up being like in https://github.com/PowerShell/vscode-powershell/issues/2856#issuecomment-669240917, where we correctly categorise the token differently and the colour is the same.

Otherwise I'm not sure it's a great idea to use a classification that isn't accurate (when an accurate one exists).

This is the real problem with themes. The themes themselves tend to be hacks designed to achieve nice coloration in the face of bad token categorisation. Which in turns forces the token classifiers to lie. I seem to recall there was a contentious change in the EditorSyntax repo where tokens were categorised more correctly and people complained because highlighting got worse.

I think the ideal here would be for us to define a correct token type, and a hack fallback, like: attribute -> label. That way the ISE theme could colour things with high specificity but users of popular themes will still get the color differentiation they're looking for.

@rjmholt if that's possible, I like that plan. I'll probably open an issue on vscode to start this discussion.

Also, I was skimming through the LSP docs and I noticed there's also a section for modifiers:

https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens

export enum SemanticTokenModifiers {
    declaration = 'declaration',
    definition = 'definition',
    readonly = 'readonly',
    static = 'static',
    deprecated = 'deprecated',
    abstract = 'abstract',
    async = 'async',
    modification = 'modification',
    documentation = 'documentation',
    defaultLibrary = 'defaultLibrary'
}

I believe these can be added on to the TokenType to provide additional context. At first glance nothing jumps at me.

Was this page helpful?
0 / 5 - 0 ratings