Powershell: tokenizer treats commands as expandable strings

Created on 1 Aug 2019  路  17Comments  路  Source: PowerShell/PowerShell

The tokenizer seems to treat commands (and some other tokens) as expandable strings. Why is this? It makes misleading colorization in PSReadLine.

function crazy$function {echo test} # `$function` appears to be a variable, but is not

crazy$function # again, `$function` appears to be a variable that will expand, but it is not

image

Another one that occurs is if you start a command with a dash, the tokenizer marks it as a parameter, but it should mark it as a command.

Issue-Question WG-Engine

Most helpful comment

By design, a command name can contain any characters that can be in a file name since, rather obviously, files are commands. And, since it's also perfectly valid to have a file foo$bar.ps1, such names must be supported as commands. So functionally, everything is fine and cannot change. But it would be nice to fix the highlighting . The token is correct in the AST. Can you not use that?

All 17 comments

I'd say that is edge cases and maybe it makes no sense to fix.

I think its very critical that highlighting works correctly, it provides guidance to a less experienced user. I just don't know if there is something dependent on this. I can't imagine so, since its actually not a token.

Highlighting is actually taken based iirc, so it is a different token. The parser just treats it as a command name due to context I guess?

@msftrncs I think @iSazonov means that having a function name likefoo$bar is an edge-case.

I'm not sure that 'edge-case' should be in the vocabulary of a software that tries to pride itself on not having limits (or at least syntax to get around limits). Since a file name could have a $, and since an unquoted command is NOT expandable (without an operator), the fact that the perceived token $function is displayed as a variable is misleading. Its exactly the sort of thing that syntax highlighting is supposed to resolve. Its the exact reason I have overhauled the PowerShell\EditorSyntax tmLanguage syntax (open demo PR), and this is the exact reason I have an open PR trying to fix as many of the completion imperfections as I can, because there really shouldn't be any edge-cases.

I think the issue might be, if for some reason it really does need to be a token independent of the rest of the command name, that the TokenFlags.CommandName is not set on the $function token. However, since the tokenizer knew the first part was a command name, there really isn't any reason for it marking off the $function portion as a separate token, its obviously aware of the context.

I was able to show a correction for the command named -hello, because even though it is tokenized as a Parameter (I don't necessarily agree with that tokenization either) its also flagged as a CommandName so I moved PSReadline's determination of a command name color above that of parameters and now the command/function -hello colors as a command. However, it doesn't work for the command crazy$function, the $function part is a separate token and not flagged as a command name like the first part is.

Windows PowerShell with PSReadLine unmodified
image
PowerShell 7 with modified PSReadLine
image

I am not however, saying this is more critical than functionality that actually provides work. There is many parts of that also broken, and probably also being considered 'edge-case'.

I was able to figure out why this happens.

The parser blindly parses nested tokens in a CommandArgumentContext.CommandName condition. When the parse cleans up going through the AST stage, it makes sure to make the correct AST for the non-expandable token, but since it returns the token for the command as is, the NestedTokens field is still live, which means the token's type is still StringExpandableToken, so PSReadLine then breaks down and colors each token. If I selectively wipe out the NestedTokens field, I can stop this behavior, and I didn't seem to cause any ill effects.

image

What if there is a variable with name $function or $green?

@iSazonov the variable isn't expanded and is never evaluated as a variable, it would seem.

image

My question is about "test$green" - what do we expect and why?

There is a conflict there, really. The _tokenizer_ sees two tokens, as evidenced by the token-based highlighting. The _parser_ makes the choice to treat them as one command name, it seems.

Maybe @daxian-dbw would know if we ought to change this behaviour? At least from _appearances_ I would think most people's expectation is that the variable is indeed separate from the command name there.

It is a PowerShell design principle - don't limit users without urgent need. And it is a breaking change area - we can change almost nothing in parser without breaking change.

By design, a command name can contain any characters that can be in a file name since, rather obviously, files are commands. And, since it's also perfectly valid to have a file foo$bar.ps1, such names must be supported as commands. So functionally, everything is fine and cannot change. But it would be nice to fix the highlighting . The token is correct in the AST. Can you not use that?

Yeah fixing the highlighting would help folks not get the wrong impression. Seems like @msftrncs has one possible way to fix it. Not sure if that's the ideal solution but it seems to be functional at least.

A fix might be as trivial as adding:

            functionNameToken.TokenFlags |= TokenFlags.CommandName;

in FunctionDeclarationRule.

@lzybkr, I think, if the flag CommandName was set on a function name token, it would have the side effect of highlighting it as a command, which is probably not desired here (though that's how it works in most syntax highlighting in editors themes, but not all.

@bpayette, PSReadLine uses the tokens. It was the discrepancy between the AST and what I was seeing in PSReadLine that led me to understand the process.

@iSazonov, I've made changes in the parser, nothing has broken on me ... yet! 馃槂

I will present a PR, as I think I found the needed places to improve PSReadLine's highlighting with out making any changes in PSReadLine. However, I understand, changing the output from the parser is always a breaking change.

I made an error in my previous statement. I think one change is still required in PSReadLine as nothing really is wrong in the tokenizer for a Parameter token.

The coloring for tokens with the CommandName flag set still needs to be captured before all the other conditions, in order to catch:

-hello # is a command, but highlights as a parameter, but is marked with a CommandName flag

@lzybkr, I think, if the flag CommandName was set on a function name token, it would have the side effect of highlighting it as a command, which is probably not desired here (though that's how it works in most syntax highlighting in editors themes, but not all.

I don't follow. The function names a command, it is invoked as a command, so why should the name token not specify CommandName?

I do wonder though if the tokenizer should have some additional context that a command name is expected, which is slightly different than simply being in "command mode", in which case nested tokens would never be scanned, just like if the token was enclosed in single quotes.

@lzybkr, I think, if the flag CommandName was set on a function name token, it would have the side effect of highlighting it as a command, which is probably not desired here (though that's how it works in most syntax highlighting in editors themes, but not all.
I don't follow. The function names a command, it is invoked as a command, so why should the name token not specify CommandName?

2 problems I see, to clarify why I said that:

  1. Currently, function names remain highlighted in the default color, not in 'Command' color, a change people could probably accept. (I notice Class names do get highlighted in the 'Type' color, but then why not method names (in Class) highlighted like members?).
  2. To fix the expandable token issue you'd probably be looking at making it so CommandName flag never highlights nested tokens, but expandable command names are possible via the & operator and they are already marked with CommandName flag. A possibility would be to mark all the NestedTokens of the non-expandable command names with the CommandName flag too.

I do wonder though if the tokenizer should have some additional context that a command name is expected, which is slightly different than simply being in "command mode", in which case nested tokens would never be scanned, just like if the token was enclosed in single quotes.

I'm finding a reason to be careful here. While my completion PR demonstrates eliminating using the parser/tokenizer for the basis of argument completions, I'm finding that more difficult for command completions. However, if I was to go back to using the parser/tokenizer and implement a better CompletionRequiresQuotes method (using the parser), it will be very important that there would be a way to detect the differences between tokenizing a command (where command mode vs expression mode hasn't yet been determined) and an argument (where command mode is for sure).

It might be that there could use to be some token testing methods, such as to test 'IsInputSuitableFor[Expandable]CommandName()' and 'IsInputSuitableForArgument()' that could prevent calling up the entire parser, just to see if an input can be represented in a single token unmodified.

Was this page helpful?
0 / 5 - 0 ratings