Any bare (unquoted) string argument that has a - in it is being highlighted as a command:

The problem is that there is no way to distinguish between a function name in a declaration and a parameter from just the token information alone. I think that we'll need to add other checks (i.e. position or looking into the AST)


I think the important part is that the Test token and the That-Thing token have different flags:
> $ast = [System.Management.Automation.Language.Parser]::ParseInput('get-banana this-thing', [ref]$tok
s, [ref]$errs)
C:\Users\Robert Holt\Documents\Dev\Microsoft\PowerShellEditorServices [async-ps-consumer +0 ~4 -0 !]
> $toks[0].TokenFlags
CommandName
C:\Users\Robert Holt\Documents\Dev\Microsoft\PowerShellEditorServices [async-ps-consumer +0 ~4 -0 !]
> $toks[1].TokenFlags
None
That's how PSReadLine is able to differentiate them I imagine
@rjmholt I think what they're getting at is that the current regex based highlighter treats the name in a function definition like a command (which doesn't have the flag).
Personally I'd rather have the name in a function definition be parsed as a bare word than the current situation (that's how both the ISE and PSRL work currently). But a fix for both might be just keeping track of if the previous token kind was Function. Then also use TokenFlags.CommandName.
There's no way to differentiate between the Get-AA token (should be labelled as a command/function) and the That-Thing token (which should be labelled as something else) with only the Token info. So without using additional data, we either have the situation we have now, or Get-AA would not be highlighted
@justinytchen You also could change this up a bit:
to be:
Token[] tokens = file.ScriptTokens;
for (int i = 0; i < tokens.Length; i++)
{
Token token = tokens[i];
PushToken(token, builder);
if (token.Kind == TokenKind.Function)
{
// + bounds checks
Token next = tokens[++i];
PushTokenAsFunctionNameIfBareWord(next, builder);
}
}
I think what they're getting at is that the current regex based highlighter treats the name in a function definition like a command (which doesn't have the flag).
Yes, I noticed that. But in @Jaykul's original issue description he seems to be referring to the CommandAst situation rather than the FunctionDefinitionAst situation.
The FunctionDefinitionAst scenario is easy to address: we look back a token and see if it's the function keyword. Given that the grammar defines the full nature of the context, rather than just the tokens, we can't do a simple token search for all tokens, but I believe we can in this case.
A more sophisticated approach is to track our position in both the AST and the token array simulataneously and for each token look into where we are in the AST to determine the token semantics.
I've written an example where we track through tokens as we move through an AST here.
Or the converse, where we find the AST element corresponding to our token:
/// <summary>
/// An AST visitor to find the smallest AST containing a given IScriptPosition.
/// </summary>
internal class FindAstFromPositionVisitor : AstVisitor2
{
private readonly IScriptPosition _position;
private Ast _astAtPosition;
/// <summary>
/// Create a FindAstFromPositionVisitor around a given position,
/// ready to find the smallest AST containing that position.
/// </summary>
/// <param name="position">The position to find the containing AST of.</param>
public FindAstFromPositionVisitor(IScriptPosition position)
{
_position = position;
}
/// <summary>
/// Retrieve the computed smallest containing AST after a visit has been performed.
/// </summary>
/// <returns>The smallest AST containing the originally given position in the visited AST.</returns>
public Ast GetAst()
{
return _astAtPosition;
}
public override AstVisitAction VisitArrayExpression(ArrayExpressionAst arrayExpressionAst) => VisitAst(arrayExpressionAst);
public override AstVisitAction VisitArrayLiteral(ArrayLiteralAst arrayLiteralAst) => VisitAst(arrayLiteralAst);
public override AstVisitAction VisitAssignmentStatement(AssignmentStatementAst assignmentStatementAst) => VisitAst(assignmentStatementAst);
public override AstVisitAction VisitAttribute(AttributeAst attributeAst) => VisitAst(attributeAst);
public override AstVisitAction VisitAttributedExpression(AttributedExpressionAst attributedExpressionAst) => VisitAst(attributedExpressionAst);
public override AstVisitAction VisitBaseCtorInvokeMemberExpression(BaseCtorInvokeMemberExpressionAst baseCtorInvokeMemberExpressionAst) => AstVisitAction.SkipChildren;
public override AstVisitAction VisitBinaryExpression(BinaryExpressionAst binaryExpressionAst) => VisitAst(binaryExpressionAst);
public override AstVisitAction VisitBlockStatement(BlockStatementAst blockStatementAst) => VisitAst(blockStatementAst);
public override AstVisitAction VisitBreakStatement(BreakStatementAst breakStatementAst) => VisitAst(breakStatementAst);
public override AstVisitAction VisitCatchClause(CatchClauseAst catchClauseAst) => VisitAst(catchClauseAst);
public override AstVisitAction VisitCommand(CommandAst commandAst) => VisitAst(commandAst);
public override AstVisitAction VisitCommandExpression(CommandExpressionAst commandExpressionAst) => VisitAst(commandExpressionAst);
public override AstVisitAction VisitCommandParameter(CommandParameterAst commandParameterAst) => VisitAst(commandParameterAst);
public override AstVisitAction VisitConfigurationDefinition(ConfigurationDefinitionAst configurationDefinitionAst) => VisitAst(configurationDefinitionAst);
public override AstVisitAction VisitConstantExpression(ConstantExpressionAst constantExpressionAst) => VisitAst(constantExpressionAst);
public override AstVisitAction VisitContinueStatement(ContinueStatementAst continueStatementAst) => VisitAst(continueStatementAst);
public override AstVisitAction VisitConvertExpression(ConvertExpressionAst convertExpressionAst) => VisitAst(convertExpressionAst);
public override AstVisitAction VisitDataStatement(DataStatementAst dataStatementAst) => VisitAst(dataStatementAst);
public override AstVisitAction VisitDoUntilStatement(DoUntilStatementAst doUntilStatementAst) => VisitAst(doUntilStatementAst);
public override AstVisitAction VisitDoWhileStatement(DoWhileStatementAst doWhileStatementAst) => VisitAst(doWhileStatementAst);
public override AstVisitAction VisitDynamicKeywordStatement(DynamicKeywordStatementAst dynamicKeywordStatementAst) => VisitAst(dynamicKeywordStatementAst);
public override AstVisitAction VisitErrorExpression(ErrorExpressionAst errorExpressionAst) => VisitAst(errorExpressionAst);
public override AstVisitAction VisitErrorStatement(ErrorStatementAst errorStatementAst) => VisitAst(errorStatementAst);
public override AstVisitAction VisitExitStatement(ExitStatementAst exitStatementAst) => VisitAst(exitStatementAst);
public override AstVisitAction VisitExpandableStringExpression(ExpandableStringExpressionAst expandableStringExpressionAst) => VisitAst(expandableStringExpressionAst);
public override AstVisitAction VisitFileRedirection(FileRedirectionAst redirectionAst) => VisitAst(redirectionAst);
public override AstVisitAction VisitForEachStatement(ForEachStatementAst forEachStatementAst) => VisitAst(forEachStatementAst);
public override AstVisitAction VisitForStatement(ForStatementAst forStatementAst) => VisitAst(forStatementAst);
public override AstVisitAction VisitFunctionDefinition(FunctionDefinitionAst functionDefinitionAst) => VisitAst(functionDefinitionAst);
public override AstVisitAction VisitFunctionMember(FunctionMemberAst functionMemberAst) => VisitAst(functionMemberAst);
public override AstVisitAction VisitHashtable(HashtableAst hashtableAst) => VisitAst(hashtableAst);
public override AstVisitAction VisitIfStatement(IfStatementAst ifStmtAst) => VisitAst(ifStmtAst);
public override AstVisitAction VisitIndexExpression(IndexExpressionAst indexExpressionAst) => VisitAst(indexExpressionAst);
public override AstVisitAction VisitInvokeMemberExpression(InvokeMemberExpressionAst methodCallAst) => VisitAst(methodCallAst);
public override AstVisitAction VisitMemberExpression(MemberExpressionAst memberExpressionAst) => VisitAst(memberExpressionAst);
public override AstVisitAction VisitMergingRedirection(MergingRedirectionAst redirectionAst) => VisitAst(redirectionAst);
public override AstVisitAction VisitNamedAttributeArgument(NamedAttributeArgumentAst namedAttributeArgumentAst) => VisitAst(namedAttributeArgumentAst);
public override AstVisitAction VisitNamedBlock(NamedBlockAst namedBlockAst) => VisitAst(namedBlockAst);
public override AstVisitAction VisitParamBlock(ParamBlockAst paramBlockAst) => VisitAst(paramBlockAst);
public override AstVisitAction VisitParameter(ParameterAst parameterAst) => VisitAst(parameterAst);
public override AstVisitAction VisitParenExpression(ParenExpressionAst parenExpressionAst) => VisitAst(parenExpressionAst);
public override AstVisitAction VisitPipeline(PipelineAst pipelineAst) => VisitAst(pipelineAst);
public override AstVisitAction VisitPropertyMember(PropertyMemberAst propertyMemberAst) => VisitAst(propertyMemberAst);
public override AstVisitAction VisitReturnStatement(ReturnStatementAst returnStatementAst) => VisitAst(returnStatementAst);
public override AstVisitAction VisitScriptBlock(ScriptBlockAst scriptBlockAst) => VisitAst(scriptBlockAst);
public override AstVisitAction VisitScriptBlockExpression(ScriptBlockExpressionAst scriptBlockExpressionAst) => VisitAst(scriptBlockExpressionAst);
public override AstVisitAction VisitStatementBlock(StatementBlockAst statementBlockAst) => VisitAst(statementBlockAst);
public override AstVisitAction VisitStringConstantExpression(StringConstantExpressionAst stringConstantExpressionAst) => VisitAst(stringConstantExpressionAst);
public override AstVisitAction VisitSubExpression(SubExpressionAst subExpressionAst) => VisitAst(subExpressionAst);
public override AstVisitAction VisitSwitchStatement(SwitchStatementAst switchStatementAst) => VisitAst(switchStatementAst);
public override AstVisitAction VisitThrowStatement(ThrowStatementAst throwStatementAst) => VisitAst(throwStatementAst);
public override AstVisitAction VisitTrap(TrapStatementAst trapStatementAst) => VisitAst(trapStatementAst);
public override AstVisitAction VisitTryStatement(TryStatementAst tryStatementAst) => VisitAst(tryStatementAst);
public override AstVisitAction VisitTypeConstraint(TypeConstraintAst typeConstraintAst) => VisitAst(typeConstraintAst);
public override AstVisitAction VisitTypeDefinition(TypeDefinitionAst typeDefinitionAst) => VisitAst(typeDefinitionAst);
public override AstVisitAction VisitTypeExpression(TypeExpressionAst typeExpressionAst) => VisitAst(typeExpressionAst);
public override AstVisitAction VisitUnaryExpression(UnaryExpressionAst unaryExpressionAst) => VisitAst(unaryExpressionAst);
public override AstVisitAction VisitUsingExpression(UsingExpressionAst usingExpressionAst) => VisitAst(usingExpressionAst);
public override AstVisitAction VisitUsingStatement(UsingStatementAst usingStatementAst) => VisitAst(usingStatementAst);
public override AstVisitAction VisitVariableExpression(VariableExpressionAst variableExpressionAst) => VisitAst(variableExpressionAst);
public override AstVisitAction VisitWhileStatement(WhileStatementAst whileStatementAst) => VisitAst(whileStatementAst);
private AstVisitAction VisitAst(Ast ast)
{
if (!AstContainsPosition(ast))
{
return AstVisitAction.SkipChildren;
}
_astAtPosition = ast;
return AstVisitAction.Continue;
}
private bool AstContainsPosition(Ast ast)
{
return IsBefore(ast.Extent.StartScriptPosition, _position)
&& IsBefore(_position, ast.Extent.EndScriptPosition);
}
private static bool IsBefore(IScriptPosition left, IScriptPosition right)
{
return left.LineNumber < right.LineNumber
|| (left.LineNumber == right.LineNumber && left.ColumnNumber <= right.ColumnNumber);
}
}
Ok so the problem is actually that generic tokens with no flags are currently highlighted as functions.
Bareword arguments without -s are tokenised as identifiers rather than generic tokens, so they fall through to the grammar.
We can recognise a bareword identifier most correctly as an identifier or a generic token that is subordinate to a command token... Harder to do though.
We need to pick a new value based on this list. I would guess Member, Property or EnumMember would work.
Property is probably the least threatening because Member in the Monokai theme was lime green lol
because Member in the Monokai theme was lime green
Ugh, yeah, this is what I mean by themes requiring hacks
A "generic token" that's a parameter ... is actually a string, not a member.
@Jaykul true but nothing currently highlights a bare word string literal command arg like every other string literal. member is probably the closest to what existing tools highlight it as.
Are you really arguing that since everyone else is wrong, this should be wrong too? It's a string. The fact that PowerShell _allows_ you to leave off the quotes doesn't change the fact that it's a string. It should be highlighted as a string if it's Kind = "Identifier" or Kind = "Generic" with TokenFlags = "None" (and follows a command?).
It's most definitely _not_ a member. Highlighting it as member is _worse than nothing_ (although not worse than highlighting it as a command) in my opinion.
If you really decide you _need_ to use member, please instead send a PR to the SemanticTokenTypes enum to add an a "unspecified" or "otherliteral" or something.
Are you really arguing that since everyone else is wrong, this should be wrong too?
Nah, I'm saying people will be annoyed if it's highlighted like every other string literal, because that's what they're used to.
It's a string. The fact that PowerShell allows you to leave off the quotes doesn't change the fact that it's a string.
Yeah, no one here thinks it isn't a string. Doesn't change the fact that folks expect it to be highlighted differently 馃し
Yeah it's not about how PowerShell uses it, it's about how it's highlighted. Categorising it as a string for highlighting purposes will make 'input' and input the same colour, whereas in PSReadLine for example, they are not at all (here vs 'there'):

Are you really arguing that since everyone else is wrong, this should be wrong too? It's a string.
Please keep your tone civil. We can be more constructive without rhetoric like this.
I don't think there's that much backward compatibility to worry about. It's already drastically changing most of the highlighting... and not just breaking changes like using statements with paths, escaped characters, #requires, etc.
And PSReadline is _wrong_ about that too. Both your here and 'there' are strings. Why would you want to make them seem different? No offense intended. I mean, I did most of the regex parser, I'm not trying to blame anyone. I'm just pointing out that it's literally a string (and now, we know that) but it's not highlighted as one just because ... we never bothered to get it right.
@Jaykul that's all fair and I don't necessarily disagree. Tbh, I think it would bother people. I don't really have any specific logic to back that up with, but I think it would get a lot of complaints.
Personally I'd also rather stick closer to what PSRL does, even if it isn't correct. PSRL highlighting barewords differently changed how they feel vs quoted strings imo, for better or worse.
Both your here and 'there' are strings. Why would you want to make them seem different?
Yeah, I definitely understand your reasoning here.
We're in a strange set because bareword strings are a rare occurrence among languages, so there's not good precedent or mechanisms built in (in an environment mainly designed to cater to C# and TypeScript) for them.
I don't think there's that much backward compatibility to worry about
Agreed. Certainly as a visual feature, I don't personally see any concerns about "breaking changes" so to speak.
I think of highlighting as a visual representation of tokeniser mode. Even though bareword strings and quoted strings represent the same type of value, when I'm typing them or looking at my program, the thing I care about is what delimits them. A quote is easy to miss, but the specific colour of a quoted string tells me that I'm in a special mode where the rules are different. That's not true of bareword strings, because the tokeniser will find a lot more reasons to end the string token.
Certainly I think that if we made a change to classify quoted strings and bareword strings the same way, the majority of our users would have complaints due to struggling to differentiate the different types of strings in their scripts.
In an ideal world, we would have a BareWordString token type that a VSCode theme could pick up on so that anyone could choose to make them the same or different colours. But in the real world we need a fallback that will cater to common themes out there that are mainly built for programming languages with tame syntaxes. I think that's the essence of my proposal in https://github.com/PowerShell/vscode-powershell/issues/2852#issuecomment-669259784.
(The unfortunate side-effect of the success of the C-syntax-family languages is that much tooling forgets just how different some languages are)
While you're convincing me this whole product is not for me, I disagree fundamentally with the base assumption that your users want what they already have.
Here's PSReadLine, Semantic, and EditorSyntax:

PSReadLine is better than the others, and (when it's not coloring strings as commands) sematic is better than editor syntax, but in all three the (lack of) highlighting of bare strings is incorrect and _does not help authors understand their code._
I have no problem with the idea of having a language-specific token (or, in this case, a shell-language-specific token) and a fallback token type for themes that don't understand the type of language that PowerShell is, but if you want to do that, you also need to provide a first-class theme that you can point to as an example (and in _this_ day and age, you need a dark version of it too). However, that's not what's happening here. And in any case, the ISE theme doesn't provide better information. If anything, it's worse in this case, because keywords are virtually indistinquishable from bareword strings.
If you wanted to have different colors for bareword, single quoted, and double quoted strings, I would would be down with that. But that falls down if Single and Double quoted strings are one kind of string, and bare strings are the only one that's totally different...
So I think we need to take this in two stages.
The first stage is parity: being as close to PSReadLine as possible. This will allow both the console and editing experience to be on the same playing field. (this was actually the original goal of doing this work from the beginning)
The second stage is improving: Once both experiences are roughly aligned, then we can go back and make the changes in both to support the language better.
Most helpful comment
So I think we need to take this in two stages.
The first stage is parity: being as close to PSReadLine as possible. This will allow both the console and editing experience to be on the same playing field. (this was actually the original goal of doing this work from the beginning)
The second stage is improving: Once both experiences are roughly aligned, then we can go back and make the changes in both to support the language better.