Roslyn: SyntaxGenerator.NameOfExpression generates node with incorrect contextual kind

Created on 12 Jan 2018 · 35Comments · Source: dotnet/roslyn

Version Used: 2.6.0

Steps to Reproduce:

Use the API SyntaxGenerator.NameOfExpression to generate a nameof expression for C#. The identifier generated for nameof should be a contextual keyword, but it is not. Following test should pass, but fails:

var nameofExpression = (InvocationExpressionSyntax)_g.NameOfExpression(_g.IdentifierName("x"));
Assert.True(((IdentifierNameSyntax)nameofExpression.Expression).Identifier.IsContextualKeyword());

See skipped test TestNameOfExpressions_ContextualKeyword added with https://github.com/dotnet/roslyn/pull/24213

As a result, binding fails for the compilation with generated syntax node with CS0103 The name 'nameof' does not exist in the current context.

Area-Compilers Bug

Source

mavasani

Most helpful comment

That is expected behavior

I do not agree, and I am not sure why that is an expected behavior. Are you suggesting that any syntax node generated out of SyntaxGenerator needs the entire source file to be reparsed to ensure no errors? I donot believe that was the original goal of SyntaxGenerator.

Additionally, I don't understand how SyntaxGenerator.NameOfExpression generating an identifier token with incorrect syntax kind (SyntaxKind.Identifier instead of SyntaxKind.NameOfKeyword) is by design.

mavasani on 12 Jan 2018

👍3

All 35 comments

Hrmm.. Should IdentifierName just do this automatically?

CyrusNajmabadi on 12 Jan 2018

@CyrusNajmabadi I am fine if @dotnet/roslyn-compiler team agrees to implement it both the C# and VB syntax factory methods for IdentifierName. Otherwise, we need to detect this case and workaround in the SyntaxGenerator.

mavasani on 12 Jan 2018

Tagging @jcouv for his opinion as he has dealt with similar nameof issues recently.

mavasani on 12 Jan 2018

I suppose as this is just a problem with NameOfExpression then we could just patch that method alone.

CyrusNajmabadi on 12 Jan 2018

@mavasani I believe this may not be a bug. The same situation applies for var.

The exception to this would be a case where nameof is used in a location where a method name is not allowed, such as the default value for a string parameter.

sharwell on 12 Jan 2018

@sharwell This is most certainly a bug. Our nameof fixer uses this API here to generate a nameof(x) node in place of string literal "x", where x is a valid parameter name, and this causes the new compilation with the code fix application in the unit test framework to generate CS0103 The name 'nameof' does not exist in the current context.. Note that the code fix works fine in IDE because the IDE will reparse the fixed text, at which point the compiler will use the correct syntax kind (SyntaxKind.NameOfKeyword). See https://github.com/dotnet/roslyn-analyzers/issues/1364 for details.

mavasani on 12 Jan 2018

👍1

@mavasani That is expected behavior. The unit test infrastructure must reparse the tree starting with text or in some cases the unit tests will fail to detect a regression.

https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/c0fc688c0c796a8f8a18ac237b6a8c9e102a61d3/StyleCop.Analyzers/StyleCop.Analyzers.Test/Helpers/CodeFixVerifier.Helper.cs#L127-L133

sharwell on 12 Jan 2018

That is expected behavior

mavasani on 12 Jan 2018

👍3

The same situation applies for var.

Note: 'var' is not a contextual keyword. The parser does not produce any special nodes for it (same with 'dynamic)'. They are true identifiers that just have special meaning during binding if they didn't bind to a preexisting symbol. That's not the same with things like "nameof" where the lexer will do the following:

c# else if (SyntaxFacts.IsContextualKeyword(info.Kind)) { info.ContextualKind = info.Kind; info.Kind = SyntaxKind.IdentifierToken; }

While reparsing is truly necessary to understand the real result that will happen from making the change, many parts of our system do just do a series of tree transformations, knowing that they should get reasonable intermediary results. For example, that's how simplification and what-not work.

In this case, as this is SyntaxGenerator, and it's literally bein asked to make a NameOfExpression, it really should make a NameOfExpression that is equivalent to what the parser woudl make.

CyrusNajmabadi on 12 Jan 2018

👍2

forcing to reparse basically break down whole our tree transformation pipe pattern. that basically means we don't support real tree transformation. it all just tree -> text -> tree -> text -> tree transformation which will lose efficiency and all our annotation ability.

sure, we can't make tree to be 100% same as parsed one, but we should try as much as we can.

heejaechang on 12 Jan 2018

This was also hit by @jamesqo here: https://github.com/dotnet/roslyn-analyzers/pull/1498#discussion_r160038294.

I believe any user who is using this API in their fixer will hit this if they try to replace the node in their existing syntax tree and attempt to verify if the new compilation has any diagnostics or not.

mavasani on 12 Jan 2018

In this case, as this is SyntaxGenerator, and it's literally bein asked to make a NameOfExpression, it really should make a NameOfExpression that is equivalent to what the parser woudl make.

First, I'll say that if we hadn't already shipped this method, then I would agree with this.

NameOfExpression should have returned a tree that uses NameOfKeyword
The resulting expression should not have been allowed in cases where nameof would parse as an identifier

However, at this point, NameOfExpression produces a syntax tree that is usable both in cases where nameof is a context-sensitive keyword and in cases where nameof is an identifier. A change at this point would have two negative side effects:

It will break code that depends on the current form produced by NameOfExpression.
It will not improve the ability of NameOfExpression to be used in tree transformations that round-trip to text. At the same time it fixes one such case, this change would break the other.

Are you suggesting that any syntax node generated out of SyntaxGenerator needs the entire source file to be reparsed to ensure no errors? I donot believe that was the original goal of SyntaxGenerator.

This is absolutely the case. It may not have been the goal, but there are many cases where trivia changes during the reparse to dramatically change the interpretation of the tree. Test sequences must reparse from text as a final step or there is no way to ensure the results are correct.

sharwell on 12 Jan 2018

💭 If the behavior did change, it might make more sense to change the node type at the time the node is added to the tree. If a nameof identifier is added to the tree where nameof could only be a keyword, it could be converted to a context sensitive keyword at that time.

sharwell on 12 Jan 2018

It will break code that depends on the current form produced by NameOfExpression.

Can you clarify what these cases are? I do not understand how someone invoking SyntaxGenerator.NameOfExpression would expect the underlying token for nameof to be an identifier, not a keyword.

mavasani on 12 Jan 2018

I dont agree backward compatibility needs to be maintained with all cost. and unless the people who gets broken by this is really big. I believe it is 100% fine to break those small number and ask them to update. and I believe 99% people will expect if you asked nameofexpression, they expect they get nameofexpression.

heejaechang on 12 Jan 2018

If the behavior did change, it might make more sense to change the node type at the time the node is added to the tree. If a nameof identifier is added to the tree where nameof could only be a keyword, it could be converted to a context sensitive keyword at that time.

I don;t believe we do those? that is why it is so easy to create malformed tree. and that is one of reason why we made SyntaxGenerator.

heejaechang on 12 Jan 2018

@heejaechang Analyzers don't get to update. They have to add branches that take different code paths depending on what version of Roslyn is detected at runtime. It's very frustrating.

sharwell on 12 Jan 2018

It will not improve the ability of NameOfExpression to be used in tree transformations that round-trip to text. At the same time it fixes one such case, this change would break the other.

When this does indeed break any actual users (which I believe is for very small minority users, if any, who want to generate nameof expression node with nameof as identifier instead of keyword), we can consider adding a new overload that enables that. I agree with HeeJae, we should fix the current API to meet majority of user's expectations.

mavasani on 12 Jan 2018

Analyzers don't get to update. They have to add branches that take different code paths depending on what version of Roslyn is detected at runtime. It's very frustrating.

then we should fix those. providing a way for analyzer to handle versioning.

heejaechang on 12 Jan 2018

Are you suggesting that any syntax node generated out of SyntaxGenerator needs the entire source file to be reparsed to ensure no errors? I donot believe that was the original goal of SyntaxGenerator.

going back to text is really not an option in product code. test might be. but never in product code.

first. we will need to re-generate full text from updated tree (unless we can somehow figure out generating partial text from diff. in current tree, there is no way since it doesn't hold onto previous tree it is generated from)

second, since it doesn't have previous text version as well, it will do full parse rather than incremental parse.

third, since it is full parse, it will lose all attached information with previous tree such as annotation completely.

this basically completely disable our tree transformation support. or we need to re-implement each step (1,2,3) to support incrementally update text, incremental parsing, keeping annotation between tree->text conversion.

sure we can probably do all those, but until we do, going back to text is not an option.

heejaechang on 12 Jan 2018

going back to text is really not an option in product code. test might be. but never in product code.

:memo: I was referring to testing the whole time. I would like to avoid breaking changes to the shipped methods' semantics if possible.

sharwell on 12 Jan 2018

but issue is in product side as well. let's say fixer created a tree with wrong nameof expression node. if we add formatting rule on nameof expression, it won't work since formatter doesn't see it as nameof expression.

heejaechang on 12 Jan 2018

FYI: Compiler binds a syntax of the form nameof(x) as a BoundNameOfOperator only if nameof token is of SytnaxKind.NameOfKeyword, see here. Otherwise, it is just considered a regular invocation expression, and it is no longer a "nameof expression" anymore.

mavasani on 12 Jan 2018

Otherwise, it is just considered a regular invocation expression, and it is no longer a "nameof expression" anymore.

Yes, this leads to some fun code:

https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/c0fc688c0c796a8f8a18ac237b6a8c9e102a61d3/StyleCop.Analyzers/StyleCop.Analyzers/ReadabilityRules/SA1121UseBuiltInTypeAlias.cs#L287-L310

sharwell on 12 Jan 2018

I would like to avoid breaking changes to the shipped methods' semantics if possible.

I think the same argument holds here as was made when we changed the semantics of analyzer execution time reporting with /reportanalyzer switch: Do we have any user who is relying on the current (incorrect) functionality, i.e. they expect SyntaxGenerator.NameOfExpression to parse as an invocation with nameof not being a keyword? If not, we should fix the existing API to do the right thing and wait for user feedback.

mavasani on 12 Jan 2018

Note: this is about ContextualKind right? The nameof keyword will still have the kind "IdentifierToken". The only change is that if you ask for its ContextualKind then that will return 'NameOfKeyword'.

While this is technically a 'breaking change' it fits into so many bug fix breaking changes where the old behavior was clearly wrong, and the new behavior so appropriate and right, that we make the change. Yes, it should likely go through the compat council. But i have little doubt it will be accepted.

CyrusNajmabadi on 13 Jan 2018

For example, other similar types of changes include how hte compiler has changed what can be returned for GetSymbolInfo and GetTypeInfo. Any changes here can always break someone who deepely depended on the old behavior. However, when we've determined that the old behavior was definitely just buggy and suboptimal, we've been willing to take the change as the new behavior is far more in line with what people need, and the fallout is generally so minor. This would be one of those cases for me.

CyrusNajmabadi on 13 Jan 2018

i.e. they expect SyntaxGenerator.NameOfExpression to parse as an invocation with nameof not being a keyword

Note: to be clear: nameof(x) should parse as an invocation with an identifier name. That is correct. The problem is that identifier does not also have the ContextualKind set that states that it has the ContextualKind "NameofKeyword". This is definitely problematic.

Note: this is not even directly observable by users. ContextualKind is an internal property that only the compiler looks at.

The only way to observe htis change is if there was code that depended on making one of these nameof expressions that then depended on the compiler being unable to compile it. That seems extremely far-fetched.

CyrusNajmabadi on 13 Jan 2018

Note: this is about ContextualKind right? The nameof keyword will still have the kind "IdentifierToken". The only change is that if you ask for its ContextualKind then that will return 'NameOfKeyword'.

I wasn't aware of the distinction. I believe this would resolve any disagreement as the major observable problem would be if Kind changed.

sharwell on 13 Jan 2018

Yeah, nameof is in that strange category in C#. Things like "from" will end up having their 'kind' actually change once the parser realizes its in a query expression (whereas it will stay 'identifier' if it shows up elsewhere). This is because there is enough information at parse time to know how the 'from' is being used.

This doesn't apply to 'nameof' as syntactially it's completely ambiguous. It truly is parsed as an invocation, and the binder just uses the contextual kind as a quick way to not actually check the characters against "nameof".

CyrusNajmabadi on 13 Jan 2018

@mavasani In your original post above, you referenced the following method: http://source.roslyn.io/#Microsoft.CodeAnalysis.CSharp/CSharpExtensions.cs,177

With a change only to ContextualKind, wouldn't the proposed test still fail, since the extension method operates on Kind instead of ContextualKind?

sharwell on 13 Jan 2018

Yes. The test would fail. But i think, more importantly, the following would not happen:

As a result, binding fails for the compilation with generated syntax node with CS0103 The name 'nameof' does not exist in the current context.

Binding should now succeed as it will check the internal .ContextualKind and see that this is something it should try to bind as a special nameof expression.

CyrusNajmabadi on 13 Jan 2018

With a change only to ContextualKind, wouldn't the proposed test still fail, since the extension method operates on Kind instead of ContextualKind?

Yes indeed. I hadn't looked at the implementation of that method and assumed it will have identical check to what the binder performs. This feels super confusing!

mavasani on 13 Jan 2018

This feels super confusing!

Here's the important thing: ContextualKind is really an internal detail. We actually removed it from the public surface area a long time ago as it was too confusing to have both. really, it's just used to allow the compiler to do fast checks of identifiers without needing to do string checks.

However, we also wanted a way for people to ask if something was a keyword or not, and we wanted to separate out contextual keywords as the language understand them to be a different concept. So we have that IsContextualKeyword helper.

Now, nameof is in a special position. It really is just an identifier. but the compiler wanted that fast check. So it abuses the fact that the internal .ContextualKind is NameofKeyword. This is extremely weird as the user can never really observe a token whose kind is NameofKeyword!!.

Honestly, the most appropriate thing to do would be to just have the compiler check for the literal text "nameof", like it looks for the literal text 'var' or 'dynamic':

c# internal static bool IsVar(this Syntax.InternalSyntax.SyntaxToken node) { return node.Kind == SyntaxKind.IdentifierToken && node.ValueText == "var"; }

i would also approve such a change at the binder level. It really shouldn't affect perf. And it would be much more in line with how we do everything else.

CyrusNajmabadi on 13 Jan 2018

👍1

Also, in a perfect world, there would be no SyntaxKind.NameOfKeyword (like there is no SyntaxKind.VarKeyword/DynamicKeyword). However, that would be a breaking change for certain. But we could put documentation on this that says "this was a mistake. you won't ever get this. our bad."

CyrusNajmabadi on 13 Jan 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings