Version Used: 2.0.0-rc2 from NuGet
Steps to Reproduce:
Execute SyntaxFactory.ParseTypeName("void").
Expected Behavior:
ParseTypeName returns a valid syntax node for the void type.
Actual Behavior:
ParseTypeName returns a syntax node for the void type with the following diagnostic:
error CS1547: Keyword 'void' cannot be used in this context
I have discovered this while investigating an SO question. I'm not sure if the behavior I expect is actually correct or not, but the current actual behavior is surprising to me.
I also couldn't find any tests for this one way or the other. (There is a test for other working keyword-types, which would pass for VoidKeyword, since the test does not check diagnostics.)
I think the fix would be to change the implementation of LanguageParser.ParseTypeName() from return ParseType(); to return ParseTypeOrVoid();.
I'm not convinced that void should be accepted as a type by this method. The language doesn't say that void is a type, though it does appear in syntactic positions where types would also be allowed, such as
return_type
: type
| 'void'
;
I think, for all intensive porpoises, we should allow this. Otherwise, it's just very difficult to do bog-standard stuff like easily parse the return type of a method.
This is one of those areas where we need to think of things less from the perspective of the language, and more from teh perspective of the API. Also, it could always have been the case that the language said "this is a type", but then had semantic rules that stated "this type cannot be used anywhere but in X, Y and Z".
@CyrusNajmabadi There are many nonterminals in the language for which we do not have parse methods. A method return type is one of them. From the perspective of the API, you should either parse significant parts of your program, or assemble them from primitive elements, as there is no way to parse most of the intermediate pieces.
A method return type is one of them.
Only because the current API has this issue. If we address this issue, then there won't be a problem parsing 100% of types that can be used in 100% of situations right?
Basically, right now i can't see how the current approach actually helps anyone. it seems to have just been an accidental implementation detail that just ends up being problematic. We can trivially loosen things to what would be intuitive and expected, without any downsides that i can tell.
There are many nonterminals in the language for which we do not have parse methods.
Sure. But we chose 'Type' as a reasonable entry-point to expose. It seems utterly silly that this method would literally handle every possible type in C# except 'void'. This makes the method useful for almost every place you might want to have a type, instead of being useful everywhere where you'd want a type.
or assemble them from primitive elements,
Yes. Hence makign it so that ParseType can properly parse 'void' would be sensible. That way you can construct things appropriately from primitive elements. If we don't do this, then anyone who wants to parse the return type for a method needs to not only use 'ParseTypeName' but also then special case this single type.
@CyrusNajmabadi void isn't a type.
the distinction is so pendantic as to not be helpful IMO :). In the actual syntax model we model it as a TypeSyntax. We also model things like the return-type of a MethodDeclaration as a TypeSyntax. We model the inner type of a PointerTypeSyntax as TypeSyntax (i.e. 'void' in 'void'). At the *syntactic level, our model effectively accepts this as a type. However, our helper to give the person a type fails for only this specific case.
This special case serves no purpose at this level and only adds complexity to precisely those consumers that would want to use the helper parsing functions.
We should parse it as a type, then report diagnostics in the cases where it can't be used.
@CyrusNajmabadi The behavior of the parse methods is defined by the language grammar, not by the data structure that the methods return.
@AdamSpeight2008 The syntax node builders do not "report diagnostics".
@gafter I'm saying we should change the stage at which it is check, that it meet the "official" grammar.
. The program can be syntactically correct, but grammatically incorrect. eg the programmer should be able to build syntactically invalid and grammatically invalid trees.
Thus since we effectively treating void as a type within the model we use, then we should treat as such. But produce diagnostics when it is grammatical incorrect.
@gafter I firmly disagree with you :). This is an argument we've had many times before. In essence, i think the problem is that you're effectively making API decisions based on a document that defines the language, but does not define the API. As i've argued plenty of times in the past, the grammar is simply what Anders/Mads decided was what they felt was appropriate to balance out how much they'd want to codify in the spec simply, vs what they felt would need more complex text to explain. Mads could turn around tomorrow and decide "actually, we'll just make 'void' grammatically type, and put in some text elsewhere in the spec to state that it's illegal to use it in contexts X, Y and Z".
Furthermore, as has been clear from literally the last 20 PRs i've sent through on the compiler. Most of the parser errors we produce today are not actually necessary to create at the parsing level. Indeed, it's not even desirable to create them there as it has negative effects on downstream consumers.**
We have to divorce the idea that "The behavior of the parse methods is defined by the language grammar". The behavior of the parser is influenced by the grammar. However, it is the compiler that is defined by the entire specification. We are free (and should be encouraged) to make sensible API and implementation decisions that allow our individual layers to deviate when appropriate, as long as we're abiding by the overall definition.
As we go forward, we will end up in the situation that the compiler reports absolutely no errors except for missing tokens, or skipped tokens. At that point, we'll end up essentially addressing this bug anyways. Parsing will be extremely relaxed, and will certainly accept a much larger superset of the language without producing errors that the grammar would generally require. My last 20 PRs have been steps along that, and we have many more to go. For example, the grammar would never allow "if (foo) var v = bar();" and our current parser produces an error here. However, post https://github.com/dotnet/roslyn/pull/16134 this will no longer be the case. The parser will allow this without any errors, and it will be up to the rest of hte compiler to report the error.
--
* For example, the grammar states that it is syntactically illegal for someone to write "where T : new(), X, class", and yet, we *now no longer report syntax errors about this. Instead, the errors are deferred to later in the compilation pipeline. That way we abide by the constraint that the compiler produces the errors, without any sort of guarantee about which stage of the compiler will do it.
@gafter
What are the advantages of not have void being parsed as type? Or being treated not as a type?
@AdamSpeight2008 I would suggest trying to produce an alternative version of the language specification that does it the other way, and comparing the two for clarity and brevity. I suspect that will answer your question.
I agree that having such restrictions in teh language-specification grammar makes the language specification shorter to write. However, that doesn't mean that i think that that means the parser must necessarily enforce such grammar restrictions through diagnostics in our API.
If anything, i think the language grammar should be as restrictive as possible (and we've heavily gone that route, especially with how we treat interfaces in the specification). However, that doesn't mean that if we parse an interface, that we then have the parser produce such specific grammar diagnostics itself.
IMO, the parser's job is simple to produce a tree that fits our syntactic model. Diagnostics should be limited simply to cases where we literally cannot make the text fit any possible syntactic-model shape. i.e. because we either are missing required tokens, or we have tokens we have to skip. Outside of this, we should attempt to postpone diagnostics as much as possible.
This also has many other benefits (that i've outlined elsewhere). For example, currently, if a user has written:
unsafe void Foo(void bar, int baz) { ... }
Then this method cannot be incrementally reused. Indeed, if any edits happen in this file, this diagnostics on 'void bar' will taint the entire spine up its tree and will impact reparsing. It will also impact IDE features that back-off due to tree diagnostics being present. There is no need for this as this is a case where we completely understand and are able to represent the above code in our syntactic model.
To me, it's important for us to decouple our language specification from our APIs. Our language specification lays out what should be accepted by the compiler and what should not. It helps inform the general shape of our APIs. But it does not require specific implementation choices about where we perform certain checks, or where we find specific diagnostics.
Most helpful comment
@gafter I firmly disagree with you :). This is an argument we've had many times before. In essence, i think the problem is that you're effectively making API decisions based on a document that defines the language, but does not define the API. As i've argued plenty of times in the past, the grammar is simply what Anders/Mads decided was what they felt was appropriate to balance out how much they'd want to codify in the spec simply, vs what they felt would need more complex text to explain. Mads could turn around tomorrow and decide "actually, we'll just make 'void' grammatically type, and put in some text elsewhere in the spec to state that it's illegal to use it in contexts X, Y and Z".
Furthermore, as has been clear from literally the last 20 PRs i've sent through on the compiler. Most of the parser errors we produce today are not actually necessary to create at the parsing level. Indeed, it's not even desirable to create them there as it has negative effects on downstream consumers.**
We have to divorce the idea that "The behavior of the parse methods is defined by the language grammar". The behavior of the parser is influenced by the grammar. However, it is the compiler that is defined by the entire specification. We are free (and should be encouraged) to make sensible API and implementation decisions that allow our individual layers to deviate when appropriate, as long as we're abiding by the overall definition.
As we go forward, we will end up in the situation that the compiler reports absolutely no errors except for missing tokens, or skipped tokens. At that point, we'll end up essentially addressing this bug anyways. Parsing will be extremely relaxed, and will certainly accept a much larger superset of the language without producing errors that the grammar would generally require. My last 20 PRs have been steps along that, and we have many more to go. For example, the grammar would never allow "if (foo) var v = bar();" and our current parser produces an error here. However, post https://github.com/dotnet/roslyn/pull/16134 this will no longer be the case. The parser will allow this without any errors, and it will be up to the rest of hte compiler to report the error.
--
* For example, the grammar states that it is syntactically illegal for someone to write "where T : new(), X, class", and yet, we *now no longer report syntax errors about this. Instead, the errors are deferred to later in the compilation pipeline. That way we abide by the constraint that the compiler produces the errors, without any sort of guarantee about which stage of the compiler will do it.