Quote from the current doc:
The null-forgiving operator has no effect at run time. It only affects the compiler's static flow analysis by changing the null state of the expression. At run time, expression x! evaluates to the result of the underlying expression x.
The document currently focuses primarily on the usage of suppressing warnings of code flow analysis.
But when combining the null-forgiving operator with a leading null-conditional member-access operator the compiler will indeed emit different code and this will have effect at run time -- which comes unexpected.
Example:
``C#
void Main()
{
var names = T1(); //names == null` now
var names2 = T2(); // throws
}
IEnumerable
{
FileInfo? fi = null;
return fi?.Directory?.EnumerateFileSystemInfos().Select(fsi => fsi.Name);
}
IEnumerable
{
FileInfo? fi = null;
return fi?.Directory?.EnumerateFileSystemInfos()!.Select(fsi => fsi.Name);
}
```
This happens because in T2() the code actually compiles to return (fi?.Directory?.EnumerateFileSystemInfos()).Select(fsi => fsi.Name); (notice the additional parentheses).
I'm sure this is by design and is covered by the sentence "expression x! evaluates to the result of the underlying expression x". But since any article or documentation I have read about the null-forgiving operator so far, focused only about getting rid of warnings, this (change in) behavior was very unexpected, at least to me.
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@springy76 thanks!
I've simplified your example a bit:
using System;
using System.Collections.Generic;
using System.Linq;
#nullable enable
class Program
{
private static void Main()
{
try
{
T1(null);
}
catch (Exception)
{
Console.WriteLine("T1 throws");
}
try
{
T2(null);
}
catch (Exception)
{
Console.WriteLine("T2 throws");
}
}
private static IEnumerable<int> T1(int[]? numbers)
{
return numbers?.ToArray().ToArray();
}
private static IEnumerable<int> T2(int[]? numbers)
{
return numbers?.ToArray()!.ToArray();
}
}
The above program produces the following output:
T2 throws
Note that T1 is converted into the following code:
private static IEnumerable<int> T1([System.Runtime.CompilerServices.Nullable(2)] int[] numbers)
{
if (numbers == null)
{
return null;
}
return Enumerable.ToArray(Enumerable.ToArray(numbers));
}
while T2:
private static IEnumerable<int> T2([System.Runtime.CompilerServices.Nullable(2)] int[] numbers)
{
return Enumerable.ToArray((numbers != null) ? Enumerable.ToArray(numbers) : null);
}
@BillWagner is that expected behavior?
Let's ask @jcouv for some thoughts on this, thank you @pkulikov and @springy76.
Yeah, I see it as:
private static IEnumerable<int> T1(int[]? numbers)
{
return numbers?.ToArray().ToArray();
}
private static IEnumerable<int> T2(int[]? numbers)
{
return (numbers?.ToArray()).ToArray();
}
It is a really odd use case, it's literally like expressing that something could be null, and then something else chained off of that could be null and then saying that the last thing is definitely not null. It could be null, it could be null, it's not null.
I can't resist a good puzzle. I think the behavior is correct, and is covered in operator precedence and in the nullable reference type spec on the null forgiving operator, and finally in the C# 6.0 spec on the null conditional operator:
I think that means the left operand of ! must be considered as non-null, so execute the ToArray() method regardless of the earlier left operand(s) of any ?. operator.
What's equally interesting is that the statement "The null forgiving operator has no runtime effect" is still strictly true (and was taken directly from the feature spec referenced above). It isn't very helpful, though. The code generated is based on the precedence of the operators, and the fact that they are left-associative.
I'm still not sure how best to fix this in the docs, but you've found a super interesting edge case @springy76 We'll keep noodling.
Precedence. Does that mean that numbers?.ToArray()!.ToArray() is interpreted as (numbers?.ToArray()!).ToArray() because of precedence?
Sort of. It's both precedence and associativity. The ! operator is a post-fix unary operator. Its operand is whatever is left of it. Because ToArray() needs a receiver, its operand becomes (numbers?.ToArray()). Because ! has higher precedence than . (primary vs. unary), the expression numbers?.ToArray()! is considered non-null and then becomes the receiver to the final ToArray.
It took my a few reads of the sections I cited above, and I'm still not completely certain I'm analyzing it correctly. (I first thought it might be a compiler bug. Now I don't think so.) I'd like to wait for @jcouv to weigh in as well before we suggest a fix.
ok, then we need to write such code (it compiles!) :)
numbers?.ToArray()!?.ToArray();
Thanks @BillWagner for the analysis (which I think is correct).
@gafter, Would this warrant changing the precedence of post-fix ! (suppression)?
The precedence of ! causes x?.ToArray()!.ToArray() to parse with an implicit grouping (x?.ToArray()!).ToArray(), which affects semantics...
I don't see what other precedence it could be...
@BillWagner thank you for the analysis; it helped. However, it's hard to think in terms of precedence because the spec on that _doesn't_ mention null-conditional operators. And it's possible to think that the precedence of ?. is the same as precedence of ., which is primary. So, I'll try to make my own analysis to understand this issue better.
Given (I assume the following two statements are true):
x! is a primary expressionnull_conditional_expression
: primary_expression null_conditional_operations
;
null_conditional_operations
: null_conditional_operations? '?' '.' identifier type_argument_list?
| null_conditional_operations? '?' '[' argument_list ']'
| null_conditional_operations '.' identifier type_argument_list?
| null_conditional_operations '[' argument_list ']'
| null_conditional_operations '(' argument_list? ')'
;
numbers?.ToArray().ToArray() is a null-conditional expression with numbers as primary_expression and ?.ToArray().ToArray() as null_conditional_operations
We can't do the similar split for numbers?.ToArray()!.ToArray() because ! cannot be included into null_conditional_operations (it's not identifier, I guess). That means that numbers?.ToArray()!.ToArray() is NOT a null-conditional expression. According to the grammar, only numbers?.ToArray() is a null-conditional expression, let's mark it A. Then, we get A!.ToArray(), which is equivalent to A.ToArray(), which is equivalent to (numbers?.ToArray()).ToArray()
Furthermore, numbers?.ToArray()!?.ToArray() is a null-conditional expression with numbers?.ToArray()! as primary_expression and ?.ToArray() as null_conditional_operations.
So, if this issue to be fixed, the definition of a null-conditional expression should be updated, not only the precedence of null-forgiving (to unary??).
@BillWagner @jcouv is that correct analysis?
Let me go into detail how I (we) spotted this problem yesterday, because it's exactly the other way round than the example I gave in the original post:
It began with a collegue who claimed that inserting a "!" somewhere fixed a NRE. I protested and bet a staple of crates of beer that the null-forgiving operator won't fix any code but only give the code analyzer a hint: "I know it better, just be calm and stop warning me".
But he proved me wrong.
In our case it was exactly the other way round (code without "!" threw NRE) because the LINQ extension method in our code (I used Enumerable.Select(fsi => fsi.Name) in the sample above) was named EmptyIfNull and as the name implies is designed to handle null for input (and will return Array.Empty<T>() in this case):
```C#
var data = mightBeNullObject?.ArrayTypeProperty.EmptyIfNull().Select(x => x.Prop);
hashset.UnionWith(data); // ArgumentNullException when mightBeNullObject is null (data is then null, too)
So in our case the "?." stripped away the call to `.EmptyIfNull<T>()` and the resulting IEnumerable variable was still null and with the additional "!" the call to `.EmptyIfNull<T>()` was forced just the same way as putting the left side of the expression into parenthesis:
```C#
var data = mightBeNullObject?.ArrayTypeProperty!.EmptyIfNull().Select(x => x.Prop);
hashset.UnionWith(data); // data is never null now
equals
C#
var data = (mightBeNullObject?.ArrayTypeProperty).EmptyIfNull().Select(x => x.Prop);
hashset.UnionWith(data); // data is never null now
It has no effect at run time; however, it may affect how the expression is parsed at compile time:
C#
int[] a = null;
string s1 = a?.First().ToString(); // null: ToString() is not called
string s2 = a?.First()!.ToString(); // Empty string: ToString() is called on default(int?)
Yes, if you decompile it (using ILSpy for example) you'll see that the compiler emitted different code (by adding parenthesis or maybe an intermediary variable): string s2 = ( a?.First() ).ToString();
I just did not expect the compiler to behave different in any way because I only expected reducing of false-positive-warnings.
As already stated in the original post: I don't think the compiler is doing something wrong, it just came unexpected due to the lack of explanation or any mentioning in all the samples.
@BillWagner @IEvangelist @springy76 what about adding the following note to the article about the null-forgiving operator:
[!NOTE]
The compiler interprets expressionx?.y!.zas(x?.y)!.z. Because of that, at run timex?.y!.zis evaluated as(x?.y).z.
@BillWagner @IEvangelist @springy76 what about adding the following note to the article about the null-forgiving operator:
[!NOTE]
The compiler interprets expressionx?.y!.zas(x?.y)!.z. Because of that, at run timex?.y!.zis evaluated as(x?.y).z.
I think that note would be appropriate, I'm curious what @BillWagner would think? I'd probably reword it a little (let me know your thoughts):
[!NOTE]
The compiler interprets thex?.y!.zexpression as(x?.y)!.z. Due to this interpretation and the operator precedence, at runtimex?.y!.zis evaluated as(x?.y).z.
@IEvangelist I would omit the words "the operator precedence": the parentheses in the first sentence are enough. Also to me, "operator precedence" here is more confusing than explaining (however, I don't know how it is to the remaining audience).
By the way, "runtime" is the noun that is in "Common Language Runtime", while "at run time/at compile time".
I first thought it might be a compiler bug. Now I don't think so.
Have we clarified with the compiler team that it is indeed the intended behavior here? If not, I would create a Roslyn issue asking for clarification. The present behavior is counterintuitive. We might consider changing the grammar to allow null-forgiving operators in null_conditional_operations.
@BillWagner @IEvangelist I've thought more about it and can explain why mentioning precedence confuses me (and can confuse others).
We say that a + b * c is evaluated as a + (b * c) because * has a higher precedence. So, a?.b!.c is evaluated as (a?.b)!.c. That looks as if ?. has a higher precedence because it's evaluated still first.
However, I don't think we need to mention precedence here at all. Precedence, for a given operand, decides to which operator the operand binds: to the one with a higher precedence. For example, in a + b * c, b binds to * because * has higher precedence than +.
That said, in expression a?.b.c.d!.e, all operators are unary postfix. Each operand binds to an operator on its right, there is no choice (thus, there is no need in precedence). Thus, all operators are evaluated from left to right in order. So, the compiler tries to evaluate the first ?.. To do that, I guess, it needs to form a null-conditional expression. A null-conditional expression can contain only (null-conditional) member and index access operations. The ! operator breaks that sequence of member and index accesses. That's why a?.b.c.d!.e is interpreted as (a?.b.c.d)!.e.
There are two solutions: (1) to allow ! in a null-conditional expression, (2) update the docs
As for the docs, currently they say:
The null-forgiving operator has no effect at run time. It only affects the compiler's static flow analysis by changing the null state of the expression. At run time, expression
x!evaluates to the result of the underlying expressionx.
Maybe we should say something like that:
The null-forgiving operator has no effect at run time. Expression
x!evaluates to the result of the underlying expressionx. However, at compile time, the null-forgiving operator affects the compiler's static flow analysis by changing the null state of the expression. In the case of a null-conditional expression, that changes the way how it's evaluated when you add the!operator to it. The compiler interpretsa?.b!.cas(a?.b)!.c, which evaluates to(a?.b).cat run time.
That said, in expression
a?.b.c.d!.e, _all_ operators are unary postfix.
@pkulikov I am not sure how you came to this conclusion. Expressions a?. and a. have no meaning by themselves, because those operators require both _left-_ and _right-_ hand sides. For instance, the C++ reference describes the _first_ and the _second_ operands of the a.b member access operator. If you do not wish to use the word _operand_ for syntax constructions, we might agree to call them _lhs_ and _rhs_. The operators would still be binary though.
Having said that, I do not think it is strictly necessary to distinguish priorities of those special operators (implied by the grammar) in the table in question. Its main idea is that primary expressions are evaluated first (they have the highest precedence), then all other 'normal' operators are evaluated according to their priorities. For that purpose, it seems OK to extend primary expressions with null-conditional operators even if the outdated formal spec is worded slightly differently.
Maybe we should say something like that:
The null-forgiving operator has no effect at run time. Expression
x!evaluates to the result of the underlying expressionx. However, at compile time, the null-forgiving operator affects the compiler's static flow analysis by changing the null state of the expression. In the case of a null-conditional expression, that changes the way how it's evaluated when you add the!operator to it. The compiler interpretsa?.b!.cas(a?.b)!.c, which evaluates to(a?.b).cat run time.
The words "that changes" imply it is the same effect; however, those are two separate effects:
! operator may change the null state of the expression. That may suppress a warning.! operator may change how the primary expression is parsed and, therefore, evaluated.I am not sure how you came to this conclusion.
@AntonLapounov this is how I understand the language spec. According to it, a null-conditional expression and a member access expression (as it's a primary expression) are unary expressions. That means they have one operand. Indeed, they require right-hand part, but it's not operand (it can be a member identifier or a list of member and indexer access operations). For example, the spec on null-conditional says:
The null-conditional operator applies a list of operations to its operand...
So, there is a distinction between "operand" and "rhs". This distinction is important, because precedence concerns operands.
The words "that changes" imply it is the same effect; however, those are two separate effects:
@AntonLapounov thanks for that remark. I agree my suggestion can be improved like that:
At compile time, the null-forgiving operator affects the compiler's static flow analysis by changing the null state of the expression. At run time, expression
x!evaluates to the result of the underlying expressionx. Typically, the introduction of the!operator doesn't affect run time evaluation. However, if you introduce the!operator in a null-conditional expression, you might observe the change in behavior at run time. That happens because the compiler interpretsa?.b!.cas(a?.b)!.c, which evaluates to(a?.b).cat run time.
@springy76 @IEvangelist @BillWagner what do you think?
@pkulikov I agree with your analysis, and I like the updated text you're proposing. The last sentence was tripping me up a bit (this is a minor).
That happens because the compiler interprets
a?.b!.cas(a?.b)!.c, which evaluates to(a?.b).cat run time.
Does this read a bit better written this way?
The reason for this, is that the compiler interprets
a?.b!.cas(a?.b)!.c, which evaluates to(a?.b).cat run time.
I am thoroughly enjoying this thread, being new to this team - I'm learning a lot from it. Thank you
@IEvangelist I would make the intro even shorter (is that grammatically correct?):
That is because the compiler...
BTW: Not to oversee what happens on nullable structs while talking about nullable reference types:
C#
static System.TimeSpan T1(DateTime? dt)
=> dt?.TimeOfDay!.GetValueOrDefault(Timeout.InfiniteTimeSpan);
Without ! the member GetValueOrDefault of the intermediary Nullable<T> struct would not be valid at this place at all since TimeOfDay itself is just a struct (TimeSpan).
@IEvangelist I would make the intro even shorter (is that grammatically correct?):
That is because the compiler...
I'm good with that, I think the "That happens because" was throwing me off...
this is how I understand the language spec. According to it, a null-conditional expression and a member access expression (as it's a primary expression) are _unary_ expressions. That means they have one operand.
That is not correct interpretation of the spec. While every primary_expression matches the unary_expression production, that does not mean that every operator within a primary_expression has to be a unary operator. Consider, for example, this unary_expression: a[1+2].b. It contains three operators:
+ taking arguments 1 and 2.[] taking arguments a and 3 (the result of 1+2).. taking arguments a[3] and b.While you might claim that b is not a 'real' operand of the last operator and name it _rhs_ instead, it would be difficult to deny that 3 is a real operand of a[3], which is certainly a unary_expression.
For example, the spec on null-conditional says:
The null-conditional operator applies a list of operations to its operand...
And that list of operations works as the second argument/operand of the operator. That is commonly used terminology. Look, for example, at Safe navigation operator Wiki page:
In object-oriented programming, the safe navigation operator (also known as optional chaining operator, safe call operator, null-conditional operator) is a binary operator that returns null if its first argument is null; otherwise it performs a dereferencing operation as specified by the second argument (typically an object member access or an array index).
So, there is a distinction between "operand" and "rhs". This distinction is important, because precedence concerns operands.
Operator precedence rules dictate how an expression is interpreted by the compiler and translated into the expression tree. That includes determining arguments of each operator. It does not matter whether you call them operands or lhs/rhs — you still have to determine their textual boundaries. If some part of the grammar is more complicated and cannot be expressed as an operator-precedence grammar, that is fine, but then you should avoid assigning precedence to operators in that part of the grammar. That is what you did in #18001 and I came here to explain why it was wrong:
- moved
x?.yandx?[y](null-conditionals) from the primary precedence row to the unary precedence row (putting it to the primary expressions group was not correct)
Thanks for all the discussion. I'll make the following proposal:
In light of dotnet/csharplang#3393, dotnet/csharplang#3297 and dotnet/roslyn#43659 which open possible changes o the spec and implementation, I'd like to re-focus this on explaining the current behavior. (Thanks @AntonLapounov @agocke and @jcouv for driving those discussions).
In the short term, I think the best solution is Option 1 in https://github.com/dotnet/docs/pull/18037/files#r414319212 as @AntonLapounov says. In addition, we should add a note as previously discussed. I suggest the following edit to the previous proposal:
[!NOTE]
In C# 8, The expressionx?.y!.zis parsed as(x?.y)!.z. Due to this interpretation(x?.y)!is considered not null. Therefore, if eitherxoryis null, the expression throws aNullReferenceException.
I specifically mention C# 8 to take into account change that might come from dotnet/csharplang#3393.
In the longer term, as the standardization effort catches up, we should remove this table from the language reference section, and refer to the corresponding table in the spec.
Can we give this is thumbs up or down as a proposed solution?
Therefore, if either
xoryis null, the expression throws aNullReferenceException.
Nitpicking, it may throw, but not necessary if z is a method call:
C#
int[] a = null;
// s2 is the empty string: ToString() is called on default(int?)
string s2 = a?.First()!.ToString();
What's about the following?
In C# 8 the expression
x?.y!.zis parsed as(x?.y)!.z. Due to this interpretationzis evaluated even ifxis null, which may result in aNullReferenceException.
BTW: What is inside LINQ expressions? ?. isn't allowed there but ! is allowed -- but seems to have no impact on the expression.
I opened #18124 to address this. Thanks for all the discussion.
@springy76 Can you open a new issue for the question on LINQ expressions? I'm not sure what you mean, and I'd like to start a new discussion for that concern.
@BillWagner
I just wanted to note that
Expression<Func<DirectoryInfo, string>> expr = di => di.Parent?.Name;
does not compile at all (CS8072 An expression tree lambda may not contain a null propagating operator) but
Expression<Func<DirectoryInfo, string>> expr = di => di.Parent!.Name;
is no problem, seems to be ignored by (or transparent to) the expression code builder.
Thanks @springy76 That's a different issue. (And we don't document it well.)
There are a number of new syntax elements that aren't representable in LINQ expression trees. I'll create an issue and we'll address that as well.
Most helpful comment
Thanks for all the discussion. I'll make the following proposal:
In light of dotnet/csharplang#3393, dotnet/csharplang#3297 and dotnet/roslyn#43659 which open possible changes o the spec and implementation, I'd like to re-focus this on explaining the current behavior. (Thanks @AntonLapounov @agocke and @jcouv for driving those discussions).
In the short term, I think the best solution is Option 1 in https://github.com/dotnet/docs/pull/18037/files#r414319212 as @AntonLapounov says. In addition, we should add a note as previously discussed. I suggest the following edit to the previous proposal:
I specifically mention C# 8 to take into account change that might come from dotnet/csharplang#3393.
In the longer term, as the standardization effort catches up, we should remove this table from the language reference section, and refer to the corresponding table in the spec.
Can we give this is thumbs up or down as a proposed solution?