Version Used: 2.9.0
Steps to Reproduce:
public class C
{
public override string ToString() => null;
}
Console.WriteLine(("" + new C()) == null);
Console.WriteLine((new C() + "") == null);
Console.WriteLine(("" + new C() + "") == null);
Console.WriteLine((new C() + "" + "") == null);
Expected Behavior:
According to section 7.8.4 "Addition Operator" of the C# Language Specification:
String concatenation:
string operator +(string x, string y);
string operator +(string x, object y);
string operator +(object x, string y);
These overloads of the binary + operator perform string concatenation. If an operand of string concatenation is null, an empty string is substituted. Otherwise, any non-string argument is converted to its string representation by invoking the virtual ToString method inherited from type object. If ToString returns null, an empty string is substituted.
...
The string concatenation operator never returns a null value
Therefore I'd expect the code above to print
False
False
False
False
Actual Behavior:
True
True
False
False
The compiler folds the "" constant in all three cases, reducing them all to a call to string.Concat(new C()). However in the third and fourth cases, it appends a test to see if string.Concat returns null, and if so it substitutes "". string.Concat tests whether its argument is null (and substitutes "" if so), but doesn't test whether calling .ToString() on its argument results in null.
Using null instead of "" seems to exhibit the same behaviour. If you use string.Empty, the compiler doesn't do the same constant folding, and the issue doesn't appear.
Interestingly this behaviour was different pre-Roslyn - in that case, you get
True
True
True
True
I'm thinking about perhaps tackling this, if I get the time.
As far as I can see, there are a few options:
string.Concat(object) to never return null. This is by far the simplest approach, but of course there are backwards compatibility considerations. The documentation says "The Concat(Object) method represents arg0 as a string by calling its parameterless ToString method", which while it doesn't explicitly say that the value of ToString() is returned even if it is null, it does imply as much.string.Concat(object) ?? "" instead. This makes the code a bit more complex when it folds multiple string concatenations into a single call to string.Concat, but I don't think it will be too bad. object?.ToString() ?? "" instead of a call to string.Concat(object). This is slightly more complex than 2 for the same reason, but does avoid a method call.Any guidance on which option is best would be appreciated.
From reading the code, I'm under the impression that the fact that new C() + "" + "" returns non-null is unintended - what happens it that new C() + "" gets simplified to string.Concat(new C()), then the compiler tries to combine it with the last emptystring, sees an expression of the form <string> + "" and simplifies that to <string> ?? "", without checking to see whether <string> is a call to string.Concat(...) (here). I think this remains harmless but unnecessary in all 3 options above (in options 2 and 3 you'll get a slightly ugly string.Concat(...) ?? "" ?? "".
Method calls: MakeBinaryOperator -> RewriteStringConcatenation -> TryFoldTwoConcatOperands -> RewriteStringConcatenationOneExpr.
Tests: ConcatEmptyString
Since this is a breaking change, I think someone will probably have to decide whether it's an acceptable one, though I have no idea who to speak to about this I'm afraid.
If you create a pull request, that's likely to drive taking that decision forward. Otherwise it's not impossible it will never come up/no-one will get round to it.
@YairHalberstadt thanks, this is what I've done
:thought_balloon: this may be a case where the compiler is assuming types correctly implement object.ToString() by not returning null. Other cases have appeared in the past where special handling was included for these cases, but it is supposed to be at worst a reasonable assumption, if not a safe assumption.
I think that's probably a valid point. The reason I dug into this is because the spec does say:
If ToString returns null, an empty string is substituted.
Thanks for providing the detailed breakdown of the problem here.
Looking at this though, why do you believe this breaking change should be taken? It appears to have been this way since C# 1.0. I'm guessing likely due to a miscommunication between the language and framework team on the behavior of ToString returning null. But it's also been operating this way for close to 15 years now without causing significant problems for customers.
Generally we take breaking changes of this sort only when there are strong justifications for doing so: evidence of massive bad app behavior, significant perf wins for corner case breaks, etc ... I'm not seeing that in this case. Are there bugs out there I'm not considering?
@jaredpar The Roslyn compiler is inconsistent with the specification, and also inconsistent with the pre-Roslyn compiler. I doubt there are customers relying on the incorrect null result value. I think we should fix the compiler to comply with the specification and document the compatibility impact.
It's also inconsistent with itself, as @canton7 showed above.
One thing I didn't notice before: a compiled expression seems to have the correct behaviour, so it's inconsistent with that as well.
Looking at this though, why do you believe this breaking change should be taken?
I raised the issue because I happened to come across this behaviour, and wanted to make sure the team were aware of it. I put together the PR because it was fun.
My only (slight) motivation for wanting this fixed is a sense of neatness and consistency, for the language to meet its spec - I don't have any personal investment in this, and if you decide this is wontfix that's perfectly fine by me.
(Another data point is that "" + c + c, "" + c + c + c and "" + c + c + c + c (calling string.Concat(object, object), string.Concat(object, object, object) and string.Concat(object[]) respectively) all return "")
This is being addressed in https://github.com/dotnet/roslyn/pull/34208 by @canton7
Most helpful comment
@jaredpar The Roslyn compiler is inconsistent with the specification, and also inconsistent with the pre-Roslyn compiler. I doubt there are customers relying on the incorrect
nullresult value. I think we should fix the compiler to comply with the specification and document the compatibility impact.