Version Used: Microsoft.CodeAnalysis.CSharp.Workspaces.1.2.1
context
The code in Microsoft.CodeAnalysis.CSharp.Workspaces/Extensions/ParenthesizedExpressionSyntaxExtensions.cs is part of the Simplification feature. The function bool CanRemoveParentheses(this ParenthesizedExpressionSyntax node) determines if parentheses can be removed.
As part of this, the function RemovalChangesAssociation is called (same source file). This function will correctly determine that, for example, the parentheses in a + (b * c) can be removed, but not in a * (b + c).
problem
The problem arises with expressions of the form a + (b + c) and a * (b * c). According to this function, the parentheses can be removed. However, that is not correct.
For example, consider the following code:
class C
{
public static string operator+(C left, C right) => "";
public static string Problem => "" + (new C() + new C());
}
expected behavior
Removal of the parenthesis in the Problem expression alters the meaning of the expression, so they cannot be removed.
actual behavior
The current implementation will remove them.
In my example, removing the parentheses was bad enough to make the program not compile. But there are very few cases where removing them would be allowed (see also https://ericlippert.com/2015/10/27/optimizing-associative-operations/), and in fact, when it is allowed, it's not needed (https://ericlippert.com/2013/06/17/string-concatenation-behind-the-scenes-part-one/).
The lines that check for this should simply be removed.
I'm not sure I understand the change of milestone to version聽3.0. Does this mean you do not intend to fix this obvious bug in VS 2017? You only need to delete line 258 to 265 in https://github.com/dotnet/roslyn/blob/master/src/Workspaces/CSharp/Portable/Extensions/ParenthesizedExpressionSyntaxExtensions.cs
This is causing real bugs in real code fixes. Please correct it.
@KrisVandermotten This doesn't currently meet our bar for RTM. We can revisit this after that release though. Thanks!
Would it help if I create a pull request? Seems like a lot of work to delete two lines, but I'm willing to do it.
PRs welcome. But i'd rather us do a full fix. i.e. keep teh optimization we have now. But not apply if there are overloaded operators.
Not sure what you mean by "optimization".
Also, this is not about overloaded operators.
Take this program:
using System;
class Program
{
static void Main(string[] args)
{
int a = 7;
const int B = 17;
const int C = 18;
Console.WriteLine(a + (B + C));
}
}
In Visual Studio 2015 Update 3, optimizations on, the Main method compiles to:
.method private hidebysig static void Main (string[] args) cil managed
{
.maxstack 8
.entrypoint
ldc.i4.7
ldc.i4.s 35
add
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
If the parentheses are removed, it compiles to
.method private hidebysig static void Main (string[] args) cil managed
{
.maxstack 8
.entrypoint
ldc.i4.7
ldc.i4.s 17
add
ldc.i4.s 18
add
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
Removing the parentheses changes the semantics of the code, even to the point that different IL is generated. There are no overloaded operators involved.
In the above example, I'm using integers, and in an unchecked context addition of integers is indeed fully associative. It is not in an checked context, where int.MaxValue + 1 - 7 throws, while int.MaxValue + (1 - 7) does not. And what about floats and doubles? Their additions and multiplications are never fully associative!
Again, addition and multiplication operators in C# are left associative. If I use parentheses to change the semantics of the expression, removing those is a plain and simple bug. Pease, just fix it.
Removing the parentheses changes the semantics of the code, even to the point that different IL is generated.
Any time we change code we may change IL.
And what about floats and doubles? Their additions are never fully associative!
The optimization i was referring to was the one about strings (which i believe you even linked to).
In the above example, I'm using integers, and in an unchecked context addition of integers is indeed fully associative. It is not in an unchecked context, where int.MaxValue + 1 - 7 throws, while int.MaxValue + (1 - 7) does not.
So, in an unchecked context, we would like to retain this behavior. If it's:
We should leave parens.
The optimization i was referring to was the one about strings (which i believe you even linked to).
Are you saying that the string optimization works because of this code? If it is, then you're absolutely correct that those lines cannot be simply removed.
However, I doubt that. I mean: assume this code is called on every expression of type string, in order to change a + (b + c) to a + b + c. Then it would be called on "a" + (1 + 2), and clearly it isn't. AFAIK, that code is only called by refactorings and code fixes to simplify the syntax they generate, not by the optimizer.
Note in fact that "a" + (1 + 2) is another example of an expression where this code incorrectly removes the parentheses.
Take this code
int i = 1 + 2;
string s = "a" + i;
Now, on i, run "Inline Temporary Variable". That changes the code to
string s = "a" + 1 + 2;
Now, correct me if I'm wrong, but that is a bug, no? It should generate
string s = "a" + (1 + 2);
And in fact it first does, but the parentheses are then incorrectly removed.
Compare that to
int i = 1 + 2;
var j = 17 * i;
Run the same refactoring, and it correctly produces
var j = 17 * (1 + 2);
Fix the bug. Please.
What i'm saying is:
I'd like us to keep removing the parens in the cases where it is safe to do so. I would like us to keep the parens in the cases where it isn't.
What I am saying is:
Determining that it actually is safe to remove them would be very complex, and it would require the SemanticModel, which is not available to that code.
We can certainly thread through a semantic model.
There virtually are no cases where it is safe to remove them.
Sure there are. You've already identified at least two of them. For example, string literals and also unchecked integral operations.
Sure there are. You've already identified several of them. For example, string literals and also unchecked integral operations.
Unchecked integral operations or not always safe to remove the parentheses. For example, it may kill the constant folding optimization, as demonstrated above.
So yes, there are some cases where it is safe. The longer we discuss this, the shorter the list becomes. But how often do you run into these cases? And when you do, how bad is it that a pair of parentheses remains? On the other hand, how bad is it when they are incorrectly removed?
As you can see; I am rather passionate about this. Please understand that I am because I have been bitten badly by the bug.
Do you agree that there is a bug in the code? Do you agree that the "Inline Temporary Variable" refactoring is not correct, in the sense that in some cases it changes the semantics of code where it shouldn't?
So now you have two options.
One option is to delete those five lines. That makes this refactoring, and all other refactorings and code fixes correct, even though sometimes it generates parens that aren't needed. In a future release, that cosmetic issue can be fixed.
The other option is to not fix the bug, and have many refactorings and code fixes produce incorrect code, until some future release where the bug may be fixed.
What is the lesser evil?
So now you have two options.
I see three options. The two you outlined, and the additional one: Remove the parens when it is safe to do so. Leave them when it isn't.
Do you agree that there is a bug in the code?
Of course. Ths issue is already marked as 'bug'. We've already accepted that this is a bug and should be fixed. I'm just stating the preferred fix i would like to see.
Unchecked integral operations or not always safe to remove the parentheses. For example, it may kill the constant folding optimization, as demonstrated above.
Can you give an example of where you see this. I'm not sure what you're referring to when you say "as demonstrated above".
Thanks!
I see three options. The two you outlined, and the additional one: Remove the parens when it is safe to do so. Leave them when it isn't.
I should clarify: it sounds to me that for 2.0, you have two options. That third one is for 3.0. in your own words: "This doesn't currently meet our bar for RTM. We can revisit this after that release though."
"This doesn't currently meet our bar for RTM. We can revisit this after that release though."
Yes. This did not meet the bar for RTM. We can now revisit this and do something about this. PRs welcome BTW :)
Can you give an example of where you see this. I'm not sure what you're referring to when you say "as demonstrated above".
In the example in this comment above, removing the parentheses kills the constant folding optimization.
I'll have some time to take a look at this tomorrow. I'm itching to fix a bug anyways. 馃槂
removing the parentheses kills the constant folding optimization.
It changes the IL. But we don't guarantee IL will not be affected by our changes. I'm concerned more with a semantic change at runtime. In the case you referenced, there is no observable change.
Do you have examples of where observable changes happen in unchecked contexts?
I want to be clear however, we will continue removing parentheses where it safe to do so. The way this system works is that we "over" parenthesize expressions in order to make them "inlineable". Then, after inlining, the simplification engine removes any parentheses that were added by the engine if it is possible to do so. (Note that we'll never remove parentheses that the user had already types.) In a couple of cases above it looks like we can do better. However, it is by design that refactoring is not always perfect. That is why we provide a preview -- so you can see what the refactoring is going to do before it does it.
That works for me @DustinCampbell . Looks like we should either blacklist some cases, or whitelist others. I think i'm leaning toward a whitelist myself. That way just removing this optimization is a case where the whitelist is initially empty. As we find cases that are safe to do, we just add them to the whitelist.
In the case you referenced, there is no observable change.
Correct. There is only possible performance impact (depending on whether the JIT or .NET Native can optimizae it afterwards).
Do you have examples of where observable changes happen in unchecked contexts?
No.
Changing expressions of the form a + (b + c) to a + b + c and a * (b * c) to a * b * c for integral types is safe if the types of a, b and c are all equal.
It goes without saying that it's not when they differ, e.g. long + (int + int) is not the same thing as long + int + int.
Ok. Then i think we have a reasonable path forward. @DustinCampbell If you can't get to this, i can take this.
Or @KrisVandermotten If you want to dip your toes in here, we're glad to take PRs!
Or @KrisVandermotten If you want to dip your toes in here, we're glad to take PRs!
:-)
Maybe next time. I trust Dustin and yourself on this one ;-)
I trust Dustin and yourself on this one ;-)
Famous last words?
I said
Changing expressions of the form
a + (b + c)toa + b + canda * (b * c)toa * b * cfor integral types is safe if the types of a, b and c are all equal.
It goes without saying that it's not when they differ, e.g.long + (int + int)is not the same thing aslong + int + int.
That should be if the _converted_ types are all the same. For example, assuming an unchecked context, the parentheses can be removed in
int Add(int a, short b, short c) => a + (b + c);
Any progress?
I've got a fix. Just filling out the tests.