Roslyn: Bug in ParenthesizedExpressionSyntaxExtensions.cs?

Created on 13 Jun 2016 · 28Comments · Source: dotnet/roslyn

Version Used: Microsoft.CodeAnalysis.CSharp.Workspaces.1.2.1

context
The code in Microsoft.CodeAnalysis.CSharp.Workspaces/Extensions/ParenthesizedExpressionSyntaxExtensions.cs is part of the Simplification feature. The function bool CanRemoveParentheses(this ParenthesizedExpressionSyntax node) determines if parentheses can be removed.

As part of this, the function RemovalChangesAssociation is called (same source file). This function will correctly determine that, for example, the parentheses in a + (b * c) can be removed, but not in a * (b + c).

problem
The problem arises with expressions of the form a + (b + c) and a * (b * c). According to this function, the parentheses can be removed. However, that is not correct.

For example, consider the following code:

class C
{
    public static string operator+(C left, C right) => "";

    public static string Problem => "" + (new C() + new C());
}

expected behavior
Removal of the parenthesis in the Problem expression alters the meaning of the expression, so they cannot be removed.

actual behavior
The current implementation will remove them.

Area-IDE Bug Resolution-Fixed

Source

KrisVandermotten

All 28 comments

In my example, removing the parentheses was bad enough to make the program not compile. But there are very few cases where removing them would be allowed (see also https://ericlippert.com/2015/10/27/optimizing-associative-operations/), and in fact, when it is allowed, it's not needed (https://ericlippert.com/2013/06/17/string-concatenation-behind-the-scenes-part-one/).
The lines that check for this should simply be removed.

KrisVandermotten on 25 Sep 2016

I'm not sure I understand the change of milestone to version 3.0. Does this mean you do not intend to fix this obvious bug in VS 2017? You only need to delete line 258 to 265 in https://github.com/dotnet/roslyn/blob/master/src/Workspaces/CSharp/Portable/Extensions/ParenthesizedExpressionSyntaxExtensions.cs
This is causing real bugs in real code fixes. Please correct it.

KrisVandermotten on 29 Jan 2017

@KrisVandermotten This doesn't currently meet our bar for RTM. We can revisit this after that release though. Thanks!

CyrusNajmabadi on 30 Jan 2017

Would it help if I create a pull request? Seems like a lot of work to delete two lines, but I'm willing to do it.

KrisVandermotten on 8 Feb 2017

PRs welcome. But i'd rather us do a full fix. i.e. keep teh optimization we have now. But not apply if there are overloaded operators.

CyrusNajmabadi on 8 Feb 2017

Not sure what you mean by "optimization".

Also, this is not about overloaded operators.

Take this program:

using System;

class Program
{
    static void Main(string[] args)
    {
        int a = 7;
        const int B = 17;
        const int C = 18;

        Console.WriteLine(a + (B + C));
    }
}

In Visual Studio 2015 Update 3, optimizations on, the Main method compiles to:

.method private hidebysig static void Main (string[] args) cil managed 
{
.maxstack 8
.entrypoint
ldc.i4.7
ldc.i4.s 35
add
call void [mscorlib]System.Console::WriteLine(int32)
ret
}

If the parentheses are removed, it compiles to

.method private hidebysig static void Main (string[] args) cil managed 
{
.maxstack 8
.entrypoint
ldc.i4.7
ldc.i4.s 17
add
ldc.i4.s 18
add
call void [mscorlib]System.Console::WriteLine(int32)
ret
}

Removing the parentheses changes the semantics of the code, even to the point that different IL is generated. There are no overloaded operators involved.

In the above example, I'm using integers, and in an unchecked context addition of integers is indeed fully associative. It is not in an checked context, where int.MaxValue + 1 - 7 throws, while int.MaxValue + (1 - 7) does not. And what about floats and doubles? Their additions and multiplications are never fully associative!

Again, addition and multiplication operators in C# are left associative. If I use parentheses to change the semantics of the expression, removing those is a plain and simple bug. Pease, just fix it.

KrisVandermotten on 8 Feb 2017

Removing the parentheses changes the semantics of the code, even to the point that different IL is generated.

Any time we change code we may change IL.

And what about floats and doubles? Their additions are never fully associative!

The optimization i was referring to was the one about strings (which i believe you even linked to).

CyrusNajmabadi on 8 Feb 2017

In the above example, I'm using integers, and in an unchecked context addition of integers is indeed fully associative. It is not in an unchecked context, where int.MaxValue + 1 - 7 throws, while int.MaxValue + (1 - 7) does not.

So, in an unchecked context, we would like to retain this behavior. If it's:

checked
involving non-associative operators
involving overloaded operators

We should leave parens.

CyrusNajmabadi on 8 Feb 2017

The optimization i was referring to was the one about strings (which i believe you even linked to).

Are you saying that the string optimization works because of this code? If it is, then you're absolutely correct that those lines cannot be simply removed.

However, I doubt that. I mean: assume this code is called on every expression of type string, in order to change a + (b + c) to a + b + c. Then it would be called on "a" + (1 + 2), and clearly it isn't. AFAIK, that code is only called by refactorings and code fixes to simplify the syntax they generate, not by the optimizer.

Note in fact that "a" + (1 + 2) is another example of an expression where this code incorrectly removes the parentheses.

Take this code

int i = 1 + 2;
string s = "a" + i;

Now, on i, run "Inline Temporary Variable". That changes the code to

string s = "a" + 1 + 2;

Now, correct me if I'm wrong, but that is a bug, no? It should generate

string s = "a" + (1 + 2);

And in fact it first does, but the parentheses are then incorrectly removed.

Compare that to

int i = 1 + 2;
var j = 17 * i;

Run the same refactoring, and it correctly produces

var j = 17 * (1 + 2);

Fix the bug. Please.

KrisVandermotten on 9 Feb 2017

What i'm saying is:

I'd like us to keep removing the parens in the cases where it is safe to do so. I would like us to keep the parens in the cases where it isn't.

CyrusNajmabadi on 9 Feb 2017

What I am saying is:

There virtually are no cases where it is safe to remove them.
In the one case where it is always safe to remove them, string concatenation, there is no need to remove them.
Determining that it actually is safe to remove them would be very complex, and it would require the SemanticModel, which is not available to that code.

KrisVandermotten on 9 Feb 2017

Determining that it actually is safe to remove them would be very complex, and it would require the SemanticModel, which is not available to that code.

We can certainly thread through a semantic model.

There virtually are no cases where it is safe to remove them.

Sure there are. You've already identified at least two of them. For example, string literals and also unchecked integral operations.

CyrusNajmabadi on 9 Feb 2017

Sure there are. You've already identified several of them. For example, string literals and also unchecked integral operations.

Unchecked integral operations or not always safe to remove the parentheses. For example, it may kill the constant folding optimization, as demonstrated above.

So yes, there are some cases where it is safe. The longer we discuss this, the shorter the list becomes. But how often do you run into these cases? And when you do, how bad is it that a pair of parentheses remains? On the other hand, how bad is it when they are incorrectly removed?

As you can see; I am rather passionate about this. Please understand that I am because I have been bitten badly by the bug.

Do you agree that there is a bug in the code? Do you agree that the "Inline Temporary Variable" refactoring is not correct, in the sense that in some cases it changes the semantics of code where it shouldn't?

So now you have two options.

One option is to delete those five lines. That makes this refactoring, and all other refactorings and code fixes correct, even though sometimes it generates parens that aren't needed. In a future release, that cosmetic issue can be fixed.

The other option is to not fix the bug, and have many refactorings and code fixes produce incorrect code, until some future release where the bug may be fixed.

What is the lesser evil?

KrisVandermotten on 9 Feb 2017

So now you have two options.

I see three options. The two you outlined, and the additional one: Remove the parens when it is safe to do so. Leave them when it isn't.

Do you agree that there is a bug in the code?

Of course. Ths issue is already marked as 'bug'. We've already accepted that this is a bug and should be fixed. I'm just stating the preferred fix i would like to see.

CyrusNajmabadi on 9 Feb 2017

Unchecked integral operations or not always safe to remove the parentheses. For example, it may kill the constant folding optimization, as demonstrated above.

Can you give an example of where you see this. I'm not sure what you're referring to when you say "as demonstrated above".

Thanks!

CyrusNajmabadi on 9 Feb 2017

I see three options. The two you outlined, and the additional one: Remove the parens when it is safe to do so. Leave them when it isn't.

I should clarify: it sounds to me that for 2.0, you have two options. That third one is for 3.0. in your own words: "This doesn't currently meet our bar for RTM. We can revisit this after that release though."

KrisVandermotten on 9 Feb 2017

"This doesn't currently meet our bar for RTM. We can revisit this after that release though."

Yes. This did not meet the bar for RTM. We can now revisit this and do something about this. PRs welcome BTW :)

CyrusNajmabadi on 9 Feb 2017

Can you give an example of where you see this. I'm not sure what you're referring to when you say "as demonstrated above".

In the example in this comment above, removing the parentheses kills the constant folding optimization.

KrisVandermotten on 9 Feb 2017

I'll have some time to take a look at this tomorrow. I'm itching to fix a bug anyways. 😃

DustinCampbell on 9 Feb 2017

❤1

removing the parentheses kills the constant folding optimization.

It changes the IL. But we don't guarantee IL will not be affected by our changes. I'm concerned more with a semantic change at runtime. In the case you referenced, there is no observable change.

Do you have examples of where observable changes happen in unchecked contexts?

CyrusNajmabadi on 9 Feb 2017

I want to be clear however, we will continue removing parentheses where it safe to do so. The way this system works is that we "over" parenthesize expressions in order to make them "inlineable". Then, after inlining, the simplification engine removes any parentheses that were added by the engine if it is possible to do so. (Note that we'll never remove parentheses that the user had already types.) In a couple of cases above it looks like we can do better. However, it is by design that refactoring is not always perfect. That is why we provide a preview -- so you can see what the refactoring is going to do before it does it.

DustinCampbell on 9 Feb 2017

That works for me @DustinCampbell . Looks like we should either blacklist some cases, or whitelist others. I think i'm leaning toward a whitelist myself. That way just removing this optimization is a case where the whitelist is initially empty. As we find cases that are safe to do, we just add them to the whitelist.

CyrusNajmabadi on 9 Feb 2017

👍1

In the case you referenced, there is no observable change.

Correct. There is only possible performance impact (depending on whether the JIT or .NET Native can optimizae it afterwards).

Do you have examples of where observable changes happen in unchecked contexts?

No.

Changing expressions of the form a + (b + c) to a + b + c and a * (b * c) to a * b * c for integral types is safe if the types of a, b and c are all equal.

It goes without saying that it's not when they differ, e.g. long + (int + int) is not the same thing as long + int + int.

KrisVandermotten on 9 Feb 2017

Ok. Then i think we have a reasonable path forward. @DustinCampbell If you can't get to this, i can take this.

Or @KrisVandermotten If you want to dip your toes in here, we're glad to take PRs!

CyrusNajmabadi on 9 Feb 2017

Or @KrisVandermotten If you want to dip your toes in here, we're glad to take PRs!

:-)

Maybe next time. I trust Dustin and yourself on this one ;-)

KrisVandermotten on 9 Feb 2017

I trust Dustin and yourself on this one ;-)

Famous last words?

DustinCampbell on 9 Feb 2017

😄1

I said

Changing expressions of the form a + (b + c) to a + b + c and a * (b * c) to a * b * c for integral types is safe if the types of a, b and c are all equal.
It goes without saying that it's not when they differ, e.g. long + (int + int) is not the same thing as long + int + int.

That should be if the _converted_ types are all the same. For example, assuming an unchecked context, the parentheses can be removed in

int Add(int a, short b, short c) => a + (b + c);

Any progress?

KrisVandermotten on 18 Feb 2017

I've got a fix. Just filling out the tests.

DustinCampbell on 18 Feb 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Too many open issues :)

marler8997 · 3Comments

Proposal: Bidirectional mapping syntax:

NikChao · 3Comments

"Implement abstract class" reorders members

OndrejPetrzilka · 3Comments

Inline temporary variable makes function return wrong value

nlwolf · 3Comments

Proposal: better editor support for commented code

vbcodec · 3Comments