Roslyn: Multiple `+ sizeof()` not being compiled down to a single add instruction in conjunction with an argument

Created on 19 Dec 2018  路  34Comments  路  Source: dotnet/roslyn

Version Used: should be the latest

Steps to Reproduce:

https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQxASwDYB8ACAGAAhwEYBuAWACgcBmIgJgIGECBvKgzoutAOxmYQMAYwAUfAbwCUbDl3k4A7AV4EA1AShoAXnAD2AM1HAAnvBkatuw6ICuUABZ6EMC5p36jEqRUryAvlT+QA

  1. Write some code
    public int Calc(int n) {
        return n + sizeof(byte) + sizeof(ushort) + sizeof(int);
    }
  1. ildasm it
    // Methods
    .method public hidebysig 
        instance int32 Calc (
            int32 n
        ) cil managed 
    {
        // Method begins at RVA 0x2050
        // Code size 8 (0x8)
        .maxstack 8

        IL_0000: ldarg.1
        IL_0001: ldc.i4.1
        IL_0002: add
        IL_0003: ldc.i4.2
        IL_0004: add
        IL_0005: ldc.i4.4
        IL_0006: add
        IL_0007: ret
    } // end of method C::Calc
  1. see the multiple adds and be sad

Expected Behavior:

    // Methods
    .method public hidebysig 
        instance int32 Calc (
            int32 n
        ) cil managed 
    {
        // Method begins at RVA 0x2050
        // Code size 8 (0x8)
        .maxstack 8

        IL_0000: ldarg.1
        IL_0001: ldc.i4.7
        IL_0002: add
        IL_0007: ret
    } // end of method C::Calc

Actual Behavior:

    // Methods
    .method public hidebysig 
        instance int32 Calc (
            int32 n
        ) cil managed 
    {
        // Method begins at RVA 0x2050
        // Code size 8 (0x8)
        .maxstack 8

        IL_0000: ldarg.1
        IL_0001: ldc.i4.1
        IL_0002: add
        IL_0003: ldc.i4.2
        IL_0004: add
        IL_0005: ldc.i4.4
        IL_0006: add
        IL_0007: ret
    } // end of method C::Calc

thankfully the JIT ASM is properly adding 0x7

C.Calc(Int32)
    L0000: mov eax, edx
    L0002: add eax, 0x7
    L0005: ret
Area-Compilers

Most helpful comment

The operator is left-associative, so the correct expression binding here is ((((n + sizeof(byte)) + sizeof(ushort)) + sizeof(int)).

Ah, I see; and indeed, if we change it to return sizeof(byte) + sizeof(ushort) + sizeof(int) + n;, the compiler folds the expression to 7 + n.

Maybe it would be worth updating the analyzers/code-fixes to take this into account...

All 34 comments

thankfully the JIT ASM is properly adding 0x7

Sounds like this is fine then. Why need the compiler to do anything here?

something this simple seems like a no-brainer for the compiler to do

Why does the compiler need to do anything here given that it won't have any affect on the final code that is executed? Any effort here would be time not spent on things that would have a more positive effect on an application?

That said, if you wanted to contribute a fix here, it's possible that that coudl be accepted :)

The same happens for:

    public int Calc2(int n) {
        return n + 1 + 2 + 3;   
    }

@SirJosh3917 Constant calculation can be forced via:

    public int Calc(int n) {
        const int size = sizeof(byte) + sizeof(short) + sizeof(int);
        return n + size;
    }

This happens only when the operators have the same precedence. Once you throw in an operator with a higher precedence (say, * or /) or add parentheses to change the order of evaulation, the appropriate parts of the expression are indeed constant-folded.

I'd say this happens because these expressions are evaluated left-to-right. The binder can do constant-folding as long as the left side of the operation is constant.

c# int a = 1 + 2 + i + 3; // folds 1 + 2 -> 3 int b = i + (4 + 5); // folds 4 + 5 -> 9 int c = i + 6 * 7; // folds 6 * 7 -> 42

Why does the compiler need to do anything here given that it won't have any affect on the final code that is executed? Any effort here would be time not spent on things that would have a more positive effect on an application?

@CyrusNajmabadi, because it can impact multiple things downstream in the runtime/JIT. The JIT, due to time constraints/etc has to take a very narrow view of things when determining inlining and certain other optimizations (unlike an AOT compiler, it doesn't have the ability to do all the analysis itself).

If the C# compiler has a case where it can produce better/smaller IL and remove the need for the JIT to do the additional analysis/etc, then it can have a positive impact downstream (the method has a higher chance of being inlined, the JIT doesn't need to produce as many nodes, it doesn't have to fold the values together, itself etc). This also means that, since the JIT doesn't have to do that work itself, it can instead spend that time on optimizations that the C# compiler can't or won't make.

This seems like a farily trivial case and something that the compiler already should be handling (and it indeed does in some cases).

When adding the parentheses, they are flagged as unnecessary by an analyzer. Therefore, we're changing the generated code by just applying an analyzer, _possibly_ degrading jit constant folding time.

@CyrusNajmabadi, because it can impact multiple things downstream in the runtime/JIT. The JIT, due to time constraints/etc has to take a very narrow view of things when determining inlining and certain other optimizations (unlike an AOT compiler, it doesn't have the ability to do all the analysis itself).

But it already does this. So the JIT has made the determination that constant folding is under its purview (which is basically the position of every real world jit i've ever seen). this isn't a hypothetical where we can wax philosophical about whose responsibility this is. It's literally a case where the jits said "yup, we're going to do this". So why reimplement optimizations they're already making?

possibly degrading jit constant folding time.

Let's actually measure. Instead of guessing. Can someone give hte data on the end impact to the user of folding this versus not? It shouldn't be hard to have a benchmark that shows if there is anything to be gained here. That's how we should be driving any work based on speculation that it might have a perf impact of some sort.

Wondering why 1 + 2 + 3 isn't considered constant in i + 1 + 2 + 3, as 1 + 2 + 3 does become a left hand side term?

LE: Figured it out. It becomes (i + 1 + 2 + 3)

@CyrusNajmabadi one concrete example would be losing inlining as it can put a method above the quota.

https://github.com/dotnet/coreclr/blob/c0c51f83e56eb37bb5a257c0e82ca6676894d6c1/src/jit/compiler.h#L9353

Removing IL instructions will increase inlineability of a method.

Instead of guessing.

@CyrusNajmabadi, This isn't guessing. What the JIT does and doesn't do is fairly well known and being constantly iterated on. The limitations around inlining and problems from bloated IL are also fairly well known (there are multiple issues over on CoreCLR discussing all of this).

It is not and should not be up to the JIT to perform every optimization. The C# compiler, where possible, should be trying to generate efficient IL while still preserving any semantics it requires. In cases like this, where it can reduce the size of the IL emitted while still preserving the semantics, it should because it reduces that burden off the JIT (we are currently emitting almost 9 bytes of IL, in Release code; when we could be emitting 3 bytes instead). The JIT has to do everything at runtime and is very much limited by that fact. It has to worry about memory overhead and startup time of the actual user code it is trying to emit "just in time" and as most of us should know very well, startup time is very important 馃槃 . The C# compiler, on the other hand, does everything at compile time, it is much less limited and has the time/ability to perform additional analysis (just like a native or AOT compiler has). It can and should do that, where possible so that the JIT has less overhead on itself.

Removing IL instructions will increase inlineability of a method.

Why is inlinability dependent on the original size, and not the resultant size?

Wondering why 1 + 2 + 3 isn't considered constant in i + 1 + 2 + 3, as 1 + 2 + 3 does become a left hand side term?

Because the first is: ((i + 1) + 2) + 3. Something would need to see and evaluate that this was always hte same as i + (1 + 2 + 3). Note that for things like floats this is definitely not ensured. I'm not 100% sure though if it can be guaranteed for integral types. I'm sure someone can probably chime in on that if there's a case where it's not safe.

The JIT has to do everything at runtime

This is not true.

The C# compiler, on the other hand, does everything at compile time, it is much less limited

There is no reason there can't be an IL compiler that does things like this. Then, instead of requiring that every, single, compiler, under, the sun, implement the same optimizations, we just get the optimizations once. If this can all be determined by the jit anyways, then there's no reason this could just be in code that utilizes that same prowess, just with a flag of "you don't have to worry about time because we're not in a time limited scenario".

What's next, go make F# implement this? And Vb? And Cobol? :) And every single other optimization that can be done purely on IL that the runtime already had the smarts for? :) This would be a massive duplication of efforts instead of putting the effort in a single place where everyone can benefit from it. IL-based optimizations should not be in the front-ends. Any sensible compiler infrastructure would place it in the 'middle to back'-ends. Can you imagine LLVM expecting all the front ends to have to do this sort of thing? Of course not. Constant folding operates sensibly on the IR representation so that literally every LLVM-backed language gets this for free.

@CyrusNajmabadi, This isn't guessing.

Absent any practical data showing the cost, it certainly seems like guessing. If this is something that does have such an impact, then it should be something we can show (and thus measure the efficacy of). However, even if this did have an impact (because inlining depends on pre-optimized IL), then this would still be something i would say should be done by an IL-optimizer since that's the appropriate representation level for this sort of analysis. It's a one time optimization that you want applied to all IL, not just C# IL (unless there is something i'm missing here and this is a C# specific situation).

if this is C# specific, then i retract. However, based on the fact that the runtime does optimize these for many IL constructs, it feels unrelated to C#, and thus is better at a compiler layer below that.

The JIT has to do everything at runtime

This is not true.

Yes it is. The JIT only happens at runtime. There may be other intermediate processes that can run before (like ngen, crossgen, aot, etc) and which can pass additional information to the JIT; but when it comes to the JIT itself, it has to take exactly what it was given and work with it at runtime.

There is no reason there can't be an IL compiler that does things like this.

For some optimizations, yes. I completely agree that having an intermediate and dedicated ilopt.exe would be great. It should take the IL and do analysis on it. It should do all the simple optimizations (like constant folding) and it should try to do more expensive analysis and leave hints/breadcrumbs to the JIT where possible. However, we have no such team to do that today and not all optimizations are possible to do at this level.

  • A valid F# optimization is not necessarily a valid C# optimization (and vice versa). F#, for example, supports method inlining at compile time. C# does not.
  • Sometimes, the best optimizations are made when you have the full context of the original source file

However, we have no such team to do that today and not all optimizations are possible to do at this level.

Sure. But this is one of the ones that can and should be.

A valid F# optimization is not necessarily a valid C# optimization (and vice versa). F#, for example, supports method inlining at compile time. C# does not.

Of course. I mentioned as much above. But that's not the case here. Constant folding is like one of the oldest, most well understood, and easiest optimizations to be done at an IR level. It may literally have been the first example of an optimization in my compiler class 20 years ago.

However, we have no such team to do that today

It seems weird that instead of investing the effort so we can only do these things once, we just externalize the cost by making it so N teams have to do it once. This is a common way that costs increase in a way that isn't measured well. Because each team pays a small amount more, no one cares. But the net effect is far more cost than just investing in the systems that let this only be done once. Imagine if we had done this for actual native compilation, and each compiler now not only needed to do the IL optimizations, but also the native optimizations.

Other projects out there, like LLVM, figured this out. It's unclear why don't.

Yes it is. The JIT only happens at runtime

Sorry. You're right. By 'runtime' i meant the '.net runtime', not the jit part in particular. We have invested a lot in the past in things like AOT/Native compilation for that, so that we don't need the jit to be so responsible for all of this. But, over and over it seems like we dont' fully invest here, and we just push it on other teams to handle.

There has been some thought about using the mono linker for things like this (announcement).

This optimization is clearly illegal according to the C# language specification, as constant folding only occurs on constant expressions.

This optimization is clearly illegal according to the C# language specification, as constant folding only occurs on constant expressions.

@agocke, could you clarify? Does C# take a view that sub-expressions cannot themselves be folded?

Looking at the spec, as best as I can tell, we don't have a single-expression for return n + sizeof(byte) + sizeof(ushort) + sizeof(int);, but rather rather a return-statement which has an additive-expression which itself has two sub-expressions (what looks to be a primary-expression and another additive-expression), the latter of which qualifies as a constant-expression.

Given that sizeof is allowed as part of a constant-expression, that we have the sub-expressions that would qualify as a constant-expression, and that parenthesizing part of the expression (that is doing return n + (sizeof(byte) + sizeof(ushort) + sizeof(int));) does cause the folding to happen, I would expect that the compiler should also be able to do the folding on the original code as well (or at least could not find anything in the spec that prevents or allows this).

return-statement which has an additive-expression which itself has two sub-expressions (what looks to be a primary-expression and another additive-expression), the latter of which qualifies as a constant-expression

That is incorrect. https://github.com/dotnet/csharplang/blob/master/spec/expressions.md#operator-precedence-and-associativity

The operator is left-associative, so the correct expression binding here is ((((n + sizeof(byte)) + sizeof(ushort)) + sizeof(int)). None of those binary expressions are constant.

The operator is left-associative, so the correct expression binding here is ((((n + sizeof(byte)) + sizeof(ushort)) + sizeof(int)).

Ah, I see; and indeed, if we change it to return sizeof(byte) + sizeof(ushort) + sizeof(int) + n;, the compiler folds the expression to 7 + n.

Maybe it would be worth updating the analyzers/code-fixes to take this into account...

Maybe it would be worth updating the analyzers/code-fixes to take this into account...

Could you clarify what you mean here?

This optimization is clearly illegal according to the C# language specification, as constant folding only occurs on constant expressions.

Constant-folding would be an internal implementation detail. C# defines semantics for what a given expression means. As long as the generated IL produces an indistinguishable result, then the compiler should be able to do whatever it wants.

And, for the bits about 'constants' that's only relevant in the sense of being able to assign this sort of thing into a 'const' field/local. i.e. in that case the expression must actually have a constant value. There is nothing that precludes the compiler from taking i + 1 + 2 and emitting that in il as i + 3 as long as it can be proven that the results will be the same in all cases.

I was indicating it might be nice if the analyzers (such as the Remove unnecessary parentheses analyzer) understood when certain changes are impactful to the compiler's view of the world.

However, I agree that just having the compiler understand that + for integer types in an unchecked context is commutative would also be nice.

I was indicating it might be nice if the analyzers (such as the Remove unnecessary parentheses analyzer) understood when certain changes are impactful to the compiler's view of the world.

IMO, they shouldn't be :). I mean, literally any change the analyzers/fixers make could change something that affects some jit. The goal is to be useful. If someone would actually be affected by this change, then i would strongly advise that they just mark their methods as perf critical and not allow anything to touch it.

Note: this is a similar conversation that was happening about IDE features helping people use linq (or not) if that's what they want. But these can def impact perf depending on how critical a hotspot something is.

This optimization would absolutely be observable in runtime behavior but I'm going to leave how as an exercise to the reader :)

This optimization would absolutely be observable in runtime behavior but I'm going to leave how as an exercise to the reader

For checked contexts (where any individual operation could cause an overflow) and for non-integer types (where primitive operations are not always associative), this could possibly be illegal and produce observable side-effects. However, I'm not sure how this would be impactful for simple scenarios like the above, as the JIT will already perform this optimization for integer types in an unchecked context (as there are no side-effects for this scenario and the + operation for integer types is considered both associative and commutative).

I am actually curious. Is it observable in an unchecked context?

I don't see any way it can be observed in an unchecked context.

Sorry, for addition on integers. It's obviously observable for any non associative operation, regardless of context.

Was this page helpful?
0 / 5 - 0 ratings