Roslyn: csc optimizer suboptimal

Created on 29 May 2018 · 6Comments · Source: dotnet/roslyn

Version Used:
2.7.0.62715 (db02128e)

Steps to Reproduce:

Save source code for the version you want to investigate.
Compile it with csc /o
Observe the IL with f.ex. ildasm.

Expected Behavior:
All cases resulting in the same optimal IL.

Actual Behavior:
Many cases resulting in total crap IL.

Use the following to play with
c# static class C { // Compiling with optimization turned ON, the following lines should // be able to generate the same code. static bool fn(this int val) { // bool retval = (val % 2 == 0) ? true : false; return retval; bool retval = (val % 2 != 0) ? false : true; return retval; // bool retval; retval = (val % 2 == 0) ? true : false; return retval; // bool retval; if (val % 2 == 0) retval = true; else retval = false; return retval; // if (val % 2 == 0) { return true; } else return false; // the following three are thankfully generating the same code with /o // bool retval; retval = (val % 2 == 0); return retval; // bool retval = (val % 2 == 0); return retval; // return val % 2 == 0; } }

Area-Compilers Feature Request Resolution-By Design

Source

tamlin-mike

All 6 comments

The flag /o causes the compiler to not generate debug information, but the compiler (by design) performs only the simplest of local "optimizations". Generally, we put optimizations in our runtime compilers.

gafter on 30 May 2018

👍1

@tamlin-mike, it may also be worth noting that, while the IL could "be improved", the only thing that actually matters is the output of the JIT.

In some cases, the "more complex" IL has special recognition by the JIT or is done to meet language specification requirements.

tannergooding on 30 May 2018

👎1 👍1

I didn't realize I'd have to back this up with profiling data.

@gafter I thought it was an actual optimizer and therefore expected a certain amount of quality and attention to details. When you now explained it's a non-optimizer _by design_ I see my expectations are diametrically opposite to reality.
Perhaps this fact (_only the simplest of local "optimizations"_) should be added to the help output from csc, to prevent future confusion from people expecting an actual optimizer to kick in?

@tannergooding I find that argument to be like claiming "what machinecode c2.dll emits is irellevant, what matters is only the CPU generated microcode output". Had such a claim been made by someone on the c2 compiler optimization team, I'd expect a vacancy in a very short amount of time.

To me, in general, larger = slower (unless for cache-line alignment); More complex IL would mean more work later (= slower to parse and optimize) for the JIT compiler to try to figure stuff out, and since the JIT by neccessity has got way less time to optimize than the off-line compiler, it follows logic it simply can't do as good of a job (= slower generated CPU machinecode).

In some cases, the "more complex" IL has special recognition by the JIT or is done to meet language specification requirements.

I fail to see how this have any relevance to the provided example. Could you elaborate?

tamlin-mike on 30 May 2018

You may want to read this: https://blogs.msdn.microsoft.com/ericlippert/2009/06/11/what-does-the-optimize-switch-do/

Joe4evr on 30 May 2018

Perhaps this fact (only the simplest of local "optimizations") should be added to the help output from csc, to prevent future confusion from people expecting an actual optimizer to kick in?

What tangible problem would that solve (beyond the case of people reading a check box label and jumping to conclusions)?

Had such a claim been made by someone on the c2 compiler optimization team, I'd expect a vacancy in a very short amount of time.

Perhaps. But we'll never know because you created the issue in the wrong repository. This is basically the "c1/c1xx" repository, not the c2 "repository". If you have an actual problem with the performance of the code generated by the JIT for such patterns then you should post in the coreclr repository.

To me, in general, larger = slower (unless for cache-line alignment); More complex IL would mean more work later (= slower to parse and optimize) for the JIT compiler to try to figure stuff out, and since the JIT by neccessity has got way less time to optimize than the off-line compiler, it follows logic it simply can't do as good of a job (= slower generated CPU machinecode).

Do you have any evidence for all this? It sounds like you are making all kinds of assumptions.

I fail to see how this have any relevance to the provided example. Could you elaborate?

It really shouldn't be relevant but the JIT has its own issues and it's can be quite sensitive to the IL shape.

mikedn on 30 May 2018

Example 1, Example 2, Example 3, and Example 5 generate equivalent assembly on the Desktop JIT for x86

Example 4 generates slightly different assembly but variable assignment can have side effects so I question whether 4 is actually semantically equivalent to the others.

Example 1:

sharplab.io

C#:
```C#
bool retval = (val % 2 == 0) ? true : false; return retval;

**IL:**
```C#
    IL_0000: ldarg.0
    IL_0001: ldc.i4.2
    IL_0002: rem
    IL_0003: brfalse.s IL_0007
    IL_0005: ldc.i4.0
    IL_0006: ret
    IL_0007: ldc.i4.1
    IL_0008: ret

x86 Assembly:

    L0000: and ecx, 0x80000001
    L0006: jns L000d
    L0008: dec ecx
    L0009: or ecx, 0xfffffffe
    L000c: inc ecx
    L000d: test ecx, ecx
    L000f: jz L0014
    L0011: xor eax, eax
    L0013: ret
    L0014: mov eax, 0x1
    L0019: ret

Example 2:

sharplab.io

C#:
```C#
bool retval = (val % 2 != 0) ? false : true; return retval;

**IL:**
```C#
    IL_0000: ldarg.0
    IL_0001: ldc.i4.2
    IL_0002: rem
    IL_0003: brtrue.s IL_0007
    IL_0005: ldc.i4.1
    IL_0006: ret
    IL_0007: ldc.i4.0
    IL_0008: ret

x86 Assembly:

    L0000: and ecx, 0x80000001
    L0006: jns L000d
    L0008: dec ecx
    L0009: or ecx, 0xfffffffe
    L000c: inc ecx
    L000d: test ecx, ecx
    L000f: jnz L0017
    L0011: mov eax, 0x1
    L0016: ret
    L0017: xor eax, eax
    L0019: ret

Example 3:

sharplab.io

C#:
```C#
bool retval; retval = (val % 2 == 0) ? true : false; return retval;

**IL:**
```C#
    IL_0000: ldarg.0
    IL_0001: ldc.i4.2
    IL_0002: rem
    IL_0003: brfalse.s IL_0007
    IL_0005: ldc.i4.0
    IL_0006: ret
    IL_0007: ldc.i4.1
    IL_0008: ret

x86 Assembly:

    L0000: and ecx, 0x80000001
    L0006: jns L000d
    L0008: dec ecx
    L0009: or ecx, 0xfffffffe
    L000c: inc ecx
    L000d: test ecx, ecx
    L000f: jz L0014
    L0011: xor eax, eax
    L0013: ret
    L0014: mov eax, 0x1
    L0019: ret

Example 4:

sharplab.io

C#:
```C#
bool retval; if (val % 2 == 0) retval = true; else retval = false; return retval;

**IL:**
```C#
    IL_0000: ldarg.0
    IL_0001: ldc.i4.2
    IL_0002: rem
    IL_0003: brtrue.s IL_0009
    IL_0005: ldc.i4.1
    IL_0006: stloc.0
    IL_0007: br.s IL_000b
    IL_0009: ldc.i4.0
    IL_000a: stloc.0
    IL_000b: ldloc.0
    IL_000c: ret

x86 Assembly:

    L0000: push ebp
    L0001: mov ebp, esp
    L0003: and ecx, 0x80000001
    L0009: jns L0010
    L000b: dec ecx
    L000c: or ecx, 0xfffffffe
    L000f: inc ecx
    L0010: test ecx, ecx
    L0012: jnz L001b
    L0014: mov eax, 0x1
    L0019: jmp L001d
    L001b: xor eax, eax
    L001d: pop ebp
    L001e: ret

Example 5:

sharplab.io

C#:
```C#
if (val % 2 == 0) { return true; } else return false;

**IL:**
```C#
    IL_0000: ldarg.0
    IL_0001: ldc.i4.2
    IL_0002: rem
    IL_0003: brtrue.s IL_0007
    IL_0005: ldc.i4.1
    IL_0006: ret
    IL_0007: ldc.i4.0
    IL_0008: ret

x86 Assembly:

    L0000: and ecx, 0x80000001
    L0006: jns L000d
    L0008: dec ecx
    L0009: or ecx, 0xfffffffe
    L000c: inc ecx
    L000d: test ecx, ecx
    L000f: jnz L0017
    L0011: mov eax, 0x1
    L0016: ret
    L0017: xor eax, eax
    L0019: ret