Runtime: Cast to ushort is dropped in release

Created on 1 Jun 2018 · 14Comments · Source: dotnet/runtime

The following program gives different outputs for release and debug:

public class Program
{
    static short s_2 = -1000;

    public static void Main()
    {
        ulong var1 = (ushort)(1U ^ s_2);
        Console.WriteLine(var1);
    }
}

In release, the result is 4294966297. In debug, the result is 64537.
@mikedn has analyzed the cause here.

This issue repros on .NET framework as well with 64-bit JIT (it does not repro with 32-bit JIT).

area-CodeGen-coreclr bug

Source

jakobbotsch

Most helpful comment

One way to think about narrow types is that when optimizing we'd prefer that they appear only at the boundaries of trees -- widening at leaves (~loads) and narrowing at the root (~stores). As @mikedn says this more or less mirrors IL and machine semantics where narrow types are a storage concept (args, locals, fields), not an operation concept.

When there are casts in the middle of a tree the general idea is to migrate them either up to the root or down to the leaves of the tree (or sometimes both, in complex trees with multiple casts), as this can often lead to simplifications.

But as should be clear, this is something that needs to be done carefully...

AndyAyersMS on 25 Jun 2018

👍3

All 14 comments

cc @dotnet/jit-contrib

jkotas on 1 Jun 2018

I took a closer look at this and this caused by optNarrowTree. It's handling of GT_CAST pays attention only to type sizes and concludes that the tree can be narrowed because short and ushort have the same size:

https://github.com/dotnet/coreclr/blob/422b0e81797d5428812c6ef825e22565e2dc63f3/src/jit/optimizer.cpp#L5830

This is correct with respect to the parent XOR node that can be narrowed from long to int. However, it is not correct to also eliminate the top level cast to ushort.

Ironically, the same optNarrowTree and fgMorphCast combination is also responsible for failing to eliminate casts that are truly redundant in other situations (e.g. dotnet/runtime#10337).

optNarrowTree could probably use a rewrite. As is now, the code is difficult to follow, buggy, misses various optimization opportunities and is likely inefficient (does it really need to traverse the tree twice?).

mikedn on 11 Jun 2018

This one is quite interesting:

// Generated by Fuzzlyn on 2018-06-11 20:55:08
// Seed: 6332377919370969218
// Reduced from 80.2 KiB to 0.6 KiB
// Debug: Outputs 65487
// Release: Outputs -49
struct S0
{
    public byte F0;
    public sbyte F1;
    public sbyte F2;
    public S0(byte f0, sbyte f1, sbyte f2)
    {
        F0 = f0;
        F1 = f1;
        F2 = f2;
    }
}

public class Program
{
    static bool[][] s_9 = new bool[][]{new bool[]{true}};
    public static void Main()
    {
        S0 vr45 = new S0(0, 0, 0);
        M5(vr45);
    }

    static bool M5(S0 arg1)
    {
        arg1 = new S0(0, -50, 0);
        char var0 = (char)(1U ^ arg1.F1);
        System.Console.WriteLine((int)var0);
        return s_9[0][0];
    }
}

In 32-bit and 64-bit .NET Core, the results are as described by the top comments. However in 32-bit .NET Framework (using JIT32), the results are the opposite: In debug, the program prints -49 and in release, the program prints 65487 (the correct result). Should I open a separate issue for this? I am unsure where to report JIT32 bugs.

jakobbotsch on 11 Jun 2018

However in 32-bit .NET Framework (using JIT32), the results are the opposite: In debug, the program prints -49 and in release, the program prints 65487 (the correct result).

Yep, the same optNarrowTree bug exists in JIT32 as well. It happens so that JIT32 manages to do constant folding (correctly) in release mode and hides the bug. Which then brings the question - why does JIT32 (and probably RyuJIT as well) attempts to do narrowing in debug mode...

I am unsure where to report JIT32 bugs.

Developer Community I'd guess - https://developercommunity.visualstudio.com/spaces/61/index.html

mikedn on 12 Jun 2018

👍1

This can cause problems in debug too:

public class Program
{
    static int s_1 = 0xff;
    public static void Main()
    {
        int vr14 = (ushort)(sbyte)s_1;
        System.Console.WriteLine(vr14);
    }
}

This program prints -1 in both debug and release, even though vr14 cannot be negative. The culprit is again optNarrowTree; this tree is morphed in these steps:

               [000014] --CXG-------              *  CAST      int <- ushort <- int
               [000013] --CXG-------              \--*  CAST      int <- byte <- int
               [000005] ----G-------                 |  /--*  FIELD     int    s_1
               [000012] --CXG-------                 \--*  COMMA     int
               [000011] H-CXG-------                    \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000007] ------------ arg0                  +--*  CNS_INT   long   0x7ffe0f4845d0
               [000008] ------------ arg1                  \--*  CNS_INT   int    1

               [000014] --CXG-------              *  CAST      int <- ushort <- int
               [000013] --CXG-------              \--*  CAST      int <- byte <- int
               [000005] ----G+------                 |  /--*  CLS_VAR   int    Hnd=0xf485288 Fseq[s_1]
               [000012] --CXG+------                 \--*  COMMA     int
               [000011] H-CXG+------                    \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000007] -----+------ arg0 in rcx           +--*  CNS_INT   long   0x7ffe0f4845d0
               [000008] -----+------ arg1 in rdx           \--*  CNS_INT   int    1

then the inner cast is narrowed and removed (correct):

               [000014] --CXG-------              *  CAST      int <- ushort <- int
               [000005] ----G+------              |  /--*  CLS_VAR   byte   Hnd=0xf485288 Fseq[s_1]
               [000012] --CXG+------              \--*  COMMA     int
               [000011] H-CXG+------                 \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000007] -----+------ arg0 in rcx        +--*  CNS_INT   long   0x7ffe0f4845d0
               [000008] -----+------ arg1 in rdx        \--*  CNS_INT   int    1

However, the outer-cast has no way to know that it should not narrow this tree further, because of the comma-node (which has type int vs its second oper, which has type byte). So the widening cast is removed and we just end up with:

               [000005] ----G+------                 /--*  CLS_VAR   byte   Hnd=0xf485288 Fseq[s_1]
               [000012] --CXG+------              /--*  COMMA     int
               [000011] H-CXG+------              |  \--*  CALL help long   HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000007] -----+------ arg0 in rcx  |     +--*  CNS_INT   long   0x7ffe0f4845d0
               [000008] -----+------ arg1 in rdx  |     \--*  CNS_INT   int    1
               [000016] -ACXG-------              *  ASG       int
               [000015] D----+-N----              \--*  LCL_VAR   int    V00 loc0

So as far as I can tell, the culprit is this:
https://github.com/dotnet/coreclr/blob/86ca8b5e282ee0b42e27c69cb42c159dfc6e5a4b/src/jit/optimizer.cpp#L5873
which should not be using genActualType. However I am not too sure why the JIT tries so hard to keep nodes wide? Does it ever make sense for a comma-node to have a different type than its right operand?

jakobbotsch on 25 Jun 2018

So as far as I can tell, the culprit is this:

That might be but it's not the only problem with narrow tree, at least because you can reproduce the bug without a COMMA node. It's somewhat difficult to pinpoint the actual problem because it's not really clear what optNarrowTree is trying to do and what a true result really means. My impression is that optNarrowTree's main intention was to deal with long->int narrowing (an useful optimization on 32 bit targets for which this code was originally written) but it strayed, perhaps by accident, into cast folding territory and it's ill designed for that.

I wrote a new one (originally with the intention to address some CQ issues) but it's still pretty much WIP. And it's probably better if a more surgical fix is found for this particular issue. Though I'm a bit skeptical that such a fix exists, one or two I tried resulted in unrelated CQ regressions.

However I am not too sure why the JIT tries so hard to keep nodes wide?

JIT's IR tends to follow IL model (and that one follows typical high level language models) where small integer operations (add, sub, mul etc.) do not exist. Supporting small integer ops would likely add significant complexity and have questionable benefit.

Does it ever make sense for a comma-node to have a different type than its right operand?

Sure, because normally there's no such thing as a small int typed operand. The few nodes that do use small int types (most commonly indirs) actually produce int values by implicit widening.

mikedn on 25 Jun 2018

👍1

But as should be clear, this is something that needs to be done carefully...

AndyAyersMS on 25 Jun 2018

👍3

I see. Thanks to both of you for the explanations, that helps. So to sum up: we would (almost?) never want to actually relabel internal nodes to smaller types?
I assume this is also the reason for the notation on cast-nodes: int <- ushort <- int, etc., to indicate that it is just used to discard bits but the nodes are left the same size?

jakobbotsch on 25 Jun 2018

So to sum up: we would (almost?) never want to actually relabel internal nodes to smaller types?

Yeah, almost. There are some places where the JIT manages to use small types in less usual circumstances. 2 or 3 cases I happen to know about:

XARCH lowering sometimes narrows relops to small int and in doing so may also narrow some binary ops (AFAIR AND/OR/XOR). That's intended to take advantage of the fact that XARCH does have byte/word instructions and it happens to work reasonably well because not much happens after lowering (e.g. no more optimizations).
fgMorphSmpOp sometimes produces TYP_BYTE relop nodes. That's also an optimization intended for XARCH that doesn't have means to produce a 32 bit integer 0/1 value from condition codes (its SETcc instruction is byte only). This particular optimization is a hack that really should be in lowering, you can tell that from the fact that the same code also needs to undo the optimization to prevent storing byte values into lclvars.
AFAIR it may also be possible to see TYP_BOOL relops though I don't remember the exact details.

I assume this is also the reason for the notation on cast-nodes: int <- ushort <- int, etc., to indicate that it is just used to discard bits but the nodes are left the same size?

Yep. These casts are fun, you can classify them as narrowing or widening depending on how you look at them.

mikedn on 25 Jun 2018

👍2

I have surgical fixes for these three cases. I verified no diffs on x64 crossgen frameworks. I will verify diffs on ({x64, x86} x {crossgen, pmi} x {frameworks + tests}) and will create a PR next week.

erozenfeld on 30 Jun 2018

👍2

@jakobbotsch I can repro and have fixes for your first and third examples above. However, I can't repro the second example (that uses struct S0). I get the correct result on core in debug and release x64. Can you double check that you get -49 in release?

erozenfeld on 6 Jul 2018

Can you double check that you get -49 in release?

I think I must have made a mistake in my comment previously. That example only reproduces with 32-bit JIT. I cannot reproduce it with 64-bit JIT.
I have already reported it over at Developer Community -- except that the result is the opposite of .NET core on the full .NET framework.
If you want I can open another issue for it, if it is not the same bug after all.

EDIT: To be clear it reproduces with RyuJIT for 32-bit code. With JIT32 (on full .NET framework), the behavior is flipped: in debug it prints the wrong result, and in release the correct result.

jakobbotsch on 6 Jul 2018

Thank you for clarification. I reproduced that example with x86 RyuJIT and verified that my fixes cover it.

erozenfeld on 7 Jul 2018

👍1

No problem. Once you open the PR I can try to look at the rest of my examples and see if I can find some other corner cases not covered.

jakobbotsch on 7 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings