Runtime: Different code behavior in Release and Debug mode.

Created on 1 Dec 2017 · 17Comments · Source: dotnet/runtime

Executing the following code:

private static void Main(string[] args)
{

    var ar = new double[]
    {
        100
    };

    FillTo1(ref ar, 5);
    Console.WriteLine(string.Join(",", ar.Select(a => a.ToString()).ToArray()));
}

public static void FillTo1(ref double[] dd, int N)
{
    if (dd.Length >= N)
        return;

    double[] Old = dd;
    double d = double.NaN;
    if (Old.Length > 0)
        d = Old[0];

    dd = new double[N];

    for (int i = 0; i < Old.Length; i++)
    {
        dd[N - Old.Length + i] = Old[i];
    }
    for (int i = 0; i < N - Old.Length; i++)
        dd[i] = d;
}

The result in Debug mode is: 100,100,100,100,100 but in Release mode is: 100,100,100,100,0

Tested on .net framework 4.7.1 and .net core 2.0.0

Also posted to Stack Overflow (https://stackoverflow.com/questions/47591915/why-code-behavior-is-different-in-release-debug-mode).

area-CodeGen-coreclr bug

Source

ans-ashkan

👍3 🎉1

Most helpful comment

Looks like just not removing statements during CSE is the way to go; it is simple and doesn't have much CQ impact. Also should easily backport to 2.0... Will have a PR up in a sec.

AndyAyersMS on 1 Dec 2017

👍3

All 17 comments

(comment aimed at the repo maintainers, not OP) for categorisation: this looks to be a recent RyuJIT bug; see the answer on the SO question for context.

mgravell on 1 Dec 2017

👍1

The code generated for the first for loop is busted:

       33C0                 xor      eax, eax
       85ED                 test     ebp, ebp
       7E25                 jle      SHORT G_M24688_IG07
G_M24688_IG06:
       488B17               mov      rdx, gword ptr [rdi]                ; rdx = dd
       8D0C03               lea      ecx, [rbx+rax]                      ; 0 + i ?!
       4C63C0               movsxd   r8, eax
       C4817B1044C610       vmovsd   xmm0, qword ptr [r14+8*r8+16]       ; xmm0 = old[i]
       3B4A08               cmp      ecx, dword ptr [rdx+8]              ; dd[n - old.Length + i] range check
       7344                 jae      SHORT G_M24688_IG10
       4863C9               movsxd   rcx, ecx
       C4E17B1144CA10       vmovsd   qword ptr [rdx+8*rcx+16], xmm0      ; dd[n - old.Length + i] = xmm0
       FFC0                 inc      eax                                 ; i++
       3BE8                 cmp      ebp, eax                            ; old.Length > i
       7FDB                 jg       SHORT G_M24688_IG06

G_M24688_IG07:
       33C0                 xor      eax, eax                            ; i = 0
       8BDE                 mov      ebx, esi                            ; ebx = n
       2BDD                 sub      ebx, ebp                            ; n - old.Length
       85DB                 test     ebx, ebx                            
       7E18                 jle      SHORT G_M24688_IG09                 ; n - old.Length <= 0
G_M24688_IG08:
       488B17               mov      rdx, gword ptr [rdi]
       3B4208               cmp      eax, dword ptr [rdx+8]
       7322                 jae      SHORT G_M24688_IG10
       4863C8               movsxd   rcx, eax
       C4E17B1174CA10       vmovsd   qword ptr [rdx+8*rcx+16], xmm6
       FFC0                 inc      eax
       3BD8                 cmp      ebx, eax                             
       7FE8                 jg       SHORT G_M24688_IG08                 ; n - old.Length > i

Where N - Old.Length + i should be we instead have 0 + i. N - Old.Length is computed and stored in ebx only after that loop.

mikedn on 1 Dec 2017

👍2 ❤1

If FillTo1 is modified to this:

public static void FillTo1(ref double[] dd, ref double[] dx, int N)
{
    if (dx.Length >= N)
        return;

and called with FillTo1(ref ar, ref ar, 5); the problem disappears. It reappears when dx is changed to dd. The only IL difference in that case is ldarg.1 vs. ldarg.0.

jeroen-mostert on 1 Dec 2017

If you want a workaround the best one is to stop using ref. Not only that it avoids the bug but it also results in slightly better code being generated. Something like:
```C#
public static double[] Test(double[] dd, int n)
{
if (dd.Length >= n)
return dd;

double[] nn = new double[n];

for (int i = 0; i < dd.Length; i++)
    nn[nn.Length - dd.Length + i] = dd[i];

double d = dd.Length > 0 ? dd[0] : double.NaN;

for (int i = 0; i < nn.Length - dd.Length; i++)
    nn[i] = d;

return nn;

}
```

mikedn on 1 Dec 2017

@mikedn thanks, just reported to help fixing this bug.
There are multiple workarounds including Console.WriteLine(), Thread.Sleep(1), declaring var len=Old.Length and using len instead of Old.Length everywhere and not using ref as you mentioned.

ans-ashkan on 1 Dec 2017

A rather interesting bug.

Loop hoisting creates a copy of N - Old.Length in the pre-header of the first loop to allow CSE to pick it up. This copy is almost dead, its result is not used anywhere at this point. But it can't be removed during morphing because Old.Length has an exception side-effect.

The second loop is turned into a do/while loop and in the process a copy of loop's condition - again N - Old.Length - is also made. This works much like loop hoisting except that this copy is actually used.

When CSE starts there are 4 occurrences of N - Old.Length - one def and one use for every loop. There are also a few other CSE candidates, including Old.Length.

So far so good, nothing unusual.

Now CSE happens to process Old.Length first and replaces all the occurrences in N - Old.Length with a local variable. Every time it does that it remorphs statements and this seems to be where things go down the drain. The statement created by loop hoisting looses its side effects and gets removed as being useless.

When CSE finally processes N - Old.Length it replaces the use inside the first loop with a local variable. But the definition has been removed so the variable will remain uninitialized.

That said, I've no idea what changed recently that could have caused this. There were some changes around side-effect updates, I suppose one possibility is that before side-effects were not updated correctly and that prevented the hoisted expression from being removed. But anyway, it looks like morph should not remove CSE candidate trees while CSE is in progress.

cc @dotnet/jit-contrib

mikedn on 1 Dec 2017

👍3

Looks like CSE#03 N - Old.Length has two defs and has uses of CSE#02 Old.Length nested inside. When CSE#02 is processed we decrement def counts for the first def of CSE#03ut we don't do this same decrement for the second def.

So after processing CSE#02, CSE#03 still looks viable despite its def having been removed, and things go off the rails there.

Some possible fixes:

do not remove CSE defs during CSE as noted above
if the def is removed, invalidate the CSE
after substituting a CSE use, walk up the expression tree and modify the counts of any containing CSE.
process CSEs outer to inner?

AndyAyersMS on 1 Dec 2017

Some possible fixes:

In this particular case it seems that we should not remove the def because doing so defeats loop hoisting.

mikedn on 1 Dec 2017

Some possible fixes:

Or perhaps postpone morphing until CSE is complete? It's not clear why should the same statement be morphed multiple times (apart from wasting cycles...)

mikedn on 1 Dec 2017

Looks like just not removing statements during CSE is the way to go; it is simple and doesn't have much CQ impact. Also should easily backport to 2.0... Will have a PR up in a sec.

AndyAyersMS on 1 Dec 2017

👍3

We do the re-morphing after introducing a CSE def so that the assignment flag gets set correctly for the new tree.

briansull on 1 Dec 2017

so that the assignment flag gets set correctly for the new tree

GTF_ASG? Can't gtUpdateSideEffects be used instead of morphing?

mikedn on 1 Dec 2017

Maybe but that function is a more recent addition, and we need to update all the way up to the top of the tree. When we have a tree with nested CSE we need the assignment flag to preserve the prior CSE defs.
It was updated to be GTF_ASG precise by PR dotnet/coreclr#13668

briansull on 1 Dec 2017

Reopening since we should port this to 2.0 servicing.

AndyAyersMS on 2 Dec 2017

Went and looked at why this doesn't happen in 1.1.

Note CSE numbering differs, but the upshot is that the array length CSE cost went up and so the sort order changed. In 1.1 we CSE'd the outer expression first.

Haven't drilled in to what caused the cost change yet, as it seems like the outer expression should be still be more costly.

;; 1.1

CSE candidate dotnet/coreclr#1, vn=$c2 cseMask=0000000000000001 in BB04, [cost= 3, size= 3]: 
N002 (  3,  3) CSE dotnet/coreclr#1 (use)[000125] ---X--------             *  arrLen    int    $c2
N001 (  1,  1)              [000114] ------------             \--*  lclVar    ref    V02 loc0         u:3 $1c1

CSE candidate dotnet/coreclr#2, vn=$203 cseMask=0000000000000002 in BB06, [cost= 5, size= 5]: 
N002 (  3,  3) CSE dotnet/coreclr#1 (use)[000052] ---X--------             /--*  arrLen    int    $c2
N001 (  1,  1)              [000051] ------------             |  \--*  lclVar    ref    V02 loc0         u:3 $1c1
N004 (  5,  5) CSE dotnet/coreclr#2 (use)[000054] ---X----R---             *  -         int    $203
N003 (  1,  1)              [000050] ------------             \--*  lclVar    int    V01 arg1         u:2 $c0

CSE dotnet/coreclr#2,cseMask=0000000000000002,useCnt=2: [def=100, use=800] :: N004 (  5,  5) CSE dotnet/coreclr#2 (def)[000213] ---X---HR---             *  -         int    $203
CSE dotnet/coreclr#1,cseMask=0000000000000001,useCnt=7: [def= 50, use=1400] :: N002 (  3,  3) CSE dotnet/coreclr#1 (def)[000020] ---X--------             *  arrLen    int    $c2

;; latest (and likely 2.0)

CSE candidate dotnet/coreclr#1, vn=$1c0 cseMask=0000000000000001 in BB03, [cost= 3, size= 2]: 
N002 (  3,  2) CSE dotnet/coreclr#1 (use)[000011] *--XG-------              *  IND       ref    <l:$1c0, c:$201>
N001 (  1,  1)              [000010] ------------              \--*  LCL_VAR   byref  V00 arg0         u:2 $80

CSE candidate dotnet/coreclr#2, vn=$c2 cseMask=0000000000000002 in BB03, [cost= 3, size= 3]: 
N002 (  3,  3) CSE dotnet/coreclr#2 (use)[000020] ---X--------              *  ARR_LENGTH int    <l:$c2, c:$c3>
N001 (  1,  1)              [000019] ------------              \--*  LCL_VAR   ref    V02 loc0         u:3 <l:$1c0, c:$201>

CSE candidate dotnet/coreclr#3, vn=$247 cseMask=0000000000000004 in BB06, [cost= 5, size= 5]: 
N002 (  3,  3) CSE dotnet/coreclr#2 (use)[000052] ---X--------              /--*  ARR_LENGTH int    <l:$c2, c:$c3>
N001 (  1,  1)              [000051] ------------              |  \--*  LCL_VAR   ref    V02 loc0         u:3 <l:$1c0, c:$201>
N004 (  5,  5) CSE dotnet/coreclr#3 (use)[000054] ---X----R---              *  SUB       int    <l:$247, c:$246>
N003 (  1,  1)              [000050] ------------              \--*  LCL_VAR   int    V01 arg1         u:2 $c0

CSE dotnet/coreclr#2,cseMask=0000000000000002,useCnt=8: [def=100, use=1450] :: N003 (  5,  4) CSE dotnet/coreclr#2 (def)[000003] ---XG-------              *  ARR_LENGTH int    <l:$c2, c:$c1>
CSE dotnet/coreclr#3,cseMask=0000000000000004,useCnt=2: [def=100, use=800] :: N004 (  5,  5) CSE dotnet/coreclr#3 (def)[000213] ---X---HR---              *  SUB       int    <l:$247, c:$246>
CSE dotnet/coreclr#1,cseMask=0000000000000001,useCnt=1: [def=100, use= 50] :: N002 (  3,  2) CSE dotnet/coreclr#1 (def)[000002] *--XG-------              *  IND       ref    <l:$1c0, c:$200>

AndyAyersMS on 3 Dec 2017

Looks like in 2.0 there is an extra CSE def of the array length which we pick up and prefer. This new CSE exists because of value numbering through byrefs. Because it has an indir this def has higher cost than the other equivalent CSEs.

Normally for nested CSEs we'd expect the parent CSE to always be more costly than the child. Because of this the algorithm naturally processes nested CSEs from outer->inner and so avoids the bug we see here.

But whens case the child CSE's expression shape can differ based on occurrence and the parent CSE only wraps cheap occurrences, then it is possible to see a child occurrence that is not less costly than the parent. This allows child CSE to be processed before parent and expose the bug.

There are likely ways to expose this in 1.1, if you can create a child CSE with two variant forms that value number equal, put the most expensive one in the dominant position, and then create a wrapping CSE that only wraps the cheaper occurrences, something similar could happen.
```
;; 1.1
N002 ( 3, 3) CSE dotnet/coreclr#1 (def)[000020] ---X-------- --* arrLen int $c2
N001 ( 1, 1) [000019] ------------ --* lclVar ref V02 loc0 u:3 $1c1

;; 2.0
N003 ( 5, 4) CSE dotnet/coreclr#2 (def)[000003] ---XG------- --* ARR_LENGTH int
N002 ( 3, 2) CSE dotnet/coreclr#1 (def)[000002] --XG------- -- IND ref
N001 ( 1, 1) [000001] ------------ --* LCL_VAR byref V00 arg0 u:2 $80

AndyAyersMS on 3 Dec 2017

Closing -- the issue is fixed in master via dotnet/coreclr#15323 and fixed in the 2.0 branch (available in 2.0.6) by dotnet/coreclr#15360.

AndyAyersMS on 23 Jan 2018

Was this page helpful?

0 / 5 - 0 ratings