Runtime: tail. call got stack overflow error !?

Created on 9 Aug 2020  路  18Comments  路  Source: dotnet/runtime

Description

I try to convert while loop code to to tail call recursion function but I got a stack overflow error when use tail. call while jmp has no problem.

Runtime framework version

.Net Core 3.1

Base code

    Sub double_to_fraction()
        Dim Base = System.Math.PI
        Dim N1, N2 As Integer
        Dim Tmp = 1.0

        While Tmp <> Base
            If Tmp < Base Then
                N1 += 1
            Else
                N2 += 1
                N1 = System.Math.Round(N2 * Base)
            End If
            Tmp = N1 / N2
        End While

        Console.WriteLine(N1 & "/" & N2)
    End Sub

Recursion form

    Sub double_to_fraction2()
        Dim Base = System.Math.PI
        Dim N = (T:=1, B:=1, N:=1.0).
                       do(Base,
                           Function(V, S) V.N = S,
                           Function(V, S)
                               If V.N < S Then
                                   V.T += 1
                               Else
                                   V.B += 1
                                   V.T = System.Math.Round(V.B * S)
                               End If
                               V.N = V.T / V.B
                               Return V
                           End Function)

        Console.WriteLine(N.T & "/" & N.B)
    End Sub

Tail call recursion method (do method) [Error : Stack overflow]

//===========================
// la.2.0.1 used f: la ; : la.3.0.1 used.1 la.1.2.3 re use-me
//===========================
//  "using data" isn't appear in code but I use it for shorten code on this post.
using data = ValueTuple`3[int,int,double];

data do(data, double, Func`3[data,double,bool], Func`3[data,double,data])
{
ldarg 2
ldarg 0
ldarg 1
call bool Invoke(data, double)
brfalse 0
ldarg 0
ret
0:
ldarg 3
ldarg 0
ldarg 1
call data Invoke(data, double)
ldarg 1
ldarg 2
ldarg 3
tail.call data  do(data, double, Func`3[data,double,bool], Func`3[data,double,data])
ret
}

Jump op recursion (do method) [Worked]

//===========================
// la.2.0.1 used t: la.3.0.1 used.1 sa jmp-me : la
//===========================
//  "using data" isn't appear in code but I use it for shorten code on this post.
using data = ValueTuple`3[int,int,double];

data do(data, double, Func`3[data,double,bool], Func`3[data,double,data])
{
ldarg 2
ldarg 0
ldarg 1
call Boolean Invoke(data, Double)
brtrue 0
ldarg 3
ldarg 0
ldarg 1
call data Invoke(data, Double)
starg 0
jmp data do(data, Double, Func`3[data,double,bool], Func`3[data,double,data])
0:
ldarg 0
ret
}

Comment

Tail op is supposed to fixed stack overflow from recursion method not cause it so I believe this should be a bug.

Addition info

I write do method via System.Reflection.Emit.DynamicMethod, if nothing wrong in runtime framework, it might be cause from ILGenerator.

Reproduce error project

This is repo of vs solution of tail call stack over flow error, for source code of ILS.dll view at IL Script.

tail call stack overflow.zip

category:correctness
theme:tail-call
skill-level:expert
cost:large

area-CodeGen-coreclr needs more info

Most helpful comment

Regardless of the platform this should also be fixed in .NET 5.0 by #341.

All 18 comments

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

There are various limitations handling tail calls with structs that depend on architecture and ABI. We have been working to relax these over time.

If this was runs on windows x64, I suspect it might have been fixed in 5.0 with #33004. Can you give 5.0 a try?

cc @dotnet/jit-contrib

Regardless of the platform this should also be fixed in .NET 5.0 by #341.

I'll take a look. @RevensofT any chance you can point me at a more complete repro case?

I was able to construct what I think is a faithful repro. This appears to be fixed in 5.0.

C:\bugs\r40581>dotnet run -c Release -f net5.0
245850922/78256779

Logical depth of recursion is the sum of the numerator and denominator above, so around 100 million calls.

Disassembly for the do method on x64 windows is:

; Assembly listing for method Helper:Do(System.ValueTuple`3[Int32,Int32,Double],double,System.Func`3[ValueTuple`3,Double,Boolean],System.Func`3[ValueTuple`3,Double,ValueTuple`3]):System.ValueTuple`3[Int32,Int32,Double]
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 RetBuf       [V00,T01] (  6,  4   )   byref  ->  rdi
;  V01 arg0         [V01,T00] (  5,  8   )   byref  ->  rsi
;  V02 arg1         [V02,T09] (  5,  4   )  double  ->  [rsp+0x90]
;  V03 arg2         [V03,T03] (  4,  3.50)     ref  ->  rbx         class-hnd
;  V04 arg3         [V04,T08] (  2,  1   )     ref  ->  [rsp+0xA0]   class-hnd
;  V05 OutArgs      [V05    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V06 tmp1         [V06    ] (  2,  2   )  struct (16) [rsp+0x48]   do-not-enreg[XSB] addr-exposed "struct address for call/obj"
;* V07 tmp2         [V07    ] (  0,  0   )  double  ->  zero-ref    V13.Item3(offs=0x00) P-INDEP "field V01.Item3 (fldOffset=0x0)"
;* V08 tmp3         [V08    ] (  0,  0   )     int  ->  zero-ref    V13.Item1(offs=0x08) P-INDEP "field V01.Item1 (fldOffset=0x8)"
;* V09 tmp4         [V09    ] (  0,  0   )     int  ->  zero-ref    V13.Item2(offs=0x0c) P-INDEP "field V01.Item2 (fldOffset=0xc)"
;  V10 tmp5         [V10    ] (  2,  2   )  double  ->  [rsp+0x48]   do-not-enreg[X] addr-exposed V06.Item3(offs=0x00) P-DEP "field V06.Item3 (fldOffset=0x0)"
;  V11 tmp6         [V11    ] (  2,  2   )     int  ->  [rsp+0x50]   do-not-enreg[X] addr-exposed V06.Item1(offs=0x08) P-DEP "field V06.Item1 (fldOffset=0x8)"
;  V12 tmp7         [V12    ] (  2,  2   )     int  ->  [rsp+0x54]   do-not-enreg[X] addr-exposed V06.Item2(offs=0x0c) P-DEP "field V06.Item2 (fldOffset=0xc)"
;* V13 tmp8         [V13    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;  V14 tmp9         [V14    ] (  6,  8   )  struct (16) [rsp+0x38]   do-not-enreg[XSB] addr-exposed "by-value struct argument"
;  V15 tmp10        [V15,T04] (  2,  4   )     ref  ->  rax         "argument with side effect"
;  V16 tmp11        [V16,T06] (  2,  2   )     ref  ->  rax         "argument with side effect"
;  V17 tmp12        [V17,T07] (  2,  2   )    long  ->  rdx         "argument with side effect"
;  V18 tmp13        [V18    ] (  2,  2   )  struct (16) [rsp+0x28]   do-not-enreg[XSB] addr-exposed "substitute local for return buffer"
;  V19 ReturnAddress[V19    ] (  1,  0.50)    long  ->  [rsp+0x78]   do-not-enreg[X] addr-exposed "Return address"
;  V20 rat0         [V20,T02] (  3,  6   )     ref  ->  rax         "delegate invoke call"
;  V21 rat1         [V21,T05] (  3,  3   )     ref  ->  rax         "delegate invoke call"
;
; Lcl frame size = 88

G_M54699_IG01:
       57                   push     rdi
       56                   push     rsi
       55                   push     rbp
       53                   push     rbx
       4883EC58             sub      rsp, 88
       C5F877               vzeroupper
       488BF9               mov      rdi, rcx
       488BF2               mov      rsi, rdx
       498BD9               mov      rbx, r9
                                                ;; bbWeight=1    PerfScore 6.00
G_M54699_IG02:
       488BC3               mov      rax, rbx
       C5FA6F06             vmovdqu  xmm0, xmmword ptr [rsi]
       C5FA7F442438         vmovdqu  xmmword ptr [rsp+38H], xmm0
       488B4808             mov      rcx, gword ptr [rax+8]
       488D542438           lea      rdx, bword ptr [rsp+38H]
       C5FB11942490000000   vmovsd   qword ptr [rsp+90H], xmm2
       FF5018               call     qword ptr [rax+24]System.Func`3[ValueTuple`3,Double,Boolean][System.ValueTuple`3[System.Int32,System.Int32,System.Double],System.Double,System.Boolean]:Invoke(System.ValueTuple`3[Int32,Int32,Double],double):bool:this
       85C0                 test     eax, eax
       7419                 je       SHORT G_M54699_IG05
                                                ;; bbWeight=1    PerfScore 10.50
G_M54699_IG03:
       C5FA6F1E             vmovdqu  xmm3, xmmword ptr [rsi]
       C5FA7F1F             vmovdqu  xmmword ptr [rdi], xmm3
       488BC7               mov      rax, rdi
                                                ;; bbWeight=0.50 PerfScore 1.63
G_M54699_IG04:
       4883C458             add      rsp, 88
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret
                                                ;; bbWeight=0.50 PerfScore 1.63
G_M54699_IG05:
       488BAC24A0000000     mov      rbp, gword ptr [rsp+A0H]
       488BC5               mov      rax, rbp
       488D542448           lea      rdx, bword ptr [rsp+48H]
       C5FA6F1E             vmovdqu  xmm3, xmmword ptr [rsi]
       C5FA7F5C2438         vmovdqu  xmmword ptr [rsp+38H], xmm3
       488B4808             mov      rcx, gword ptr [rax+8]
       4C8D442438           lea      r8, bword ptr [rsp+38H]
       C5FB109C2490000000   vmovsd   xmm3, qword ptr [rsp+90H]
       FF5018               call     qword ptr [rax+24]System.Func`3[ValueTuple`3,Double,ValueTuple`3][System.ValueTuple`3[System.Int32,System.Int32,System.Double],System.Double,System.ValueTuple`3[System.Int32,System.Int32,System.Double]]:Invoke(System.ValueTuple`3[Int32,Int32,Double],double):System.ValueTuple`3[Int32,Int32,Double]:this
       C5F9104C2448         vmovupd  xmm1, xmmword ptr [rsp+48H]
       C5F9114C2438         vmovupd  xmmword ptr [rsp+38H], xmm1
       488D4C2438           lea      rcx, bword ptr [rsp+38H]
       C5FB108C2490000000   vmovsd   xmm1, qword ptr [rsp+90H]
       4C8BC3               mov      r8, rbx
       4C8BCD               mov      r9, rbp
       E8D6F0FFFF           call     ILStubClass:IL_STUB_StoreTailCallArgs(System.ValueTuple`3[Int32,Int32,Double],double,System.Object,System.Object)
       488D4C2478           lea      rcx, bword ptr [rsp+78H]
       4C8D442428           lea      r8, bword ptr [rsp+28H]
       48BAD892D3A6FA7F0000 mov      rdx, 0x7FFAA6D392D8
       E895ADFEFF           call     System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
       C5FA6F442428         vmovdqu  xmm0, xmmword ptr [rsp+28H]
       C5FA7F07             vmovdqu  xmmword ptr [rdi], xmm0
       488BC7               mov      rax, rdi
                                                ;; bbWeight=0.50 PerfScore 12.88
G_M54699_IG06:
       4883C458             add      rsp, 88
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret
                                                ;; bbWeight=0.50 PerfScore 1.63

; Total bytes of code 209, prolog size 11, PerfScore 56.45, (MethodHash=759e2a54) for method Helper:Do(System.ValueTuple`3[Int32,Int32,Double],double,System.Func`3[ValueTuple`3,Double,Boolean],System.Func`3[ValueTuple`3,Double,ValueTuple`3]):System.ValueTuple`3[Int32,Int32,Double]
; ============================================================

@jakobbotsch do we expect to see code like this after the call to DispatchTailCalls?

Still chasing down where things fail with 3.1; moving to future as this isn't a 5.0 work item.

@AndyAyersMS The code after DispatchTailCalls is what I added in #39815 - it copies the value from a buffer on stack that holds a return value during a call to DispatchTailCalls to the return buffer of Helper:Do (that can point to the GC heap)

Maybe I'm confused, but if DispatchTailCalls returns control back to Do, then I don't see how Do can be properly tail recursive.

The only reachable return in Do should be the first one; the code after the call to DispatchTailCalls should never be executed. So that last bit of code (including epilog) looks like it should be dead code and not needed.

Maybe I'm confused, but if DispatchTailCalls returns control back to Do, then I don't see how Do can be properly tail recursive.

The only reachable return in Do should be the first one; the code after the call to DispatchTailCalls should never be executed. So that last bit of code (including epilog) looks like it should be dead code and not needed.

It returns as normal if there is a previous tail call dispatcher on the stack that will handle the call (in which case no call will be done now and the written value will be ignored), or if this is the first call (in which case it the written value is the actual return value). So the control flow will be (relatively) normal.

I don't get an overflow on 3.1 either. Note I'm using ILASM to generate the do method, so perhaps the result doesn't exactly match what @RevensofT 's repro does...

Suspect that since dynamic methods aren't eligible for inlining they're also not eligible to make tail calls, even if they're tail calling themselves. Let me try and confirm this...

@AndyAyersMS
This is repo of vs solution of tail call stack over flow error, for source code of ILS.dll view at IL Script.

tail call stack overflow.zip

Thanks @RevensofT -- will report back what I see soon.

@RevensofT I have tried your repro on windows x64 with 3.1 and don't get an overflow.

Were you running into to this on some other OS or ISA?

@AndyAyersMS I run on Windows 10 x64 1909 with .net core 3.1, below is video clip I run this repro and got stack overflow.
Tester.zip 1.77 MB

@AndyAyersMS Do you intend to keep looking for a repro, or can this be closed?

I reran using the zipped repro above. Runs to completion under VS in release with jit optimizations enabled, but I do see a stack overflow in debug

@RevensofT were you running debug versions above? I could not see which config you were launching from your video. If so, this behavior is by design. Debug versions of methods can have larger stack frames.

image

image

@AndyAyersMS thanks for update info, it's run on debug setting, can I enlarge stack frame on debug setting ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

v0l picture v0l  路  3Comments

iCodeWebApps picture iCodeWebApps  路  3Comments

matty-hall picture matty-hall  路  3Comments

omariom picture omariom  路  3Comments

btecu picture btecu  路  3Comments