Runtime: How to debug StackOverflowException

Created on 27 Oct 2017 · 17Comments · Source: dotnet/runtime

@Daniel15 commented on Wed Oct 25 2017

I'm getting this error while moving a site from ASP.NET Core 1.1 on Mono to ASP.NET Core 2.0 on .NET Core 2.0:

dbug: Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker[2]
      Executed action method Daniel15.Web.Controllers.ShortUrlController.Index (Daniel15.Web), returned result Microsoft.AspNetCore.Mvc.ContentResult.
Process is terminating due to StackOverflowException.
[1]    12976 abort      LD_LIBRARY_PATH=/tmp/ssltest ASPNETCORE_ENVIRONMENT=Development =

How do I get a full stack trace for the StackOverflowException to determine where it's coming from?

area-ExceptionHandling-coreclr question

Source

Petermarcu

👍12

Most helpful comment

@cdmihai Presumably at this point it would be hard to print the stack trace (there is no stack with which to work, after all).
But I want to join in and comment that _anything_ would be good here. Having even a small portion of the stack trace should usually be enough to tell us what is recursing and narrow down investigation times considerably.

ayende on 1 Jun 2018

👍7

All 17 comments

@janvorli your stack overflow work was in 2.0 I think.?

danmosemsft on 2 Nov 2017

Any news about this? Any idea how to get at least some idea about what is going on?

ayende on 21 Jan 2018

The only thing that could work is to run the app under lldb and when it hits the stack overflow, load the libsosplugin.so and run "clrstack -f".

janvorli on 22 Jan 2018

👍1

@janvorli Any suggestions for doing this on Windows?
We are trying with procdump right now.
The problem is that this is happening in production, and the kind of things we can do there are limited.

ayende on 22 Jan 2018

SO questions suggest either using windbg or reproing it in VS while debugging. This is a bit hard when the issue is hard to repro and happens in processes spawned by the entry process (or when it's not happening on windows). Just printing out the stack trace would be so helpful ...

cdmihai on 1 Jun 2018

ayende on 1 Jun 2018

👍7

The only thing that could work is to run the app under lldb and when it hits the stack overflow, load the libsosplugin.so and run "clrstack -f".

@janvorli How do Microsoft dev debug this kind of bug in prod?
Not every bug can reproduce easily in the local environment.

patricksuo on 18 Oct 2018

Having even a small portion of the stack trace should usually be enough to tell us what is recursing and narrow down investigation times considerably.

This is exactly how Golang do. (In stacktrace below, I elide some frame manually)

supei@sandbox-dev-hk:~$ cat a.go
package main

func foo()() {
    foo()
}

func main(){
    foo()
}

supei@sandbox-dev-hk:~$ go run a.go
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

runtime stack:
runtime.throw(0x46d1a8, 0xe)
    /home/supei/go/src/runtime/panic.go:608 +0x72
runtime.newstack()
    /home/supei/go/src/runtime/stack.go:1008 +0x729
runtime.morestack()
    /home/supei/go/src/runtime/asm_amd64.s:429 +0x8f

goroutine 1 [running]:
main.foo()
    /home/supei/a.go:3 +0x2e fp=0xc020086378 sp=0xc020086370 pc=0x44e9fe
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc020086388 sp=0xc020086378 pc=0x44e9f0
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc020086398 sp=0xc020086388 pc=0x44e9f0
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc0200863a8 sp=0xc020086398 pc=0x44e9f0
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc020086998 sp=0xc020086988 pc=0x44e9f0
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc0200869a8 sp=0xc020086998 pc=0x44e9f0
...additional frames elided...
exit status 2

patricksuo on 18 Oct 2018

In other words, like the CoreCLR allocates an OutOfMemoryException instance upfront, we can allocate some space (1KB should be more than enough) and do that there?

ayende on 18 Oct 2018

👍2

Golang has dynamic (goroutine) stack which is in heap. Golang runtime grows/shrinks stack size as needed.
In the StackOverflow scenario, the runtime will preempt the goroutine just before it requires an abnormal stack growth.

I'm not familiar with dotnet. I guess managed code run on native thread stack.
Maybe thread stack guard page mechanism is sth could help.

patricksuo on 18 Oct 2018

I guess managed code run on native thread stack.

That's right. We already run sigsegv handler on an alternate stack to be able to at least print the message and not just silently die. This alternate stack is kept as small as possible since we need to allocate it for each thread. That size would likely not be enough to run the code that's necessary to dump the stack trace. But since we've recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump the stack trace.
I've created dotnet/runtime#825 assigned to myself to track it.

janvorli on 18 Oct 2018

👍4

I currently have a problem where I cannot even get Stack Trace with Visual Studio debugger... So anything which could help us to get a clue would be welcome... :-)

[Edit: We solved this problem in the mean time via "print-debugging" - we used log entries to nail down the exact place where the code crashes, so it's not urgent any more...]

markusschaber on 15 Nov 2018

+1 :|

facundofarias on 5 Dec 2018

Does using windbg and SOS still work with core?

As described here: https://stackoverflow.com/a/49882734/684096

BrunoJuchli on 11 Jan 2019

That's right. We already run sigsegv handler on an alternate stack to be able to at least print the message and not just silently die. This alternate stack is kept as small as possible since we need to allocate it for each thread. That size would likely not be enough to run the code that's necessary to dump the stack trace. But since we've recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump the stack trace.

Where is the stacktrace dumped to, standard err/output? I am debugging in an orchestrated containerized environment, when app crashes because of StackOverFlowException the containers goes away and all is left is stderr and stdout,
2019-02-28T14:33:34.98-0500 [APP/PROC/WEB/0] ERR Process is terminating due to StackOverflowException.
What's the best way to debug SOFE in this kind of environment.

fwanggg on 28 Feb 2019

👍1

Wait ... you're already outputting Process is terminating due to a StackOverflowException ... Too bad we can't walk down the frames and output them. This can be done in a constant amount of RAM.

jhudsoncedaron on 15 Apr 2019

Got this from the console ...

Api> Route matched with {action = "Get", controller = "App"}. Executing controller action with signature Microsoft.AspNetCore.Mvc.IActionResult Get(Microsoft.AspNet.OData.Query.ODataQueryOptions`1[Core.Objects.Entities.CMS.App]) on controller Api.Controllers.AppController (Api).
Api>
Api> Process is terminating due to StackOverflowException.

Put a breakpoint in the action ... it's not getting that far ... so how do I debug stack overflows in DI ?