Runtime: How to debug StackOverflowException

Created on 27 Oct 2017  路  17Comments  路  Source: dotnet/runtime

@Daniel15 commented on Wed Oct 25 2017

I'm getting this error while moving a site from ASP.NET Core 1.1 on Mono to ASP.NET Core 2.0 on .NET Core 2.0:

dbug: Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker[2]
      Executed action method Daniel15.Web.Controllers.ShortUrlController.Index (Daniel15.Web), returned result Microsoft.AspNetCore.Mvc.ContentResult.
Process is terminating due to StackOverflowException.
[1]    12976 abort      LD_LIBRARY_PATH=/tmp/ssltest ASPNETCORE_ENVIRONMENT=Development =

How do I get a full stack trace for the StackOverflowException to determine where it's coming from?

area-ExceptionHandling-coreclr question

Most helpful comment

@cdmihai Presumably at this point it would be hard to print the stack trace (there is no stack with which to work, after all).
But I want to join in and comment that _anything_ would be good here. Having even a small portion of the stack trace should usually be enough to tell us what is recursing and narrow down investigation times considerably.

All 17 comments

@janvorli your stack overflow work was in 2.0 I think.?

Any news about this? Any idea how to get at least some idea about what is going on?

The only thing that could work is to run the app under lldb and when it hits the stack overflow, load the libsosplugin.so and run "clrstack -f".

@janvorli Any suggestions for doing this on Windows?
We are trying with procdump right now.
The problem is that this is happening in production, and the kind of things we can do there are limited.

SO questions suggest either using windbg or reproing it in VS while debugging. This is a bit hard when the issue is hard to repro and happens in processes spawned by the entry process (or when it's not happening on windows). Just printing out the stack trace would be so helpful ...

@cdmihai Presumably at this point it would be hard to print the stack trace (there is no stack with which to work, after all).
But I want to join in and comment that _anything_ would be good here. Having even a small portion of the stack trace should usually be enough to tell us what is recursing and narrow down investigation times considerably.

The only thing that could work is to run the app under lldb and when it hits the stack overflow, load the libsosplugin.so and run "clrstack -f".

@janvorli How do Microsoft dev debug this kind of bug in prod?
Not every bug can reproduce easily in the local environment.

Having even a small portion of the stack trace should usually be enough to tell us what is recursing and narrow down investigation times considerably.

This is exactly how Golang do. (In stacktrace below, I elide some frame manually)

supei@sandbox-dev-hk:~$ cat a.go
package main

func foo()() {
    foo()
}

func main(){
    foo()
}

supei@sandbox-dev-hk:~$ go run a.go
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

runtime stack:
runtime.throw(0x46d1a8, 0xe)
    /home/supei/go/src/runtime/panic.go:608 +0x72
runtime.newstack()
    /home/supei/go/src/runtime/stack.go:1008 +0x729
runtime.morestack()
    /home/supei/go/src/runtime/asm_amd64.s:429 +0x8f

goroutine 1 [running]:
main.foo()
    /home/supei/a.go:3 +0x2e fp=0xc020086378 sp=0xc020086370 pc=0x44e9fe
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc020086388 sp=0xc020086378 pc=0x44e9f0
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc020086398 sp=0xc020086388 pc=0x44e9f0
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc0200863a8 sp=0xc020086398 pc=0x44e9f0
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc020086998 sp=0xc020086988 pc=0x44e9f0
main.foo()
    /home/supei/a.go:4 +0x20 fp=0xc0200869a8 sp=0xc020086998 pc=0x44e9f0
...additional frames elided...
exit status 2

In other words, like the CoreCLR allocates an OutOfMemoryException instance upfront, we can allocate some space (1KB should be more than enough) and do that there?

Golang has dynamic (goroutine) stack which is in heap. Golang runtime grows/shrinks stack size as needed.
In the StackOverflow scenario, the runtime will preempt the goroutine just before it requires an abnormal stack growth.

I'm not familiar with dotnet. I guess managed code run on native thread stack.
Maybe thread stack guard page mechanism is sth could help.

I guess managed code run on native thread stack.

That's right. We already run sigsegv handler on an alternate stack to be able to at least print the message and not just silently die. This alternate stack is kept as small as possible since we need to allocate it for each thread. That size would likely not be enough to run the code that's necessary to dump the stack trace. But since we've recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump the stack trace.
I've created dotnet/runtime#825 assigned to myself to track it.

I currently have a problem where I cannot even get Stack Trace with Visual Studio debugger... So anything which could help us to get a clue would be welcome... :-)

[Edit: We solved this problem in the mean time via "print-debugging" - we used log entries to nail down the exact place where the code crashes, so it's not urgent any more...]

+1 :|

Does using windbg and SOS still work with core?

As described here: https://stackoverflow.com/a/49882734/684096

That's right. We already run sigsegv handler on an alternate stack to be able to at least print the message and not just silently die. This alternate stack is kept as small as possible since we need to allocate it for each thread. That size would likely not be enough to run the code that's necessary to dump the stack trace. But since we've recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump the stack trace.

Where is the stacktrace dumped to, standard err/output? I am debugging in an orchestrated containerized environment, when app crashes because of StackOverFlowException the containers goes away and all is left is stderr and stdout,
2019-02-28T14:33:34.98-0500 [APP/PROC/WEB/0] ERR Process is terminating due to StackOverflowException.
What's the best way to debug SOFE in this kind of environment.

Wait ... you're already outputting Process is terminating due to a StackOverflowException ... Too bad we can't walk down the frames and output them. This can be done in a constant amount of RAM.

Got this from the console ...

Api> Route matched with {action = "Get", controller = "App"}. Executing controller action with signature Microsoft.AspNetCore.Mvc.IActionResult Get(Microsoft.AspNet.OData.Query.ODataQueryOptions`1[Core.Objects.Entities.CMS.App]) on controller Api.Controllers.AppController (Api).
Api>
Api> Process is terminating due to StackOverflowException.

Put a breakpoint in the action ... it's not getting that far ... so how do I debug stack overflows in DI ?

Was this page helpful?
0 / 5 - 0 ratings