Mono: [merp] crashing thread sometimes missing native frames in MERP report

Created on 23 Oct 2019  路  3Comments  路  Source: mono/mono

We are seeing some crash reports that are missing the "native_frames" array for the crashed thread.

Steps to Reproduce

We are able to reproduce it using this debug command in MonoDevelop: https://github.com/mono/monodevelop/pull/9049

Current Behavior

We are seeing managed and native frames for all threads except the one labeled "crashed = true" which is missing native frames.

Expected Behavior

Expecting to see the backtrace for all threads, including (most importantly?) the crashed thread.

On which platforms did you notice this

[x] macOS
[ ] Linux
[ ] Windows

Version Used:

Mono JIT compiler version 6.6.0.126 (2019-08/8969f2cc99b Mon Oct 14 18:19:47 EDT 2019)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
    TLS:
    SIGSEGV:       altstack
    Notification:  kqueue
    Architecture:  amd64
    Disabled:      none
    Misc:          softdebug
    Interpreter:   yes
    LLVM:          yes(610)
    Suspend:       hybrid
    GC:            sgen (concurrent by default)

Most helpful comment

Recording the underlying issue:

When mono gets a SIGSEGV or SIGBUS, it runs on an altstack. Setup with sigaltstack and SA_ONSTACK flag for sigaction. (We do this to detect null pointer dereferences and turn them into managed NullPointerExceptions and stack overflows into StackOverflowException). The signal handler is supposed to examine the context of the signal and if it was in a managed method, raise an NPE or SOE. To do that it uses altstack_handle_and_restore.

The issue was that the crash reporter code expects to run backtrace, but since we were running the crash reporting code from the main SIGSEGV handler without restoring from the altstack to the original stack, backtrace (which doesn't know how to jump back from the altstack to the main stack) would not find anything that it could unwind and we would get an empty native stack trace.

The solution was to run the crash reporting code from altstack_handle_and_restore which runs back in the main stack.

All 3 comments

@lambdageek I filed this for book keeping. The fix is in https://github.com/mono/mono/pull/17466 and we should can close this when the back port to 2019-08 completes. Thanks!

Seems like the fix works:
Crashing thread before: https://gist.github.com/kdubau/50a2b627799d31f4e42bc747315b8fcf#file-gistfile1-txt-L503-L706 (missing "unmanaged_frames" array).
And after: https://gist.github.com/kdubau/387c9a6fe9ebf8b36dc0602b41971645#file-gistfile1-txt-L640-L1083 (has the frames, L843)

Recording the underlying issue:

When mono gets a SIGSEGV or SIGBUS, it runs on an altstack. Setup with sigaltstack and SA_ONSTACK flag for sigaction. (We do this to detect null pointer dereferences and turn them into managed NullPointerExceptions and stack overflows into StackOverflowException). The signal handler is supposed to examine the context of the signal and if it was in a managed method, raise an NPE or SOE. To do that it uses altstack_handle_and_restore.

The issue was that the crash reporter code expects to run backtrace, but since we were running the crash reporting code from the main SIGSEGV handler without restoring from the altstack to the original stack, backtrace (which doesn't know how to jump back from the altstack to the main stack) would not find anything that it could unwind and we would get an empty native stack trace.

The solution was to run the crash reporting code from altstack_handle_and_restore which runs back in the main stack.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

equalent picture equalent  路  4Comments

callmekohei picture callmekohei  路  3Comments

zulhfreelancer picture zulhfreelancer  路  4Comments

JoseTiagoCarvalho picture JoseTiagoCarvalho  路  3Comments

zwcloud picture zwcloud  路  3Comments