Sdk: Expose debug-id via API, or include in stacktrace

Created on 1 Sep 2020  路  8Comments  路  Source: dart-lang/sdk

Today the story to demangle stacktraces when DWARF information is split from the library is to pass the string stack trace, such as:

$ cat stacktrace.txt

Warning: This VM has been configured to produce stack traces that violate the Dart standard.
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 29278, tid: 29340, name 1.ui
isolate_dso_base: 6fe9d64000, vm_dso_base: 6fe9d64000
isolate_instructions: 6fe9d74000, vm_instructions: 6fe9d66000
    #00 abs 0000006fe9f4e87b virt 00000000001ea87b _kDartIsolateSnapshotInstructions+0x1da87b
    #01 abs 0000006fe9f4e4a3 virt 00000000001ea4a3 _kDartIsolateSnapshotInstructions+0x1da4a3
    #02 abs 0000006fe9d83ca3 virt 000000000001fca3 _kDartIsolateSnapshotInstructions+0xfca3
    #03 abs 0000006fe9f06513 virt 00000000001a2513 _kDartIsolateSnapshotInstructions+0x192513
    #04 abs 0000006fe9f0b457 virt 00000000001a7457 _kDartIsolateSnapshotInstructions+0x197457
    #05 abs 0000006fe9f5150f virt 00000000001ed50f _kDartIsolateSnapshotInstructions+0x1dd50f
    #06 abs 0000006fe9d83d07 virt 000000000001fd07 _kDartIsolateSnapshotInstructions+0xfd07
    #07 abs 0000006fe9f06513 virt 00000000001a2513 _kDartIsolateSnapshotInstructions+0x192513
    #08 abs 0000006fe9f0b457 virt 00000000001a7457 _kDartIsolateSnapshotInstructions+0x197457
    #09 abs 0000006fe9f5150f virt 00000000001ed50f _kDartIsolateSnapshotInstructions+0x1dd50f
    #10 abs 0000006fe9d82fc7 virt 000000000001efc7 _kDartIsolateSnapshotInstructions+0xefc7
    #11 abs 0000006fe9d82eab virt 000000000001eeab _kDartIsolateSnapshotInstructions+0xeeab

To the native_stack_trace tool, for example:

 decode translate -d app.android-arm64.symbols -v -i stacktrace.txt

Even though this is very straight forward, it has some limitations:

  1. Assumes the developer knows exactly what file contains the correct DWARF information.
  2. Tool is built to parse the whole stack trace (including header with isolate/vm addr). Assumes you log the whole string.

To address the first point, I'd like to ask (or propose) that the stacktrace include the relevant debug_id.
A debug id was added to generated ELF files and their debug files (when split) through this change, it would be helpful if that debug-id was included.

native_stack_trace could even check against that debug_id to make sure it in fact received the correct file as a parameter and warn the user otherwise.

If that's no an option (i.e: don't want to change the exception string format), please provide an API that we can query in Dart to get the debug-id. That would be used by crash reporting tools ( like https://sentry.io ), to report which debug_id to use when symbolicating the stack trace on the server.

Point number 2 could be addressed by having a more fine grained API on native_stack_trace package to return a set of objects describing the stack trace (with frames and addresses). Or ideally a way to _convert_ the dynamic stacktrace at runtime to such representation, in order to avoid having to parse the string on the client to report the frames to the server.

This is a blocker for this Flutter issue.

area-vm

Most helpful comment

By the way, I'm happy to contribute to this if you like, but may need some guidance for the places to look at in the Dart project.

I thought it may be nice to share an example of how the information could be reported and how this looks in Sentry. Let's start with the library list read with the code I linked above. It includes the name of the library, the absolute memory range, and the build id called "Code ID":

image

It's important to note that this list is much longer and includes all loaded libraries. It's definitely worth exposing all of them in case the stack trace includes system frames, or calls to third-party native modules.

The reported and symbolicated stack trace then looks like this. We report the absolute addresses only:

image

The corresponding relative addresses actually used for the lookup in debug information would then be these, for instance:

image

All 8 comments

This looks like a reasonable request to me. I see that @sstrickl has already commented on the original Flutter issue - the main complication seems to be accessing build-id (LC_UUID) in iOS builds.

Tentatively assigning to @sstrickl

Just a note, I have started on this, but it's requiring a bit of rework on how we do an end-run around the embedder API for extra information that only exists for precompiled snapshots, and I still need to check into how to handle the build ID generation for assembly snapshots. Will update here if I run into any blockers (none expected at the moment though).

CL 163207 is now under review. That CL adds build IDs to non-symbolic stack traces for ELF-compiled snapshots.

Having build IDs be consistent between assembly snapshots and their separate ELF debugging information (by generating them ourselves for assembly output) remains to be done.

The CL mentioned above has just landed (and I believe I've tested with enough trybots that don't expect a revert, but of course we'll see if anything surprising shows up).

CL 163585 is a followup that does the same for assembly, including generating our own build ID section in assembly for ELF-native platforms. I plan to put it up for review now. It does leave the question open for what to do about snapshots that are split into multiple loading units (right now no build IDs are generated for such), but I've created #43516 for thinking about that as I don't think that's yet a primary workflow.

Hi. We'd like to know if there's something we can help with. This is a blocker for https://github.com/flutter/flutter/issues/59321 and we'd like to try our symbolication on the server as well, let us know if we could be of any help, thanks.

So just to clarify the current state, ELF snapshots should already include a build ID that's also reported in non-symbolic stack traces. Assembly ones do not, for two major reasons:

  • When assembling to ELF, clang and gcc both include their own GNU build ID unless specifically told not to.
  • When assembling to Mach-O, I don't know if there's a different standard for build IDs there.

The CL I mentioned above is still valid (though needs updating due to changes since), so I'll return to it now and see about getting it in. I think I'll actually split it into two parts:

  • An initial CL that adds the information to assembly snapshots to print a build ID in non-symbolic stack traces which matches a build ID section in the separately generated debugging information (which is always in ELF format currently and thus we can add the GNU build ID note section to it).
  • A followup CL that, for ELF-native targets (e.g., Linux), adds directives in the assembly to generate a GNU build ID note section with the same information.

With only the first, then you can correlate non-symbolic stack traces to the separate debugging information for both ELF and assembly, but you can only correlate them with unstripped snapshots for direct-to-ELF snapshots. That might be enough for all needs and the second CL may not be necessary.

If we do the second as well, then there'll still be some work that I'll need to coordinate with Flutter and internal customers, to change their assembly process for snapshots in their toolchains to elide the assembler-added build ID. Until that happens, there'll be two build ID sections in each assembled ELF snapshot, only one of which will match the reported build ID in non-symbol stack traces, and I'm not sure how well external tools will handle multiple build ID sections.

When assembling to Mach-O, I don't know if there's a different standard for build IDs there.

Compilers should place it in the LC_UUID load command. As the name suggests, it has to be a UUID, which differs from ELF where either MD5 or SHA1 of the code (.text section) are used. I've seen compilers produce reproducible UUIDs, which suggest they are digesting the code as well.

When dsymutil generates the dSYM structure, it also ensures that the LC_UUID header is copied into the dSYM file, along with all moved debug sections.

but you can only correlate them with unstripped snapshots for direct-to-ELF snapshots

Can you elaborate on this? As far as I'm aware, most tools in this space are aware of the build ID program header (NT_GNU_BUILD_ID) and section (.note.gnu.build-id). Both strip and objcopy ensure that the sections are never stripped but always copied over.

I still have to look into how your assembly process works, so please forgive me if I'm making wrong assumptions. Ideally, your assembly process does not care about build ids and lets standard tooling like the linker take care of that. If you have custom tooling for this, then the only thing to do is ensure the sections do not get removed or get copied (depending on whether you strip or split).

Since the libraries are loaded into readable process memory at runtime, you need no further modifications to the binary. Usually, debuggers and crash reporting tools obtain a list of all loaded libraries that includes:

  • the path and name of the library
  • the memory address at which the library is loaded
  • the platform-dependent identifier

You can find examples for this in the Sentry Native SDK here: getsentry/sentry-native/src/modulefinder. We have actually started using this code for Flutter in the meanwhile, as it allows us to easily symbolicate frames even from third-party and system libraries using the standard approach for native symbolication:

  • Take the absolute address reported by the Dart VM
  • Subtract the base address from the library the address points into
  • Look up the relative address in the DWARF information resolved using the build ID

By the way, I'm happy to contribute to this if you like, but may need some guidance for the places to look at in the Dart project.

I thought it may be nice to share an example of how the information could be reported and how this looks in Sentry. Let's start with the library list read with the code I linked above. It includes the name of the library, the absolute memory range, and the build id called "Code ID":

image

It's important to note that this list is much longer and includes all loaded libraries. It's definitely worth exposing all of them in case the stack trace includes system frames, or calls to third-party native modules.

The reported and symbolicated stack trace then looks like this. We report the absolute addresses only:

image

The corresponding relative addresses actually used for the lookup in debug information would then be these, for instance:

image

Was this page helpful?
0 / 5 - 0 ratings