Rust: debuginfo: Add support for split-debuginfo on platforms that allow it

Created on 4 Jul 2016  ·  24Comments  ·  Source: rust-lang/rust

DWARF v5 will standardize something sometimes called "Split DWARF" or "debuginfo fission". The gist of it is this: Debuginfo can be very large (gigabytes) and can contain lots of relocations, so the linker will spend a lot of time copying and relocating it into the final executable. "Split DWARF" is an approach that allows to basically skip this linker step: Since debuginfo is already located in the individual object files generated during compilation, why not just "link" to the debuginfo in there and let the debugger do any relocations on demand. This can potentially mean a drastic reduction of compile times for builds with debuginfo.

LLVM already supports this on Linux, as far as I know, and it might be a good fit for incremental compilation where the linker could easily become the most time-consuming step (although that might be nixed by using gold's incremental linking feature).

It might also be interesting for providing pre-built debuginfo separately via *.dwp files.

A-debuginfo A-incr-comp C-feature-request

Most helpful comment

I believe the current state of this can roughly be summarized as:

  • This isn't implemented by default anywhere

    • On OSX you can pass -Zrun-dsymutil=no to simulate what compile times would be like

    • On Linux we have not implemented the requisite support to use fission with DWARF

    • I don't think anyone's looked into Windows MSVC or MinGW

  • Switching on by default at least needs to preserve debugger backtraces and RUST_BACKTRACE=1 backtraces by default

    • For the latter we've switched to the backtrace crate where development can more easily happen. Notably the gimli-symbolize feature is pretty mature now, and development can likely happen on that feature to see what it would take to get RUST_BACKTRACE to support this

Getting all that done I believe is a bare minimum for even considering turning this on by default, but just adding an option could likely be stabilized much sooner. We could likely add an option with a Linux implementation and stabilize the OSX functionality under that option (and probably do some cursory Windows investigation too)

All 24 comments

Technically it is already possible to have “split” DWARF debug info, even without v5, which you can load into gdb.

The only problem is generating it separately in the first place, because currently the only way to get split debug info to the best of my knowledge, is to use objcopy --only-keep-debug and then strip the original library/executable.

The only problem is generating it separately in the first place, because currently the only way to get split debug info to the best of my knowledge, is to use objcopy --only-keep-debug and then strip the original library/executable.

How does gdb know where to find the debuginfo in that case? With the DWARF 5 approach, the 'skeleton' debuginfo entries store the path to the object files.

You use command like add-symbol-file. Sure not the most seamless experience, but, like I said, it at least works.

How does gdb know where to find the debuginfo in that case?

There are two ways, but the better of the two is the build-id feature. See the docs. This is what all the distros use.

However, this form of splitting is very different from Fission. Fission trades off link time speed for some debugger performance (I believe, I haven't used it in anger). Build-id is more about being able to split off the debuginfo for separate packaging -- and, contra Fission, it slows down the development cycle, as the splitting is a separate step.

It's not always appropriate to use Fission so this would have to be a compile-time flag of some kind.

@tromey Thanks for the background info! Would you agree that Fission with .dwp packages would be the cleaner way of distributing debuginfo separately? I know, this feature just wasn't available before, so distros had to do it differently. But now that it is there (in some places at least :)), it seems like the superior approach.

It's not always appropriate to use Fission so this would have to be a compile-time flag of some kind.

Yes, that's how I envisioned it.

Would you agree that Fission with .dwp packages would be the cleaner way of distributing debuginfo separately?

They are maybe mildly cleaner (but maybe not); but the approach currently taken by the distros has the advantage that all the tools already work with it; whereas nobody did the work to make everything use Fission. Also the distros, IME, were more space-sensitive and less link-time-sensitive than developers; leading to dwz instead, which is somewhat tied into the build-id and packaging schemes.

@tromey That is good to know. Thanks!

Adding a cc from https://github.com/rust-lang/rust/issues/47240 to this as well. OSX supports this by default but we're actually going out of our way to undo the compile time win by running dsymutil by default. It looks like we don't need to though and instead need to ensure that we leave the object files without deleting them. That's probably also relevant for this where we'll need to avoid deleting some intermediate artifacts!

LLVM already supports this on Linux, as far as I know,

To be clear: there are two flavours of split-DWARF: the pre-standard GNU flavour and the DWARF5 standardized flavour, and they are different. (Par for the course due to DWARF's broken development model :-(.) LLVM/gcc/gdb support the former but not the latter and probably there is not yet any tool support for the DWARF5 standard flavour. See http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-February/004325.html. So your choice is between using the GNU flavour or waiting some unknown length of time for the standard flavour to be usable.

Using split-DWARF would create inconvenience for some users. E.g. we move Rust binaries around between machines/containers and sometimes need to debug binaries on machines other than where they were built. Split-DWARF would mean we have to move around the object files as well for debugging to work, which would be less efficient than just not splitting. So we'd want split-DWARF to be an option we could turn off (and linking performance without split-DWARF would continue to matter for us).

Split-DWARF would mean we have to move around the object files as well for debugging to work

Really? Is there nothing like Microsoft's PDB for Linux?

You mean a way to link the debuginfo from all your object files into a single large debuginfo file for the binary?

I don't know if such a tool exists, but even if it did you'd lose most of the performance benefits described at the top of this issue, plus it would still be less convenient than having the debuginfo packed into the binary itself.

You mean a way to link the debuginfo from all your object files into a single large debuginfo file for the binary?

Yes

I don't know if such a tool exists, but even if it did you'd lose most of the performance benefits described at the top of this issue, plus it would still be less convenient than having the debuginfo packed into the binary itself.

Linking all the object files into a single large debuginfo only would have to be done when cutting releases, not during day-to-day development. Basically one would publish the big debuginfo file as a release artifact and probably build some automation for retrieving it to assist with debugging release versions of the software.

Linking all the object files into a single large debuginfo only would have to be done when cutting releases

For you perhaps, but in https://github.com/rust-lang/rust/issues/34651#issuecomment-374420388 I explained that we frequently need to move binaries around and debug them.

For you perhaps, but in #34651 (comment) I explained that we frequently need to move binaries around and debug them.

Right, I think different users have different goals. My use case is handled well by split dwarf because I care about build speed first for day-to-day builds and I want a PDB-like thing for making small final releases that can still be debugged well, even if those final releases take a long time to build.

I want a PDB-like thing for making small final releases that can still be debugged well, even if those final releases take a long time to build.

Your final release situation is already served well by making a binary with full debuginfo and then moving the debuginfo out into an external debuginfo file. So split-DWARF is only needed to improve your build times.

Yes, this would be a performance optimization first and foremost. And it would probably not be the default setting. That being said, there are no concrete plans for implementing this yet.

I don't know about the DWARF5 flavor, but with the GNU flavor there's a dwp tool that functions like dsymutil to generate a file.dwp containing the linked debug info from a -gsplit-dwarf build:
https://gcc.gnu.org/wiki/DebugFissionDWP

Note that @alexcrichton landed a patch to rustc to allow it to stop running dsymutil after every build (because it's slow):
https://github.com/rust-lang/rust/pull/47784

and he's planning on making this the default behavior in cargo to get faster rebuilds on Mac (https://github.com/rust-lang/rust/issues/47240). If that change lands then Mac builds will be functionally equivalent to split-DWARF builds on Linux.

@luser ah unfortunately the plan to turn it on by default was shot down when it was realized that line numbers disappered from RUST_BACKTRACE=1

Bummer! Presumably we'd hit similar issues with split-DWARF on Linux. We need to lock fitzgen in a room and make him finish unwind-rs. 😉

@alexcrichton This might be a long shot, but has anything changed lately that might unblock this change?

I believe the current state of this can roughly be summarized as:

  • This isn't implemented by default anywhere

    • On OSX you can pass -Zrun-dsymutil=no to simulate what compile times would be like

    • On Linux we have not implemented the requisite support to use fission with DWARF

    • I don't think anyone's looked into Windows MSVC or MinGW

  • Switching on by default at least needs to preserve debugger backtraces and RUST_BACKTRACE=1 backtraces by default

    • For the latter we've switched to the backtrace crate where development can more easily happen. Notably the gimli-symbolize feature is pretty mature now, and development can likely happen on that feature to see what it would take to get RUST_BACKTRACE to support this

Getting all that done I believe is a bare minimum for even considering turning this on by default, but just adding an option could likely be stabilized much sooner. We could likely add an option with a Linux implementation and stabilize the OSX functionality under that option (and probably do some cursory Windows investigation too)

Does #73441 help toward this goal? though I now see https://github.com/rust-lang/backtrace-rs/issues/287 is currently an open issue.

I don't know about the DWARF5 flavor, but with the GNU flavor there's a dwp tool that functions like dsymutil to generate a file.dwp containing the linked debug info from a -gsplit-dwarf build:
https://gcc.gnu.org/wiki/DebugFissionDWP

It still doesn't support DWARF5 but gcc's split dwarf works just fine, also in gdb.
For reference, this is similar to running mspdbcmf on a /DEBUG:FASTLINK pdb on Windows.

Was this page helpful?
0 / 5 - 0 ratings