In the case of Vec<T> and String, it would be possible to have e.g.:
impl<'a, T> DebuggerView<'a> for Vec<T> {
type View = &'a [T];
fn debugger_view(&'a self) -> Self::View { &self[..] }
}
impl<'a> DebuggerView<'a> for String {
type View = &'a str;
fn debugger_view(&'a self) -> Self::View { &self[..] }
}
We would then need:
&[T] and &str<X as DebuggerView>::debugger_view in X's debuginfo<X as DebuggerView>::debugger_view for pretty-printingI suspect there are better APIs that could accommodate more complex data structures, but I'm not sure what the limitation of the pretty-printer scripts are, around being able to call arbitrary functions.
There's also the possibility that we could encode this sort of information as data, instead of compiling it into functions, but that's probably a lot more work.
cc @michaelwoerister
How does this work when cross-compiling? Is it possible to easily strip this code from the executable to reduce its size while still being able to debug the binary (important for embedded)?
@jonas-schievink You mean with remote debugging? As long as the gdb server (or however you're controlling the target) lets you call functions, this should work.
Also, for embedded, would you have dynamic allocation? Pretty printers are mostly needed in that case. And even if you do, each of these impls would be a couple instructions (at least the above ones).
You mean with remote debugging? As long as the
gdbserver (or however you're controlling the target) lets you call functions, this should work.
Ah, I see. Though this sounds like it would disrupt the target state more than the current solution. Is anything similar to this implemented by other languages?
Also, for embedded, would you have dynamic allocation?
You can, yes. I suppose in that case (if you can afford a heap) the overhead of a few instructions isn't terrible but it's still a regression nonetheless.
How would this be controlled? With -Cdebuginfo? Does that currently have any impact on generated code or is everything it adds removable with strip?
:+1: on the approach, this is essentially what KotlinNative is doing: calling into runtime to "reflect" values
Production-grade DebugerView should be richer and express children.
The MVP here can be much simpler though:
fn debug<T: std::fmt::Debug>(value: &T, buf: *const u8, buf_len: usize)debug for all T'sdebug::<T>Hm, maybe even crazier:
fn debug(value: *const T, buf: *const u8, buf_len: usize)value (this might need some additional info like a frame pointer passed it)value, debug calls into fmt::Debug using the appropriate mangled nameThat way, debugger can blindly call debug on any pointer, without containing logic for figuring out symbol name from a type.
Note that this actually can be prototyped as a crate! The only thing that the crate can't do is forcing eager instantiation of fmt::Debug, but, for a POC, it would be good enough to per-instantiate some stdlib types and provide generate_debug_printers!(HashMap<String, MyType>) macro.
The problem with using fmt::Debug is that I don't know the extent of the power of Python scripts.
make that function itself look at the debug info associated with value (this might need some additional info like a frame pointer passed it)
That seems... extremely complicated, compared to the debugger which already has it.
Also, if DWARF type IDs are exposed in the Python scripts, then it should be trivial to get a per-type symbol name that way.
While I like the idea, I see two problems:
Debug instances hide sensitive information (eg encryption keys) to prevent accidentially leaking it. However while debugging you may want to be able to inspect it though.Oh, I would prefer if this wasn't based on fmt::Debug but rather provided access, through collections, to elements, for debug purposes. So more like &_: IntoIterator for debugging.
That is, I want the debugger to do inside, what it was doing outside, e.g. Vec.
Just a random idea: Is is possible to store &mut [MaybeUninit<T>] inside Vec<T> instead of ptr + cap or &mut [T] instead of ptr + len? That would make special casing Vec<T> inside the debugger pretty printer unnecessary.
That wouldn't work for, say, HashMap. And there's no nice way to fit both capacity and length into one slice.
I'm not sure if there is a reliable way for calling trait methods from inside debuggers yet.
@michaelwoerister We would be exposing it as a regular function with a predictable symbol name, so that we can find it from the DWARF type information (via type ID perhaps?).
I think we need to add the type ID as custom DWARF attribute for a type (maybe call it DW_AT_rust_type_id) to be able to access the type id from inside a debugger.
I'm not talking about our TypeId, but the code in the compiler led me to believe that "Type ID" is a DWARF concept - however, if we can invent custom DWARF attributes, we can do better than an ID, we can have an entire mangled symbol presumably.
I dont think that DWARF has a type id concept. I believe it only cares about offsets within a debug section.
however, if we can invent custom DWARF attributes, we can do better than an ID, we can have an entire mangled symbol presumably.
Yes, that would be nicer than storing a type id.
Okay so I dug into that stuff and I found how it works:
rustc_codegen_llvm tracks something it calls unique_type_id / UniqueTypeId:DW_FORM_ref_sig8:That is only for DWARF version 4 and higher. Only those versions support type units, which contain that value as the type_signature header field.
I didn't see this mentioned yet -- apologies if it has: doesn't this only work if you've got an active process? When debugging a core file, you're not going to be able to call anything. It'd be unfortunate to lose that ability.
(granted, I have never actually debugged a rust process, so I don't know the current state of the type inspector script support.)
EDIT: I mean, granted, debugging a coredump is probably 10-100x more likely in C/C++ land, but still... 馃槀
@uberjay I don't this it was mentioned, that's a very good point!
That would suggest it might be better to encode structural information in some data format (a bytecode? DWARF already has some features like that), as opposed to using machine code.
This would also alleviate @jonas-schievink's concern with regards to remote debugging, I think, even if it may be harder to implement overall.
Something I wasn't aware of, that looks very useful: Windows natvis!
Spotted new natvis definitions being added in #66597 (cc @MaulingMonkey)
This is, for example, Vec's natvis definition:
https://github.com/rust-lang/rust/blob/b5f265eeed23ac87ec6b4a7e6bc7cb4ea3e67c31/src/etc/natvis/liballoc.natvis#L3-L13
This looks decently declarative to me, and the more advanced features are pretty interesting. I definitely prefer it over Python code, even if it's wrapped in XML.
Here's a more advanced example (HashMap from #66597):
https://github.com/rust-lang/rust/blob/b5f265eeed23ac87ec6b4a7e6bc7cb4ea3e67c31/src/etc/natvis/libstd.natvis#L28-L49
I could see this being integrated as either an unstable attribute or an associated const - but we could go further and generate the C expressions necessary (and/or some sort of DWARF ops) from a subset of MIR, for example, so that it has to fully go through type-checking first.
In terms of support, VS Code even supports (a subset of) natvis for gdb/lldb, but there are also other projects, like gdb-natvis and natvis4gdb.
And if we come up with our own solution that can be lowered to natvis, then we can have our own pretty-printers and also interface with all of the above as well.
natvis is indeed pretty useful! They're used by VS/VCS/WinDbg/CDB as the main means of visualizing the C++ stdlib containers, smart pointers, etc. - and they can be embedded into pdbs (as rustc does via undocumented link.exe flags) where they're automatically picked up by debuggers. They work fine on minidumps, VS can hot-reload .natvis files if they're part of your project, you can use them on third party types you don't even have source code for... there's even a non-Microsoft game console out there with natvis support in their debug engine.
Cons: Using XML for a programming language is admittedly kinda horrific, and CustomListItems amounts to that. Fortunately it's typically only needed for hash/tree containers. It's also very C++ oriented - my initial PR did a bit of mangling to make Rust debug info look C++y enough for natvis to work with slices and strs (although not reliably... something to do with fat pointer types perhaps?)
My own wild fantasies include auto-generating natvis files for rust enums, and teaching rust to gather and embed natvis files found in dependency crates (or maybe using build scripts to emit link args from said crates would be sufficient? I should experiment with that...)
Interestingly, DWARF is more flexible than I thought, and it looks like at the very least Vec<T> and String could be implemented natively in it - see https://github.com/rust-lang/rust/issues/37504#issuecomment-578231071.
I could maybe see this scaling to HashMap (to get something akin to [(K, V)] being shown in the debugger), but for LinkedList and BTreeMap you would have some nesting wherever there's recursion through indirection.
When debugging a core file, you're not going to be able to call anything. It'd be unfortunate to lose that ability.
In addition to this, in gdb we normally recommend that pretty-printers not make inferior calls because it involves letting the inferior run; this can be very confusing when other threads run when you are trying to print something. Also, these calls involve stack manipulations, which is unpleasant when also trying to unwind -- gdb calls into these things when printing the stack trace.
Overall that makes me want to go for something more like "Vec<T> is a dynamic array where ptr is in this field and the length is in this other field".
Maybe offsets instead if field indices (using offset_of!?)
We can do this with a trait and an associated const, and generate DWARF from it, maybe even NatVis (if we can rely on embedding it into pdbs).
Most helpful comment
Something I wasn't aware of, that looks very useful: Windows
natvis!Spotted new
natvisdefinitions being added in #66597 (cc @MaulingMonkey)This is, for example,
Vec'snatvisdefinition:https://github.com/rust-lang/rust/blob/b5f265eeed23ac87ec6b4a7e6bc7cb4ea3e67c31/src/etc/natvis/liballoc.natvis#L3-L13
This looks decently declarative to me, and the more advanced features are pretty interesting. I definitely prefer it over Python code, even if it's wrapped in XML.
Here's a more advanced example (
HashMapfrom #66597):https://github.com/rust-lang/rust/blob/b5f265eeed23ac87ec6b4a7e6bc7cb4ea3e67c31/src/etc/natvis/libstd.natvis#L28-L49
I could see this being integrated as either an unstable attribute or an associated
const- but we could go further and generate the C expressions necessary (and/or some sort of DWARF ops) from a subset of MIR, for example, so that it has to fully go through type-checking first.In terms of support, VS Code even supports (a subset of)
natvisforgdb/lldb, but there are also other projects, likegdb-natvisandnatvis4gdb.And if we come up with our own solution that can be lowered to
natvis, then we can have our own pretty-printers and also interface with all of the above as well.