Ghidra: Is there a nicer way to see vtable's function calls directly in the decompiler?

Created on 26 Apr 2019  路  11Comments  路  Source: NationalSecurityAgency/ghidra

When I am dealing with a C++ binary, it's expected that I will have to deal with vtables and that the binary will call functions by accessing the instance's vtables. Defining the vtable doesn't seem too difficult as I can type the data to an array of func* and it shows perfectly in the listing. I can then retype the appropriate field in the class's structure to func* so the decompiler knows it is an array of func.

The problem is this is the best I can get in the decompiler (this is an example I made up to illustrate my point):

(*this->vtable[6])(local_20);

Now, at least it does tell me that it is calling the 7th function in the vtable, but the problem is that I would like the decompiler to be able to infer WHICH function it is called, but it can't because it cannot know that this field is a vtable so it won't really change once assigned which would have been done before. I haven't found an option to set a structure's field as a constant that will never change so I am forced to manually check the vtable in the listing to figure out what is index 6 in the table.

The only other solution I seem to have found is to create a new structure with all the entries being there, but not only it will JUST say the function name and not show the actual reference, it will be completely separate from the class structure which is incredibly inconvenient to create for all vtables as I am dealing with hundreds of classes (my particula binary has debugging information).

My question: Is there a better way to deal with this and if there isn't, would it be possible to fix this? It seems to be a huge inconvenience to check the table every time I see an indirect function call.

Question

Most helpful comment

Some people from the CMU have written a framework for static analysis of object-oriented code, including a tool for recovering classes and methods and a corresponding Ghidra plugin. It looks much more comprehensive than my script and should be the best way going forward.

All 11 comments

I can get to
(*this->vtable.MyClass::MyMethod)(this, param1, param2)
by typing/naming the members of a vtable structure appropriately and then typing the vtable pointer of the actual structure with it.

So basically: MyClass with a vtable member of type vtable-MyClass* and vtable-MyClass with a MyClass::MyMethod member (name can be anything, I chose to put the class name so it shows up for derived classes) of type void __thiscall MyMethod(MyClass* this, int param1, int param2)
Then go to the actual memory location of the vtable (either check a ctor/dtor or find the VFTABLE some other way) and type it as vtable-MyClass (Note that sometimes the vtable isn't the right size and has extra function pointers at the end, you might have to clear code bytes before typing)

It works, but is still kinda clunky in syntax. Helps readability more than this->vtable[6] though.

But yeah, it would be nice if there was a way to help this vtable reconstruction; be it structure definition based on the vtables from metadata, naming from the referenced methods or just a vtable field that lets you edit this in the structure (or class!) somewhere.

@BhaaLseN how is this done? I can see classes in symbol tree, but when i try to retype a variable - it expects type from "data type" tab. Classes are not there of course. Is there some extra step to create a data type from a class in symbol tree?

Edit: I can see we create new structures manually. This is a chore though, especially for vtables. Is there really no way to create a struct datatype from class symbol?

Yeah, this needs to be done _by hand_ at this point; which is why I also put my 馃憤 for better support.

But as mentioned, right now you need _two_ structs; your actual object/class/structure (which was probably auto-created) and a second one for the vtable (which you have to do by hand); then type the vtable member of the first one with a pointer to the second one.

I made a script for detecting vtables and creating datatypes for them. It does this by looking for "typeinfo name" mangled strings, finding typeinfos by looking at references and then finding vtables the same way. I haven't tested it on more than two binaries, so it's probably got some bugs, but it should work on any Itanium ABI binary (GCC, clang and some others).

Don't worry if you run it and get a lot of "ERR" messages, those might just be artifacts of typename-like strings in the binary or classes where the tool can't get all the necessary data.

It doesn't deal with inheriting from a multiple inheritance class yet, but I'll hopefully fix that later.

Unfortunately, this script can't currently represent the type hierarchy faithfully (i.e. how gdb's print does it). This script creates multiple vptrs in a struct at particular offsets, but what's really happening is that entire objects (of parent classes) are embedded in the child object. This is all fine and could be implemented easily, but the first embedded object's vptr is also the whole object's vptr, so the pointed-to vtable has extra function pointers appended at the end. This would require something like the types being parametric over their vtable type (or just "hardcoded" support for C++ inheritance) in Ghidra.

Some people from the CMU have written a framework for static analysis of object-oriented code, including a tool for recovering classes and methods and a corresponding Ghidra plugin. It looks much more comprehensive than my script and should be the best way going forward.

Some people from the CMU have written a framework for static analysis of object-oriented code, including a tool for recovering classes and methods and a corresponding Ghidra plugin. It looks much more comprehensive than my script and should be the best way going forward.

But it cannot compile on windows.

been doing like @BhaaLseN said typing function def (names & types) and using them in vtable struct def and applying to vtable data. this work nicely and decompiler are able to infer argument and their types.

However i have like a 100MB binary reversed almost all function def , names and their types but creating manually vtable ... for binary over 100k functions not fun...

At least if only their was was a plugin or script you select the vtable pointer array and hit F2 and it auto create one for you with pointer pointee types that you already reversed !

However creating vtable data structure from pointer to function : do create a structure like : this.

image

what is bad: is that functions are already typed correctly. so mutch manual labor for some very low reward

Small example

Decided to compile a small example to see what ghidra's decompiler is able to recover:

image

1) inserting func def in the datatype manager

image
image
image
image

2) replacing pointer with function def in the vtable and vtable placeholder

image

image

image

Finish product :

image
image

a bit disappointed

  • the aes_p->mode enumeration make the decompiler choke off . on others sample the enumeration works mostly fine most of the time. is this a bug ?
  • aes_init() : returned data could have been the first field of the struct but it is not !
  • main return type and stack variable recovered ok.

image

here is src,bin,makefile,ghidra project file gzf
aes.tar.gz

@pabx06 That is one heck of a write-up. That will make it easier to follow your issue.

That is one heck of a write-up

i cant agree more. typing vtable types so much labor . it would be nice to make a script to pull func-def & name from sected pointer and create a vtable ? imagine 100MB sample to reverse ...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

astrelsky picture astrelsky  路  21Comments

ghost picture ghost  路  29Comments

woachk picture woachk  路  33Comments

rszibele picture rszibele  路  35Comments

SocraticBliss picture SocraticBliss  路  26Comments