Hello,
I asked the following question: https://github.com/NationalSecurityAgency/ghidra/issues/2143 and @cetfor gave a very good answer. My question is - is there any way to get the references (or access instructions) for those variables (or highsymbols). For e.g. if I can get variable references using:
variables = function.getAllVariables()
for variable in variables:
// using refmanager
refmanager.getReferencesTo(variable)
// or alternatively
variable.getSymbol().getReferences()
But, I want references from variables or symbols from decompiler interface (just like @cetfor in his answer). Thanks so much in advanced and I appreciate your attention.
Hey @Ruturaj4,
This is not as direct of a response, so please bear with me here. You can use the symbols to get High Variables, then call getInstances() on each High Variable, then getDescendants() on each instance. This looks really confusing and probably not what you expect though, so let's look at an example.
Consider this decompiled code generated by Ghidra:
undefined8 func(int param_1,int param_2)
{
long in_FS_OFFSET;
uint auStack88 [8];
undefined4 auStack56 [10];
long local_10;
local_10 = *(long *)(in_FS_OFFSET + 0x28);
auStack56[param_1] = 1;
printf("%d\n",(ulong)auStack88[param_2]);
if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
/* WARNING: Subroutine does not return */
__stack_chk_fail();
}
return 0;
}
We can get the instances of the symbols, and descendants of the instances with something like this:
from ghidra.app.decompiler import DecompileOptions
from ghidra.app.decompiler import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor
func_name = "func"
func = getGlobalFunctions(func_name)[0]
options = DecompileOptions()
monitor = ConsoleTaskMonitor()
ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(func.getProgram())
res = ifc.decompileFunction(func, 60, monitor)
high_func = res.getHighFunction()
lsm = high_func.getLocalSymbolMap()
symbols = lsm.getSymbols()
for i, symbol in enumerate(symbols):
print("\nSymbol {}:".format(i+1))
print(" name: {}".format(symbol.name))
print(" dataType: {}".format(symbol.dataType))
hs = symbol.getHighVariable() # note important part here
instances = hs.getInstances() # note important part here
for instance in instances:
print("\n instance: {}".format(instance))
print(" type: {}".format(type(instance)))
print(" uniqueID: {}".format(instance.uniqueId))
print(" PCAddress: {}".format(instance.getPCAddress()))
for desc in instance.getDescendants():
print(" Descendant: {}".format(desc))
Which will return this:
Symbol 1:
name: auStack56
dataType: undefined4[10]
instance: (stack, 0xffffffffffffffc8, 40)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 0
PCAddress: NO ADDRESS
Symbol 2:
name: auStack88
dataType: uint[8]
instance: (stack, 0xffffffffffffffa8, 32)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 1
PCAddress: NO ADDRESS
Symbol 3:
name: in_FS_OFFSET
dataType: long
instance: (register, 0x110, 8)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 303
PCAddress: 001006eb
Descendant: (unique, 0x100000a3, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
instance: (register, 0x110, 8)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 348
PCAddress: NO ADDRESS
Descendant: (unique, 0x1000009b, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
Descendant: (register, 0x110, 8) INDIRECT (register, 0x110, 8) , (const, 0x36, 4)
Symbol 4:
name: local_10
dataType: long
instance: (stack, 0xfffffffffffffff0, 8)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 409
PCAddress: 001006c1
Descendant: (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x24, 4)
instance: (stack, 0xfffffffffffffff0, 8)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 465
PCAddress: 001006cc
Descendant: (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x36, 4)
instance: (stack, 0xfffffffffffffff0, 8)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 458
PCAddress: 001006eb
Descendant: (register, 0x206, 1) INT_NOTEQUAL (stack, 0xfffffffffffffff0, 8) , (unique, 0x1ff0, 8)
Descendant: (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x45, 4)
instance: (stack, 0xfffffffffffffff0, 8)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 462
PCAddress: 00100704
Symbol 5:
name: param_1
dataType: int
instance: (register, 0x38, 4)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 367
PCAddress: NO ADDRESS
Descendant: (register, 0x0, 8) INT_SEXT (register, 0x38, 4)
Symbol 6:
name: param_2
dataType: int
instance: (register, 0x30, 4)
type: <type 'ghidra.program.model.pcode.VarnodeAST'>
uniqueID: 369
PCAddress: NO ADDRESS
Descendant: (register, 0x0, 8) INT_SEXT (register, 0x30, 4)
The instances and descendants are probably not what you'd expect as they are VarnodeAST and PcodeOpAST objects but they play a really important role in the underlying PCode that helped generate this decompiled code. We can dump this "refined pcode" of PcodeOpAST objects for the function like this:
from ghidra.util.task import ConsoleTaskMonitor
from ghidra.app.decompiler import DecompileOptions, DecompInterface
# == helper functions =============================================================================
def get_high_function(func):
options = DecompileOptions()
monitor = ConsoleTaskMonitor()
ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(getCurrentProgram())
# Setting a simplification style will strip useful `indirect` information.
# Please don't use this unless you know why you're using it.
#ifc.setSimplificationStyle("normalize")
res = ifc.decompileFunction(func, 60, monitor)
high = res.getHighFunction()
return high
def dump_refined_pcode(func, high_func):
opiter = high_func.getPcodeOps()
while opiter.hasNext():
op = opiter.next()
print("{}".format(op.toString()))
print(type(op))
# == run examples =================================================================================
func = getGlobalFunctions("func")[0] # assumes only one function named `main`
hf = get_high_function(func) # we need a high function from the decompiler
dump_refined_pcode(func, hf) # dump straight refined pcode as strings
This will dump the equivalent pcode for the decompiled code:
(unique, 0x1000009b, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
(unique, 0x1ff0, 8) LOAD (const, 0x1b1, 4) , (unique, 0x9e0, 8)
(unique, 0x9e0, 8) CAST (unique, 0x1000009b, 8)
(stack, 0xfffffffffffffff0, 8) COPY (unique, 0x1ff0, 8)
(register, 0x0, 8) INT_SEXT (register, 0x38, 4)
--- STORE (const, 0x1b1, 4) , (unique, 0x720, 8) , (const, 0x1, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x24, 4)
(unique, 0x1000005b, 8) PTRSUB (register, 0x20, 8) , (const, 0xffffffffffffffc8, 8)
(unique, 0x720, 8) PTRADD (unique, 0x1000005b, 8) , (register, 0x0, 8) , (const, 0x4, 8)
(register, 0x0, 8) INT_SEXT (register, 0x30, 4)
(unique, 0x1fd0, 4) LOAD (const, 0x1b1, 4) , (unique, 0x720, 8)
(unique, 0x1000007b, 8) PTRSUB (register, 0x20, 8) , (const, 0xffffffffffffffa8, 8)
(unique, 0x720, 8) PTRADD (unique, 0x1000007b, 8) , (register, 0x0, 8) , (const, 0x4, 8)
(register, 0x30, 8) INT_ZEXT (unique, 0x1fd0, 4)
--- CALL (ram, 0x100580, 8) , (unique, 0x10000043, 8) , (register, 0x30, 8)
(register, 0x110, 8) INDIRECT (register, 0x110, 8) , (const, 0x36, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x36, 4)
(unique, 0x10000043, 8) COPY (const, 0x1007b4, 8)
(register, 0x0, 8) COPY (const, 0x0, 8)
(unique, 0x100000a3, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
(unique, 0x1ff0, 8) LOAD (const, 0x1b1, 4) , (unique, 0x9e0, 8)
(register, 0x206, 1) INT_NOTEQUAL (stack, 0xfffffffffffffff0, 8) , (unique, 0x1ff0, 8)
(unique, 0x9e0, 8) CAST (unique, 0x100000a3, 8)
--- CBRANCH (ram, 0x100709, 1) , (register, 0x206, 1)
--- CALL (ram, 0x100570, 8)
--- RETURN (const, 0x1, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x45, 4)
--- RETURN (const, 0x0, 8) , (register, 0x0, 8)
If you glace back and forth between this "refined pcode" and the decompiled code, you'll be able to see the correlation here. Note that this pcode is way different than the "raw pcode" you see in the GUI because it's been heavily processed by Ghidra.
Each variable can appear in multiple locations (these are the instances) and each PCode operation can use multiple variable instances (these are the descendants).
So like I said, this is probably not what you expected but this is the best answer I know how to give. The pseudo C Ghidra shows comes from these PCodeOpAST objects, and I'm not sure how much processing can really be done at the pseudo C level, I believe it all has to happen at the PCodeOpAST level.
This is great @cetfor. Thanks so much for your answer. May be I will have to link pcode and disassembly code in some way to get the corresponding disassembly references for the variables seen in the decompiler interface.
Most helpful comment
Hey @Ruturaj4,
This is not as direct of a response, so please bear with me here. You can use the symbols to get High Variables, then call
getInstances()on each High Variable, thengetDescendants()on each instance. This looks really confusing and probably not what you expect though, so let's look at an example.Consider this decompiled code generated by Ghidra:
We can get the instances of the symbols, and descendants of the instances with something like this:
Which will return this:
The instances and descendants are probably not what you'd expect as they are
VarnodeASTandPcodeOpASTobjects but they play a really important role in the underlying PCode that helped generate this decompiled code. We can dump this "refined pcode" ofPcodeOpASTobjects for the function like this:This will dump the equivalent pcode for the decompiled code:
If you glace back and forth between this "refined pcode" and the decompiled code, you'll be able to see the correlation here. Note that this pcode is way different than the "raw pcode" you see in the GUI because it's been heavily processed by Ghidra.
Each variable can appear in multiple locations (these are the instances) and each PCode operation can use multiple variable instances (these are the descendants).
So like I said, this is probably not what you expected but this is the best answer I know how to give. The pseudo C Ghidra shows comes from these PCodeOpAST objects, and I'm not sure how much processing can really be done at the pseudo C level, I believe it all has to happen at the PCodeOpAST level.