Ghidra: how to get references for highsymbols

Created on 6 Aug 2020  路  2Comments  路  Source: NationalSecurityAgency/ghidra

Hello,

I asked the following question: https://github.com/NationalSecurityAgency/ghidra/issues/2143 and @cetfor gave a very good answer. My question is - is there any way to get the references (or access instructions) for those variables (or highsymbols). For e.g. if I can get variable references using:

variables = function.getAllVariables()
for variable in variables:
    // using refmanager
    refmanager.getReferencesTo(variable)
    // or alternatively
    variable.getSymbol().getReferences()

But, I want references from variables or symbols from decompiler interface (just like @cetfor in his answer). Thanks so much in advanced and I appreciate your attention.

Most helpful comment

Hey @Ruturaj4,

This is not as direct of a response, so please bear with me here. You can use the symbols to get High Variables, then call getInstances() on each High Variable, then getDescendants() on each instance. This looks really confusing and probably not what you expect though, so let's look at an example.

Consider this decompiled code generated by Ghidra:

undefined8 func(int param_1,int param_2)

{
  long in_FS_OFFSET;
  uint auStack88 [8];
  undefined4 auStack56 [10];
  long local_10;

  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  auStack56[param_1] = 1;
  printf("%d\n",(ulong)auStack88[param_2]);
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return 0;
}

We can get the instances of the symbols, and descendants of the instances with something like this:

from ghidra.app.decompiler import DecompileOptions
from ghidra.app.decompiler import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor

func_name = "func"
func = getGlobalFunctions(func_name)[0]
options = DecompileOptions()
monitor = ConsoleTaskMonitor()
ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(func.getProgram())
res = ifc.decompileFunction(func, 60, monitor)
high_func = res.getHighFunction()
lsm = high_func.getLocalSymbolMap()
symbols = lsm.getSymbols()

for i, symbol in enumerate(symbols):
    print("\nSymbol {}:".format(i+1))
    print("  name:         {}".format(symbol.name))
    print("  dataType:     {}".format(symbol.dataType))
    hs = symbol.getHighVariable()  # note important part here
        instances = hs.getInstances()  # note important part here
    for instance in instances:
        print("\n  instance:     {}".format(instance))
        print("  type:         {}".format(type(instance)))
        print("  uniqueID:     {}".format(instance.uniqueId))
        print("  PCAddress:    {}".format(instance.getPCAddress()))
        for desc in instance.getDescendants():
            print("  Descendant:   {}".format(desc))

Which will return this:

Symbol 1:
  name:         auStack56
  dataType:     undefined4[10]

  instance:     (stack, 0xffffffffffffffc8, 40)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     0
  PCAddress:    NO ADDRESS

Symbol 2:
  name:         auStack88
  dataType:     uint[8]

  instance:     (stack, 0xffffffffffffffa8, 32)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     1
  PCAddress:    NO ADDRESS

Symbol 3:
  name:         in_FS_OFFSET
  dataType:     long

  instance:     (register, 0x110, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     303
  PCAddress:    001006eb
  Descendant:   (unique, 0x100000a3, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)

  instance:     (register, 0x110, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     348
  PCAddress:    NO ADDRESS
  Descendant:   (unique, 0x1000009b, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
  Descendant:   (register, 0x110, 8) INDIRECT (register, 0x110, 8) , (const, 0x36, 4)

Symbol 4:
  name:         local_10
  dataType:     long

  instance:     (stack, 0xfffffffffffffff0, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     409
  PCAddress:    001006c1
  Descendant:   (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x24, 4)

  instance:     (stack, 0xfffffffffffffff0, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     465
  PCAddress:    001006cc
  Descendant:   (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x36, 4)

  instance:     (stack, 0xfffffffffffffff0, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     458
  PCAddress:    001006eb
  Descendant:   (register, 0x206, 1) INT_NOTEQUAL (stack, 0xfffffffffffffff0, 8) , (unique, 0x1ff0, 8)
  Descendant:   (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x45, 4)

  instance:     (stack, 0xfffffffffffffff0, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     462
  PCAddress:    00100704

Symbol 5:
  name:         param_1
  dataType:     int

  instance:     (register, 0x38, 4)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     367
  PCAddress:    NO ADDRESS
  Descendant:   (register, 0x0, 8) INT_SEXT (register, 0x38, 4)

Symbol 6:
  name:         param_2
  dataType:     int

  instance:     (register, 0x30, 4)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     369
  PCAddress:    NO ADDRESS
  Descendant:   (register, 0x0, 8) INT_SEXT (register, 0x30, 4)

The instances and descendants are probably not what you'd expect as they are VarnodeAST and PcodeOpAST objects but they play a really important role in the underlying PCode that helped generate this decompiled code. We can dump this "refined pcode" of PcodeOpAST objects for the function like this:

from ghidra.util.task import ConsoleTaskMonitor
from ghidra.app.decompiler import DecompileOptions, DecompInterface

# == helper functions =============================================================================
def get_high_function(func):
    options = DecompileOptions()
    monitor = ConsoleTaskMonitor()
    ifc = DecompInterface()
    ifc.setOptions(options)
    ifc.openProgram(getCurrentProgram())
    # Setting a simplification style will strip useful `indirect` information.
    # Please don't use this unless you know why you're using it.
    #ifc.setSimplificationStyle("normalize") 
    res = ifc.decompileFunction(func, 60, monitor)
    high = res.getHighFunction()
    return high

def dump_refined_pcode(func, high_func):
    opiter = high_func.getPcodeOps()
    while opiter.hasNext():
        op = opiter.next()
        print("{}".format(op.toString()))
    print(type(op))

# == run examples =================================================================================
func = getGlobalFunctions("func")[0]    # assumes only one function named `main`
hf = get_high_function(func)            # we need a high function from the decompiler
dump_refined_pcode(func, hf)            # dump straight refined pcode as strings

This will dump the equivalent pcode for the decompiled code:

(unique, 0x1000009b, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
(unique, 0x1ff0, 8) LOAD (const, 0x1b1, 4) , (unique, 0x9e0, 8)
(unique, 0x9e0, 8) CAST (unique, 0x1000009b, 8)
(stack, 0xfffffffffffffff0, 8) COPY (unique, 0x1ff0, 8)
(register, 0x0, 8) INT_SEXT (register, 0x38, 4)
 ---  STORE (const, 0x1b1, 4) , (unique, 0x720, 8) , (const, 0x1, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x24, 4)
(unique, 0x1000005b, 8) PTRSUB (register, 0x20, 8) , (const, 0xffffffffffffffc8, 8)
(unique, 0x720, 8) PTRADD (unique, 0x1000005b, 8) , (register, 0x0, 8) , (const, 0x4, 8)
(register, 0x0, 8) INT_SEXT (register, 0x30, 4)
(unique, 0x1fd0, 4) LOAD (const, 0x1b1, 4) , (unique, 0x720, 8)
(unique, 0x1000007b, 8) PTRSUB (register, 0x20, 8) , (const, 0xffffffffffffffa8, 8)
(unique, 0x720, 8) PTRADD (unique, 0x1000007b, 8) , (register, 0x0, 8) , (const, 0x4, 8)
(register, 0x30, 8) INT_ZEXT (unique, 0x1fd0, 4)
 ---  CALL (ram, 0x100580, 8) , (unique, 0x10000043, 8) , (register, 0x30, 8)
(register, 0x110, 8) INDIRECT (register, 0x110, 8) , (const, 0x36, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x36, 4)
(unique, 0x10000043, 8) COPY (const, 0x1007b4, 8)
(register, 0x0, 8) COPY (const, 0x0, 8)
(unique, 0x100000a3, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
(unique, 0x1ff0, 8) LOAD (const, 0x1b1, 4) , (unique, 0x9e0, 8)
(register, 0x206, 1) INT_NOTEQUAL (stack, 0xfffffffffffffff0, 8) , (unique, 0x1ff0, 8)
(unique, 0x9e0, 8) CAST (unique, 0x100000a3, 8)
 ---  CBRANCH (ram, 0x100709, 1) , (register, 0x206, 1)
 ---  CALL (ram, 0x100570, 8)
 ---  RETURN (const, 0x1, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x45, 4)
 ---  RETURN (const, 0x0, 8) , (register, 0x0, 8)

If you glace back and forth between this "refined pcode" and the decompiled code, you'll be able to see the correlation here. Note that this pcode is way different than the "raw pcode" you see in the GUI because it's been heavily processed by Ghidra.

Each variable can appear in multiple locations (these are the instances) and each PCode operation can use multiple variable instances (these are the descendants).

So like I said, this is probably not what you expected but this is the best answer I know how to give. The pseudo C Ghidra shows comes from these PCodeOpAST objects, and I'm not sure how much processing can really be done at the pseudo C level, I believe it all has to happen at the PCodeOpAST level.

All 2 comments

Hey @Ruturaj4,

This is not as direct of a response, so please bear with me here. You can use the symbols to get High Variables, then call getInstances() on each High Variable, then getDescendants() on each instance. This looks really confusing and probably not what you expect though, so let's look at an example.

Consider this decompiled code generated by Ghidra:

undefined8 func(int param_1,int param_2)

{
  long in_FS_OFFSET;
  uint auStack88 [8];
  undefined4 auStack56 [10];
  long local_10;

  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  auStack56[param_1] = 1;
  printf("%d\n",(ulong)auStack88[param_2]);
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return 0;
}

We can get the instances of the symbols, and descendants of the instances with something like this:

from ghidra.app.decompiler import DecompileOptions
from ghidra.app.decompiler import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor

func_name = "func"
func = getGlobalFunctions(func_name)[0]
options = DecompileOptions()
monitor = ConsoleTaskMonitor()
ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(func.getProgram())
res = ifc.decompileFunction(func, 60, monitor)
high_func = res.getHighFunction()
lsm = high_func.getLocalSymbolMap()
symbols = lsm.getSymbols()

for i, symbol in enumerate(symbols):
    print("\nSymbol {}:".format(i+1))
    print("  name:         {}".format(symbol.name))
    print("  dataType:     {}".format(symbol.dataType))
    hs = symbol.getHighVariable()  # note important part here
        instances = hs.getInstances()  # note important part here
    for instance in instances:
        print("\n  instance:     {}".format(instance))
        print("  type:         {}".format(type(instance)))
        print("  uniqueID:     {}".format(instance.uniqueId))
        print("  PCAddress:    {}".format(instance.getPCAddress()))
        for desc in instance.getDescendants():
            print("  Descendant:   {}".format(desc))

Which will return this:

Symbol 1:
  name:         auStack56
  dataType:     undefined4[10]

  instance:     (stack, 0xffffffffffffffc8, 40)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     0
  PCAddress:    NO ADDRESS

Symbol 2:
  name:         auStack88
  dataType:     uint[8]

  instance:     (stack, 0xffffffffffffffa8, 32)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     1
  PCAddress:    NO ADDRESS

Symbol 3:
  name:         in_FS_OFFSET
  dataType:     long

  instance:     (register, 0x110, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     303
  PCAddress:    001006eb
  Descendant:   (unique, 0x100000a3, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)

  instance:     (register, 0x110, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     348
  PCAddress:    NO ADDRESS
  Descendant:   (unique, 0x1000009b, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
  Descendant:   (register, 0x110, 8) INDIRECT (register, 0x110, 8) , (const, 0x36, 4)

Symbol 4:
  name:         local_10
  dataType:     long

  instance:     (stack, 0xfffffffffffffff0, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     409
  PCAddress:    001006c1
  Descendant:   (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x24, 4)

  instance:     (stack, 0xfffffffffffffff0, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     465
  PCAddress:    001006cc
  Descendant:   (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x36, 4)

  instance:     (stack, 0xfffffffffffffff0, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     458
  PCAddress:    001006eb
  Descendant:   (register, 0x206, 1) INT_NOTEQUAL (stack, 0xfffffffffffffff0, 8) , (unique, 0x1ff0, 8)
  Descendant:   (stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x45, 4)

  instance:     (stack, 0xfffffffffffffff0, 8)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     462
  PCAddress:    00100704

Symbol 5:
  name:         param_1
  dataType:     int

  instance:     (register, 0x38, 4)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     367
  PCAddress:    NO ADDRESS
  Descendant:   (register, 0x0, 8) INT_SEXT (register, 0x38, 4)

Symbol 6:
  name:         param_2
  dataType:     int

  instance:     (register, 0x30, 4)
  type:         <type 'ghidra.program.model.pcode.VarnodeAST'>
  uniqueID:     369
  PCAddress:    NO ADDRESS
  Descendant:   (register, 0x0, 8) INT_SEXT (register, 0x30, 4)

The instances and descendants are probably not what you'd expect as they are VarnodeAST and PcodeOpAST objects but they play a really important role in the underlying PCode that helped generate this decompiled code. We can dump this "refined pcode" of PcodeOpAST objects for the function like this:

from ghidra.util.task import ConsoleTaskMonitor
from ghidra.app.decompiler import DecompileOptions, DecompInterface

# == helper functions =============================================================================
def get_high_function(func):
    options = DecompileOptions()
    monitor = ConsoleTaskMonitor()
    ifc = DecompInterface()
    ifc.setOptions(options)
    ifc.openProgram(getCurrentProgram())
    # Setting a simplification style will strip useful `indirect` information.
    # Please don't use this unless you know why you're using it.
    #ifc.setSimplificationStyle("normalize") 
    res = ifc.decompileFunction(func, 60, monitor)
    high = res.getHighFunction()
    return high

def dump_refined_pcode(func, high_func):
    opiter = high_func.getPcodeOps()
    while opiter.hasNext():
        op = opiter.next()
        print("{}".format(op.toString()))
    print(type(op))

# == run examples =================================================================================
func = getGlobalFunctions("func")[0]    # assumes only one function named `main`
hf = get_high_function(func)            # we need a high function from the decompiler
dump_refined_pcode(func, hf)            # dump straight refined pcode as strings

This will dump the equivalent pcode for the decompiled code:

(unique, 0x1000009b, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
(unique, 0x1ff0, 8) LOAD (const, 0x1b1, 4) , (unique, 0x9e0, 8)
(unique, 0x9e0, 8) CAST (unique, 0x1000009b, 8)
(stack, 0xfffffffffffffff0, 8) COPY (unique, 0x1ff0, 8)
(register, 0x0, 8) INT_SEXT (register, 0x38, 4)
 ---  STORE (const, 0x1b1, 4) , (unique, 0x720, 8) , (const, 0x1, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x24, 4)
(unique, 0x1000005b, 8) PTRSUB (register, 0x20, 8) , (const, 0xffffffffffffffc8, 8)
(unique, 0x720, 8) PTRADD (unique, 0x1000005b, 8) , (register, 0x0, 8) , (const, 0x4, 8)
(register, 0x0, 8) INT_SEXT (register, 0x30, 4)
(unique, 0x1fd0, 4) LOAD (const, 0x1b1, 4) , (unique, 0x720, 8)
(unique, 0x1000007b, 8) PTRSUB (register, 0x20, 8) , (const, 0xffffffffffffffa8, 8)
(unique, 0x720, 8) PTRADD (unique, 0x1000007b, 8) , (register, 0x0, 8) , (const, 0x4, 8)
(register, 0x30, 8) INT_ZEXT (unique, 0x1fd0, 4)
 ---  CALL (ram, 0x100580, 8) , (unique, 0x10000043, 8) , (register, 0x30, 8)
(register, 0x110, 8) INDIRECT (register, 0x110, 8) , (const, 0x36, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x36, 4)
(unique, 0x10000043, 8) COPY (const, 0x1007b4, 8)
(register, 0x0, 8) COPY (const, 0x0, 8)
(unique, 0x100000a3, 8) INT_ADD (register, 0x110, 8) , (const, 0x28, 8)
(unique, 0x1ff0, 8) LOAD (const, 0x1b1, 4) , (unique, 0x9e0, 8)
(register, 0x206, 1) INT_NOTEQUAL (stack, 0xfffffffffffffff0, 8) , (unique, 0x1ff0, 8)
(unique, 0x9e0, 8) CAST (unique, 0x100000a3, 8)
 ---  CBRANCH (ram, 0x100709, 1) , (register, 0x206, 1)
 ---  CALL (ram, 0x100570, 8)
 ---  RETURN (const, 0x1, 4)
(stack, 0xfffffffffffffff0, 8) INDIRECT (stack, 0xfffffffffffffff0, 8) , (const, 0x45, 4)
 ---  RETURN (const, 0x0, 8) , (register, 0x0, 8)

If you glace back and forth between this "refined pcode" and the decompiled code, you'll be able to see the correlation here. Note that this pcode is way different than the "raw pcode" you see in the GUI because it's been heavily processed by Ghidra.

Each variable can appear in multiple locations (these are the instances) and each PCode operation can use multiple variable instances (these are the descendants).

So like I said, this is probably not what you expected but this is the best answer I know how to give. The pseudo C Ghidra shows comes from these PCodeOpAST objects, and I'm not sure how much processing can really be done at the pseudo C level, I believe it all has to happen at the PCodeOpAST level.

This is great @cetfor. Thanks so much for your answer. May be I will have to link pcode and disassembly code in some way to get the corresponding disassembly references for the variables seen in the decompiler interface.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

marcushall42 picture marcushall42  路  3Comments

toor-de-force picture toor-de-force  路  3Comments

chibicitiberiu picture chibicitiberiu  路  3Comments

pd0wm picture pd0wm  路  3Comments

Kerilk picture Kerilk  路  3Comments