Ghidra: Is there any way to get predicted variables using script?

Created on 27 Jul 2020  路  2Comments  路  Source: NationalSecurityAgency/ghidra

I have a simple program:

#include <stdio.h>

int main()
{
  int a;
  a = func(15, 3);
  return a;
}

int func(int i, int j)
{
  int b1[5], b2[10];

  b2[i] = 1;
  printf("%d\n", b1[j]);

  return 0;
}

I am using python script to get local variables from the stripped binary, compiled using above program.

I use: function.getLocalVariables() or something like function.getStackFrame().getStackVariables() to get the local variables. Interestingly I observed that, this script doesn't give me all the variables which can be seen in the decompiler window. For e.g., in the above case, I get following in the decompiled window (for function func):

image

Here, the predicted buffers can be seen. But instead I get:

array(ghidra.program.model.listing.Variable, [[undefined4 local_5c@Stack[-0x5c]:4], [undefined4 local_60@Stack[-0x60]:4]])

which are clearly not the predicted buffers. Is there any way to get those buffers?

Most helpful comment

Hey @Ruturaj4,

What you're asking for is possible. You need to use the decompiler interface to get that information. Here's an example using Python:

from ghidra.app.decompiler import DecompileOptions
from ghidra.app.decompiler import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor

name = "myFunctionName"
func = getGlobalFunctions(name)[0]

options = DecompileOptions()
monitor = ConsoleTaskMonitor()
ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(func.getProgram())
res = ifc.decompileFunction(func, 60, monitor)
high_func = res.getHighFunction()
lsm = high_func.getLocalSymbolMap()
symbols = lsm.getSymbols()

for i, symbol in enumerate(symbols):
    print("Symbol {}: {} (size: {})".format(i+1, symbol.getName(), symbol.size))

And here's an example out put:

Symbol 1: auStack56 (size: 40)
Symbol 2: auStack88 (size: 32)
Symbol 3: in_FS_OFFSET (size: 8)
Symbol 4: local_10 (size: 8)
Symbol 5: param_1 (size: 4)
Symbol 6: param_2 (size: 4)

Note that the sizes returned here are in bytes. So something like undefined4 auStack88 [12] will return size: 48 (12 * 4). Use print(dir(symbol)) to get more information on what you can get from these symbols. Everything you're looking for should be there.

All 2 comments

Hey @Ruturaj4,

What you're asking for is possible. You need to use the decompiler interface to get that information. Here's an example using Python:

from ghidra.app.decompiler import DecompileOptions
from ghidra.app.decompiler import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor

name = "myFunctionName"
func = getGlobalFunctions(name)[0]

options = DecompileOptions()
monitor = ConsoleTaskMonitor()
ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(func.getProgram())
res = ifc.decompileFunction(func, 60, monitor)
high_func = res.getHighFunction()
lsm = high_func.getLocalSymbolMap()
symbols = lsm.getSymbols()

for i, symbol in enumerate(symbols):
    print("Symbol {}: {} (size: {})".format(i+1, symbol.getName(), symbol.size))

And here's an example out put:

Symbol 1: auStack56 (size: 40)
Symbol 2: auStack88 (size: 32)
Symbol 3: in_FS_OFFSET (size: 8)
Symbol 4: local_10 (size: 8)
Symbol 5: param_1 (size: 4)
Symbol 6: param_2 (size: 4)

Note that the sizes returned here are in bytes. So something like undefined4 auStack88 [12] will return size: 48 (12 * 4). Use print(dir(symbol)) to get more information on what you can get from these symbols. Everything you're looking for should be there.

image

Here, the predicted buffers can be seen. But instead I get:

array(ghidra.program.model.listing.Variable, [[undefined4 local_5c@Stack[-0x5c]:4], [undefined4 local_60@Stack[-0x60]:4]])

which are clearly not the predicted buffers. Is there any way to get those buffers?

i am smelling entropy here.
Variadic function like printf snprintf,... take a format string that leak entropy:
1) Number of params
2) Type of params that could be propagated .

ie printf("%d",auStack40[param_2] ) => signed int => auStack40[param_2] is signed int => auStack40 array of signed int. but the decompiler analyse vars and chosed uint.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

huettenhain picture huettenhain  路  3Comments

astrelsky picture astrelsky  路  3Comments

gemini00 picture gemini00  路  3Comments

marcushall42 picture marcushall42  路  3Comments

CalcProgrammer1 picture CalcProgrammer1  路  3Comments