Ghidra: Possible to access higher-level IR in decompilation process?

Created on 5 Sep 2019  路  3Comments  路  Source: NationalSecurityAgency/ghidra

This question comes to you in two parts:

  1. Is the HighFunction we see in Ghidra's source another form of IR? Is it possible to access that representation, or is is PCode the only IR that is involved in the decompiling process?
  2. If a higher-level IR exists, can we print the IR as well?

Thanks!

Question

Most helpful comment

Yes, it is another form of IR.

The HighFunction object is an object containing the AST (Arbitrary Syntax Tree) built by the decompiler, which is Ghidra's IR language. You can't modify it by hand easily, but you can retrieve and review the IR by fetching a DecompileResults object, using the DecompInterface object. The HighFunction object obtained via DecompileResults().getHighFunction() is full of good information, including the full set of varnodes, the list of basic blocks, the jump tables, etc, but to figure out what all it can give you, you also need to play with feeding the DecompInterface instance a DecompileOptions object using the setOptions method.

I'd recommend looking at the full API for both, but its trivial to get the basic DecompileResults and HighFunction object in the jython interpreter. Just open a Python window, then do the following:

>>> from ghidra.app.decompiler import DecompInterface
>>> ifc = DecompInterface()
>>> ifc.openProgram(currentProgram)
True
>>> timeout_secs = 60
>>> results = ifc.decompileFunction(getFunctionContaining(currentAddress), timeout_secs, monitor)
>>> results.__class__
<type 'ghidra.app.decompiler.DecompileResults'>
>>> high_func = results.getHighFunction()
>>> high_func.__class__
<type 'ghidra.program.model.pcode.HighFunction'>
>>> high_func.getBasicBlocks()
[basic@00000010]
>>> high_func.getVarnodes(currentAddress)
java.util.TreeMap$NavigableSubMap$SubMapKeyIterator@62fa9b87
>>> results.decompiledFunction.getC()
u'\nvoid _init(void)\n\n{\n  undefined *puVar1;\n  \n  FUN_00000040();\n  puVar1 = PTR_DAT_0000003c;\n  *(undefined4 *)(PTR_DAT_0000003c + -4) = 0;\n  ...[trimmed to save space]

By default, the decompiler is configured to just yield the basic information, I think to make communications between SWING and the decompiler faster? But you can configure that by changing the information in the interface object using .getOptions() to get the DecompileOptions object, then setting the fields you want the decompiler to populate, and using .setOptions(<modified options object>) to then reconfigure the decompiler using the new set of options.

All 3 comments

Yes, it is another form of IR.

The HighFunction object is an object containing the AST (Arbitrary Syntax Tree) built by the decompiler, which is Ghidra's IR language. You can't modify it by hand easily, but you can retrieve and review the IR by fetching a DecompileResults object, using the DecompInterface object. The HighFunction object obtained via DecompileResults().getHighFunction() is full of good information, including the full set of varnodes, the list of basic blocks, the jump tables, etc, but to figure out what all it can give you, you also need to play with feeding the DecompInterface instance a DecompileOptions object using the setOptions method.

I'd recommend looking at the full API for both, but its trivial to get the basic DecompileResults and HighFunction object in the jython interpreter. Just open a Python window, then do the following:

>>> from ghidra.app.decompiler import DecompInterface
>>> ifc = DecompInterface()
>>> ifc.openProgram(currentProgram)
True
>>> timeout_secs = 60
>>> results = ifc.decompileFunction(getFunctionContaining(currentAddress), timeout_secs, monitor)
>>> results.__class__
<type 'ghidra.app.decompiler.DecompileResults'>
>>> high_func = results.getHighFunction()
>>> high_func.__class__
<type 'ghidra.program.model.pcode.HighFunction'>
>>> high_func.getBasicBlocks()
[basic@00000010]
>>> high_func.getVarnodes(currentAddress)
java.util.TreeMap$NavigableSubMap$SubMapKeyIterator@62fa9b87
>>> results.decompiledFunction.getC()
u'\nvoid _init(void)\n\n{\n  undefined *puVar1;\n  \n  FUN_00000040();\n  puVar1 = PTR_DAT_0000003c;\n  *(undefined4 *)(PTR_DAT_0000003c + -4) = 0;\n  ...[trimmed to save space]

By default, the decompiler is configured to just yield the basic information, I think to make communications between SWING and the decompiler faster? But you can configure that by changing the information in the interface object using .getOptions() to get the DecompileOptions object, then setting the fields you want the decompiler to populate, and using .setOptions(<modified options object>) to then reconfigure the decompiler using the new set of options.

I've played with scripting the decompiler to a decent extent at this point, fwiw, and I'm happy to answer any other questions you might have on this as they come up, so don't hesitate to ask.

Thanks for your detailed response @hedgeberg!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lab313ru picture lab313ru  路  16Comments

astrelsky picture astrelsky  路  16Comments

0x6d696368 picture 0x6d696368  路  18Comments

dalvarezperez picture dalvarezperez  路  19Comments

dw picture dw  路  20Comments