Ghidra: Ghidra doesn't work well against dropbox binary

Created on 30 May 2019  路  8Comments  路  Source: NationalSecurityAgency/ghidra

Summary: Ghidra 9.0.4 doesn't work well against the dropbox binary.

Versions used: Ghidra 9.0.4 on Ubuntu 18.04 LTS host.

Target binary: https://clientupdates.dropboxstatic.com/dbx-releng/client/dropbox-lnx.x86_64-73.4.118.tar.gz (open the 23 MB dropbox binary contained in this archive).

Reproduction steps: Navigate to _PyEval_EvalFrameDefault function and compare the Ghidra disassembly with the corresponding disassembly from IDA Freeware .

Problem: It seems that Ghidra is unable to parse this function completely (the disassembly is incomplete). It also doesn't recover the switch case labels.

I am attaching from IDA Freeware for this function. IDA Freeware works very well (and automatically) against this function and is able to recover all the switch case labels.

ida-nice-job

Is there a way to get similar results from Ghidra?

I am hoping to use Ghidra in my work, if possible.

Bug

Most helpful comment

Another thing you can experiment with is changing the image base when importing the binary. This binary is PIC, and Ghidra defaults to forcing it away from 0x00000 to 0x100000, but the DWARF side of things isn't keeping up with that.
If you change the image base back to 0 (which can cause its own small difficulties for other things in Ghidra), the dwarf data will line up.

All 8 comments

I would turn off applying DWARF information when you run analysis. There appears to be an issue with the information in the file. I can't tell if IDA is suffering from the same issue, but it is possible.

The switch statements recover with the decompiler, and that particular routine is massive. The decompiler is set to timeout after 30 seconds for recovering switches, you can up the time. There can also be issues with the size of the returned function information from the decompiler. You may need to increase the decompiler payload size in the Edit->ToolOptions...->Decompiler.

There are other ways to recover switches if the decompiler fails, but we haven't had to use them in quite a while because the decompiler is usually very good at recovering switches, and recovers why each switch case was taken. This seems like some sort of dynamic dispatch to other actual functions, thus the reason it is taking so long to decompile and times out on the recovery of the switch statement. If the decompiler eventually comes back I might advise upping the time on the DecompilerSwitch analyzer from 30 seconds. I may be able to give you a script that recovers the switch quickly. It is something we've been thinking of putting back into the analysis in the odd case the decompiler fails. I'll take a look and see if there is something that can be done in the meantime.

A minor issues, but it looks like decode of the VPROTQ instruction appears to be missing as well.

Upping the decompiler timeout to 1000 seconds and the payload to 100Meg I was able to get the routine _PyEval_EvalFrameDefault to decompile. Although it may have many sub-switch statements that are unrecovered. I need to look at them to see. I haven't discovered why the switch is not recovering yet, but will continue looking.

The issue is the use of very large switch statements instead of functions. Without looking too deeply, I imagine this was generated code, possibly compiled python?

Thank you for looking into this :+1:

This problematic function in the dropbox binary actually comes from Python 3.7.x (3.7.2 I think) tarball. This binary was generated by GCC 4.8.4.

https://github.com/python/cpython/blob/e042a4553efd0ceca2234f68a4f1878f2ca04973/Python/ceval.c#L2677 <-- this is how this function looks in the original source code form.

https://github.com/kholia/dedrop/tree/master/src/dedrop-ng has some notes on reversing this binary with IDA. I am using this script to recover the switch case labels.

Was going to ask what level of hand analysis you were interested in doing. Obviously everyone wants push button automated, but sometimes a little help is necessary unless the end goal is absolute automation. I think a fairly simple script, or by hand analysis would get the code. The question is what next?

The code explains some of the issues. The coder really like goto's and inlining duplicate code. I think it would gain some speed if the code were collapsed, but I suppose they were trying to eliminate stack handling?

There are two key switch tables that lead to most of the non-disassembled code. They can be disassembled by hand fairly easily, it all depends on what outcome you are looking for.
Recovering the original C-code with the decompiler?
Recovering references?
Recovering the Python byte code and ignoring the driver?

The heuristic switch recovery should be added back in when the code is too massive for the decompiler in reasonable time. We can look at faster recovery options for switch statements in the decompiler, but that won't help currently.

I would like to recover the switch case labels with some automation (absolute automation is not a must) / simple script.

It would be great to bring back this "heuristic switch recovery" thing.

For now, I will try turning off the DWARF analysis option. Thanks!

Another thing you can experiment with is changing the image base when importing the binary. This binary is PIC, and Ghidra defaults to forcing it away from 0x00000 to 0x100000, but the DWARF side of things isn't keeping up with that.
If you change the image base back to 0 (which can cause its own small difficulties for other things in Ghidra), the dwarf data will line up.

If you change the image base back to 0 (which can cause its own small difficulties for other things in Ghidra), the dwarf data will line up.

Doing so helped a lot.

yay-ghidra

The disassembly is not broken now and it seems that switch case labels were also recovered.

Time to learn some GHIDRA scripting to port this IDA script to GHIDRA.

Thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lab313ru picture lab313ru  路  3Comments

CalcProgrammer1 picture CalcProgrammer1  路  3Comments

ghost picture ghost  路  3Comments

gemini00 picture gemini00  路  3Comments

chibicitiberiu picture chibicitiberiu  路  3Comments