Ghidra: Decompiler: x86-16 external functions are not being handled

Created on 10 Jul 2019  路  8Comments  路  Source: NationalSecurityAgency/ghidra

Describe the bug
In x86-16 files such as the NE (new executable) format for Windows where external symbols to DLLs are loaded, the decompiler for whatever reason is being informed of external references but not querying to get the details needed for naming and type information. Instead the raw decompilation contains names like func_0x009002();

To Reproduce
1) Also useful debugging tip - on Windows an API Monitoring tool like API Monitor v2 64-bit from www.rohitab.com (with Tools -> Options -> Memory -> Maximum size of captured buffers = 4096, and filter for Data Access and Storage -> Local File Systems -> File Management -> Kernel32.dll) monitoring Ghidra's javaw.exe can yield this interface details (in the child decompile.exe process):
WriteFile: getMappedSymbolsXML........<addr space="ram" offset="0x9002"/>
ReadFile: <result>.<parent>.<val/></parent>.<mapsym>.<externrefsymbol name="Ordinal_1_exref">.<addr space="ram" offset="0x9002"/></externrefsymbol>.<addr space="ram" offset="0x9002"/><rangelist/></mapsym>.</result>

Then the decompilation completes without anything further.

Expected behavior
The decompiler is supposed to send getExternalRefXML to get the function details for the external referenced function. It does this consistently on 32 and 64-bit Windows modules.

Environment (please complete the following information):

  • OS: Windows 10
  • Java Version: 11.0.3
  • Ghidra Version: 9.0.4
Bug

All 8 comments

Can you send the debug decompiler xml dump from the pull down menu in the upper right corner of the decompiler?

Yes certainly. Here it is.

Also an easy example to see the issues with external calls not being properly handled by the decompiler (as well as no call fixups which could lead to bad code segment registers in a large enough program (not this one) - see issue #765 ) is here:
http://www.minesweeper.info/downloads/Winmine31.html

winmine.zip

I tried to choose a simple function. It should display KERNEL_Ordinal_102 or KERNEL::Ordinal_102 instead of func_...

Reference #485 and #765 maybe should merge to single issue

Conclusion: coreaction.cc has only handled CPUI_CALLIND and not CPUI_CALL (which a ptr:offset call is) for resolving external symbols. Suggestion is to resolve CPUI_CALLs. The Java client could send a function prototype instead of marking external functions in 16-bit modules as external. But this is merely a workaround not a solution.

Yes, it seems the NE importer is improperly creating external references which were designed for indirect calls. Working on fix so the importer properly creates external functions.

Is this necessarily needed? There is a real difference in the ASM which still requires the fix which is proposed and working. I do not see where the NE loader would need a change. There are several interesting things in NE files:
The imports are far pointers (not relative/near pointers as per 32/64 bit) - CALLF 0x#:0x# vs CALL ptr [0x#]
The imports are relocated and fictitious far pointers into the import table which is loaded and then the actual far pointers are copied into their place by the NE loader at initialization (not simple indirect reference relocations, into a pointer table) - this means there is no IAT, and you cannot hook a call by merely changing the import table pointer, the Ghidra NE loader currently makes fictitious extensions to the code segment(s) (convenient way to mask other bugs to do with the CS register after far calls being wrong).

IDA Pro on its part does not create fictitious locations at the end of the code and instead leaves the original fictitious pointer value but of course those addresses are contiguous and cannot actually be called. Ghidra's fictitious locations are 1 byte each so obviously no placeholder code could even fit a single return instruction there either. Regardless these things need to be considered.

This is part of a series of changes to 16-bit analysis that are making their way through review (some are already in master). The NeLoader hadn't been visited in a while, so some adjustment was necessary anyway to support "protected mode" addressing via the new ProtectedAddressSpace class. For this particular ticket, the decompiler was receiving pointer information for direct calls when it should have been receiving function information. This fix lets the decompiler receive the information normally for a direct call, without having to requery for the information at a later stage of the decompilation.

We'll repost here when the change is up on master, and you can take a look.

Thanks for the info, looking forward to see what comes about - too many details to discuss here as you point out. I had contributed a large batch of changes to address some of them albeit some were so deep in the architecture, it was difficult to say if my approach was the ideal one or not, despite that outwardly they appeared to work (and were refined to attempt not to break anything else).

This is now in master.

This seems to be a fudge as when trying to create types like:
typedef char far* LPSTR;
or
typedef const char far* LPCSTR;
or any "far*" we end up with an incorrect 2-byte pointer reference that subsequently breaks analysis further down the line.

I've been trying to understand how this shortcoming can be rectified, but it seems like the DNA of Ghidra will need updating to accommodate pointers of differing sizes simultaneously. I would appreciate some pointers (pun unintended) on possible solutions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lab313ru picture lab313ru  路  3Comments

loudinthecloud picture loudinthecloud  路  3Comments

huettenhain picture huettenhain  路  3Comments

awsaba picture awsaba  路  3Comments

pd0wm picture pd0wm  路  3Comments