Ghidra: How to compile the decompiled code?

Created on 20 Mar 2019  路  4Comments  路  Source: NationalSecurityAgency/ghidra

I could not re-compile the code generated by the decompiler. For the simple code below

include

int main(){
int a;
scanf("%d",&a);
int b = a * 2;
printf("%d\n", b);
return 0;
}

ghidra decompiler generates hundreds lines of code. main function looks like this:

undefined8 main(void)
{
long in_FS_OFFSET;
int local_18;
uint local_14;
long local_10;

local_10 = *(long *)(in_FS_OFFSET + 0x28);
__isoc99_scanf(&DAT_00100814,&local_18);
local_14 = local_18 * 2;
printf("%d\n",(ulong)local_14);
if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
// WARNING: Subroutine does not return
__stack_chk_fail();
}
return 0;
}

Now, how do I compile this code? There is no definition for ulong, uint, code, scanf etc. Am I missing the flag or something when I decompiled?

Question

Most helpful comment

Hi anwarmamat,

As you may know, compilation is an inherently lossy process. Information such as variable names and types are lost unless explicit steps are taken to preserve then in the compilation process, for example by requesting debug symbols. Once compiled, variables are basically just unnamed, unsized locations in the memory or register space.

The decompiler attempts to make some reasonable deductions regarding types based on how the variables are used in order to produced something resembling the original code. For instance, in your example, it concluded that the local variables local_14 and local_18, corresponding to b and a, were originally defined to be unsigned and signed integers. The type definitions were buried in the files you #include'd in your original code but that information is not present in the executable. In some cases, it makes sense for the decompiler to guess; in others, it really has insufficient information.

For you to recompile the code produced by the decompiler, you'll have to explicitly supply some bits of information that were lost. In particular, you'll need to add the #include's so that your compiler has information about types like "uint" and function definitions like "_isoc99_scanf". You'll also have to link against the library containing "scanf" as your executable was probably not statically linked.

Shout if any of this is unclear,
Dave

All 4 comments

Hi anwarmamat,

As you may know, compilation is an inherently lossy process. Information such as variable names and types are lost unless explicit steps are taken to preserve then in the compilation process, for example by requesting debug symbols. Once compiled, variables are basically just unnamed, unsized locations in the memory or register space.

The decompiler attempts to make some reasonable deductions regarding types based on how the variables are used in order to produced something resembling the original code. For instance, in your example, it concluded that the local variables local_14 and local_18, corresponding to b and a, were originally defined to be unsigned and signed integers. The type definitions were buried in the files you #include'd in your original code but that information is not present in the executable. In some cases, it makes sense for the decompiler to guess; in others, it really has insufficient information.

For you to recompile the code produced by the decompiler, you'll have to explicitly supply some bits of information that were lost. In particular, you'll need to add the #include's so that your compiler has information about types like "uint" and function definitions like "_isoc99_scanf". You'll also have to link against the library containing "scanf" as your executable was probably not statically linked.

Shout if any of this is unclear,
Dave

More follow-up (and something I didn't know, so thanks for asking this question!): if you choose File->ExportProgram from the CodeBrowser tool and select "Format: C/C++", the output file (by default in your home directory) does (at least on the sample I tested) write out all the typedef information, including function prototypes, that you'll need to re-compile. I.e. should obviate the need for adding a #include. Again, if the program wasn't statically linked originally, you'll need to link against the standard I/O lib for scanf, printf, etc., but...

(always new stuff to discover)

I can recompile the decompiled code after adding a header or a missing type etc. Can you show me a simple example that can be re-compiled without manually modifying the decompiled code?

I think you should always expect to have to do some manual work - that is to say, any example I could provide would likely be extremely contrived. In particular, there are known problems with the exported output emitting variable names the are not legal or not unique. Some of these are perhaps fixable, but again there are always losses in the compilation process that make a perfect decompilation unlikely.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lab313ru picture lab313ru  路  16Comments

ghost picture ghost  路  29Comments

yifanlu picture yifanlu  路  24Comments

tzizi picture tzizi  路  17Comments

rszibele picture rszibele  路  35Comments