Ghidra: ARM ASM C doesn;t understand indirect loads

Created on 7 Jun 2020  路  5Comments  路  Source: NationalSecurityAgency/ghidra

Describe the bug
Given

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined FUN_08002cf4()
                               assume LRset = 0x0
                               assume TMode = 0x1
             undefined         r0:1           <RETURN>
             undefined4        Stack[-0x8]:4  local_8                                 XREF[1]:     08002d40(*)  
                             FUN_08002cf4                                    XREF[1]:     MAIN:08000af2(c)  
        08002cf4 80 b5           push       { r7, lr }
        08002cf6 6f 46           mov        r7,sp
        08002cf8 ff f7 d4 ff     bl         FUN_08002ca4                                     undefined FUN_08002ca4()
        08002cfc 01 06           lsl        r1,r0,#0x18
        08002cfe 08 16           asr        r0,r1,#0x18
        08002d00 00 28           cmp        r0,#0x0
        08002d02 1d d0           beq        LAB_08002d40
        08002d04 05 49           ldr        r1,[PTR_PTR_sActiveInputState_08002d1c]          = 08b857f8
        08002d06 08 68           ldr        r0,[r1,#0x0]=>PTR_sActiveInputState_08b857f8     = 02024c78
        08002d08 81 88           ldrh       r1,[r0,#0x4]=>sActiveInputState_02024c78.raw     = null
        08002d0a 05 48           ldr        r0,[INT_08002d20]                                = 303h
        08002d0c 81 42           cmp        r1,r0
        08002d0e 09 d1           bne        LAB_08002d24
        08002d10 ff f7 bc ff     bl         FUN_08002c8c                                     undefined FUN_08002c8c()
        08002d14 fe 20           mov        r0,#0xfe
        08002d16 bc f0 93 fe     bl         GBABIOS_RESET                                    void GBABIOS_RESET(void)
        08002d1a 11 e0           b          LAB_08002d40
                             PTR_PTR_sActiveInputState_08002d1c              XREF[1]:     FUN_08002cf4:08002d04(R)  
        08002d1c f8 57 b8 08     sActiveI   PTR_sActiveInputState_08b857f8                   = 02024c78
                             INT_08002d20                                    XREF[1]:     FUN_08002cf4:08002d0a(R)  
        08002d20 03 03 00 00     int        303h

Note the 08002d0a 05 48 ldr r0,[INT_08002d20] = 303h shows 303h while the C code for this line is if ((uint)(*PTR_PTR_sActiveInputState_08002d1c)->raw == INT_08002d20) and it should be if ((uint)(*PTR_PTR_sActiveInputState_08002d1c)->raw == 0x303)
To Reproduce
you basically need some arm 4t thumb code, the above is little endian that uses the 48 opcode form, or anything indirect, its not just this one case, its the whole C code from it. Assigning pointers, values, compares the C code show the label at the address and not the value at it.

Expected behavior
either if ((uint)(*PTR_PTR_sActiveInputState_08002d1c)->raw == 0x303)
or

#define UINT_08002d20 0x303
if ((uint)(*PTR_PTR_sActiveInputState_08002d1c)->raw == INT_08002d20)

Screenshots
If applicable, add screenshots to help explain your problem.

Attachments
If applicable, please attach any files that caused problems or log files generated by the software.

Environment (please complete the following information):

  • OS: Windows 10
  • Java Version: 11.0.6
  • Ghidra Version: 9.1.2

Additional context
Add any other context about the problem here.

Most helpful comment

If you're expecting to be able to recompile the output of the decompiler, you're going to be disappointed. It's an aid to understanding, and a starting point for reconstructing code manually if necessary. It's unlikely that round-tripping will ever be practical, as far too much information is lost.

All 5 comments

I think I had this problem before and I solved it by marking the respective code section non-writable.

In my case that totally made sense since if the code section is writable, values in the literal pool could in theory be non-constant.

turning of W does change the code.

However if it was in RAM it should be *INT_08002d20 right? as its the value stored at the label not the value of the label.

Even if it's in RAM INT_08002d20 should be right, since INT_08002d20 is not a pointer. It refers to the variable in your literal pool. If the code would actually use the address of that literal variable, it'd refer to it as &INT_08002d20.

Initially I was also confused by the way Ghidra silently handles it, but it does make sense.

Are Literal Pools constant? If this is referencing a literal because it writable then it would be wrong? In that its treating it as a variable?

So it's

static int INT_08002d20 = 0x303
if ((uint)(*PTR_PTR_sActiveInputState_08002d1c)->raw == INT_08002d20)

While literals might be what the compiler makes for the platform, C has no concept of them. And they don't get exported. So when you export the code, it won't compile because all the "INT_08b924bc" etc are not defined anywhere. Nor can you copy and compile the code in the code window to test it, without then manually defining all the values it references.

If you're expecting to be able to recompile the output of the decompiler, you're going to be disappointed. It's an aid to understanding, and a starting point for reconstructing code manually if necessary. It's unlikely that round-tripping will ever be practical, as far too much information is lost.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

forkoz picture forkoz  路  3Comments

Kerilk picture Kerilk  路  3Comments

marcushall42 picture marcushall42  路  3Comments

ghost picture ghost  路  3Comments

0x6d696368 picture 0x6d696368  路  3Comments