Radare2: Elf - incorrect information about PLT entries

Created on 17 Jul 2020  Â·  18Comments  Â·  Source: radareorg/radare2

Work environment

| Questions | Answers
|------------------------------------------------------|--------------------
| OS/arch/bits (mandatory) | Ubuntu x86 64
| File format of the file you reverse (mandatory) | ELF
| Architecture/bits of the file (mandatory) | ARM
| r2 -v full output, not truncated (mandatory) | radare2 4.5.0-git 24948 @ linux-x86-64 git.4.4.0-429-ga933ba8be
commit: a933ba8bebab7c97b8ffdb56ee8bb5394cfbab2e build: 2020-07-16__11:05:33

Expected behavior

Actual behavior

providing incorrect information even about plt entries, e.g.

$ r2 test
 -- This is an unregistered copy.
[0x00010a38]> is | grep read
482  0x00000f40 0x00010f40 GLOBAL FUNC   176      spec_read
490  0x00000ff0 0x00010ff0 GLOBAL FUNC   220      spec_fread
538  0x00005008 0x00015008 GLOBAL FUNC   72       BZ2_bzread
20   0x000006a8 0x000106a8 GLOBAL FUNC   16       imp.read

but read@plt is actually at 0x106ac not 0x106a8 (off by 4 error)

$ objdump -d test  | grep read@plt
000106ac <read@plt>:
   10e8a:       f7ff fc0f       bl      106ac <read@plt>

(also confirmed independently in Ghidra)

Originally reported in https://github.com/BinaryAnalysisPlatform/bap/pull/1174#issuecomment-659531768

test.zip

IDA Pro 7.5 shows this:

.plt:000106A0                 CODE32
.plt:000106A0
.plt:000106A0 ; =============== S U B R O U T I N E =======================================
.plt:000106A0
.plt:000106A0 ; Attributes: thunk
.plt:000106A0
.plt:000106A0 ; int strtol(const char *nptr, char **endptr, int base)
.plt:000106A0 strtol                                  ; CODE XREF: j_strtol↑j
.plt:000106A0                 ADR     R12, 0x106A8
.plt:000106A4                 ADD     R12, R12, #0x1B000
.plt:000106A8                 LDR     PC, [R12,#(strtol_ptr - 0x2B6A8)]! ; __imp_strtol
.plt:000106A8 ; End of function strtol
.plt:000106A8
.plt:000106AC                 CODE16
.plt:000106AC
.plt:000106AC ; =============== S U B R O U T I N E =======================================
.plt:000106AC
.plt:000106AC ; Attributes: thunk
.plt:000106AC
.plt:000106AC ; ssize_t j_read(int fd, void *buf, size_t nbytes)
.plt:000106AC j_read                                  ; CODE XREF: spec_load+52↓p
.plt:000106AC                 BX      PC
.plt:000106AC ; ---------------------------------------------------------------------------
.plt:000106AE                 ALIGN 4
.plt:000106AE ; End of function j_read
.plt:000106AE
.plt:000106B0                 CODE32
.plt:000106B0
.plt:000106B0 ; =============== S U B R O U T I N E =======================================
.plt:000106B0
.plt:000106B0 ; Attributes: thunk
.plt:000106B0
.plt:000106B0 ; ssize_t read(int fd, void *buf, size_t nbytes)
.plt:000106B0 read                                    ; CODE XREF: j_read↑j
.plt:000106B0                 ADR     R12, 0x106B8
.plt:000106B4                 ADD     R12, R12, #0x1B000
.plt:000106B8                 LDR     PC, [R12,#(read_ptr - 0x2B6B8)]! ; __imp_read
.plt:000106B8 ; End of function read
.plt:000106B8
.plt:000106BC                 CODE16

See also:

ELF RBin bug high-priority test-required

All 18 comments

@HoundThe by the way, when I open the attached file with ASAN enabled there is a violation in DWARF code, please take a look:

[i] ℤ ASAN_OPTIONS=detect_odr_violation=0 r2 test                                                                                                                                                                                  10:55:22 
=================================================================
==1136700==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6270000308fb at pc 0x7f4d9788fa15 bp 0x7fff35f67560 sp 0x7fff35f67550
READ of size 1 at 0x6270000308fb thread T0
    #0 0x7f4d9788fa14 in parse_ext_opcode /home/radare/radare2/libr/bin/dwarf.c:698
    #1 0x7f4d978935f2 in parse_opcodes /home/radare/radare2/libr/bin/dwarf.c:970
    #2 0x7f4d97893f13 in parse_line_raw /home/radare/radare2/libr/bin/dwarf.c:1048
    #3 0x7f4d978a5d69 in r_bin_dwarf_parse_line /home/radare/radare2/libr/bin/dwarf.c:2198
    #4 0x7f4d99d63726 in bin_dwarf /home/radare/radare2/libr/core/cbin.c:919
    #5 0x7f4d99d8be19 in r_core_bin_info /home/radare/radare2/libr/core/cbin.c:4100
    #6 0x7f4d99d57694 in r_core_bin_set_env /home/radare/radare2/libr/core/cbin.c:220
    #7 0x7f4d99c5721c in r_core_file_do_load_for_io_plugin /home/radare/radare2/libr/core/cfile.c:441
    #8 0x7f4d99c59b39 in r_core_bin_load /home/radare/radare2/libr/core/cfile.c:651
    #9 0x7f4d91c262f4 in r_main_radare2 /home/radare/radare2/libr/main/radare2.c:1089
    #10 0x5579c8087713 in main /home/radare/radare2/binr/radare2/radare2.c:96
    #11 0x7f4d9107a041 in __libc_start_main (/lib64/libc.so.6+0x27041)
    #12 0x5579c80871ad in _start (/home/radare/radare2/binr/radare2/radare2+0x21ad)

0x6270000308fb is located 0 bytes to the right of 12283-byte region [0x62700002d900,0x6270000308fb)
allocated by thread T0 here:
    #0 0x7f4da1037837 in __interceptor_calloc (/lib64/libasan.so.6+0xb0837)
    #1 0x7f4d978a5be9 in r_bin_dwarf_parse_line /home/radare/radare2/libr/bin/dwarf.c:2183
    #2 0x7f4d99d63726 in bin_dwarf /home/radare/radare2/libr/core/cbin.c:919
    #3 0x7f4d99d8be19 in r_core_bin_info /home/radare/radare2/libr/core/cbin.c:4100
    #4 0x7f4d99d57694 in r_core_bin_set_env /home/radare/radare2/libr/core/cbin.c:220
    #5 0x7f4d99c5721c in r_core_file_do_load_for_io_plugin /home/radare/radare2/libr/core/cfile.c:441
    #6 0x7f4d99c59b39 in r_core_bin_load /home/radare/radare2/libr/core/cfile.c:651
    #7 0x7f4d91c262f4 in r_main_radare2 /home/radare/radare2/libr/main/radare2.c:1089
    #8 0x5579c8087713 in main /home/radare/radare2/binr/radare2/radare2.c:96
    #9 0x7f4d9107a041 in __libc_start_main (/lib64/libc.so.6+0x27041)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/radare/radare2/libr/bin/dwarf.c:698 in parse_ext_opcode
Shadow bytes around the buggy address:
  0x0c4e7fffe0c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4e7fffe0d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4e7fffe0e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4e7fffe0f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4e7fffe100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c4e7fffe110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]
  0x0c4e7fffe120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4e7fffe130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4e7fffe140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4e7fffe150: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4e7fffe160: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==1136700==ABORTING

@HoundThe please see parse_ext_opcode . I have hidden @XVilka comment to keep this about just one issue.

From an initial analysis, it seems there are 2 issues with this file:
1) get_import_addr_arm in elf.c incorrectly computes the plt entry address.

    case R_ARM_JUMP_SLOT: {
        plt_addr += pos * 12 + 20;

This computation is only right in some cases, apparently, but it is not ok in the test binary

2) radare2 assign a 32bit hint at each sym.imp symbol, while in this case it should be 16bit code.

@ret2libc i will take a look at it this afternoon.

I don't know why the plt entrie size is not a const value.
One solution could be search for matching byte. (We don't need to match the whole binary we can assume the number of plt entries with

[0x00010a38]> /x 00f9bce5:00ffffff # we should use /x 00f8bce5:00f8ffff or some kind of signature
Searching 4 bytes in [0x2cc78-0x2dd28]
hits: 0
Searching 4 bytes in [0x2bf04-0x2cc78]
hits: 0
Searching 4 bytes in [0x10000-0x1b644]
hits: 11
0x00010698 hit0_0 74f9bce5
0x000106a8 hit0_1 68f9bce5
0x000106b8 hit0_2 5cf9bce5
0x000106c8 hit0_3 50f9bce5
0x000106d8 hit0_4 44f9bce5
0x000106e8 hit0_5 38f9bce5
0x000106f8 hit0_6 2cf9bce5
0x00010708 hit0_7 20f9bce5
0x00010718 hit0_8 14f9bce5
0x00010728 hit0_9 08f9bce5
0x00010734 hit0_10 00f9bce5
[0x00010a38]> s section..plt
[0x0001067c]> pd 20
            ;-- section..plt:
            ;-- .plt:
            0x0001067c      04e02de5       str lr, [sp, -4]!           ; [12] -r-x section size 372 named .plt
            0x00010680      04e09fe5       ldr lr, [0x0001068c]        ; [0x1068c:4]=0x1b974
            0x00010684      0ee08fe0       add lr, pc, lr
            0x00010688      08f0bee5       ldr pc, [lr, 8]!
            0x0001068c      74b90100       andeq fp, r1, r4, ror sb
            ;-- raise:
            0x00010690      00c68fe2       add ip, pc, 0, 12
            0x00010694      1bca8ce2       add ip, ip, 0x1b000
            ;-- hit0_0:
            0x00010698      74f9bce5       ldr pc, [ip, 0x974]!
            ;-- strtol:
            0x0001069c      7847c046       uxtab16mi r4, r0, r8, ror 8
            0x000106a0      00c68fe2       add ip, pc, 0, 12
            0x000106a4      1bca8ce2       add ip, ip, 0x1b000
            ;-- read:
            ;-- hit0_1:
            0x000106a8      68f9bce5       ldr pc, [ip, 0x968]!
            0x000106ac      7847           bx pc
            0x000106ae      c046           mov r8, r8
            0x000106b0      00c68fe2       add ip, pc, 0, 12
            ;-- free:
            0x000106b4      1bca8ce2       add ip, ip, 0x1b000
            ;-- hit0_2:
            0x000106b8      5cf9bce5       ldr pc, [ip, 0x95c]!
            0x000106bc      7847           bx pc
            0x000106be      c046           mov r8, r8
            ;-- memcpy:
            0x000106c0      00c68fe2       add ip, pc, 0, 12

get_impot_addr(pos: 0) => section..plt + 20
get_impot_addr(pos: 1) => hit0_0 + 4
...
get_impot_addr(pos: x) => hit0_x + 4

It is an ugly (and way slower). But it could work.
Maybe we could introduce a signature?

add ip, pc, 0, 12
add ip, ip, 0x1b000
ldr pc, [ip, ...]!

The match + 0xc

@ret2libc Any idea?

That looks like a very hacky solution :( I would prefer to understand whether there is something better that we can use to distinguish between the different kind of plt entries. Have you seen what IDA and other tools return? Have you looked at ARM/ELF specs?

I think, it could be interesting to look into how binutils handle variable-length PLT entries in the thumb mode.

That looks like a very hacky solution :(
Indeed it is very hacky

I would prefer to understand whether there is something better that we can use to distinguish between the different kind of plt entries.

I didn't find any info in the ARM-spec
Do you have any idea where the sample come from?

I think, it could be interesting to look into how binutils handle variable-length PLT entries in the thumb mode.

I will look at this. Ty

Ghidra match the plt entry with the got entry during the analysis. I stop there because the code base of ghidra is way too big.
For binutils, i didn't found any interesting peace of code.

For binutils, i didn't found any interesting peace of code.

Binutils are doing the same but I wouldn't pretend that I understand all peculartities of the code.

Okay, i believe the main problem is that we link the each plt block with one got entry during the elf loading and not during an analysis.

For me the best solution is create a new analysis command which will associate each plt block with on got entry.

The idea is:

  1. identify each plt block:
we ignore the plt "header"
# --------------------
            0x00010690      00c68fe2       add ip, pc, 0, 12
            0x00010694      1bca8ce2       add ip, ip, 0x1b000
            0x00010698      74f9bce5       ldr pc, [ip, 0x974]!
# --------------------
            0x0001069c      7847c046       uxtab16mi r4, r0, r8, ror 8
            0x000106a0      00c68fe2       add ip, pc, 0, 12
            0x000106a4      1bca8ce2       add ip, ip, 0x1b000
            0x000106a8      68f9bce5       ldr pc, [ip, 0x968]!
# --------------------
  1. Fill the got with specific addresses
            ;-- reloc.raise:
            0x0002c00c      .dword 0x0001067c ; section..plt ; sym..plt ; pc ; r15; RELOC 32 raise // replace with 0xdeadbeef
            ;-- reloc.strtol:
            0x0002c010      .dword 0x0001067c ; section..plt ; sym..plt ; pc ; r15; RELOC 32 strtol // replace 0xboobface

0xdeadbeef -> raise
0xboobface -> strtol

This is not complex. But there is some trap.
The first one is identify which got entry is associated with a plt relocation.
Indeed we need to modify only the entry link with the plt.

  1. Emulate each block inside the plt
    we get back the pc reg and if pc == specific_address we do the association

PS:
For me the ida and ghidra result is strange indeed
the addresses 0x000106ac and 0x000106ae are valid entry point but we can't know which one will be used by the program.

With this new analysis we should have valid result and maybe some warning when the pc is not valid (pc != specific addresses example addresses => 0x000106ac)

I am not sure if i express my idea correctly. so don't hesitate to ask question.

The way how we are trying to resolve these entries in BAP, is by doing a very lightweight constant folding on:

            0x00010690      00c68fe2       add ip, pc, 0, 12
            0x00010694      1bca8ce2       add ip, ip, 0x1b000
            0x00010698      74f9bce5       ldr pc, [ip, 0x974]!

which is just pc+0x1b000 + 0x974 => 2c00c, or, more generally plt.got.address = pc+base+offset, where base is invariant in the binary (in fact, I see that it is consistent across different binaries compiled with the same compiler/abi) and offset differs for each plt entry and lands us into a correct plt.got address.

For the reference, this is how this plt entry looks in BAP:

.address 0x10690
00005f88: sub raise(raise_result)
0000614a: raise_result :: out u32 = R0
0000003c: 
00000041: R12 := 0x10698
00000048: R12 := R12 + 0x1B000
00000051: #9 := R12
00000054: R12 := R12 + 0x974
00000058: call mem[#9 + 0x974, el]:u32 with noreturn

or, with a more aggressive optimization option:

00005f88: sub raise(raise_result)
0000614a: raise_result :: out u32 = R0
00000058: call mem[0x2C00C, el]:u32 with noreturn

So this 0x2C00C is the value that is stored in the offset field of the relocation entry that corresponds to the raise function.

@XVilka Does this script output do what you want?

PS:
format is [plt_addr] -> [got_addr] ~(maybe a little bit of hard coding for the offset of the first plt entry)~
~And yes i know the first line is false, i don't know how to solve that~

0x10690 -> 0x2c00c
0x1069c -> 0x2c010
0x106ac -> 0x2c014
0x106ae -> 0x2c014
0x106bc -> 0x2c018
0x106be -> 0x2c018
0x106cc -> 0x2c01c
0x106dc -> 0x2c020
0x106de -> 0x2c020
0x106ec -> 0x2c024
0x106ee -> 0x2c024
0x106fc -> 0x2c028
0x1070c -> 0x2c02c
0x1070e -> 0x2c02c
0x1071c -> 0x2c030
0x1071e -> 0x2c030
0x1072c -> 0x2c034
0x10738 -> 0x2c038
0x1068c -> 0x2c00c
0x10794 -> 0x2c050
0x10796 -> 0x2c050
0x107a4 -> 0x2c054
0x107a6 -> 0x2c054
0x107b4 -> 0x2c058
0x107b6 -> 0x2c058
0x107c4 -> 0x2c05c
0x107c6 -> 0x2c05c
0x107d4 -> 0x2c060
0x107e0 -> 0x2c064
0x107e2 -> 0x2c064
0x10774 -> 0x2c048
0x10776 -> 0x2c048
0x10784 -> 0x2c04c
0x10786 -> 0x2c04c
0x10754 -> 0x2c040
0x10756 -> 0x2c040
0x10764 -> 0x2c044
0x10766 -> 0x2c044

@trufae since you are working on ARM relocs - you might be interested in this problem as well.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PaquitoRiviera picture PaquitoRiviera  Â·  7Comments

radare picture radare  Â·  8Comments

YugoCode picture YugoCode  Â·  6Comments

0ki picture 0ki  Â·  6Comments

MariasStory picture MariasStory  Â·  6Comments