| Questions | Answers
|------------------------------------------------------|--------------------
| OS/arch/bits (mandatory) | Ubuntu x86 64
| File format of the file you reverse (mandatory) | ELF
| Architecture/bits of the file (mandatory) | ARM
| r2 -v full output, not truncated (mandatory) | radare2 4.5.0-git 24948 @ linux-x86-64 git.4.4.0-429-ga933ba8be
commit: a933ba8bebab7c97b8ffdb56ee8bb5394cfbab2e build: 2020-07-16__11:05:33
providing incorrect information even about plt entries, e.g.
$ r2 test
-- This is an unregistered copy.
[0x00010a38]> is | grep read
482 0x00000f40 0x00010f40 GLOBAL FUNC 176 spec_read
490 0x00000ff0 0x00010ff0 GLOBAL FUNC 220 spec_fread
538 0x00005008 0x00015008 GLOBAL FUNC 72 BZ2_bzread
20 0x000006a8 0x000106a8 GLOBAL FUNC 16 imp.read
but read@plt is actually at 0x106ac not 0x106a8 (off by 4 error)
$ objdump -d test | grep read@plt
000106ac <read@plt>:
10e8a: f7ff fc0f bl 106ac <read@plt>
(also confirmed independently in Ghidra)
Originally reported in https://github.com/BinaryAnalysisPlatform/bap/pull/1174#issuecomment-659531768
IDA Pro 7.5 shows this:
.plt:000106A0 CODE32
.plt:000106A0
.plt:000106A0 ; =============== S U B R O U T I N E =======================================
.plt:000106A0
.plt:000106A0 ; Attributes: thunk
.plt:000106A0
.plt:000106A0 ; int strtol(const char *nptr, char **endptr, int base)
.plt:000106A0 strtol ; CODE XREF: j_strtol↑j
.plt:000106A0 ADR R12, 0x106A8
.plt:000106A4 ADD R12, R12, #0x1B000
.plt:000106A8 LDR PC, [R12,#(strtol_ptr - 0x2B6A8)]! ; __imp_strtol
.plt:000106A8 ; End of function strtol
.plt:000106A8
.plt:000106AC CODE16
.plt:000106AC
.plt:000106AC ; =============== S U B R O U T I N E =======================================
.plt:000106AC
.plt:000106AC ; Attributes: thunk
.plt:000106AC
.plt:000106AC ; ssize_t j_read(int fd, void *buf, size_t nbytes)
.plt:000106AC j_read ; CODE XREF: spec_load+52↓p
.plt:000106AC BX PC
.plt:000106AC ; ---------------------------------------------------------------------------
.plt:000106AE ALIGN 4
.plt:000106AE ; End of function j_read
.plt:000106AE
.plt:000106B0 CODE32
.plt:000106B0
.plt:000106B0 ; =============== S U B R O U T I N E =======================================
.plt:000106B0
.plt:000106B0 ; Attributes: thunk
.plt:000106B0
.plt:000106B0 ; ssize_t read(int fd, void *buf, size_t nbytes)
.plt:000106B0 read ; CODE XREF: j_read↑j
.plt:000106B0 ADR R12, 0x106B8
.plt:000106B4 ADD R12, R12, #0x1B000
.plt:000106B8 LDR PC, [R12,#(read_ptr - 0x2B6B8)]! ; __imp_read
.plt:000106B8 ; End of function read
.plt:000106B8
.plt:000106BC CODE16
See also:
@HoundThe by the way, when I open the attached file with ASAN enabled there is a violation in DWARF code, please take a look:
[i] ℤ ASAN_OPTIONS=detect_odr_violation=0 r2 test 10:55:22
=================================================================
==1136700==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6270000308fb at pc 0x7f4d9788fa15 bp 0x7fff35f67560 sp 0x7fff35f67550
READ of size 1 at 0x6270000308fb thread T0
#0 0x7f4d9788fa14 in parse_ext_opcode /home/radare/radare2/libr/bin/dwarf.c:698
#1 0x7f4d978935f2 in parse_opcodes /home/radare/radare2/libr/bin/dwarf.c:970
#2 0x7f4d97893f13 in parse_line_raw /home/radare/radare2/libr/bin/dwarf.c:1048
#3 0x7f4d978a5d69 in r_bin_dwarf_parse_line /home/radare/radare2/libr/bin/dwarf.c:2198
#4 0x7f4d99d63726 in bin_dwarf /home/radare/radare2/libr/core/cbin.c:919
#5 0x7f4d99d8be19 in r_core_bin_info /home/radare/radare2/libr/core/cbin.c:4100
#6 0x7f4d99d57694 in r_core_bin_set_env /home/radare/radare2/libr/core/cbin.c:220
#7 0x7f4d99c5721c in r_core_file_do_load_for_io_plugin /home/radare/radare2/libr/core/cfile.c:441
#8 0x7f4d99c59b39 in r_core_bin_load /home/radare/radare2/libr/core/cfile.c:651
#9 0x7f4d91c262f4 in r_main_radare2 /home/radare/radare2/libr/main/radare2.c:1089
#10 0x5579c8087713 in main /home/radare/radare2/binr/radare2/radare2.c:96
#11 0x7f4d9107a041 in __libc_start_main (/lib64/libc.so.6+0x27041)
#12 0x5579c80871ad in _start (/home/radare/radare2/binr/radare2/radare2+0x21ad)
0x6270000308fb is located 0 bytes to the right of 12283-byte region [0x62700002d900,0x6270000308fb)
allocated by thread T0 here:
#0 0x7f4da1037837 in __interceptor_calloc (/lib64/libasan.so.6+0xb0837)
#1 0x7f4d978a5be9 in r_bin_dwarf_parse_line /home/radare/radare2/libr/bin/dwarf.c:2183
#2 0x7f4d99d63726 in bin_dwarf /home/radare/radare2/libr/core/cbin.c:919
#3 0x7f4d99d8be19 in r_core_bin_info /home/radare/radare2/libr/core/cbin.c:4100
#4 0x7f4d99d57694 in r_core_bin_set_env /home/radare/radare2/libr/core/cbin.c:220
#5 0x7f4d99c5721c in r_core_file_do_load_for_io_plugin /home/radare/radare2/libr/core/cfile.c:441
#6 0x7f4d99c59b39 in r_core_bin_load /home/radare/radare2/libr/core/cfile.c:651
#7 0x7f4d91c262f4 in r_main_radare2 /home/radare/radare2/libr/main/radare2.c:1089
#8 0x5579c8087713 in main /home/radare/radare2/binr/radare2/radare2.c:96
#9 0x7f4d9107a041 in __libc_start_main (/lib64/libc.so.6+0x27041)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/radare/radare2/libr/bin/dwarf.c:698 in parse_ext_opcode
Shadow bytes around the buggy address:
0x0c4e7fffe0c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c4e7fffe0d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c4e7fffe0e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c4e7fffe0f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c4e7fffe100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c4e7fffe110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]
0x0c4e7fffe120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4e7fffe130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4e7fffe140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4e7fffe150: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4e7fffe160: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1136700==ABORTING
@HoundThe please see parse_ext_opcode . I have hidden @XVilka comment to keep this about just one issue.
From an initial analysis, it seems there are 2 issues with this file:
1) get_import_addr_arm in elf.c incorrectly computes the plt entry address.
case R_ARM_JUMP_SLOT: {
plt_addr += pos * 12 + 20;
This computation is only right in some cases, apparently, but it is not ok in the test binary
2) radare2 assign a 32bit hint at each sym.imp symbol, while in this case it should be 16bit code.
@ret2libc i will take a look at it this afternoon.
I don't know why the plt entrie size is not a const value.
One solution could be search for matching byte. (We don't need to match the whole binary we can assume the number of plt entries with
[0x00010a38]> /x 00f9bce5:00ffffff # we should use /x 00f8bce5:00f8ffff or some kind of signature
Searching 4 bytes in [0x2cc78-0x2dd28]
hits: 0
Searching 4 bytes in [0x2bf04-0x2cc78]
hits: 0
Searching 4 bytes in [0x10000-0x1b644]
hits: 11
0x00010698 hit0_0 74f9bce5
0x000106a8 hit0_1 68f9bce5
0x000106b8 hit0_2 5cf9bce5
0x000106c8 hit0_3 50f9bce5
0x000106d8 hit0_4 44f9bce5
0x000106e8 hit0_5 38f9bce5
0x000106f8 hit0_6 2cf9bce5
0x00010708 hit0_7 20f9bce5
0x00010718 hit0_8 14f9bce5
0x00010728 hit0_9 08f9bce5
0x00010734 hit0_10 00f9bce5
[0x00010a38]> s section..plt
[0x0001067c]> pd 20
;-- section..plt:
;-- .plt:
0x0001067c 04e02de5 str lr, [sp, -4]! ; [12] -r-x section size 372 named .plt
0x00010680 04e09fe5 ldr lr, [0x0001068c] ; [0x1068c:4]=0x1b974
0x00010684 0ee08fe0 add lr, pc, lr
0x00010688 08f0bee5 ldr pc, [lr, 8]!
0x0001068c 74b90100 andeq fp, r1, r4, ror sb
;-- raise:
0x00010690 00c68fe2 add ip, pc, 0, 12
0x00010694 1bca8ce2 add ip, ip, 0x1b000
;-- hit0_0:
0x00010698 74f9bce5 ldr pc, [ip, 0x974]!
;-- strtol:
0x0001069c 7847c046 uxtab16mi r4, r0, r8, ror 8
0x000106a0 00c68fe2 add ip, pc, 0, 12
0x000106a4 1bca8ce2 add ip, ip, 0x1b000
;-- read:
;-- hit0_1:
0x000106a8 68f9bce5 ldr pc, [ip, 0x968]!
0x000106ac 7847 bx pc
0x000106ae c046 mov r8, r8
0x000106b0 00c68fe2 add ip, pc, 0, 12
;-- free:
0x000106b4 1bca8ce2 add ip, ip, 0x1b000
;-- hit0_2:
0x000106b8 5cf9bce5 ldr pc, [ip, 0x95c]!
0x000106bc 7847 bx pc
0x000106be c046 mov r8, r8
;-- memcpy:
0x000106c0 00c68fe2 add ip, pc, 0, 12
get_impot_addr(pos: 0) => section..plt + 20
get_impot_addr(pos: 1) => hit0_0 + 4
...
get_impot_addr(pos: x) => hit0_x + 4
It is an ugly (and way slower). But it could work.
Maybe we could introduce a signature?
add ip, pc, 0, 12
add ip, ip, 0x1b000
ldr pc, [ip, ...]!
The match + 0xc
@ret2libc Any idea?
That looks like a very hacky solution :( I would prefer to understand whether there is something better that we can use to distinguish between the different kind of plt entries. Have you seen what IDA and other tools return? Have you looked at ARM/ELF specs?
I think, it could be interesting to look into how binutils handle variable-length PLT entries in the thumb mode.
That looks like a very hacky solution :(
Indeed it is very hacky
I would prefer to understand whether there is something better that we can use to distinguish between the different kind of plt entries.
I didn't find any info in the ARM-spec
Do you have any idea where the sample come from?
I think, it could be interesting to look into how binutils handle variable-length PLT entries in the thumb mode.
I will look at this. Ty
Ghidra match the plt entry with the got entry during the analysis. I stop there because the code base of ghidra is way too big.
For binutils, i didn't found any interesting peace of code.
For binutils, i didn't found any interesting peace of code.
Binutils are doing the same but I wouldn't pretend that I understand all peculartities of the code.
Okay, i believe the main problem is that we link the each plt block with one got entry during the elf loading and not during an analysis.
For me the best solution is create a new analysis command which will associate each plt block with on got entry.
The idea is:
we ignore the plt "header"
# --------------------
0x00010690 00c68fe2 add ip, pc, 0, 12
0x00010694 1bca8ce2 add ip, ip, 0x1b000
0x00010698 74f9bce5 ldr pc, [ip, 0x974]!
# --------------------
0x0001069c 7847c046 uxtab16mi r4, r0, r8, ror 8
0x000106a0 00c68fe2 add ip, pc, 0, 12
0x000106a4 1bca8ce2 add ip, ip, 0x1b000
0x000106a8 68f9bce5 ldr pc, [ip, 0x968]!
# --------------------
;-- reloc.raise:
0x0002c00c .dword 0x0001067c ; section..plt ; sym..plt ; pc ; r15; RELOC 32 raise // replace with 0xdeadbeef
;-- reloc.strtol:
0x0002c010 .dword 0x0001067c ; section..plt ; sym..plt ; pc ; r15; RELOC 32 strtol // replace 0xboobface
0xdeadbeef -> raise
0xboobface -> strtol
This is not complex. But there is some trap.
The first one is identify which got entry is associated with a plt relocation.
Indeed we need to modify only the entry link with the plt.
PS:
For me the ida and ghidra result is strange indeed
the addresses 0x000106ac and 0x000106ae are valid entry point but we can't know which one will be used by the program.
With this new analysis we should have valid result and maybe some warning when the pc is not valid (pc != specific addresses example addresses => 0x000106ac)
I am not sure if i express my idea correctly. so don't hesitate to ask question.
The way how we are trying to resolve these entries in BAP, is by doing a very lightweight constant folding on:
0x00010690 00c68fe2 add ip, pc, 0, 12
0x00010694 1bca8ce2 add ip, ip, 0x1b000
0x00010698 74f9bce5 ldr pc, [ip, 0x974]!
which is just pc+0x1b000 + 0x974 => 2c00c, or, more generally plt.got.address = pc+base+offset, where base is invariant in the binary (in fact, I see that it is consistent across different binaries compiled with the same compiler/abi) and offset differs for each plt entry and lands us into a correct plt.got address.
For the reference, this is how this plt entry looks in BAP:
.address 0x10690
00005f88: sub raise(raise_result)
0000614a: raise_result :: out u32 = R0
0000003c:
00000041: R12 := 0x10698
00000048: R12 := R12 + 0x1B000
00000051: #9 := R12
00000054: R12 := R12 + 0x974
00000058: call mem[#9 + 0x974, el]:u32 with noreturn
or, with a more aggressive optimization option:
00005f88: sub raise(raise_result)
0000614a: raise_result :: out u32 = R0
00000058: call mem[0x2C00C, el]:u32 with noreturn
So this 0x2C00C is the value that is stored in the offset field of the relocation entry that corresponds to the raise function.
@XVilka Does this script output do what you want?
PS:
format is [plt_addr] -> [got_addr] ~(maybe a little bit of hard coding for the offset of the first plt entry)~
~And yes i know the first line is false, i don't know how to solve that~
0x10690 -> 0x2c00c
0x1069c -> 0x2c010
0x106ac -> 0x2c014
0x106ae -> 0x2c014
0x106bc -> 0x2c018
0x106be -> 0x2c018
0x106cc -> 0x2c01c
0x106dc -> 0x2c020
0x106de -> 0x2c020
0x106ec -> 0x2c024
0x106ee -> 0x2c024
0x106fc -> 0x2c028
0x1070c -> 0x2c02c
0x1070e -> 0x2c02c
0x1071c -> 0x2c030
0x1071e -> 0x2c030
0x1072c -> 0x2c034
0x10738 -> 0x2c038
0x1068c -> 0x2c00c
0x10794 -> 0x2c050
0x10796 -> 0x2c050
0x107a4 -> 0x2c054
0x107a6 -> 0x2c054
0x107b4 -> 0x2c058
0x107b6 -> 0x2c058
0x107c4 -> 0x2c05c
0x107c6 -> 0x2c05c
0x107d4 -> 0x2c060
0x107e0 -> 0x2c064
0x107e2 -> 0x2c064
0x10774 -> 0x2c048
0x10776 -> 0x2c048
0x10784 -> 0x2c04c
0x10786 -> 0x2c04c
0x10754 -> 0x2c040
0x10756 -> 0x2c040
0x10764 -> 0x2c044
0x10766 -> 0x2c044
@trufae since you are working on ARM relocs - you might be interested in this problem as well.