ghidra_9.1-BETA_DEV (and also 9.0.4)
if i open up a simple 16 bit dos exe (build with nasm assembler and ulink) Ghidra doesn't detect it as Old-Style DOS Exe
The exe is working and correctly assembled - checked with dosbox debugger and IDA Pro
(also tested with several other assemblers - its a linker thing)
single.asm
; build with:
; nasm.exe (https://www.nasm.us/pub/nasm/releasebuilds/2.14.02/)
; ulink.exe (ftp://ftp.styx.cabel.net/pub/UniLink/)
;
; nasm.exe -f obj -o single.obj single.asm
; unlink.exe single.obj
BITS 16
segment seg000 align=16
text: db 'Hello World!',0ah,0dh,'$'
segment seg001 align=16
..start:
mov ax,seg000
mov ds,ax
push ax
pop ax
call far print
mov ax,0x4c00
int 0x21
segment seg002 align=16
print:
mov dx,text
mov ah,9
int 0x21
retf
segment seg003 stack
resb 256
checked it with serveral Linkers:
wlink.exe: Open Watcom Linker Version 2.0 beta Sep 13 2019 01:44:55 (64-bit)
link.exe: Microsoft (R) Segmented Executable Linker Version 5.60.339 Dec 5 1994
optlink.exe: OPTLINK (R) for Win32 Release 8.00.17 (from the dmd package: dmd.2.088.0.windows)
ulink.exe: UniLink v1.11 [beta] (build 11.27) from ftp://ftp.styx.cabel.net/pub/UniLink/
all exes except the ulink.exe linked exe getting detected as Old-Style DOS Exe
the only real difference is a "UniLink" string between the header and relocation table
IDA Pro detects all of them as DOS MZ Executables
optlink.single.exe -> detected as Old-Style DOS Exe
exe_header:
signature: MZ
bytes_in_last_block: 0x0068
blocks_in_file: 0x0001
num_relocs: 0x0002
header_paragraphs: 0x0003
min_extra_paragraphs: 0x0010
max_extra_paragraphs: 0xffff
ss:sp: 0x0004:0x0100
checksum: 0x0000
cs:ip: 0x0001:0x0000
reloc_table_offset: 0x001e
overlay_number: 0x0000
data between header and relocation table:
00000000 00 00 ..
relocation_table:
0 0x0001:0x000A
1 0x0001:0x0001
ulink.single.exe -> detected as Raw binary
exe_header:
signature: MZ
bytes_in_last_block: 0x0088
blocks_in_file: 0x0001
num_relocs: 0x0002
header_paragraphs: 0x0005
min_extra_paragraphs: 0x0011
max_extra_paragraphs: 0xffff
ss:sp: 0x0004:0x0100
checksum: 0x0000
cs:ip: 0x0001:0x0000
reloc_table_offset: 0x0040
overlay_number: 0x0000
data between header and relocation table:
00000000 55 6E 69 4C 69 6E 6B 00 00 00 00 00 00 00 00 00 UniLink.........
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020 00 00 00 00 ....
relocation table:
0 0x0001:0x0001
1 0x0001:0x000A
nasm sample dos exes build with serveral linkers:
http://s000.tinyupload.com/?file_id=77824670479507329081
data between header and relocation table by linker:
optlink.exe: 00 00
link.exe: 01 00
wlink.exe: 00 00 00 00
ulink.exe:
00000000 55 6E 69 4C 69 6E 6B 00 00 00 00 00 00 00 00 00 UniLink.........
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020 00 00 00 00 ....
I don't know why that binary is not detected (tried it myself and it really doesn't work, maybe you should try with 9.0.4 just in case) but MS-DOS binaries certainly work, you can even find many issues here related to x86 in 16 bit mode.
I just wanted to clarify that the only one not working is ulink, correct?
The problem is here:
https://github.com/NationalSecurityAgency/ghidra/blob/208433c9f7b2e4af8cd26d0b757ecb74a37e8f07/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/MzLoader.java#L72-L74
and here:
https://github.com/NationalSecurityAgency/ghidra/blob/208433c9f7b2e4af8cd26d0b757ecb74a37e8f07/Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/mz/DOSHeader.java#L255-L261
ulink has a e_lfarlc value of 0x40
private short [] e_res = new short[4]; // Reserved words
that is wrong - there is nothing that prevents a linker to use more or less then 4 shorts, its not fixed and depends only on the value of e_lfarlc, as you can see 4 linkers 3 different sizes
according to http://bytepointer.com/resources/win16_ne_exe_format_win3.0.htm
A new-format .EXE file is
identified if the segmented executable header contains a valid
signature. If the signature is not valid, the file is assumed to be an
old-style format .EXE file
NE-file samples:
http://justsolve.archiveteam.org/wiki/New_Executable (Exe Samples)
http://cd.textfiles.com/aztechmb/AZTECH.EXE (its a real NE file)
even valid PE-Files (according to IDA) are detected as NE files - but there is no valid NE-Header ('NE' is missing in Segmented Header) for example MSVC 1.5 link.exe
AZTECH.EXE (NE) and old MSVC 1.5 LINK.EXE (PE)
http://s000.tinyupload.com/?file_id=12663273413500823409
The main issue here is that
https://github.com/NationalSecurityAgency/ghidra/blob/208433c9f7b2e4af8cd26d0b757ecb74a37e8f07/Ghidra/Features/Base/src/main/java/ghidra/app/util/bin/format/mz/DOSHeader.java#L255-L261
isn't checking what's at 0x3c when e_lfarlc is 0x40. When it's0x40, we need to actually confirm that the NE signature is present before declaring it a new executable.
why to check for value 0x40 in 0x3c at all? the distance is a linker related thing, no standard according to the docs i've read, why not just get the value at offset 0x3c, check for 'NE', PE, PE32 or PE+ signature and if not its an Old-Style (exact as decribed in http://bytepointer.com/resources/win16_ne_exe_format_win3.0.htm)
https://github.com/zfigura/semblance/blob/cbdadd3cec26c686cc48782b9702151f522d66ce/src/dump.c#L28-L59
i don't know if you copied the header from here but its just not correct: https://blog.kowalczyk.info/articles/pefileformat.html
The word at offset 18h in the old-style .EXE header contains the
relative byte offset to the stub program's relocation table. If this
offset is 40h, then the double word at offset 3Ch is assumed to be the
relative byte offset from the beginning of the file to the beginning
of the segmented executable header. A new-format .EXE file is
identified if the segmented executable header contains a valid
signature. If the signature is not valid, the file is assumed to be an
old-style format .EXE file.
According to that, 0x40 is required for it to be a New Executable. However, we still need to go to the header and actually make sure the NE signature is there before saying it's a New Executable.
0x40: im not sure if that is a need, because of
A new-format .EXE file is
identified if the segmented executable header contains a valid
signature.
Perhaps it's not needed, but since I'm going to fix this for 9.1, I don't want to risk changing the behavior too much. I'm just going to add the additional check to address the problem while minimizing impact.
i would also add 'PE' to the check
if MZ
if NE
if PE
else Old-Style
even valid PE-Files (according to IDA) are detected as NE files - but there is no valid NE-Header ('NE' is missing in Segmented Header) for example MSVC 1.5 link.exe
must correct me: IDA detects the MSVC 1.5 link.exe wrongly as PE file, but its an Old-Style DOS Exe with Pharlab extender
http://s000.tinyupload.com/?file_id=12663273413500823409 (the PE.msvc1.5.LINK.EXE seems to be a Old-Style DOS Exe)