I am hoping to use Ghidra as a hobbyist to reverse engineer parts of GBA games for modding. Unfortunately, they make extensive use of what is technically unpredictable behaviour: using the second half of a "bl" instruction pair in THUMB code to perform an absolute jump to the address in the link register.
It would be really useful for THUMB disassembly to recognise and support this, maybe as an optional feature as there is no guarantee it works on other processors. The processor in question is the ARM7TDMI, with architecture ARMv4T, little-endian.
Sample of affected code:
0d 4b ldr r3, =procAddr
9e 46 mov lr, r3
00 f8 bl #0
Details of this instruction pair are found on page A7-26 of the ARM Architecture Reference Manual.
should that be a BX LR and not a bl #0? also I can't get binja or capstone to give me the same instructions for those bytes... are you sure the sample is correct?
Definitely the second half of a bl pair. BL's instruction format on THUMB is 111H HOOO OOOO OOOO where HH is 11 for the second of the pair, and O is bits 11:1 of the offset for the relative branch-with-link; in this case, the #0 part of the instruction I quoted. Putting that together, you get F800 or 00 F8 when split into separate bytes for a little-endian binary.
As for the sample, I'm looking at it here in ghidra and it's definitely correct. One possible reason you aren't getting the same instructions is that this is from a little-endian binary. Attached is a screenshot of another piece of affected code in ghida, without the instructions in between removed.

Edit: BX LR would be 4770 or 70 47 as separate bytes in byte order
This turns out to be a nice exercise in SLEIGH programming, so here we go:
diff --git a/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc b/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
index ea89edd..fb9f9a1 100644
--- a/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
+++ b/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
@@ -100,6 +100,7 @@ define token instrThumb (16)
soffset8=(0,7) signed
offset10=(0,9)
offset10S=(10,10)
+ offset11=(0,10)
soffset11=(0,10) signed
offset12=(0,11)
@@ -1383,6 +1384,19 @@ with : ARMcondCk=1 {
@endif # VERSION_5
+# half-bl, used in GBA games, technically unpredictable behaviour
+ThHalfBlOffset11: "" is offset11=0 { export 0:4; }
+ThHalfBlOffset11: "+"reloc is offset11 [ reloc = offset11 << 1; ] { export reloc; }
+
+:bl^ItCond "lr" ThHalfBlOffset11 is TMode=1 & ItCond & (op11=0x1f & ThHalfBlOffset11)
+{
+ build ItCond;
+ local tmp = lr + ThHalfBlOffset11;
+ lr = inst_next|1;
+ SetThumbMode(1);
+ call [tmp];
+}
+
:bl^ItCond ThAddr24 is TMode=1 & ItCond & (op11=0x1e; part2c1415=3 & part2c1212=1) & ThAddr24
{
build ItCond;
If you apply the patch and restart Ghidra, it should automatically recompile the SLEIGH specification.
I confirmed this disassembled the half-br instruction properly on a test binary, and the decompiler picked it up as a call instruction as expected. (I'm not 100% sure I got the non-zero case right - but hopefully you don't run into any of those?)
Thank you so much for that patch. I've managed to tweak it to create a new processor variant so that the changes don't escape outside the GBA version. I don't know how to suitably create a patch for this so I'll list the steps I used in detail in case it helps anyone else: Update: patch now included.
diff --git a/Ghidra/Processors/ARM/data/languages/ARM.ldefs b/Ghidra/Processors/ARM/data/languages/ARM.ldefs
index f98077e..1440716 100644
--- a/Ghidra/Processors/ARM/data/languages/ARM.ldefs
+++ b/Ghidra/Processors/ARM/data/languages/ARM.ldefs
@@ -243,6 +243,22 @@
<external_name tool="IDA-PRO" name="arm"/>
<external_name tool="DWARF.register.mapping.file" name="ARM.dwarf"/>
</language>
+
+ <language processor="ARM"
+ endian="little"
+ size="32"
+ variant="v4t_gba"
+ version="1.101"
+ slafile="ARM4t_gba.sla"
+ processorspec="ARMt_v45.pspec"
+ manualindexfile="../manuals/ARM.idx"
+ id="ARM:LE:32:v4t">
+ <description>ARM/Thumb v4 little endian (GBA variant)</description>
+ <compiler name="default" spec="ARM_v45.cspec" id="default"/>
+ <external_name tool="gnu" name="armv4t"/>
+ <external_name tool="IDA-PRO" name="arm"/>
+ <external_name tool="DWARF.register.mapping.file" name="ARM.dwarf"/>
+ </language>
<language processor="ARM"
endian="big"
diff --git a/Ghidra/Processors/ARM/data/languages/ARM4t_gba.slaspec b/Ghidra/Processors/ARM/data/languages/ARM4t_gba.slaspec
new file mode 100644
index 0000000..d316c85
--- /dev/null
+++ b/Ghidra/Processors/ARM/data/languages/ARM4t_gba.slaspec
@@ -0,0 +1,7 @@
+
+@define ENDIAN "little"
+@define T_VARIANT ""
+@define GBA_VARIANT ""
+
+@include "ARM.sinc"
+
diff --git a/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc b/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
index ea89edd..84d9563 100644
--- a/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
+++ b/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
@@ -100,6 +100,7 @@ define token instrThumb (16)
soffset8=(0,7) signed
offset10=(0,9)
offset10S=(10,10)
+ offset11=(0,10)
soffset11=(0,10) signed
offset12=(0,11)
@@ -1383,6 +1384,19 @@ with : ARMcondCk=1 {
@endif # VERSION_5
+@if defined(GBA_VARIANT)
+ThHalfBlOffset11: reloc is offset11 [ reloc = offset11 << 1; ] { export reloc; }
+
+:bl^ItCond lr",#"ThHalfBlOffset11 is TMode=1 & ItCond & op11=0x1f & ThHalfBlOffset11 & lr
+{
+ build ItCond;
+ local tmp = lr + ThHalfBlOffset11;
+ lr = inst_next|1;
+ SetThumbMode(1);
+ call [tmp];
+}
+@endif # GBA_VARIANT
+
:bl^ItCond ThAddr24 is TMode=1 & ItCond & (op11=0x1e; part2c1415=3 & part2c1212=1) & ThAddr24
{
build ItCond;
I'm leaving this issue open in case this is a desired modification when Ghidra becomes open source.
Try this method to see if this works for you. This will enable the Half BL support until V6 of ARM when it became invalid. This way you don't need a new variant. Also the other BL half variants are added. If you can give this a try in the wild, I'd appreciate it.
diff --git a/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc b/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
index d335ae0..a9c655d 100644
--- a/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
+++ b/Ghidra/Processors/ARM/data/languages/ARMTHUMBinstructions.sinc
@@ -100,6 +100,7 @@
soffset8=(0,7) signed
offset10=(0,9)
offset10S=(10,10)
+ offset11=(0,10)
soffset11=(0,10) signed
offset12=(0,11)
@@ -1402,6 +1403,33 @@
call ThAddr24;
}
+@ifndef VERSION_6T2
+
+:bl^ItCond "#"^off is TMode=1 & ItCond & op11=0x1e & soffset11 [ off = inst_start + 4 + (soffset11 << 12); ]
+{
+ build ItCond;
+ lr = off:4;
+}
+
+:bl^ItCond "#"^off is TMode=1 & ItCond & op11=0x1f & offset11 [ off = offset11 << 1; ]
+{
+ build ItCond;
+ local dest = lr + off:4;
+ lr = inst_next|1;
+ SetThumbMode(1);
+ goto [dest];
+}
+
+:blx^ItCond "#"^off is TMode=1 & ItCond & op11=0x1d & offset11 & thc0000=0 [ off = offset11 << 1; ]
+{
+ build ItCond;
+ local dest = (lr & (~0x3)) + off:4;
+ lr = inst_next|1;
+ SetThumbMode(0);
+ call [dest];
+}
+@endif
+
:bl^ItCond ThAddr24 is TMode=1 & CALLoverride=1 & ItCond & (op11=0x1e; part2c1415=3 & part2c1212=1) & ThAddr24
{
build ItCond;
Most helpful comment
Thank you so much for that patch. I've managed to tweak it to create a new processor variant so that the changes don't escape outside the GBA version.
I don't know how to suitably create a patch for this so I'll list the steps I used in detail in case it helps anyone else:Update: patch now included.I'm leaving this issue open in case this is a desired modification when Ghidra becomes open source.