When building arm64 defconfig with ThinLTO and CFI, the final linking of vmlinux fails with following errors:
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_gic_400)
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_arm11mp_gic)
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_arm1176jzf_dc_gic)
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_cortex_a15_gic)
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_cortex_a9_gic)
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_cortex_a7_gic)
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_msm_8660_qgic)
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_msm_qgic2)
ld.lld: error: undefined hidden symbol: gic_of_init
>>> referenced by xarray.c
>>> vmlinux.o:(__of_table_pl390)
Makefile:1111: recipe for target 'vmlinux' failed
Looking at the source code, which is in drivers/irqchip/irq-gic.c, we have the following:
int __init
gic_of_init(struct device_node *node, struct device_node *parent)
{
...
}
IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init);
...
include/linux/irqchip.h:
#define IRQCHIP_DECLARE(name, compat, fn) OF_DECLARE_2(irqchip, name, compat, fn)
include/linux/of.h
#define _OF_DECLARE(table, name, compat, fn, fn_type) \
static const struct of_device_id __of_table_##name \
__attribute__((unused)) \
= { .compatible = compat, \
.data = (fn == (fn_type)NULL) ? fn : fn }
#define OF_DECLARE_2(table, name, compat, fn) \
_OF_DECLARE(table, name, compat, fn, of_init_fn_2)
Looking at drivers/irqchip/irq-gic.o, gic_of_init appears to be there still:
$ llvm-nm drivers/irqchip/irq-gic.o | grep gic_of_init
---------------- T gic_of_init
$ llvm-nm drivers/irqchip/irq-gic.o | grep gic_400
---------------- d __of_table_gic_400
However, things look different in vmlinux.o:
$ llvm-nm vmlinux.o | grep gic_of_init
U gic_of_init
0000000000000034 t gic_of_init
000000000000a380 t gic_of_init.cfi
000000000003f6ec T gic_of_init.cfi
...
The global gic_of_init is suddenly undefined and we have a static function with the same name, which is defined in drivers/irqchip/irq-gic-v3.c.
As an experiment, I renamed the static function in irq-gic-v3.c to gicv3_of_init and this fixed the compilation issue. Looks like a bug in ThinLTO?
$ llvm-nm vmlinux.o | grep gic_of_init
U gic_of_init
So an undefined reference, which is why the linker is complaining...
0000000000000034 t gic_of_init
but why t? IIRC, that means it exists in the .text section but has static linkage, so other object files cannot reference it.
I renamed the static function in irq-gic-v3.c to gicv3_of_init and this fixed the compilation issue.
Why would the name of the function have anything to do (unless there's some other reference to it or alias or something)? (also gic_of_init is not marked static based on your above snippet).
There are two functions named gic_of_init. One of them is a global in irq-gic.c, and the other one is a static function in irq-gic-v3.c. ThinLTO seems to drop the global one from vmlinux.o (but only when CFI is enabled) and renaming the static one to a different name fixes this.
That sounds like a bug with CFI enabled ThinLTO. Can you please file a bug upstream at bugs.llvm.org? Sounds like a minimal test case isn't too difficult to put together either.
This has been fixed in ToT LLVM at some point, but I would have to bisect it to find the exact commit.
OK, it compiles with ToT LLVM, but without the workaround patch, the kernel hangs at early boot. I'll have to take a closer look at what goes wrong.
So, this happens because the compiler doesn't know that the static function is used outside the compilation unit, and the kernel isn't being exactly open about it.
We have static structs containing pointers to static functions, stored in a specific data section that's read by a different part of the kernel that uses the addresses stored there to call the functions outside their compilation units.
While this works fine with ThinLTO alone, it breaks when CFI is enabled. Here's a reproducer:
refs.h:
#define __used __attribute__((__used__))
#define __section(S) __attribute__((__section__(#S)))
struct ref {
const void *data;
};
#define REF(name, fn) \
static const struct ref __ref_##name __used __section(.refs) = { \
.data = fn \
}
global.c:
#include "refs.h"
int func(int a)
{
return a + 1;
}
REF(global, func);
static.c:
#include "refs.h"
static int func(int a)
{
return a + 2;
}
REF(static, func);
refs.lds:
SECTIONS {
.func_refs : {
. = ALIGN(8);
__refs_start = .;
*(.refs);
__refs_end = .;
}
}
Imitating the kernel build process, we compile these as follows:
FILES=( global static )
for i in ${FILES[@]}; do
clang \
-flto=thin \
-fvisibility=hidden \
-fsanitize=cfi \
-fno-sanitize-cfi-canonical-jump-tables \
-fsanitize-cfi-cross-dso \
-c -o $i.o $i.c
done
ld.lld -r -o all.o ${FILES[@]/%/.o}
ld.lld -T refs.lds -o a.out all.o
Note that if we omit -fno-sanitize-cfi-canonical-jump-tables, this fails to compile with a similar error to the one in the first comment. However, with -fno-sanitize-cfi-canonical-jump-tables this compiles just fine.
Here's what the .func_refs section looks like in the end:
$ objdump -s -j .func_refs
a.out: file format elf64-x86-64
Contents of section .func_refs:
2078 40200000 00000000 40200000 00000000 @ ......@ ......
We would expect the section to contain the addresses of two different functions, but it contains the same address twice. Here are the relevant parts of the disassembly:
# This is the static function.
0000000000001000 <func>:
1000: 55 push %rbp
1001: 48 89 e5 mov %rsp,%rbp
1004: 89 7d fc mov %edi,-0x4(%rbp)
1007: 8b 45 fc mov -0x4(%rbp),%eax
100a: 83 c0 02 add $0x2,%eax
100d: 5d pop %rbp
100e: c3 retq
100f: cc int3
# And this is the global function.
0000000000001010 <func>:
1010: 55 push %rbp
1011: 48 89 e5 mov %rsp,%rbp
1014: 89 7d fc mov %edi,-0x4(%rbp)
1017: 8b 45 fc mov -0x4(%rbp),%eax
101a: 83 c0 01 add $0x1,%eax
101d: 5d pop %rbp
# Only one CFI jump table entry pointing to the static function.
0000000000002040 <func.cfi_jt>:
2040: e9 bb ef ff ff jmpq 1000 <func>
The compiler produced only one CFI jump table entry and used its address to replace both entries in the .func_refs section. This is also what happens in the kernel and we end up calling the wrong function during initialization.
As we can see from the reproducer, this bug is not arm64-specific, even though that's where we (first) ran into this in the kernel.
https://reviews.llvm.org/D67945 fixes Sami's reproducer above. Sami, could you please check whether this fixes the problem for the kernel?
Sami, could you please check whether this fixes the problem for the kernel?
Yes, this fixes the problem. Thanks!
Thanks for the confirmation, I've added a note to the commit message.
The LLVM fix landed as r373678.
I can confirm that ToT LLVM doesn't need the workaround anymore. Thank you, Peter!