Linux: multiple x86_64 lld boot failures

Created on 4 Aug 2020  ·  31Comments  ·  Source: ClangBuiltLinux/linux

Looking at the latest CI run, it looks like a bunch of x86_64 builds linked with LLD are no longer booting. Since this is across various trees, I wonder if this is a regression in LLD.

Needs Backport [ARCH] x86_64 [BUG] linux [FIXED][LINUX] 5.9 [TOOL] lld

All 31 comments

I was running a bisect on LLVM last night but I have not had a chance to check the results yet.

c41a18cf61790fc898dcda1055c3efbf442c14c0 is the first bad commit
commit c41a18cf61790fc898dcda1055c3efbf442c14c0
Author: Fangrui Song <[email protected]>
Date:   Sun Aug 2 23:05:50 2020 -0700

    [CMake] Default ENABLE_X86_RELAX_RELOCATIONS to ON

    This makes clang default to -Wa,-mrelax-relocations=yes, which enables
    R_386_GOT32X (GNU as enables it regardless of -mrelax-relocations=) and
    R_X86_64_[REX_]GOTPCRELX in MC. The produced object files require GNU ld>=2.26
    to link. binutils 2.26 is considered a very old release today.

 clang/CMakeLists.txt                                        | 2 +-
 llvm/utils/gn/secondary/clang/include/clang/Config/BUILD.gn | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

My git bisect run script:

#!/usr/bin/env bash

TC_BLD=${HOME}/cbl/github/tc-build
LNX=${HOME}/src/linux

set -x

"${TC_BLD}"/build-llvm.py \
    --assertions \
    --build-stage1-only \
    --no-update \
    --projects "clang;lld" \
    --targets X86 || exit 125

rm -rf "${LNX}"/out

PATH=${TC_BLD}/build/llvm/stage1/bin:${PATH} \
    make \
    -C "${LNX}" \
    -skj"$(nproc)" \
    LLVM=1 \
    O=out/x86_64 \
    defconfig bzImage || exit 125

"${HOME}"/cbl/github/boot-utils/boot-qemu.sh \
    -a x86_64 \
    -k "${LNX}"/out/x86_64 \
    -t 30s

https://reviews.llvm.org/rGc41a18cf61790fc898dcda1055c3efbf442c14c0

cc @MaskRay

Workaround: CFLAGS & ASFLAGS: -Wa,-mrelax-relocations=no

I am surprised that R_X86_64_GOTPCRELX or R_X86_64_REX_GOTPCRELX can cause problems. How does GCC+GNU as work? Modern GNU as enables R_X86_64_GOTPCRELX or R_X86_64_REX_GOTPCRELX everywhere.

The produced object files require GNU ld>=2.26
to link. binutils 2.26 is considered a very old release today.

The Linux kernel's minimum supported version of GNU binutils is currently 2.23:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n34

So we'll probably need that flag when building with Clang if LLD is not being used. Additionally, we'll have to figure out if maybe these relocation types are unsupported?

Workaround: CFLAGS & ASFLAGS: -Wa,-mrelax-relocations=no

What's tricky is that changes to LLVM like this require us to modify multiple trees in the Linux kernel. Patches take time to bake in linux-next, which is a staging/integration tree before the "merge window" opens to merge patches in the "mainline." The merge window only opens for two weeks once every two months. Only once patches hit mainline can they be accepted for inclusion into the "stable" tree, which Android and CrOS pull from for their kernels.

So changes like this to LLVM can end up breaking linux-next, mainline, and stable, and leave them broken or without CI coverage for months.

We can tolerate these kind of breaking changes once in a while, but if they start occurring more frequently then we have a problem. We probably need to wire up buildbots of the linux kernel for LLVM post submits. #1099 and #416 come to mind.

The Linux kernel's minimum supported version of GNU binutils is currently 2.23:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n34

With binutils<2.26, cmake ... -DENABLE_X86_RELAX_RELOCATIONS=off is simpler than adding -Wa,-mrelax-relocations=no. The LLVM build system could do a more sophisticated configure time check whether binutils>=2.26 is available but that will probably just add some kludge which will soon be removed in (say) one year.

If someone wants to do a bisection,

lld/ELF/Arch/X86_64.cpp adjustRelaxExpr

static const int cnt = atoi(getenv("CNT"));
...
if (type != R_X86_64_GOTPCRELX && type != R_X86_64_REX_GOTPCRELX)
  return relExpr;
if (cnt-- > 0)
  return relExpr;    /////// disable relaxation for first cnt relocations

Then bisect on cnt.

Heh, observing boot in qemu is interesting.

(gdb) lx-dmesg 
Python Exception <class 'gdb.error'> No symbol "log_first_idx" in specified context.:
Error occurred in Python: No symbol "log_first_idx" in specified context.

I can get snippets of the log via (gdb) x/1000s __log_buf. I can dump+view better with (gdb) dump binary memory dump.bin &__log_buf &__log_buf+12 (the +12 is curious, I can't do much larger values). then view with $ strings dump.bin | less.

But the kernel seems to start properly, just some symbols aren't observable.

Strange, if the bisection culprit involves new relocation types, shouldn't those appear in the output of llvm-readelf -r?

$ llvm-readelf -r vmlinux > vmlinux.readelf.r.txt  
$ grep R_X86_64_GOTPCRELX vmlinux.readelf.r.txt 
$ grep R_X86_64_REX_GOTPCRELX vmlinux.readelf.r.txt 

Strange, if the bisection culprit involves new relocation types, shouldn't those appear in the output of llvm-readelf -r?

R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX only appear in object files. They are resolved after linking.

Have someone tried GCC + LLD? I'm curious whether the combo works.

Have someone tried GCC + LLD? I'm curious whether the combo works.

Just tested, that combination doesn't build.

  LD      arch/x86/boot/setup.elf
ld.lld: error: section .text.startup file range overlaps with .header
>>> .text.startup range is [0x1040, 0x11FE]
>>> .header range is [0x11EF, 0x126B]
...

Weird, I thought my compiler was haunted (spoiler: it is 👻 ) but I actually can't reproduce the boot failure on a slightly new build of llvm (f2c04239955a8e0d71aa27f7ffa3bbba6c623aef). Was the problematic patch reverted? Doesn't look like it... @nathanchance can you repro that?

But if I checkout precisely c41a18cf61790fc898dcda1055c3efbf442c14c0, I can repro. Do I need to forward bisect to see when this was fixed?

last known bad: 7ba82a7320df
first known good: ee1c12708a45

commit ee1c12708a4519361729205168dedb2b61bc2638
("[SCEV] If Start>=RHS, simplify (Start smin RHS) = RHS for trip counts.")

hmm...strange, maybe I was holding it wrong.

Anyways, today I have c41a18cf61790fc898dcda1055c3efbf442c14c0 checked out, and am using @MaskRay 's diff to bisect.

diff --git a/lld/ELF/Arch/X86_64.cpp b/lld/ELF/Arch/X86_64.cpp
index 24711ec210a4..d08f642801ee 100644
--- a/lld/ELF/Arch/X86_64.cpp
+++ b/lld/ELF/Arch/X86_64.cpp
@@ -73,6 +73,8 @@ static const std::vector<std::vector<uint8_t>> nopInstructions = {
     {0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00},
     {0x66, 0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00}};

+static int cnt = atoi(getenv("CNT"));
+
 X86_64::X86_64() {
   copyRel = R_X86_64_COPY;
   gotRel = R_X86_64_GLOB_DAT;
@@ -732,6 +734,8 @@ RelExpr X86_64::adjustRelaxExpr(RelType type, const uint8_t *data,
                                 RelExpr relExpr) const {
   if (type != R_X86_64_GOTPCRELX && type != R_X86_64_REX_GOTPCRELX)
     return relExpr;
+  if (cnt-- > 0)
+    return relExpr;
   const uint8_t op = data[-2];
   const uint8_t modRm = data[-1];

CNT=10000 make LLVM=1 -j71 boots. Bisection in progress. Will try to print the symbol based on the the value of cnt identified.

CNT=24 make LLVM=1 -j71 clean bzImage does not boot.
CNT=25 make LLVM=1 -j71 clean bzImage boots. Time to find the 25th symbol being relaxed.

  LD      arch/x86/boot/compressed/vmlinux
symbol 25: early_serial_base

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index c08714ae76ec..55eb5d41d90e 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -43,6 +43,7 @@ KBUILD_CFLAGS += -Wno-pointer-sign
 KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
 KBUILD_CFLAGS += -D__DISABLE_EXPORTS
+KBUILD_CFLAGS += -Wa,-mrelax-relocations=no

 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n

boots with stock LLVM. Let's narrow in on early_serial_base. Looks like it might be declared static in arch/x86/boot/compressed/misc.h...

File level bisection:

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index c08714ae76ec..8b23691ff9e3 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -43,6 +43,7 @@ KBUILD_CFLAGS += -Wno-pointer-sign
 KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
 KBUILD_CFLAGS += -D__DISABLE_EXPORTS
+CFLAGS_kaslr_64.o += -Wa,-mrelax-relocations=no

 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n

Doesn't contain the symbol early_serial_base, though maybe I did the optimization fuel wrong.

though maybe I did the optimization fuel wrong.

I did. Rather than bisect via optimization fuel, then disable the relaxation on a file, it's better to disable relaxation on all files in the dir, then enable it for the bad file, then bisect symbols via optimization fuel.


diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index c08714ae76ec..09fc50e818c1 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -43,6 +43,8 @@ KBUILD_CFLAGS += -Wno-pointer-sign
 KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
 KBUILD_CFLAGS += -D__DISABLE_EXPORTS
+subdir-ccflags-y += -Wa,-mrelax-relocations=no
+CFLAGS_kaslr_64.o += -Wa,-mrelax-relocations=yes

 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n

In that case, optimization fuel bisection produces 9 (boots) 8 (fails to boot). Relocation relaxation for the 9th symbol is _pgtable which is forward declared:

// arch/x86/boot/compressed/misc.h:100
extern unsigned char _pgtable[];                                                                                                                                    

defined in:

// arch/x86/boot/compressed/vmlinux.lds.S:70
_pgtable = . ;

and referenced in

// arch/x86/boot/compressed/kaslr_64.c
111   if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {                                                                                                 
112     debug_putstr("booted via startup_32()\n");                                                                                                                      
113     pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;                                                                                                               
114     pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;                                                                                                     
115     memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);                                                                                                             
116   } else {                                                                                                                                                          
117     debug_putstr("booted via startup_64()\n");                                                                                                                      
118     pgt_data.pgt_buf = _pgtable;       

Though it's not clear to my why relaxing that symbol in particular is problematic (if I even have the right symbol).

$ llvm-readelf -r arch/x86/boot/compressed/kaslr_64.o | grep _pgtable
...
0000000000000380  000000130000002a R_X86_64_REX_GOTPCRELX 0000000000000000 _pgtable - 4
0000000000000395  000000130000002a R_X86_64_REX_GOTPCRELX 0000000000000000 _pgtable - 4
00000000000003e3  000000130000002a R_X86_64_REX_GOTPCRELX 0000000000000000 _pgtable - 4
$ llvm-readelf -s arch/x86/boot/compressed/vmlinux | grep -e _pgtable -e _epgtable
...
   217: 000000000091c000     0 NOTYPE  GLOBAL DEFAULT     8 _pgtable
   230: 0000000000930000     0 NOTYPE  GLOBAL DEFAULT     8 _epgtable

FWIW

(if I even have the right symbol).

It must be _pgtable. If I unmodify my kernel, then modify LLD to not relocate that symbol, we boot:

diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp
index 751ded397768..8cf82fb594dc 100644
--- a/lld/ELF/Relocations.cpp
+++ b/lld/ELF/Relocations.cpp
@@ -1336,7 +1336,8 @@ static void scanReloc(InputSectionBase &sec, OffsetGetter &getOffset, RelTy *&i,
   // list. We can leverage that fact.
   if (!sym.isPreemptible && (!sym.isGnuIFunc() || config->zIfuncNoplt)) {
     if (expr == R_GOT_PC && !isAbsoluteValue(sym)) {
-      expr = target->adjustRelaxExpr(type, relocatedAddr, expr);
+      if (!((type == R_X86_64_GOTPCRELX || type == R_X86_64_REX_GOTPCRELX) && sym.getName() == "_pgtable"))
+        expr = target->adjustRelaxExpr(type, relocatedAddr, expr);
     } else {
       // The 0x8000 bit of r_addend of R_PPC_PLTREL24 is used to choose call
       // stub type. It should be ignored if optimized to R_PC.

Here's the diff between a non-bootable (red)(relaxed) and bootable (green)(relaxation skipped) disassembly of initialize_identity_maps:

$ llvm-objdump -Dr --disassemble-symbols=initialize_identity_maps arch/x86/boot/compressed/vmlinux > initialize_identity_maps.vmlinux.good2.txt
<same for bad>
$ diff -u initialize_identity_maps.vmlinux.bad.txt initialize_identity_maps.vmlinux.good2.txt
--- initialize_identity_maps.vmlinux.bad.txt    2020-08-06 16:51:37.287449694 -0700
+++ initialize_identity_maps.vmlinux.good2.txt  2020-08-06 17:11:13.514214107 -0700
@@ -25,11 +25,11 @@
   8fb52d: 48 b9 00 f0 ff ff ff ff 0f 00        movabsq $4503599627366400, %rcx
   8fb537: 48 23 08                             andq    (%rax), %rcx
   8fb53a: 48 89 c8                             movq    %rcx, %rax
-  8fb53d: 48 81 f8 00 c0 91 00                 cmpq    $9551872, %rax
+  8fb53d: 48 3b 05 dc 83 00 00                 cmpq    33756(%rip), %rax  # 903920 <_got>
   8fb544: 74 49                                je      0x8fb58f <initialize_identity_maps+0xcf>
   8fb546: 48 8d 3d 06 78 00 00                 leaq    30726(%rip), %rdi  # 902d53 <kernel_info_var_len_data+0x4eb>
   8fb54d: e8 ee c7 ff ff                       callq   0x8f7d40 <__putstr>
-  8fb552: 48 8d 3d a7 0a 02 00                 leaq    133799(%rip), %rdi  # 91c000 <pgtable>
+  8fb552: 48 8b 3d c7 83 00 00                 movq    33735(%rip), %rdi  # 903920 <_got>
   8fb559: 48 89 3d 88 dc 01 00                 movq    %rdi, 121992(%rip)  # 9191e8 <pgt_data>
   8fb560: 48 8d 1d 81 dc 01 00                 leaq    121985(%rip), %rbx  # 9191e8 <pgt_data>
   8fb567: 48 c7 05 7e dc 01 00 00 30 01 00             movq    $77824, 121982(%rip)  # 9191f0 <pgt_data+0x8>
@@ -44,7 +44,7 @@
   8fb58f: 48 8d 3d 28 7f 00 00                 leaq    32552(%rip), %rdi  # 9034be <kernel_info_var_len_data+0xc56>
   8fb596: e8 a5 c7 ff ff                       callq   0x8f7d40 <__putstr>
   8fb59b: bf 00 60 00 00                       movl    $24576, %edi
-  8fb5a0: 48 81 c7 00 c0 91 00                 addq    $9551872, %rdi
+  8fb5a0: 48 03 3d 79 83 00 00                 addq    33657(%rip), %rdi  # 903920 <_got>
   8fb5a7: 48 89 3d 3a dc 01 00                 movq    %rdi, 121914(%rip)  # 9191e8 <pgt_data>
   8fb5ae: 48 c7 05 37 dc 01 00 00 d0 00 00             movq    $53248, 121911(%rip)  # 9191f0 <pgt_data+0x8>
   8fb5b9: ba 00 d0 00 00                       movl    $53248, %edx

The bad case doesn't look position independent to me. The linker invocation looks like:

$ ld.lld -m elf_x86_64    -T arch/x86/boot/compressed/vmlinux.lds arch/x86/boot/compressed/kernel_info.o arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o arch/x86/boot/compressed/cpuflags.o arch/x86/boot/compressed/early_serial_console.o arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o arch/x86/boot/compressed/mem_encrypt.o arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o arch/x86/boot/compressed/efi_thunk_64.o drivers/firmware/efi/libstub/lib.a -o arch/x86/boot/compressed/vmlinux

Oh, it looks like the test of the -pie linker flag is failing:

# arch/x86/boot/compressed/Makefile
 60 # To build 64-bit compressed kernel as PIE, we disable relocation                                                                                                   
 61 # overflow check to avoid relocation overflow error with a new linker                                                                                               
 62 # command-line option, -z noreloc-overflow.                                                                                                                         
 63 KBUILD_LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \                                                                                      
 64   && echo "-z noreloc-overflow -pie --no-dynamic-linker")

So we never tell lld that we want a PIE binary. Though -pie needs to be --pie, but we still get additional linkage failures reminiscent of #579.

ld.lld: error: can't create dynamic relocation R_X86_64_32 against local symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output
>>> defined in arch/x86/boot/compressed/head_64.o
>>> referenced by arch/x86/boot/compressed/head_64.o:(efi32_pe_entry)

I wonder if Arvind's patch set @kees is carrying helps. I should try that.

In terms of keeping my kernel booting before this gets fixed, will passing -DENABLE_X86_RELAX_RELOCATIONS=OFF when building llvm keep this issue from occurring?

@E5ten @nathanchance can you both please test https://lore.kernel.org/lkml/[email protected]/T/#u and comment on it upstream?

Note that both -pie and --pie are supported by GNU ld and LLD. gcc supports both -pie and --pie as driver options while clang only supports -pie.

My understanding: LLD correctly performs R_X86_64_[REX_]GOTPCRELX relaxation. The build system has other bugs that -pie is not properly passed to LLD but the built image is used in a -pie fashion. cmpq $9551872, %rax thus fails. Is it correct?

Arvind has submitted a different approach: https://lore.kernel.org/lkml/[email protected]/

accepted: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=09e43968db40c33a73e9ddbfd937f46d5c334924

@nivedita76 notes that we'll need to manually backport for stable. (No worries).

So in testing https://bugs.llvm.org/show_bug.cgi?id=47468, I had checked out the release/11.x branch of LLVM, and hit what I think it this boot failure for:
Running 'ARCH=x86_64 LD=ld.lld REPO=4.14 ./driver.sh'... Failed in 3m 2s
Running 'ARCH=x86_64 LD=ld.lld REPO=android-4.9-q ./driver.sh'... Failed in 3m 16s
Running 'ARCH=x86_64 LD=ld.lld REPO=android-4.14-stable ./driver.sh'... Failed in 4m 15s
Even though we're carrying patches for those in our CI; which didn't fail last night (https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/184873914) (so why would they fail for me locally, unless the llvm-11 on apt is old, or we haven't rebuilt our docker image for some reason?)

@adelva1984 mentioned that the backport of the patch above didn't boot for him for Android common kernels. So for the backport, we'll need to figure out if maybe someone's wiping out CFLAGS when they should not be, for those branches.

If I remove LD=ld.lld, the above three boot for me, which is why I suspect this issue.

https://github.com/llvm/llvm-project/commit/c41a18cf61790fc898dcda1055c3efbf442c14c0 is not in release/11.x, are you sure that it is the same issue?

$ cd llvm/build
$ ag ENABLE_X86_RELAX_RELOCATIONS
CMakeCache.txt
606:ENABLE_X86_RELAX_RELOCATIONS:BOOL=ON

FFFFFFFFFFFFFFF (so I had downgraded from ToT/clang-12 to clang-11, and the cmake var stayed cached). 🐒 💨

Not sure that explains @adelva1984 's observance. Will retest with a clang-11 rebuilt from scratch.

yep, that was it, sorry for the false alarm. ⏰ 🏴

I just got an email that this has been queued for 5.8, 5.4, and 4.19. We're building x86_64 with LLD back to 4.4, so I will send backports for 4.14, 4.9, and 4.4.

09e43968db40c33a73e9ddbfd937f46d5c334924

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nickdesaulniers picture nickdesaulniers  ·  4Comments

nickdesaulniers picture nickdesaulniers  ·  4Comments

nathanchance picture nathanchance  ·  4Comments

nickdesaulniers picture nickdesaulniers  ·  4Comments

nathanchance picture nathanchance  ·  3Comments