Linux: next-20190915 causes a boot regression for aspeed_g5_defconfig

Created on 17 Sep 2019  路  7Comments  路  Source: ClangBuiltLinux/linux

Commit 4cdabee7d6d2 ("ARM: configs: aspeed_g5: Enable AST2600") in -next causes a boot regression with aspeed_g5_defconfig, as determined by our CI (cc @shenki).

Good build: https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/235496835

Bad build: https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/235854196

I have done no further triage, I will try to run a config bisect to see which exact one triggers this issue.

Bisect log:

$ git bisect log
# bad: [f84e1c16f08d7d1d6955f8f1c2e6cc0587d6d14c] Add linux-next specific files for 20190916
# good: [4d856f72c10ecb060868ed10ff1b1453943fc6c8] Linux 5.3
git bisect start 'origin/master' 'origin/stable'
# bad: [2031f8db7809127fa5f6dd3626bb0a1ee5d7e6cf] Merge remote-tracking branch 'mtd/mtd/next'
git bisect bad 2031f8db7809127fa5f6dd3626bb0a1ee5d7e6cf
# bad: [7c08d3e78efd404bacc862d7ad524e535389ab08] Merge remote-tracking branch 'pci/next'
git bisect bad 7c08d3e78efd404bacc862d7ad524e535389ab08
# bad: [0bd1555c6c187a39eb5841d61af13308cdf1a58a] Merge remote-tracking branch 'arm-soc/for-next'
git bisect bad 0bd1555c6c187a39eb5841d61af13308cdf1a58a
# bad: [83a86bc1b4f701efd860a37f4a79f0e23a70c209] Merge branch 'arm/soc' into for-next
git bisect bad 83a86bc1b4f701efd860a37f4a79f0e23a70c209
# good: [f02bd65a5b63bbc8cfbe65073005909833b72bb5] Merge tag 'amlogic-dt64-2.1' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into arm/dt
git bisect good f02bd65a5b63bbc8cfbe65073005909833b72bb5
# bad: [f119c164735d85f2a41d14503cb9933d219c539e] arm-soc: document merges
git bisect bad f119c164735d85f2a41d14503cb9933d219c539e
# good: [bff00fdf03b0cba4219cca6e0c2a59cd701aec36] Merge tag 'imx-drivers-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/drivers
git bisect good bff00fdf03b0cba4219cca6e0c2a59cd701aec36
# good: [50e5bf56ba2fbbce3da74fb93aebb8e13f8eb8fc] arm-soc: document merges
git bisect good 50e5bf56ba2fbbce3da74fb93aebb8e13f8eb8fc
# good: [9e38081e5deb9c6cb7a7949b4f12959ea3ca8663] Merge tag 'renesas-arm-soc-for-v5.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into arm/soc
git bisect good 9e38081e5deb9c6cb7a7949b4f12959ea3ca8663
# bad: [597b97903543028cf6b708460b634f43762b3432] Merge tag 'aspeed-5.4-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/aspeed into arm/defconfig
git bisect bad 597b97903543028cf6b708460b634f43762b3432
# good: [9414d65d8b7ea41fa4062642910711cf434ae923] Merge tag 'sunxi-config64-for-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/defconfig
git bisect good 9414d65d8b7ea41fa4062642910711cf434ae923
# good: [24183825c3c7a9378e5a3a5555c6504edd0a302b] ARM: configs: aspeed: Refresh defconfigs
git bisect good 24183825c3c7a9378e5a3a5555c6504edd0a302b
# bad: [4cdabee7d6d2e439fea726a101e448c4ca6837f4] ARM: configs: aspeed_g5: Enable AST2600
git bisect bad 4cdabee7d6d2e439fea726a101e448c4ca6837f4
# good: [e6fe57ebd5290ef7548c298ff0cf5da575600ff3] ARM: configs: multi_v7: Add ASPEED G6
git bisect good e6fe57ebd5290ef7548c298ff0cf5da575600ff3
# first bad commit: [4cdabee7d6d2e439fea726a101e448c4ca6837f4] ARM: configs: aspeed_g5: Enable AST2600

Bisect run script

$ cat /tmp/tmp.lbWuipMiDT.sh
#!/bin/bash

# Move into kernel repo
cd /home/nathan/src/linux-next

# Apply patch to fix mcount build issues; if it can't be applied, revision is untestable because kernel won't build
grep -q "meabi gnu" arch/arm/Makefile || { git fp -1 --stdout b0fe66cf095016e0b238374c10ae366e1f087d11 | git ap || exit 125; }

# Build kernel (if build fails, revision is untestable)
TC_FOLDER=${HOME}/cbl/toolchains
PATH=${TC_FOLDER}/llvm-tot/bin:${TC_FOLDER}/binutils/bin:${PATH} \
make -j$(nproc) -s \
     ARCH=arm \
     CC=clang \
     CROSS_COMPILE=arm-linux-gnueabi- \
     KCFLAGS=-Wno-implicit-fallthrough \
     O=out \
     distclean aspeed_g5_defconfig all || exit 125

# Build should fail to boot four times
# There was some flakiness with doing just one boot
result=0
for i in $(seq 1 4); do
    timeout 180s unbuffer \
    qemu-system-arm -m 512m \
                    -machine romulus-bmc \
                    -no-reboot \
                    -dtb out/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dtb \
                    -initrd /home/nathan/cbl/git/ci-master/images/arm/rootfs.cpio \
                    -display none \
                    -serial mon:stdio \
                    -kernel out/arch/arm/boot/zImage
    result=$((result + ${?}))
done

# Remove any local changes so the bisect flip is clean
git reset --hard

# Only say this revision is bad 
if [[ ${result} -eq 496 ]]; then
    exit 1
else
    exit 0
fi
[ARCH] arm32 [BUG] qemu

Most helpful comment

Fixed in master in commit ece628fcf69cbbd, which will be in the 4.2
release (also tagged for stable so will end up on the 4.1.x branch at
some point).

https://bugs.launchpad.net/bugs/1844597

All 7 comments

Thanks for the report. The 'bad' boot ends like this:

[   47.406783] Freeing unused kernel memory: 1024K
[   68.845527] Writes:  Total: 170550  Max/Min: 0/0   Fail: 0 
[   68.868620] Checked W+X mappings: passed, no W+X pages found
[   68.870734] rodata_test: all tests were successful
[   68.872328] Run /init as init process
Starting syslogd: OK
The command "docker run \
    --env ARCH=${ARCH} \
    --env LD=${LD} \
    --env REPO=${REPO} \
    --rm \
    --workdir /travis \
    --volume ${TRAVIS_BUILD_DIR}:/travis \
    clangbuiltlinux/debian:llvm${LLVM_VERSION:-10}-latest /bin/bash -c './env-setup.sh && ./driver.sh && ccache -s'
" exited with 124.

I can't see what failed there. Was it killed due to a timeout?

Yes, the 124 exit code is from timeout.

I am doing a config bisect right now.

I don't know what exactly is going on but I cannot reproduce the CI failure locally in the Docker image. I have restarted the CI build to see if it happens again (I did that earlier and it failed again so I figured it was legitimate): https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/235854196

The issue I am noticing locally and didn't realize was a separate issue is that CONFIG_VFP somehow prevents a clang built kernel from booting in my local environment. I guess it might be related to qemu 4.1.0 but I will need to do some further investigation.

Turns out, this is a QEMU issue but it somehow interacts with Clang because this hang does not happen with GCC...

$ git bisect log
# bad: [6f214b30445793f2d397ab0dd3bf632511f97624] Merge remote-tracking branch 'remotes/rth/tags/pull-hppa-20190915' into staging
# good: [32a1a94dd324d33578dca1dc96d7896a0244d768] Update version for v3.1.0 release
git bisect start 'origin/master' 'v3.1.0'
# good: [2fc8d6f8e4d959b05e9225b46dad414bd8f54f88] Merge remote-tracking branch 'remotes/xtensa/tags/20190326-xtensa' into staging
git bisect good 2fc8d6f8e4d959b05e9225b46dad414bd8f54f88
# skip: [8caba36db5d70beb5353b5b5d28421b67275bcf0] i386/kvm: hv-evmcs requires hv-vapic
git bisect skip 8caba36db5d70beb5353b5b5d28421b67275bcf0
# skip: [14f5d874bcd533054648bb7cc767c7169eaf2f1c] target/mips: Fix emulation of ILVR.<B|H|W> on big endian host
git bisect skip 14f5d874bcd533054648bb7cc767c7169eaf2f1c
# bad: [0b9e918f03df47ae393fa68d9ac37ce4820cbaaf] fdc: Fix inserting read-only media in empty drive
git bisect bad 0b9e918f03df47ae393fa68d9ac37ce4820cbaaf
# good: [52f2b8961409be834abaee5189bff2cc9e372851] tests/perf: Test lseek influence on qcow2 block-status
git bisect good 52f2b8961409be834abaee5189bff2cc9e372851
# bad: [f4af847316ec3ee05f5ebb40fedec317ddca46fa] Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20190701' into staging
git bisect bad f4af847316ec3ee05f5ebb40fedec317ddca46fa
# bad: [519ee7ae31e050eb0ff9ad35c213f0bd7ab1c03e] target/arm: Convert VDIV to decodetree
git bisect bad 519ee7ae31e050eb0ff9ad35c213f0bd7ab1c03e
# good: [29168c6585673402e5ff285d84fd1d44de977424] target/nios2: Use env_cpu, env_archcpu
git bisect good 29168c6585673402e5ff285d84fd1d44de977424
# good: [2736b5cbee8aaf1af668693edacf261851565cc0] tests/tcg: clean-up VPATH/TESTS for i386
git bisect good 2736b5cbee8aaf1af668693edacf261851565cc0
# good: [4747524f9f243ca5ff1f146d37e423c00e923ee1] Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2019-06-12' into staging
git bisect good 4747524f9f243ca5ff1f146d37e423c00e923ee1
# bad: [e3bb599d16e4678b228d80194cee328f894b1ceb] target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
git bisect bad e3bb599d16e4678b228d80194cee328f894b1ceb
# bad: [be1ba4d56eba5666ee03b40e286d7315862ab188] hw/core/bus.c: Only the main system bus can have no parent
git bisect bad be1ba4d56eba5666ee03b40e286d7315862ab188
# good: [785a602eae7ad97076b9794ebaba072ad4a9f74f] Merge remote-tracking branch 'remotes/kraxel/tags/vga-20190613-pull-request' into staging
git bisect good 785a602eae7ad97076b9794ebaba072ad4a9f74f
# bad: [fc1120a7f5f2d4b601003205c598077d3eb11ad2] target/arm: Implement NSACR gating of floating point
git bisect bad fc1120a7f5f2d4b601003205c598077d3eb11ad2
# good: [3a7a2b4e5cf0d49cd8b14e8225af0310068b7d20] target/arm: Use tcg_gen_gvec_bitsel
git bisect good 3a7a2b4e5cf0d49cd8b14e8225af0310068b7d20
# first bad commit: [fc1120a7f5f2d4b601003205c598077d3eb11ad2] target/arm: Implement NSACR gating of floating point

https://github.com/qemu/qemu/commit/fc1120a7f5f2d4b601003205c598077d3eb11ad2

$ timeout 60s unbuffer /tmp/tmp.ELdl7xxuc9/bin/qemu-system-arm -m 512m -machine romulus-bmc -no-reboot -dtb out/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dtb -initrd /home/nathan/cbl/git/ci-master/images/arm/rootfs.cpio -display none -serial mon:stdio -kernel out/arch/arm/boot/zImage
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.3.0-next-20190916 (nathan@archlinux-threadripper) (gcc version 9.2.0 (GCC)) #1 SMP Tue Sep 17 00:54:31 MST 2019
...
$ readelf --string-dump=.comment out/vmlinux

String dump of section '.comment':
  [     0]  ClangBuiltLinux clang version 10.0.0 (git://github.com/llvm/llvm-project 6bc1236d395187334051b99e2fcd715a54753f33) (based on LLVM 10.0.0svn)

$ timeout 60s unbuffer /tmp/tmp.ELdl7xxuc9/bin/qemu-system-arm -m 512m -machine romulus-bmc -no-reboot -dtb out/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dtb -initrd /home/nathan/cbl/git/ci-master/images/arm/rootfs.cpio -display none -serial mon:stdio -kernel out/arch/arm/boot/zImage; echo ${?}
124

cc @pm215

I completely forgot about using gdb to get a dmesg... (thanks Nick!)

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.3.0-next-20190918-dirty (nathan@archlinux-threadripper) (ClangBuiltLinux clang version 10.0.0 (git://github.com/llvm/llvm-project 93e1f73b6b2dc751a27568da5797cc2ba8436385) (based on LLVM 10.0.0svn)) #1 SMP Wed Sep 18 16:58:35 MST 2019
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
[    0.000000] CPU: VIPT aliasing data cache, unknown instruction cache
[    0.000000] OF: fdt: Machine model: Romulus BMC
[    0.000000] Memory policy: Data cache writeback
[    0.000000] Reserved memory: created CMA memory pool at 0x9d000000, size 16 MiB
[    0.000000] OF: reserved mem: initialized node framebuffer, compatible id shared-dma-pool
[    0.000000] Reserved memory: created CMA memory pool at 0x96000000, size 32 MiB
[    0.000000] OF: reserved mem: initialized node jpegbuffer, compatible id shared-dma-pool
[    0.000000] cma: Reserved 16 MiB at 0x9c000000
[    0.000000] On node 0 totalpages: 110336
[    0.000000]   Normal zone: 990 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 110336 pages, LIFO batch:31
[    0.000000] CPU: All CPU(s) started in SVC mode.
[    0.000000] percpu: Embedded 17 pages/cpu s48780 r0 d20852 u69632
[    0.000000] pcpu-alloc: s48780 r0 d20852 u69632 alloc=17*4096
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 109346
[    0.000000] Kernel command line: console=ttyS4,115200 earlyprintk
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 356684K/441344K available (8192K kernel code, 432K rwdata, 1564K rodata, 1024K init, 173K bss, 19124K reserved, 65536K cma-reserved, 0K highmem)
[    0.000000] random: get_random_u32 called from cache_alloc_refill+0x660/0x8d8 with crng_init=0
[    0.000000] ftrace: allocating 22519 entries in 44 pages
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=2 to nr_cpu_ids=1.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000000] i2c controller registered, irq 16
[    0.000000] clocksource: FTTMR010-TIMER2: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 77222644334 ns
[    0.000105] sched_clock: 32 bits at 24MHz, resolution 40ns, wraps every 86767015915ns
[    0.001670] Switching to timer-based delay loop, resolution 40ns
[    0.006484] Calibrating delay loop (skipped), value calculated using timer frequency.. 49.50 BogoMIPS (lpj=247500)
[    0.006612] pid_max: default: 32768 minimum: 301
[    0.007865] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.007903] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.019928] CPU: Testing write buffer coherency: ok
[    0.031644] Setting up static identity map for 0x80100000 - 0x80100078
[    0.032664] rcu: Hierarchical SRCU implementation.
[    0.037828] smp: Bringing up secondary CPUs ...
[    0.037885] smp: Brought up 1 node, 1 CPU
[    0.037953] SMP: Total of 1 processors activated (49.50 BogoMIPS).
[    0.037984] CPU: All CPU(s) started in SVC mode.
[    0.044616] devtmpfs: initialized
[    0.058685] VFP support v0.3: 
[    0.059159] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
[    0.059525] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-next-20190918-dirty #1
[    0.059547] Hardware name: Generic DT based system
[    0.059702] PC is at vfp_init+0x50/0x1f4
[    0.059721] LR is at vfp_init+0x4c/0x1f4
[    0.059738] pc : [<80b0383c>]    lr : [<80b03838>]    psr: 60000153
[    0.059756] sp : 9e497ec0  ip : 00000020  fp : 9e497ed8
[    0.059773] r10: 00000000  r9 : ffffe000  r8 : 80c06048
[    0.059792] r7 : 00000000  r6 : 80c0caac  r5 : 80c6c418  r4 : 80b037ec
[    0.059811] r3 : 00000000  r2 : 339aa372  r1 : 00000000  r0 : 00000012
[    0.059859] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
[    0.059883] Control: 00c5387d  Table: 80004008  DAC: 00000051
[    0.059997] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
[    0.060048] Stack: (0x9e497ec0 to 0x9e498000)
[    0.060205] 7ec0: 80b037ec 80b6bf0c 80b037ec ffffffff 00000000 00000000 9e497f48 80b01100
[    0.060310] 7ee0: 00000000 9eeff9e0 80a85734 809eb9be 00000000 8014b7f4 9eeff9e0 80a85734
[    0.060408] 7f00: 9e497f48 8014b7f4 000000a4 00000001 00000001 00000000 80b0133c 9e497f38
[    0.060509] 7f20: 00000000 9eeff9d5 339aa372 80b6be80 80b6bf0c 00000000 00000000 00000000
[    0.060606] 7f40: 00000000 00000000 9e497f70 80b01864 00000001 00000001 00000000 80b0133c
[    0.060703] 7f60: 00000001 8085d268 00000000 00000000 9e497f80 80b01758 00000000 00000000
[    0.060800] 7f80: 9e497f90 80b015e4 00000000 8085d268 9e497fa8 8085d27c 00000000 8085d268
[    0.060897] 7fa0: 00000000 00000000 00000000 801010e8 00000000 00000000 00000000 00000000
[    0.060993] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    0.061090] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    0.061625] [<80b0383c>] (vfp_init) from [<80b01100>] (do_one_initcall+0xa8/0x1e0)
[    0.061722] [<80b01100>] (do_one_initcall) from [<80b01864>] (do_initcall_level+0xfc/0x12c)
[    0.061742] [<80b01864>] (do_initcall_level) from [<80b01758>] (do_basic_setup+0x2c/0x3c)
[    0.061759] [<80b01758>] (do_basic_setup) from [<80b015e4>] (kernel_init_freeable+0x68/0x104)
[    0.061777] [<80b015e4>] (kernel_init_freeable) from [<8085d27c>] (kernel_init+0x14/0x26c)
[    0.061798] [<8085d27c>] (kernel_init) from [<801010e8>] (ret_from_fork+0x14/0x2c)
[    0.061835] Exception stack(0x9e497fb0 to 0x9e497ff8)
[    0.061896] 7fa0:                                     00000000 00000000 00000000 00000000
[    0.061998] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    0.062080] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    0.062263] Code: e5860000 e59f0174 ebd9d8fc e59f5170 (eef04a10) 
[    0.062679] ---[ end trace 2d338c91e4e74562 ]---
[    0.062814] Kernel panic - not syncing: Fatal exception
[    1.063431] Reboot failed -- System halted

Reported on the QEMU bug tracker: https://bugs.launchpad.net/qemu/+bug/1844597

Fixed in master in commit ece628fcf69cbbd, which will be in the 4.2
release (also tagged for stable so will end up on the 4.1.x branch at
some point).

https://bugs.launchpad.net/bugs/1844597

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nathanchance picture nathanchance  路  3Comments

nickdesaulniers picture nickdesaulniers  路  3Comments

nathanchance picture nathanchance  路  3Comments

nickdesaulniers picture nickdesaulniers  路  4Comments

nickdesaulniers picture nickdesaulniers  路  4Comments