We are seeing kernel 'Oopses', while running the following combination:
Both of these elements seem to be required for the crash.
This is reproducable on stock raspbian (2018-04-18).
We did our own build of Qt to use eglfs and libinput, but it's build from stock qt 5.10.0. We carry one small patch to qt, but have reproduced the crash without it.
Here are the kernel error messages we have seen. (closest to stock raspbian/qt first):
Captured over ssh, patched qt: https://gist.github.com/Ealdwulf/d26ae24f059e37acda58ef384bce9c27
Captured over serial, patched qt, kernel 4.14.50 rebuilt with more debug options[1]: https://gist.github.com/wizofe/469d8c4f092e5ae7d9adc9147f83e167
The following are the minimum steps to reproduce, that we have so far:
# (connected to wifi and enabled ssh, and switched audio to Analog output).
sudo apt-get update
sudo apt-get install libts-0.0-0 tsconf
# This is build from Stock QT 5.10.0:
wget http://dev.kano.me/temp/kernel_crash_repr/libqt5all_5.10-1_armhf.deb
# This is a very simple Qt app:
wget http://dev.kano.me/temp/kernel_crash_repr/qmlmatrix_1.0-1.20170815_all.deb
# A long wav file:
wget http://dev.kano.me/temp/kernel_crash_repr/chippytoon.wav
# Despite the name, this just contains some symlinks:
wget http://dev.kano.me/temp/kernel_crash_repr/kano-graphics-libs_4.0.0-0.20180530_all.deb
sudo dpkg -i qmlmatrix_1.0-1.20170815_all.deb libqt5all_5.10-1_armhf.deb kano-graphics-libs_4.0.0-0.20180530_all.deb
# a script to run the qt app repeatedly
cat <<END >>qtest
while true; do
qmlmatrix &
QPID=$!
sleep 240
kill $QPID
done
END
while true; do aplay chippytoon.wav ; done &
. ./qtest
The result is one of the following:
aplay: pcm_write:2011: write error: Input/output error , sometimes continues oblivious). After this it needs a reboot to start working again.It is not clear to us whether the kernel or firmware is at fault.
Please let us know if there are other logs or traces we could enable which would help diagnose this.
We have a patch to our app that seems to prevent the problem, but we have a fair few qt apps, so we'd like to persue this, to be sure that the others aren't vulnerable at some lower probability.
[1]
CONFIG_DEBUG_INFO=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_FRAME_POINTER=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_KDB=y
CONFIG_KDB_KEYBOARD=y
Which platform is this running on?
Have you tried with the latest rpi-update firmware?
We are using pi 3B. I've seen it on a 2B as well.
I did reproduce with the lastest rpi-update as of a few days ago, but not on vanilla raspbian - I'll try that today.
Yes, the crash still occurs on the latest rpi-update firmware (firmware fe525d2be041c1a9b924824e430b5d51214315c4, kernel 4.14.50+).
It seems to take a lot longer to trigger (the above test was running for half an hour)
Here are some more kernel messages:
锘縖 1458.875616] Unhandled prefetch abort: unknown 1 (0x001) at 0xaaaa75a2
[ 1458.964953] systemd: 32 output lines suppressed due to ratelimiting
[ 1459.386477] EXT4-fs error (device mmcblk0p2): ext4_find_dest_de:1806: inode #2: block 8422: comm systemd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2863326613, rec_len=43690, name_len=170
[ 1459.775672] Unable to handle kernel paging request at virtual address 00002b89
[ 1459.783024] pgd = 80004000
[ 1459.785787] [00002b89] *pgd=00000000
[ 1459.785835] Unable to handle kernel paging request at virtual address 00002c2d
[ 1459.785838] pgd = b75b4000
[ 1459.785840] [00002c2d] *pgd=37483835, *pte=00000000, *ppte=00000000
[ 1459.785854] Internal error: Oops: 17 [#1] SMP ARM
[ 1459.785859] Modules linked in: fuse rfcomm cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic brcmfmac brcmutil cfg80211 rfkill snd_bcm2835(C) snd_pcm snd_timer snd uio_pdrv_genirq uio fixed evdev joydev hid_multitouch i2c_dev ip_tables x_tables ipv6
[ 1459.785922] CPU: 3 PID: 343 Comm: dhcpcd Tainted: G C 4.14.50-v7+ #1122
[ 1459.785923] Hardware name: BCM2835
[ 1459.785927] task: b7795a00 task.stack: b92d6000
[ 1459.785939] PC is at sock_poll+0x24/0xa8
[ 1459.785948] LR is at ep_send_events_proc+0x90/0x144
[ 1459.785951] pc : [<8066c2d0>] lr : [<802d6be4>] psr: 80000013
[ 1459.785954] sp : b92d7e40 ip : b92d7e60 fp : b92d7e5c
[ 1459.785957] r10: 00000000 r9 : b92d7f18 r8 : b657af30
[ 1459.785960] r7 : b92d7e68 r6 : b09f9780 r5 : afaa8700 r4 : b970558c
[ 1459.785963] r3 : 8066c2ac r2 : 00002b6d r1 : b92d7e68 r0 : b09f9780
[ 1459.785967] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 1459.785970] Control: 10c5383d Table: 375b406a DAC: 00000055
[ 1459.785975] Process dhcpcd (pid: 343, stack limit = 0xb92d6210)
[ 1459.785979] Stack: (0xb92d7e40 to 0xb92d8000)
[ 1459.785985] 7e40: b970558c b92d7ea0 7ea68ab8 b92d7f18 b92d7e9c b92d7e60 802d6be4 8066c2b8
[ 1459.785991] 7e60: 60000013 b657af00 00000000 00000019 802d6b54 b657af00 b92d7ea0 00000000
[ 1459.785997] 7e80: 802d6b54 b657af30 b92d7f18 00000000 b92d7ecc b92d7ea0 802d74b4 802d6b60
[ 1459.786004] 7ea0: b92d7ea0 b92d7ea0 b97efb40 00000001 7ea68ab8 b97efb40 b657af30 00000000
[ 1459.786009] 7ec0: b92d7f6c b92d7ed0 802d8864 802d7424 00000000 00000504 0000031c 00000000
[ 1459.786015] 7ee0: 2bbff380 b92d7f40 b968da40 b657af30 05f5e100 00000000 b92d7f00 b657af00
[ 1459.786021] 7f00: 56df3a87 0000020d 000008d0 00000000 12cb1a87 b92d7f20 00000001 7ea68ab8
[ 1459.786026] 7f20: 22a5f107 00000504 00000001 b7795a00 80148e4c 00000100 00000200 801edbe0
[ 1459.786032] 7f40: 00000000 000c283e 00000001 7ea68ab8 00000003 b92d7f78 b92d6000 00000000
[ 1459.786038] 7f60: b92d7fa4 b92d7f70 802d8bc0 802d8670 00000000 00000000 00007a03 00000000
[ 1459.786044] 7f80: 7ea68a0c 7ea68bb0 00000008 0020c49b 0000015a 80108204 00000000 b92d7fa8
[ 1459.786050] 7fa0: 80108060 802d8b10 7ea68bb0 00000008 00000003 7ea68ab8 00000001 000c283e
[ 1459.786056] 7fc0: 7ea68bb0 00000008 0020c49b 0000015a ffffffff 431bde83 000f423f 00ac8c50
[ 1459.786061] 7fe0: 00000000 7ea68a90 00018494 76f1ddb8 60000010 00000003 00000000 00000000
[ 1459.786080] [<8066c2d0>] (sock_poll) from [<802d6be4>] (ep_send_events_proc+0x90/0x144)
[ 1459.786089] [<802d6be4>] (ep_send_events_proc) from [<802d74b4>] (ep_scan_ready_list+0x9c/0x1cc)
[ 1459.786097] [<802d74b4>] (ep_scan_ready_list) from [<802d8864>] (SyS_epoll_wait+0x200/0x4a0)
[ 1459.786104] [<802d8864>] (SyS_epoll_wait) from [<802d8bc0>] (SyS_epoll_pwait+0xbc/0x17c)
[ 1459.786116] [<802d8bc0>] (SyS_epoll_pwait) from [<80108060>] (ret_fast_syscall+0x0/0x28)
[ 1459.786126] Code: e5905090 e1a06000 e1a07001 e5952014 (e59230c0)
[ 1459.786137] ---[ end trace bfc137ac1a74431c ]---
[ 1459.786688] Unable to handle kernel paging request at virtual address f7dd00e8
[ 1459.786690] pgd = b75b4000
[ 1459.786693] [f7dd00e8] *pgd=00000000
[ 1459.786699] Internal error: Oops: 5 [#2] SMP ARM
[ 1459.786703] Modules linked in: fuse rfcomm cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic brcmfmac brcmutil cfg80211 rfkill snd_bcm2835(C) snd_pcm snd_timer snd uio_pdrv_genirq uio fixed evdev joydev hid_multitouch i2c_dev ip_tables x_tables ipv6
[ 1459.786752] CPU: 3 PID: 343 Comm: dhcpcd Tainted: G D C 4.14.50-v7+ #1122
[ 1459.786754] Hardware name: BCM2835
[ 1459.786756] task: b7795a00 task.stack: b92d6000
[ 1459.786766] PC is at locks_remove_posix+0x24/0x150
[ 1459.786774] LR is at filp_close+0x68/0x8c
[ 1459.786776] pc : [<802e0d44>] lr : [<80286e40>] psr: 20000113
[ 1459.786779] sp : b92d7b48 ip : b92d7bf0 fp : b92d7bec
[ 1459.786781] r10: b7795a00 r9 : b96e4818 r8 : 00000000
[ 1459.786784] r7 : b96e4800 r6 : b96e4800 r5 : f7dd0000 r4 : 00000000
[ 1459.786787] r3 : afa25a18 r2 : b09f9300 r1 : b96e4800 r0 : b09f9300
[ 1459.786791] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 1459.786794] Control: 10c5383d Table: 375b406a DAC: 00000055
[ 1459.786798] Process dhcpcd (pid: 343, stack limit = 0xb92d6210)
[ 1459.786802] Stack: (0xb92d7b48 to 0xb92d8000)
[ 1459.786807] 7b40: 80c85000 00000002 80c85000 00000000 80c85704 b9c3b180
[ 1459.786813] 7b60: b97168a0 8025bea0 b968da40 0000ddac b968da78 b7795a00 b92d7bbc b92d7b88
[ 1459.786819] 7b80: 80277eec 8027792c 00000001 8025bea0 b9716c60 b97168a0 00000000 00000544
[ 1459.786825] 7ba0: b968da40 00000001 b968da78 b7795a00 ffffffff 8028c108 8028c05c b96739c0
[ 1459.786831] 7bc0: b96739c0 00000000 b09f9300 b96e4800 b96e4800 00000000 b96e4818 b7795a00
[ 1459.786836] 7be0: b92d7c0c b92d7bf0 80286e40 802e0d2c b96e488c 00000009 00000038 00000000
[ 1459.786842] 7c00: b92d7c34 b92d7c10 802aac78 80286de4 b7795a00 b7795f40 b96e4800 b968da40
[ 1459.786848] 7c20: 00000001 b968da78 b92d7c54 b92d7c38 802aad88 802aabd0 00000004 b7795a00
[ 1459.786853] 7c40: 00000000 00000544 b92d7c94 b92d7c58 80121db8 802aad3c 00000000 80c97204
[ 1459.786859] 7c60: 60000113 0000000b 80957070 80c97204 60000113 0000000b 80957070 00000017
[ 1459.786865] 7c80: 80c08a1c b7795a00 b92d7ccc b92d7c98 8010c498 80121a2c b92d6210 0000000b
[ 1459.786870] 7ca0: 80957008 00002c2d 00000017 b92d7df0 b968da40 b968da40 b968da78 00000014
[ 1459.786876] 7cc0: b92d7ce4 b92d7cd0 801145a8 8010c250 b92d7df0 00002c2d b92d7d3c b92d7ce8
[ 1459.786882] 7ce0: 807a2064 80114540 807a13a4 b9c12e00 00000000 00000000 00000003 ba385d40
[ 1459.786888] 7d00: b9ce7514 00000000 00000000 00010000 8019c2ec 80c093bc 00000017 807a1e1c
[ 1459.786894] 7d20: 00002c2d b92d7df0 b92d6000 00000000 b92d7dec b92d7d40 801011e0 807a1e28
[ 1459.786900] 7d40: b92d7d5c 801507b8 b91e4800 b7795a00 ba385d78 00000000 b7795e60 80b8ed40
[ 1459.786905] 7d60: b92d7d9c 80b8ed40 a2252a80 b9d1bc00 ba385d40 8079ba4c b92d7dcc b92d7d88
[ 1459.786911] 7d80: 80145a74 801edbe0 8013f2c8 80102da0 b9d26018 b7795a00 397f7000 b7795a00
[ 1459.786917] 7da0: b968da40 ba385d40 b7795a00 b9d1bc00 00000000 b968da40 00000153 ba3825c0
[ 1459.786922] 7dc0: 20000013 8066c2d0 80000013 8066c2d0 80000013 ffffffff b92d7e24 b657af30
[ 1459.786928] 7de0: b92d7e5c b92d7df0 807a15f4 801011a4 b09f9780 b92d7e68 00002b6d 8066c2ac
[ 1459.786934] 7e00: b970558c afaa8700 b09f9780 b92d7e68 b657af30 b92d7f18 00000000 b92d7e5c
[ 1459.786940] 7e20: b92d7e60 b92d7e40 802d6be4 8066c2d0 80000013 ffffffff b92d7eb4 7f000000
[ 1459.786945] 7e40: b970558c b92d7ea0 7ea68ab8 b92d7f18 b92d7e9c b92d7e60 802d6be4 8066c2b8
[ 1459.786951] 7e60: 60000013 b657af00 00000000 00000019 802d6b54 b657af00 b92d7ea0 00000000
[ 1459.786958] 7e80: 802d6b54 b657af30 b92d7f18 00000000 b92d7ecc b92d7ea0 802d74b4 802d6b60
[ 1459.786964] 7ea0: b92d7ea0 b92d7ea0 b97efb40 00000001 7ea68ab8 b97efb40 b657af30 00000000
[ 1459.786969] 7ec0: b92d7f6c b92d7ed0 802d8864 802d7424 00000000 00000504 0000031c 00000000
[ 1459.786980] 7ee0: 2bbff380 b92d7f40 b968da40 b657af30 05f5e100 00000000 b92d7f00 b657af00
[ 1459.786986] 7f00: 56df3a87 0000020d 000008d0 00000000 12cb1a87 b92d7f20 00000001 7ea68ab8
[ 1459.786991] 7f20: 22a5f107 00000504 00000001 b7795a00 80148e4c 00000100 00000200 801edbe0
[ 1459.786997] 7f40: 00000000 000c283e 00000001 7ea68ab8 00000003 b92d7f78 b92d6000 00000000
[ 1459.787003] 7f60: b92d7fa4 b92d7f70 802d8bc0 802d8670 00000000 00000000 00007a03 00000000
[ 1459.787009] 7f80: 7ea68a0c 7ea68bb0 00000008 0020c49b 0000015a 80108204 00000000 b92d7fa8
[ 1459.787014] 7fa0: 80108060 802d8b10 7ea68bb0 00000008 00000003 7ea68ab8 00000001 000c283e
[ 1459.787020] 7fc0: 7ea68bb0 00000008 0020c49b 0000015a ffffffff 431bde83 000f423f 00ac8c50
[ 1459.787026] 7fe0: 00000000 7ea68a90 00018494 76f1ddb8 60000010 00000003 00000000 00000000
[ 1459.787038] [<802e0d44>] (locks_remove_posix) from [<80286e40>] (filp_close+0x68/0x8c)
[ 1459.787048] [<80286e40>] (filp_close) from [<802aac78>] (put_files_struct+0xb4/0x10c)
[ 1459.787058] [<802aac78>] (put_files_struct) from [<802aad88>] (exit_files+0x58/0x5c)
[ 1459.787066] [<802aad88>] (exit_files) from [<80121db8>] (do_exit+0x398/0xb9c)
[ 1459.787075] [<80121db8>] (do_exit) from [<8010c498>] (die+0x254/0x34c)
[ 1459.787090] [<8010c498>] (die) from [<801145a8>] (__do_kernel_fault.part.0+0x74/0x84)
[ 1459.787101] [<801145a8>] (__do_kernel_fault.part.0) from [<807a2064>] (do_page_fault+0x248/0x3a4)
[ 1459.787110] [<807a2064>] (do_page_fault) from [<801011e0>] (do_DataAbort+0x48/0xc4)
[ 1459.787122] [<801011e0>] (do_DataAbort) from [<807a15f4>] (__dabt_svc+0x54/0x80)
[ 1459.787125] Exception stack(0xb92d7df0 to 0xb92d7e38)
[ 1459.787130] 7de0: b09f9780 b92d7e68 00002b6d 8066c2ac
[ 1459.787136] 7e00: b970558c afaa8700 b09f9780 b92d7e68 b657af30 b92d7f18 00000000 b92d7e5c
[ 1459.787140] 7e20: b92d7e60 b92d7e40 802d6be4 8066c2d0 80000013 ffffffff
[ 1459.787153] [<807a15f4>] (__dabt_svc) from [<8066c2d0>] (sock_poll+0x24/0xa8)
[ 1459.787161] [<8066c2d0>] (sock_poll) from [<802d6be4>] (ep_send_events_proc+0x90/0x144)
[ 1459.787168] [<802d6be4>] (ep_send_events_proc) from [<802d74b4>] (ep_scan_ready_list+0x9c/0x1cc)
[ 1459.787175] [<802d74b4>] (ep_scan_ready_list) from [<802d8864>] (SyS_epoll_wait+0x200/0x4a0)
[ 1459.787182] [<802d8864>] (SyS_epoll_wait) from [<802d8bc0>] (SyS_epoll_pwait+0xbc/0x17c)
[ 1459.787190] [<802d8bc0>] (SyS_epoll_pwait) from [<80108060>] (ret_fast_syscall+0x0/0x28)
[ 1459.787196] Code: e8bd4000 e590300c e1a06001 e5935028 (e59530e8)
[ 1459.787201] ---[ end trace bfc137ac1a74431d ]---
[ 1459.787205] Fixing recursive fault but reboot is needed!
[ 1460.462015] Unable to handle kernel NULL pointer dereference at virtual address 0000001c
[ 1460.462020] pgd = b9690000
[ 1460.462023] [0000001c] *pgd=00000000
[ 1460.462031] Internal error: Oops: 5 [#3] SMP ARM
[ 1460.462034] Modules linked in: (FEK)
[ 1460.462044] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462046] pgd = b9690000
[ 1460.462047] [00002b69] *pgd=00000000
[ 1460.462051] Internal error: Oops: 5 [#4] SMP ARM
[ 1460.462052] Modules linked in: (FEK)
[ 1460.462057] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462059] pgd = b9690000
[ 1460.462060] [00002b69] *pgd=00000000
[ 1460.462064] Internal error: Oops: 5 [#5] SMP ARM
[ 1460.462065] Modules linked in: (FEK)
[ 1460.462070] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462072] pgd = b9690000
[ 1460.462073] [00002b69] *pgd=00000000
[ 1460.462077] Internal error: Oops: 5 [#6] SMP ARM
[ 1460.462078] Modules linked in: (FEK)
[ 1460.462083] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462084] pgd = b9690000
[ 1460.462086] [00002b69] *pgd=00000000
[ 1460.462089] Internal error: Oops: 5 [#7] SMP ARM
[ 1460.462091] Modules linked in: (FEK)
[ 1460.462095] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462097] pgd = b9690000
[ 1460.462098] [00002b69] *pgd=00000000
[ 1460.462102] Internal error: Oops: 5 [#8] SMP ARM
[ 1460.462103] Modules linked in: (FEK)
[ 1460.462108] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462109] pgd = b9690000
[ 1460.462111] [00002b69] *pgd=00000000
[ 1460.462114] Internal error: Oops: 5 [#9] SMP ARM
[ 1460.462115] Modules linked in: (FEK)
[ 1460.462120] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462122] pgd = b9690000
[ 1460.462123] [00002b69] *pgd=00000000
[ 1460.462127] Internal error: Oops: 5 [#10] SMP ARM
[ 1460.462128] Modules linked in: (FEK)
[ 1460.462134] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462135] pgd = b9690000
[ 1460.462137] [00002b69] *pgd=00000000
[ 1460.462141] Internal error: Oops: 5 [#11] SMP ARM
[ 1460.462142] Modules linked in: (FEK)
[ 1460.462146] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462148] pgd = b9690000
[ 1460.462149] [00002b69] *pgd=00000000
[ 1460.462153] Internal error: Oops: 5 [#12] SMP ARM
[ 1460.462154] Modules linked in: (FEK)
[ 1460.462159] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462160] pgd = b9690000
[ 1460.462162] [00002b69] *pgd=00000000
[ 1460.462165] Internal error: Oops: 5 [#13] SMP ARM
[ 1460.462167] Modules linked in: (FEK)
[ 1460.462171] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462173] pgd = b9690000
[ 1460.462174] [00002b69] *pgd=00000000
[ 1460.462178] Internal error: Oops: 5 [#14] SMP ARM
[ 1460.462179] Modules linked in: (FEK)
[ 1460.462183] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462185] pgd = b9690000
[ 1460.462187] [00002b69] *pgd=00000000
[ 1460.462190] Internal error: Oops: 5 [#15] SMP ARM
[ 1460.462191] Modules linked in: (FEK)
[ 1460.462196] Unable to handle kernel paging request at virtual address 00002b69
[ 1460.462198] pgd = b9690000
[ 1460.462200] [00002b69] *pgd=00000000
Have you run vcgencmd get_throttled to confirm that you are not getting an under-voltage condition? Semi-random crashes under load in different places are usually voltage-related.
I was able to reproduce the issue on a Raspberry 3 model B+ with an official Raspberry Pi power supply. The issue only seems to occur with analog audio.
vcgencmd get_throttled always returned "throttled=0x0"
@pelwell suggested disabling high quality audio by adding audio_pwm_mode=1 to config.txt. I tried this and the crash no longer occurs.
It may be that qmlmatrix is causing a high system/GL load which slows down the VC audio component or ALSA driver. If the ALSA VC audio driver fails to handle a buffer underflow correctly and tramples memory then it would explain the random oops.
I was eventually able to trigger the same issue by running glmark2 plus multiple instances of memtester. I'm not sure why qmlmatrix was worse, perhaps large texture uploads ?
Thanks @timg236!
I'm glad someone else managed to reproduce it, because after moving offices I'm having difficulty doing so (at least on vanilla raspbian - we reproduced it many times on Kano OS, which is closely related).
In the version of qmlmatrix above, we missed setting the qml attribgute renderTarget: Canvas.FramebufferObject which we believe meant that there was an additional copy-to-gpu option each frame. (Hence this fix). So that would be consistent with memory subsystem load.
We will try out audio_pwm_mode=1.
We are pretty sure that it's not a voltage issue; we have seen it on both kits where don't get low voltage warnings, and those where we do. Thats from the kernel rather than vcgencmd; I'll check vcgencmd as well just to be sure.
We have tested audio_pwm_mode=1 and it seems to prevent the problem.
By the way, the above is the simplest reproducer, not the fastest. We can reproduce it in about 5 minutes. When we are running another app at the same time as qmlmatrix (a love2d app, which runs on its own dispmanx layer) the bug seems to happen more quickly. We can give you a binary of that, or an image containing it, if that would help you debug the problem.
I think the existing tests are good enough. The problem seems to be a bug triggered by underflow in the audio_pwm_mode=2 code which corrupts VC memory and then causes invalid DMA writes which in turn cause random kernel oops.
I have a vc firmware patch which resolves the crash but needs a bit more testing.
N.B This would only fix the crash and not the audio glitches. Ensuring that high quality analog audio gets enough resources when graphics/VCHIQ is heavily loaded might not be practical to fix.
That's really good news; we'll test it when it comes out.
I've seen this happen before on my Pi3B running MPV(video + audio) and an Qt EGLFS overlay after a few days of playtime. Audio PCM device is stuck, while the mixer still works.
amixer cset iface=MIXER,name='PCM Playback Route' '2' to re-direct the output from 3.5mm audio jack to HDMI will un-stuck the audio.
Hello (and thanks to jdb for pointing me to this problem page). Here is another view / test which I believe quickly reveals the issue. rPI 3B+ with latest code, non desktop using mpg123 to play audio, using 2 x SSH windows to start and stop audio.
Test 1:
dtparam=audio=on
audio_pwm_mode=1
window 1: start audio (good)
window 2: start audio (good and mixing with first)
window 1, stop and restart same audio (good)
Test 2:
dtparam=audio=on
audio_pwm_mode=2
window 1: start audio (good)
window 2: start audio (good and mixing with first)
window 1, stop and restart same audio (BAD).
Added here in case it helps anyone. Really looking forward to a fix for this, fingers crossed!
Latest rpi-update firmware has the fix that has been discussed here. Please update and test.
Hello all, and thanks for those involved in looking at this. The test case I described above (window 1: start audio (good), window 2: start audio (good and mixing with first), window 1, stop and restart same audio (BAD).) no longer fails. I've tested back and forth between windows and with the new rpi-update i can no longer break it. Thanks for the fix!
@Ealdwulf can you confirm the fix is good?
We are running a soak test, will let you know later today @popcornmix .
Looks pretty good - it's been running for 4 hours, restarting qmlmatrix every 450 seconds, with no crashes. We're going to leave the soak running overnight , but just out of an abundance of caution.
It's still going this morning, so it looks like it's fixed 馃憤 .
Most helpful comment
Latest rpi-update firmware has the fix that has been discussed here. Please update and test.