Describe the bug
When CONFIG_MULTITHREADING=n then interrupts are initially disabled (see bug #8393) when they are enabled then usage fault happens immediately (seems that it happens during returning from interrupt).
To Reproduce
Steps to reproduce the behavior:
void main(void)
{
/* enable interrupts */
irq_unlock(0);
printk("Hello World! %s\n", CONFIG_BOARD);
/* wait for interrupt coming from LF clock being started. */
k_busy_wait(1000000);
}
prj.conf:
CONFIG_MULTITHREADING=n
CONFIG_LOG=y
*** Booting Zephyr OS build zephyr-v2.3.0-979-ga043d48c5472 ***
[00:00:02.426,116] <err> os: ***** USAGE FAULT *****
[00:00:02.426,116] <err> os: Illegal load of EXC_RETURN into PC
[00:00:02.426,116] <err> os: r0/a1: 0x00000004 r1/a2: 0x00000001 r2/a3: 0x00000001
[00:00:02.426,116] <err> os: r3/a4: 0x00000000 r12/ip: 0x00000020 r14/lr: 0x000029db
[00:00:02.426,116] <err> os: xpsr: 0x00000000
[00:00:02.426,147] <err> os: Faulting instruction address (r15/pc): 0xe000ed00
[00:00:02.426,147] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.426,147] <err> os: Current thread: 0x00000000 (unknown)
[00:00:03.378,143] <err> os: Halting system
Expected behavior
No error should appear.
Impact
Interrupts cannot be used when multithreading is off. User cannot use any driver (even out of tree which does not use kernel synchronization apis).
Environment (please complete the following information):
Note that #26372 was probably failing because of that, too (apart from kernel API usage).
@nordic-krch what is the root cause of the usage fault? won't CONFIG_LOG=y require multithreading by default? could you test with something that doesn't require threads at all? Because I remember that I was able to enable interrupts with irq_lock(0) and then use interrupts without a problem with multithreading disabled.
EDIT: See for example this use of an interrupt-driven UART: https://github.com/JuulLabs-OSS/mcuboot/blob/master/boot/zephyr/serial_adapter.c#L229 which is perfectly functional. So while it's true that interrupts are disabled by default with multithreading disabled, I am not quite sure that they are broken when enabled.
I was seeing the same issue yesterday when trying to use I2C functions from main() without a separate thread whilst creating/integrating a driver, trace is as follows:
*** Booting Zephyr OS build zephyr-v2.0.0-8735-g2f1d9dded535 ***
[00:00:00.009,857] \1b[1;31m<err> os: ***** USAGE FAULT *****\1b[0m
[00:00:00.015,533] \1b[1;31m<err> os: Illegal load of EXC_RETURN into PC\1b[0m
[00:00:00.022,308] \1b[1;31m<err> os: r0/a1: 0xb672b501 r1/a2: 0x6a104a0b r2/a3: 0xbf1e2800\1b[0m
[00:00:00.031,005] \1b[1;31m<err> os: r3/a4: 0x62112100 r12/ip: 0xf9c2f007 r14/lr: 0xf3efb662\1b[0m
[00:00:00.039,703] \1b[1;31m<err> os: xpsr: 0xea4f0000\1b[0m
[00:00:00.044,952] \1b[1;31m<err> os: Faulting instruction address (r15/pc): 0xf1a08005\1b[0m
[00:00:00.052,856] \1b[1;31m<err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0\1b[0m
[00:00:00.060,699] \1b[1;31m<err> os: Current thread: 0x00000000 (unknown)\1b[0m
[00:00:00.067,474] \1b[1;31m<err> os: Halting system\1b[0m
@carlescufi with CONFIG_LOG_MINIMAL=y i see the same issue and I think that error comes from clock interrupt (when LF clock is ready). Could it be serial recovery turns on multithreading?
@carlescufi with
CONFIG_LOG_MINIMAL=yi see the same issue and I think that error comes from clock interrupt (when LF clock is ready). Could it be serial recovery turns on multithreading?
No, it does not. I just checked: disabled logging and built with CONFIG_MCUBOOT_SERIAL and CONFIG_MULTITHREADING remains disabled.
@de-nordic and @nvlsianpu can you confirm that serial recovery is fully functional in mcuboot and that it keeps CONFIG_MULTITHREADING disabled?
@carlescufi I will look at this today and let you know.
@carlescufi the CONFIG_MULTITHREADING is still disabled, but serial recovery does not work with latest master commit (75949f470f56161e1c6708193384d5e4d32f3f2f at the time I am writing this).
I have enabled it for test purposes and then it worked.
Tested on nrf52840dk_nrf52840.
@de-nordic thanks.
I have enabled it for test purposes and then it worked.
You mean you enabledCONFIG_MULTITHREADINGright?
In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.
In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.
@carlescufi do you want me to check it?
In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.
@carlescufi do you want me to check it?
Sure, yes please @de-nordic since we are at it.
@nordic-krch this commit introduced the regression.
Waiting for @nordic-krch input on the https://github.com/zephyrproject-rtos/zephyr/commit/2881df3d0cb329452fbe32da6f9b368a82366888, but I am afraid that the change might have uncovered the issue rather than introduce it.
I have tried to pinpoint exact location where the failt is triggered, in mcuboot, and I have found out that it would happen (for me) on second invocation of boot_serial_start:595, call f->read(...), but when I have tried to step it (si) in gdb disassembly, I could basically put rock on the enter key and the issue would never happen.
@carlescufi @de-nordic @nordic-krch This issue caught my attention and I took a quick deeper look. I think the problem lies in incorrect configuration of stack pointer registers when CONFIG_MULTITHREADING is disabled.
Thread mode is configured to use PSP here:
https://github.com/zephyrproject-rtos/zephyr/blob/3bc6c555a51888b5996d06f68f7b65892a9c79e2/arch/arm/core/aarch32/cortex_m/reset.S#L95-L98
Because further initialization is done this way:
https://github.com/zephyrproject-rtos/zephyr/blob/7d90812f265c0d23bad904434cef9c1616ba08ad/kernel/init.c#L475-L479
PSP is not reconfigured to the top of the main stack by this code that is called from switch_to_main_thread():
https://github.com/zephyrproject-rtos/zephyr/blob/4c673395718e862949a6c153214662fbc92cd479/arch/arm/core/aarch32/thread.c#L430-L438
and after initialization is finished, PSP points to the same stack as MSP, just a little below MSP. Then, if an interrupt routine uses the stack (pointed by MSP) more intensively, it can overwrite the values stacked there on the exception entry (using PSP) and the return from exception may fail in various ways (most likely with UsageFault). But if all interrupt routines don't use too much stack, everything can work correctly for quite a long time. As @de-nordic already signaled:
Waiting for @nordic-krch input on the 2881df3, but I am afraid that the change might have uncovered the issue rather than introduce it.
And it seems this issue may occur on all Cortex-M SoCs. I'm not sure who would be the best person to look at this problem.
@anangl thanks for the extensive analysis!
And it seems this issue may occur on all Cortex-M SoCs. I'm not sure who would be the best person to look at this problem.
@ioannisg should be able to look at this.
@anangl thanks!
@anangl thanks for the study you did - it's half of the work already :)
This was meant for #27343 but is still relevant here. I've updated the issue title.
Using #27136 on current master I've confirmed broken CONFIG_MULTITHREADING=n support on:
Clearly CONFIG_MULTITHREADING=n is a poorly tested configuration, and the failure is not Nordic-specific.
Most helpful comment
@anangl thanks for the study you did - it's half of the work already :)