Zephyr: Usage Fault with CONFIG_NO_OPTIMIZATIONS even on samples/hello_world

Created on 2 Sep 2019 · 21Comments · Source: zephyrproject-rtos/zephyr

Describe the bug
Zephyr crashes with a usage fault before getting to main() if CONFIG_NO_OPTIMIZATIONS is passed in.
BOARD=cc1352r1_launchxl
using samples/hello_world (custom application exhibits the same behaviour)

CONFIG_NO_OPTIMIZATIONS is required in order to see values other than optimized out through JTAG / gdb (see screenshot).

More info on CONFIG_NO_OPTIMIZTIONS.

To Reproduce
Steps to reproduce the behavior:

echo 'CONFIG_NO_OPTIMIZATIONS=y' >> samples/hello_world/prj.conf
west build --pristine -b cc1352r1_launchxl samples/hello_world
west debug
See error

Expected behavior

Aside from being able to debug inspect variables and not see 'value optimized out', I would expect this:

***** Booting Zephyr OS build v2.0.0-rc1-265-gc4d2e173a42c *****
Hello World! cc1352r1_launchxl

Impact

For my purposes, this is a bit of a showstopper, as I was hoping to get application-level code tested over UART before moving it to BLE in time for the Linux Plumbers Conference.

Without the ability to inspect all variables through JTAG, it makes debugging via JTAG only slightly more useful than printf debugging, but not by much. I've found it often takes
at least 3x as long to debug applications when having to rely on printed debugging.

However, in my case, I am using the UART for my application (i.e. cannot use printf debugging), and so it makes debugging effectively impossible.

Screenshots or console output

Backtrace:

Program received signal SIGINT, Interrupt.
k_cpu_idle () at /home/cfriedt/workspace/zephyrproject/zephyr/arch/arm/core/cpu_idle.S:99
99      bx lr
bt
#0  k_cpu_idle () at /home/cfriedt/workspace/zephyrproject/zephyr/arch/arm/core/cpu_idle.S:99
#1  0x0000466a in z_arch_system_halt (reason=0) at /home/cfriedt/workspace/zephyrproject/zephyr/kernel/fatal.c:29
#2  0x000023ca in k_sys_fatal_error_handler (reason=0, esf=0x20000910 <_interrupt_stack+2016>) at /home/cfriedt/workspace/zephyrproject/zephyr/kernel/fatal.c:42
#3  0x000024d4 in z_fatal_error (reason=0, esf=0x20000910 <_interrupt_stack+2016>) at /home/cfriedt/workspace/zephyrproject/zephyr/kernel/fatal.c:121
#4  0x00003cd4 in z_arm_fatal_error (reason=0, esf=0x20000910 <_interrupt_stack+2016>) at /home/cfriedt/workspace/zephyrproject/zephyr/arch/arm/core/fatal.c:50
#5  0x0000188e in _Fault (esf=0x20000910 <_interrupt_stack+2016>, exc_return=42) at /home/cfriedt/workspace/zephyrproject/zephyr/arch/arm/core/cortex_m/fault.c:867
#6  0x00001322 in __usage_fault () at /home/cfriedt/workspace/zephyrproject/zephyr/arch/arm/core/fault_s.S:161
#7  <signal handler called>
#8  0x0000be00 in ?? ()
#9  <signal handler called>
#10 z_errno () at /home/cfriedt/workspace/zephyrproject/zephyr/lib/os/fdtable.c:146
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Console:

**FATAL: ***** USAGE FAULT *****                                                                                 
FATAL:   Illegal use of the EPSR                                                                                 
FATAL: r0/a1:  0x40001000  r1/a2:  0x0000002a  r2/a3:  0xe4000000                                                
FATAL: r3/a4:  0x40001000 r12/ip:  0x00000000 r14/lr:  0xfffffffd                                                
FATAL:  xpsr:  0x8000000f                                                                                        
FATAL: Faulting instruction address (r15/pc): 0x0000be00                                                         
FATAL: >>> ZEPHYR FATAL ERROR 0: CPU exception                                                                   
FATAL: Current thread: 0x20000028 (unknown)                                                                      
FATAL: Halting system

Here is a screenshot of me attempting to inspect variables without the CONFIG_NO_OPTIMIZATION option. Likely, the default optimization level is -O2 or -Os.

optimized-out

Environment (please complete the following information):

OS: Ubuntu Bionic
Toolchain: GNU ARM Embedded 2019 Q3 (but presumably affects all gcc variants)
Commit: 0906671a7b17d7218c7f3acf0bcd991712dafd35

Additional context

ARM bug low

Source

cfriedt

All 21 comments

I have tested the hello_world sample on the nrf52840_pca10056 with CONFIG_NO_OPTIMIZATIONS enabled and it seems to work fine here. Do you have another board to test @cfriedt?

Also, in my case the variables show fine in the debugger (Segger Ozone):

carlescufi on 2 Sep 2019

@cfriedt also, please note that the right name of this Kconfig option is CONFIG_NO_OPTIMIZATIONS (note the S at the end). In your description you seem to have misspelled it.

carlescufi on 2 Sep 2019

👍1

@cfriedt also, please note that the right name of this Kconfig option is CONFIG_NO_OPTIMIZATIONS (note the S at the end). In your description you seem to have misspelled it.

Fixed! Sorry about the typo(s) - I updated the description.

cfriedt on 2 Sep 2019

Setting priority to low since this doesn't seem to be a general issue, but one with this particular board.

carlescufi on 2 Sep 2019

Hmm... considering that plumbers is in a matter of days, I don't consider it low priority, but sure..

cfriedt on 2 Sep 2019

Hmm... considering that plumbers is in a matter of days, I don't consider it low priority, but sure..

Well typically we consider it low-priority if it only affects a single board. That said with this issue the problem is that I am not sure if the board support is broken or if it is a symptom of a deeper problem we need to understand.
I wonder if @bwitherspoon could help out here, since he's listed as the official maintainer of this board.

carlescufi on 2 Sep 2019

Hopefully. It happens quite systematically.

E.g. it's always after the 6th char printed out using uart_cc13xx_cc26xx_poll_out(). In this case, it's the boot banner( " ", of " Booting Zephyr OS build v2.0.0-rc1-265-gc4d2e173a42c *").

One of the tricky things about this board, is that there are some "magic" reserved sections in memory due to in-ROM drivers. Personally, if those sections aren't reserved in the memory map, then stack corruption would likely occur.

I'll see if I can get a link to point to the linker script that TI uses in their SDK.

cfriedt on 2 Sep 2019

Stack trace is here:

#0  uart_cc13xx_cc26xx_poll_out (dev=0x20000ec0 <__device_uart_cc13xx_cc26xx_0>, c=32 ' ') at /home/cfriedt/workspace/zephyrproject/zephyr/drivers/serial/uart_cc13xx_cc26xx.c:55
#1  0x00003c14 in z_impl_uart_poll_out (dev=0x20000ec0 <__device_uart_cc13xx_cc26xx_0>, out_char=32 ' ') at ../include/drivers/uart.h:634
#2  0x00003c32 in uart_poll_out (dev=0x20000ec0 <__device_uart_cc13xx_cc26xx_0>, out_char=32 ' ') at zephyr/include/generated/syscalls/uart.h:17
#3  0x00000b12 in console_out (c=32) at /home/cfriedt/workspace/zephyrproject/zephyr/drivers/console/uart_console.c:90
#4  0x000009c0 in char_out (c=32, ctx_p=0x20000cd4 <_main_stack+932>) at /home/cfriedt/workspace/zephyrproject/zephyr/lib/os/printk.c:320
#5  0x000004f2 in z_vprintk (out=0x9a1 <char_out>, ctx=0x20000cd4 <_main_stack+932>, fmt=0x59cd " Booting Zephyr OS build v2.0.0-rc1-266-ged39fdcbd40d *****\n", ap=...) at /home/cfriedt/workspace/zephyrproject/zephyr/lib/os/printk.c:116
#6  0x000009ec in vprintk (fmt=0x59c8 "***** Booting Zephyr OS build v2.0.0-rc1-266-ged39fdcbd40d *****\n", ap=...) at /home/cfriedt/workspace/zephyrproject/zephyr/lib/os/printk.c:345
#7  0x00003abe in printk (fmt=0x59c8 "***** Booting Zephyr OS build v2.0.0-rc1-266-ged39fdcbd40d *****\n") at /home/cfriedt/workspace/zephyrproject/zephyr/lib/os/printk.c:399
#8  0x000025c8 in bg_thread_main (unused1=0x0 <z_errno>, unused2=0x0 <z_errno>, unused3=0x0 <z_errno>) at /home/cfriedt/workspace/zephyrproject/zephyr/kernel/init.c:267
#9  0x00003a5e in z_thread_entry (entry=0x259d <bg_thread_main>, p1=0x0 <z_errno>, p2=0x0 <z_errno>, p3=0x0 <z_errno>) at /home/cfriedt/workspace/zephyrproject/zephyr/lib/os/thread_entry.c:29
#10 0x00002732 in z_arch_switch_to_main_thread (_main=0x259d <bg_thread_main>, main_stack_size=1024, main_stack=0x20000930 <_main_stack>, main_thread=0x20000028 <_main_thread_s>) at ../arch/arm/include/kernel_arch_func.h:110
#11 switch_to_main_thread () at /home/cfriedt/workspace/zephyrproject/zephyr/kernel/init.c:425
#12 0x000027f6 in z_cstart () at /home/cfriedt/workspace/zephyrproject/zephyr/kernel/init.c:527
#13 0x00003d32 in _PrepC () at /home/cfriedt/workspace/zephyrproject/zephyr/arch/arm/core/prep_c.c:174
#14 0x00001402 in __start () at /home/cfriedt/workspace/zephyrproject/zephyr/arch/arm/core/cortex_m/reset.S:111

As soon as I exit z_vprintk() after printing out "* " in the boot banner, the exception is entered.

cfriedt on 2 Sep 2019

Hmm... I just cat'ed all of the generated .ld files in the build directory, along with soc/arm/ti_simplelink/cc13x2_cc26x2/linker.ld, and they definitely do not take reserved memory regions into account.

cfriedt on 2 Sep 2019

Uploaded linker scripts from TI's 3.10 SDK

CC1352R1_LAUNCHXL_NoRTOS.lds.txt
CC1352R1_LAUNCHXL_TIRTOS.lds.txt

cfriedt on 2 Sep 2019

The Cortex-M linker script shows how to declare an entire vendor specific section.

It isn't immediately clear to me how to inject a section at a specific memory address as part of e.g. .data, or .today's, or .text .

Does Zephyr's linker script have a convenient way to do that?

cfriedt on 2 Sep 2019

Does Zephyr's linker script have a convenient way to do that?

At the SoC level you can do it like this.

Directly in the main linker script you can do it like this.

Also take a look at this commit that moved the vector relay code to be CMake-handled, maybe it can serve as inspiration as well.

carlescufi on 2 Sep 2019

👍1

@carlescufi - I just threw up a "hack" PR. It fixes this issue, and so the stack was definitely getting corrupted because of reserved memory regions. I don't expect it to be approved as-is, but will work on it a bit more after LPC.

Would be nice to see if @bwitherspoon has some suggestions too.

cfriedt on 2 Sep 2019

Thanks @cfriedt for the PRs.

carlescufi on 2 Sep 2019

👍1

Have you tried increasing the size of various stacks? The ones being used can be found in your zephyr/.config file. Maybe try doubling them?

dcpleung on 4 Sep 2019

I have the same issue on both the CC1352 and CC2652 launchpads. I will have closer look at it this week.

bwitherspoon on 4 Sep 2019

I may have conflated an issue with OpenOCD and this issue. I was using upstream OpenOCD (which TI has committed to supporting). My other PR suggested to use reset_confog trst_only in OpenOCD because that allowed me to flash with upstream. However, it also does not properly reset the cpu or put the cpu into debug mode. So it's possible that (although flashing works) the usage fault I received was from the CPU going off in a tangent.

The TI ROM drivers do have several "magic" reserved memory regions though, so this cha ge might still be useful for bringing that out into Kconfig.

cfriedt on 4 Sep 2019

@vanti can you see if this might still be an issue and look into it. Since it seems specific to the TI platform.

galak on 22 Jul 2020

For what it's worth, I'm no longer having this issue, but now I'm using the OpenOCD from the Zephyr SDK.

cfriedt on 22 Jul 2020

@galak I am unable to reproduce this issue using hello_world. That said, whenever I see a crash with CONFIG_NO_OPTIMIZATIONS=y, it is usually due to the default stack sizes being too small. Maybe an advice for someone who encounters this problem in the future would be to increase the stack sizes and see if it fixes the issue.

vanti on 23 Jul 2020

@galak I am unable to reproduce this issue using hello_world. That said, whenever I see a crash with CONFIG_NO_OPTIMIZATIONS=y, it is usually due to the default stack sizes being too small. Maybe an advice for someone who encounters this problem in the future would be to increase the stack sizes and see if it fixes the issue.

there is a related issue for this, #19340, so let's close this one for now.