The following is some barebones code I'm using to just boot to the main app space at POST_APPLICATION_ADDR
:
#include "mbed.h"
#include "mbed-trace/mbed_trace.h"
int main()
{
//mbed_trace_init();
mbed_start_application(POST_APPLICATION_ADDR);
}
This code works, and I successfully boot from the bootloader into the main app space as expected. However, once I uncomment mbed_trace_init();
, my app hangs after the bootloader and is unable to start the main app.
Using gdb, I've found that the fault seems to arrive at some point within __scvKernelInitialize ()
. This is the disassemble print out of its definition:
Dump of assembler code for function osKernelInitialize:
0x0802463c <+0>: push {lr}
0x0802463e <+2>: sub sp, #12
0x08024640 <+4>: bl 0x8024638 <osRtxKernelPreInit>
0x08024644 <+8>: bl 0x80240a0 <IsIrqMode>
0x08024648 <+12>: mov r3, r0
0x0802464a <+14>: cmp r3, #0
0x0802464c <+16>: bne.n 0x8024658 <osKernelInitialize+28>
0x0802464e <+18>: bl 0x80240ba <IsIrqMasked>
0x08024652 <+22>: mov r3, r0
0x08024654 <+24>: cmp r3, #0
0x08024656 <+26>: beq.n 0x8024668 <osKernelInitialize+44>
0x08024658 <+28>: mvn.w r0, #5
0x0802465c <+32>: bl 0x8023df4 <EvrRtxKernelError>
0x08024660 <+36>: mvn.w r3, #5
0x08024664 <+40>: str r3, [sp, #4]
0x08024666 <+42>: b.n 0x8024672 <osKernelInitialize+54>
=> 0x08024668 <+44>: ldr r3, [pc, #16] ; (0x802467c <osKernelInitialize+64>)
0x0802466a <+46>: mov r12, r3
0x0802466c <+48>: svc 0
0x0802466e <+50>: mov r3, r0
0x08024670 <+52>: str r3, [sp, #4]
0x08024672 <+54>: ldr r3, [sp, #4]
0x08024674 <+56>: mov r0, r3
0x08024676 <+58>: add sp, #12
0x08024678 <+60>: ldr.w pc, [sp], #4
0x0802467c <+64>: rors r1, r3
0x0802467e <+66>: lsrs r2, r0, #32
From there, a few things happen inside of irq_cm4f.S before hitting an exception on line 93 of said file. After that except.S
is called from there a final fault message is seen in the debugger:
Program received signal SIGTRAP, Trace/breakpoint trap.
0x08003320 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681 errno = EBADF;
What's causing the fault within irq_cm4f.S
?
My custom target uses an STM32F413RH chip, but inside custom_targets.json
I've specified the device name to be "device_name": "STM32F413ZH"
because this has sector information filled out and is virtually the same processor but with a larger package / more pins.
Target: Custom target with STM32F413RH processor
Toolchain: GCC_ARM 8.2.1
Tool: mbed-cli
Vers: mbed-os 5.13
[x] Question
[ ] Enhancement
[ ] Bug
For what it's worth, this is the diff of the progress through that SVC0 from irq_cm4f.S
between a bootloader not using trace (AKA no faults, left side) and bootloader that uses trace (AKA a fault, right side).
In the faulting condition, it seems that it diverges from 72 because the contents of print osRtxInf
show that thread.run.curr = 0x10
and thread.run.next = 0x0
. Because these aren't equal, a different branch is jumped to. I'm guessing this means that when trace is included there's an extra thread that needs to close. Again, just guessing here.
Moving on, line 93 is where the exception is caused. This is the line:
STR R12,[R1,#TCB_SP_OFS] ; Store SP
The values involved here are as follows:
R12
: 0x2004ff98R01
: 0x10#TCB_SP_OFS
: 0x80048A5*I don't really understand the STR
op code but something in the above values is disagreeing with it.
*n.b. I got this value in gdb using print (uint32_t)TCB_SP_OFS
-- it's possible I chose the wrong cast type here.
@AGlass0fMilk @40Grit @0xc0170 Tagging you guys now that I've got something worth showing and the issue thread has moved.
The bootloader->main image transition basically tries to "reset" hardware things back to a safe state, then effectively runs the main image as if through the reset vector, so everything gets reinitialised.
there's an extra thread that needs to close.
All software state is lost, so it's not about closing - we effectively do a software "reset" in mbed_start_application
and run an entirely new stand-alone image.
But if the hardware is not correctly reset by either the bootloader's shutdown or main image's start-up code, then the new image could be being confused by hardware not being in the expected state.
In this case though, it seems like maybe it's just a failure to initialize RAM correctly? At the point that first SVC_Handler is called, RTX should not have been initialised yet, meaning all its static osRtxInfo
structure should be in its default state (as specified by its initialiser in rtx_kernel.c
).
I would expect that structure to have been placed in the .data
area of the linker map, and then copy-initialised by the assembler CopyDataInit
in startup_stm32f413xx.S
early in the main image boot.
Check to see why that apparently hasn't happened, or has gone wrong. Does the linker map show it between _sdata
and _edata
?
Maybe stick a watchpoint on the location of osThreadInfo.thread.run.curr
. That '0x10` is nonsense - it should be 0 at startup, and later become a valid pointer value.
I don't really understand the STR op code but something in the above values is disagreeing with it.
It's expecting R1 to be a pointer to a osRtxThread_t
structure, and it's trying to set its sp
member (TCB_SP_OFS
is offsetof(osRtxThread_t, sp)
). But R1 is not a valid pointer.
Did a comparison of osRtxInfo between the working bootloader (not calling mbed_trace_init()
) and the failing code captured at line 69 of this SVC handling source file for context. Here's the dif (left side works, right side fails):
Interesting that the os_id
and version
are miss-matching. Definitely would have figured these to be the same.
Anyway, @kjbracey-arm I've compared the .map files of the successful bootloader (left) and the failing (right) at the region between _sdata
and _edata
:
Not sure what to make of this. What's left out of the screen shot is just more of the same symbols with the right side just offset by 0x40 all the way to _edata
. Interesting to see .data.m_trace
show up here though. Does anything look bizarre here? Not sure if this is exactly what you wanted to see. Perhaps you meant the map files of the main application, blinky_application.map
.
Maybe stick a watchpoint on the location of osThreadInfo.thread.run.curr. That '0x10` is nonsense - it should be 0 at startup, and later become a valid pointer value.
Couldn't find a var named "osThreadInfo" so I'm assuming you meant "osRtxInfo". Here's the watchpoint trace of this var from start to fail:
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x10
New value = (osRtxThread_t *) 0x0
Reset_Handler ()
at libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F413xH/device/TOOLCHAIN_GCC_ARM/startup_stm32f413xx.S:91
91 ldr r0, =_sdata
(gdb) c
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x200012a4 <os_timer_thread_cb>
SVC_Handler () at irq_cm4f.S:102
102 irq_cm4f.S: No such file or directory.
(gdb) c
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x200012a4 <os_timer_thread_cb>
New value = (osRtxThread_t *) 0x20001198 <_main_obj>
SVC_Handler () at irq_cm4f.S:102
102 in irq_cm4f.S
(gdb) c
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x20001198 <_main_obj>
New value = (osRtxThread_t *) 0x0
0x08026e1c in OS_Tick_Setup (freq=134244071, handler=0x4cc28)
at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/Source/os_systick.c:77
77 }
(gdb) c
Continuing.
Breakpoint 13, 0x08024668 in osKernelInitialize ()
at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:519
519 if (IsIrqMode() || IsIrqMasked()) {
(gdb) c
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x10
0x0802421c in svcRtxKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:93
93 memset(&osRtxInfo.kernel, 0, sizeof(osRtxInfo) - offsetof(osRtxInfo_t, kernel));
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x08005ae8 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681 errno = EBADF;
That last line where the value of 0x10
is assigned, is actually given this value in the working bootloader:
New value = (osRtxThread_t *) 0x20001104 <os_timer_thread_cb>
Not sure why having mbed_trace present is preventing this from being assigned to osRtxInfo.thread.run.curr
rather than 0x10
...
Not sure if this is exactly what you wanted to see. Perhaps you meant the map files of the main application
What I'm looking for is where osRtxInfo
is in the main image map - that should be somewhere between _sdata
and _edata
so that it gets correctly initialised.
From the dump, it looks like there's been no attempt to actually initialise that structure at the point you've stopped in the SVC Handler.
The initialisation code should be copying the initialised data from _sidata
in the ROM to _sdata
in the RAM.
To double-check - the crash is happening in the main image, right? As it tries to initialise its OS?
Also, I'd like to see the lines from _sidata
to _sdata
in each image's map.
Are all those watchpoints from the main image execution - so the Reset_Handler
is after your mbed_start_application
?
It looks like it is being set to zero, but I don't see where that 0x10
has come from. That doesn't correspond to the code you're stopping at. Except that I can see a nearby osRtxInfo.isr_queue.max = osRtxConfig.isr_queue.max;
, which might be setting a value to 16.
So mismatch between loaded image and ELF in the debugger?
Is this is a custom bootloader you've created that has the RTOS in use? Our own bootloaders disable the RTOS, so you may be hitting some sort of problem with interrupts not being properly shut down that we haven't?
The fact that SVC_Handler
is being called before osKernelInitialize
makes very little sense. As does being in OS_Tick_Setup
. That shouldn't happen until osKernelStart
, which should be after osKernelInitialize
.
Are you sure you've got the right ELF file in the debugger?
What I'm looking for is where
osRtxInfo
is in the main image map - that should be somewhere between_sdata
and_edata
so that it gets correctly initialised.
To be really clear here, I was previously using the .map
file from the bootloader. That was probably wrong, so here's the section of the main application's .map
file with the copy of osRtxInfo:
(osRtxInfo
business occurs around 0x00000000200002ec
)
.iplt 0x0000000008033830 0x0
.iplt 0x0000000008033830 0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
.ARM.extab
*(.ARM.extab* .gnu.linkonce.armextab.*)
0x0000000008033830 __exidx_start = .
.ARM.exidx 0x0000000008033830 0x8
*(.ARM.exidx* .gnu.linkonce.armexidx.*)
.ARM.exidx 0x0000000008033830 0x8 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/crt0.o
.ARM.exidx 0x0000000008033838 0x0 projects/blinky/build/debug/libs/mbed-os/platform/TARGET_CORTEX_M/TOOLCHAIN_GCC/except.o
0x28 (size before relaxing)
.ARM.exidx 0x0000000008033838 0x0 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/TOOLCHAIN_GCC/TARGET_RTOS_M4_M7/irq_cm4f.o
0x18 (size before relaxing)
.ARM.exidx 0x0000000008033838 0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/libgcc.a(_udivmoddi4.o)
0x8 (size before relaxing)
0x0000000008033838 __exidx_end = .
0x0000000008033838 __etext = .
0x0000000008033838 _sidata = .
.rel.dyn 0x0000000008033838 0x0
.rel.iplt 0x0000000008033838 0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
.data 0x00000000200001d8 0xb78 load address 0x0000000008033838
0x00000000200001d8 __data_start__ = .
0x00000000200001d8 _sdata = .
*(vtable)
*(.data*)
.data 0x00000000200001d8 0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
0x00000000200001d8 __dso_handle
.data.irq_handler
0x00000000200001dc 0x4 projects/blinky/build/debug/libs/mbed-os/hal/mbed_lp_ticker_api.o
.data.irq_handler
0x00000000200001e0 0x4 projects/blinky/build/debug/libs/mbed-os/hal/mbed_us_ticker_api.o
.data._ZL11filehandles
0x00000000200001e4 0x100 projects/blinky/build/debug/libs/mbed-os/platform/mbed_retarget.o
.data._ZZ5_sbrkE4heap
0x00000000200002e4 0x4 projects/blinky/build/debug/libs/mbed-os/platform/mbed_retarget.o
.data._ZL14idle_hook_fptr
0x00000000200002e8 0x4 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/mbed_rtx_idle.o
.data.os 0x00000000200002ec 0xa4 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.o
0x00000000200002ec osRtxInfo
.data.SystemCoreClock
0x0000000020000390 0x4 projects/blinky/build/debug/libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/device/system_stm32f4xx.o
0x0000000020000390 SystemCoreClock
.data._impure_ptr
0x0000000020000394 0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-impure.o)
0x0000000020000394 _impure_ptr
.data.impure_data
0x0000000020000398 0x428 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-impure.o)
.data.__malloc_av_
0x00000000200007c0 0x408 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
0x00000000200007c0 __malloc_av_
.data.__malloc_sbrk_base
0x0000000020000bc8 0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
0x0000000020000bc8 __malloc_sbrk_base
.data.__malloc_trim_threshold
0x0000000020000bcc 0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
0x0000000020000bcc __malloc_trim_threshold
.data.__global_locale
0x0000000020000bd0 0x16c /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-locale.o)
0x0000000020000bd0 __global_locale
0x0000000020000d40 . = ALIGN (0x8)
*fill* 0x0000000020000d3c 0x4
0x0000000020000d40 PROVIDE (__preinit_array_start = .)
*(.preinit_array)
0x0000000020000d40 PROVIDE (__preinit_array_end = .)
0x0000000020000d40 . = ALIGN (0x8)
0x0000000020000d40 PROVIDE (__init_array_start = .)
*(SORT_BY_NAME(.init_array.*))
*(.init_array)
.init_array 0x0000000020000d40 0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
0x0000000020000d44 PROVIDE (__init_array_end = .)
0x0000000020000d48 . = ALIGN (0x8)
*fill* 0x0000000020000d44 0x4
[!provide] PROVIDE (__fini_array_start = .)
*(SORT_BY_NAME(.fini_array.*))
*(.fini_array)
.fini_array 0x0000000020000d48 0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
[!provide] PROVIDE (__fini_array_end = .)
*(.jcr*)
0x0000000020000d50 . = ALIGN (0x8)
*fill* 0x0000000020000d4c 0x4
0x0000000020000d50 __data_end__ = .
0x0000000020000d50 _edata = .
*also note that this map file was generated from mbed-cli and I found it in the build folder of my main app project.
The chunk of the .map
file above is identical between .map
file produced when the application includes trace init (fails) and omits it (succeeds).
To double-check - the crash is happening in the main image, right? As it tries to initialise its OS?
To verify -- indeed the crash occurs within the main image as it tries to initialise the OS (I think more specifically it's failing to initialise the Kernel)
Also, I'd like to see the lines from
_sidata
to_sdata
in each image's map.
So this region in the main app is in my comment above. That same region in the bootloader_application.map
is as follows (left works, right fails):
Are all those watchpoints from the main image execution - so the
Reset_Handler
is after yourmbed_start_application
?
I have both the bootloader and main application ELF files loaded into GDB. I've set a breakpoint at mbed_application.c:175
which is called from the bootloader - not the main applicaition - and I've recorded the watch points here for you with this breakpoint in place:
(gdb) monitor reset
Resetting target
(gdb) c
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x200012a4 <os_idle_thread_stack+284>
New value = (osRtxThread_t *) 0x0
Reset_Handler ()
at libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F413xH/device/TOOLCHAIN_GCC_ARM/startup_stm32f413xx.S:91
91 ldr r0, =_sdata
(gdb)
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x20001144 <os_timer_thread_cb>
SVC_Handler () at irq_cm4f.S:102
102 irq_cm4f.S: No such file or directory.
(gdb)
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x20001144 <os_timer_thread_cb>
New value = (osRtxThread_t *) 0x20001038 <_main_obj>
SVC_Handler () at irq_cm4f.S:102
102 in irq_cm4f.S
(gdb)
Continuing.
Breakpoint 18, start_new_application (sp=0x20050000, pc=0x8026e0d <OS_Tick_Setup+52>)
at libs/mbed-os/platform/mbed_application.c:175
175 __asm volatile(
(gdb)
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x20001038 <_main_obj>
New value = (osRtxThread_t *) 0x0
OS_Tick_Setup (freq=<error reading variable: Cannot access memory at address 0x20050004>,
handler=<error reading variable: Cannot access memory at address 0x20050000>)
at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/Source/os_systick.c:72
72 SysTick->VAL = 0U;
(gdb)
Continuing.
Hardware watchpoint 15: osRtxInfo.thread.run.curr
Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x10
0x0802421c in svcRtxKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:94
94 uint32_t offset = offsetof(osRtxInfo_t, kernel);
(gdb)
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x08003880 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681 errno = EBADF;
So it seems the majority of the watch points are hit before entering the main app.
It looks like it is being set to zero, but I don't see where that
0x10
has come from. That doesn't correspond to the code you're stopping at. Except that I can see a nearbyosRtxInfo.isr_queue.max = osRtxConfig.isr_queue.max;
, which might be setting a value to 16.
So perhaps there's an offset here causing this 0x10 that was actually meant for the field osRtxConfig
? Would this explain the bizarre values in os_id = 0x5 "v",
as well?
So mismatch between loaded image and ELF in the debugger?
See my bottom response in this comment.
Is this is a custom bootloader you've created that has the RTOS in use? Our own bootloaders disable the RTOS, so you may be hitting some sort of problem with interrupts not being properly shut down that we haven't?
I've had a few people ask me about the bootloader. It's a managed mbed bootloader where the contents of main.cpp
can be seen in my OP above. I've used target.restrict_size: 0x00020000
in the mbed_app.json
and that's about it. I don't think I'm explicitly using any RTOS features (unless trace does?). And this bootloader works fine (with trace) on supported mbed targets. It's my own custom target that's causing this trouble. Is there a way I can explicitly check the RTOS is being disabled here?
The fact that
SVC_Handler
is being called beforeosKernelInitialize
makes very little sense. As does being inOS_Tick_Setup
. That shouldn't happen untilosKernelStart
, which should be afterosKernelInitialize
.
SVC_Handler is used for software interrupts right? Is there a way we can trace back to see who's calling these interrupts before osKernelInitialize
gets called?
Are you sure you've got the right ELF file in the debugger?
So I was making the mistakes of assuming that when I typed > load
into gdb, that it would flash the combined image of my main application with the bootloader. However, experimentally I've determined this not to be the case; instead it's loading both new versions of the ELF files, but only flashing the new image of the bootloader. I'm now using JFlashLite to flash the full main.bin
. So I had the new bootloader flashed, with both the new elf files loaded into gdb, but the main application was never being updated. Not sure if this matters as much as I thought it may have since I was never really changing anything in my main.cpp
, but there might be more in the binary that was changing other than the active app space that has to due with the bootloader from before it...
@kjbracey-arm What information would you like me to redo now that I'm properly flashing the binaries and loading the elfs to match?
Seeing above that there was some offset business I thought - at a high level here - perhaps this offset is coming from trace existing in the bootloader but not existing in the main application? So I initialised trace in my blinky (main) program as well and reflashed the image.
Sure enough, the bootloader successfully boots to the main app now and both are able to use trace messages.
[INFO][BOOT]: I'm tracing!
[INFO][MAIN]: Now in main!
I'm not sure what this tells us exactly, but I think this means for now I can have a working bootloader so long as the peripherals I use in it are also present in the main application as well. Surely this isn't intended behaviour?
EDIT: Something else that sucks is this problem exists backwards too. Meaning, if I have trace defined in my main app, but not in my bootloader, then it faults as well.
EDIT2: Another thing that may or may not be relevant -> The custom target we're using does not have an external low freq clock (no LSE). We have an 8 MHz external that we're using as for high freq stuff configured as the HSE, but for the low freq stuff it's being done with the LSI. I think I've configured my target correctly with this:
"config": {
"clock_source": {
"help": "Mask value : USE_PLL_HSE_EXTC | USE_PLL_HSE_XTAL (need HW patch) | USE_PLL_HSI",
"value": "USE_PLL_HSE_EXTC|USE_PLL_HSI",
"macro_name": "CLOCK_SOURCE"
},
"lpticker_lptim": {
"help": "This target supports LPTIM. Set value 1 to use LPTIM for LPTICKER, or 0 to use RTC wakeup timer",
"value": 1
}
},
"overrides": {
"lpticker_delay_ticks": 4,
"lse_available": 0
},
But if I've made a mistake maybe that could be causing some troubles.
As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.
As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.
How/Where would I confirm this?
I have both the bootloader and main application ELF files loaded into GDB.
This makes me nervous - I've no experience with loading two ELF files simultaneously into GDB, and I'm not sure how you disambiguate. Some of your output would be consistent with you printing the content of the bootloaders osRtxInfo
(or the memory where it would have been) while actually running the main image.
Could you instead just load the ELF file for the main image into GDB?
I don't think I'm explicitly using any RTOS features
No, but it is an RTOS build, which means the RTOS is getting initialised, and you're running your bootloader main as an RTOS thread. As of 5.12/5.13, the RTOS can be excluded via a requires: [ "bare-metal" ]
in your mbed_app.json, which would make it more like other bootloaders (and smaller). Still, the fact that your bootloader is RTOS is not _supposed_ to break anything.
As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.
Well, it's not really a reset, it's just "manually safe hardware and jump to Reset_Handler". Where that fails is for any hardware that was disrupted by the bootloader and not restored, and assumed to be in reset state by the main image.
SVC_Handler is used for software interrupts right? Is there a way we can trace back to see who's calling these interrupts before osKernelInitialize gets called?
If using an up-to-date pyOCD for your connection, then a breakpoint on SVC_Handler should show you - the calling point would be shown as a different "thread" - there's 1 thread for handler mode, and 1 (or more) threads for process mode.
If using a more archaic tool, then you'd have to dump the 8 words from SP while stopped on a SVC_Handler breakpoint - the return address for the SVC would be in the seventh word at SP+24. Show all 8 words in a dump, and that will tell us exactly what the supervisor call was. (SP+16 should be the address of an RTX function - look it up in the map).
These really should not be happening.
Anyway, all your memory maps look correct, so don't need to worry about them any more. But we still need to figure out what's going on with that 0x10 pointer that makes it abort - can you go around again with correctly matched ELF (main image only) and run through the boot? You won't have source references then for the bootloader bit, but that doesn't matter.
I can have a working bootloader so long as the peripherals I use in it are also present in the main application as well. Surely this isn't intended behaviour?
There isn't a very rigorous cleanup, tbh. I can imagine a situation where a peripheral is left active, then after entering the main image there's no driver (or init) code for it. Its interrupt generation should at least be masked by the powerdown_nvic
call in mbed_start_application
, which should minimise the opportunity for it to actually immediately break things. Power wastage would be more likely.
There's no general system "shut everything down" notification to give things a chance to clean-up before the start_new_application
.
@kjbracey-arm
Off topic but need to ask:
Is there any case here for the mbed bootloader to be linked to the same mbed-os instance in flash as the application?
In principle, no, there shouldn't be. In practice, maybe?
In bigger systems, main images (like the Linux kernel) tend to be very paranoid about what their bootloader may or may not have done, so they tend have init code that manually sets/reset pretty much everything on entry. The bootloader could be years behind the kernel, so they just do not trust "reset" hardware state.
That does take code space though. It would be more space-efficient to trust the bootloader to put everything back it knows it touched, and have the main image just assume everything is in reset state.
As we're currently in a situation where neither of those is really happening, then I guess having bootloader and main image as close as possible does in practice minimise the chances of errors - such as this case potentially is.
An ideal would be to have a chip that did support "real reset into secondary image". Have a register that the bootloader could write to that did "reset, but jump into this handler".
Or maybe you could even do that in bootloader software - if you could reliably indicate "reset reason", then the bootloader itself could do NVIC_Reset
, re-enter itself, detect the reset reason and jump straight to main image this time.
I don't know if that approach has been attempted in Mbed OS - I've only seen the manual simulated-reset approach in mbed_start_application
.
My earlier comment about soft b reset was under the assumption that Mbed-os was using reset-reason and potentially some other core register.
I see now that is not the case.
Thinking further that would require some sort of industry standards between BL and application to use the core in that way.
I'll make any further enquires into this topic in a separate issue.
@40Grit @kjbracey-arm Going to try the bare metal setting when I get back, but I'll be out until tuesday. Just FYI so you don't think I've given up haha
Sorry for the delay, but I've returned and tried adding "requires bare-metal" to my mbed_app.json
which now looks like this:
"CUSTOM_TARGET": {
"target.features_add": ["STORAGE"],
"target.components_add": ["SPIF"],
"target.components": ["SPIF"],
"target.requires" : [ "bare-metal" ],
"spif-driver.SPI_MOSI": "FLASH_SPI_MOSI",
"spif-driver.SPI_MISO": "FLASH_SPI_MISO",
"spif-driver.SPI_CLK": "FLASH_SPI_SCK",
"spif-driver.SPI_CS": "FLASH_SPI_CS",
"spif-driver.SPI_FREQ": 40000000,
"target.restrict_size": "0x00020000"
},
I'm still crashing inside of svcRtxKernelInitialize
unfortunately. I can follow the crash up until this line here inside of svcRtxKernelInitialize
:
0x0802445e <+622>: ldr.w pc, [sp], #4
where pc = 0x802445e <svcRtxKernelInitialize+622>
and sp = 0x2004ffac
After this if I try to go forward (stepping), I end up at 0x08004842
and the elf file is undefined for this ROM region (using the main app's elf file as requested per @kjbracey-arm). It's some point after this that the crash occurs.
I'm confused with what's going on here. Why is it accessing code at 0x08004842
?
EDIT: After adding the bootloader elf again to see what's going on, it seems to be accessing SVC_Handler code and just crashing as before:
Breakpoint 17, 0x0802445e in svcRtxKernelInitialize ()
at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:186
186 }
(gdb) stepi
SVC_Handler () at irq_cm4f.S:64
64 irq_cm4f.S: No such file or directory.
@kjbracey-arm @40Grit Just for the sake of sanity, could either of you two reiterate the requirements for defining a custom target? I'm worried I may have missed a very basic step here and perhaps this is the source of the trouble here. I've read over this section from you guys a number of times but I want to be sure here.
I'm specifically concerned about this section here:
MCUs are required to have, and Families and Subfamilies may have:
release_versions. supported_toolchains. default_lib. public. device_has.
Is my target definition for an MCU, Family, or Subfamily? Perhaps I'm missing one of these fields and that's what's causing this.
I have no direct affiliation with ARM. I just work for one of their partners (Embedded Planet)
Without watching the config, build, and debug session right in front if me I'm starting to fish for ideas.
Another sanity check could be to check the parts errata.
What linker file does your build end up using?
@kjbracey-arm might the image built for the nucleo board run in the processor that @drynnbavis is using since it really only differs by package?
Another sanity check could be to check the parts errata.
I've been through this for the STM32F4xH already and nothing jumped out at me.
What linker file does your build end up using?
Pretty sure I'm using this one here
Some more poking around I found the the osKernelPreInit section of assembly code is simply two lines, first is a nop
and the second is just a branch back to the lr
. Off-topic here, but is there any point in this and is this OK?
@drynnbavis if your bootloader has no operational dependency on peripheral IO, I would see what happens if you flash the working binaries built for the nucleo board.
@DrynnBavis if your bootloader has no operational dependency on peripheral IO, I would see what happens if you flash the working binaries built for the nucleo board.
So I set my target to NUCLEO_F413ZH
and made sure the bootloader + main code was previously working on my old target. The debugger gets trapped within mbed_rtos_start()
and never seems to boot to main.
EDIT: Recall that there's no LSE on my custom board so I need to use LSI (as defined in my custom_targets.json
). I don't see this in the target definition of the NUCLEO_F413ZH
so this is probably not going to work.
@DrynnBavis Is there a fully internal clock you can use for the time being to bring this up?
@loverdeg-ep @40Grit I added an override to the NUCLEO_F413ZH
target to disable the LSE (and thus use the LSI) with:
"overrides": {
"lpticker_delay_ticks": 4,
"tickless-from-us-ticker": true,
"lse_available": 0
},
and this works for the same setup that was working when I used my custom target. However, once I added trace back to only the bootloader, the same behaviour with the same fault location occurs. Good idea, but still no luck.
Do you have access to a debugger that supports SWO?
I'd like to see what happens if you redirect trace over SWO.
Do you have access to a debugger that supports SWO?
I'd like to see what happens if you redirect trace over SWO.
I do indeed, what test would you like me to redo over SWO?
I haven't used this interface personally yet but it should achieve trace through swo.
https://github.com/ARMmbed/mbed-os/blob/master/platform/mbed_retarget.h#L97
Use your custom target and a bare minimum but still failing config I guess?
Using SWO should eliminate the need for the UART and provide another datapoint; since trace enabled, in exclusive or fashion, between app and bootloader seems to cause a failure.
Use your custom target and a bare minimum but still failing config I guess?
Even with SWO, I'm crashing in the same spot. Explicitly, I'm failing within SVC_0 during osKernelInitialize
of the main program (the second time osKernelInitalize
gets called)
Hmm, so has the system failed to set up the vector table pointer?
Maybe the target doesn't actually have RAM vector tables properly configured?
Almost all Mbed OS targets arrange for the initial exception vectors to be copied from ROM to RAM, and then set VTOR. This then allows NVIC_SetVector to work. A very few targets do not do this, and keep vectors in ROM.
I've not thought about this before, but there are two ways this can not work with bootloader
It's possible you've always been jumping into the bootloader. If so you've managed something quite fascinating, and ultimately desirable as an optimisation - sharing code between main image and bootloader.
There's no reason the bootloader's task and SVC dispatcher can't work for the main image, as long as osRtxInfo
is in the same place for both images, and they're both configured the same....
@kjbracey-arm we led @DrynnBavis through the exercise of confirming vtor in a previous issue. I'm betting I didn't properly identify the failure to relocate or that the exercise needs to be done again.
@kjbracey-arm I do hope shared code has been achieved; been wanting to look into that for a while. I was going to take a position independent approach however.
If the vectors are not copied into RAM, VTOR will remain pointing at the bootloader's initial vector table, so any exceptions will make it jump into the bootloader.
@kjbracey-arm I've actually noticed that when I load only the main code's ELF, that the pre main() reset initialization work is occurring at addresses lower than the start address of the main codes region of ROM. I can confirm that osKernelInitalise
is indeed occurring in the main code region, but the SVC_handler
as well as the Fault_handler
are being called from within the pre main code ROM region. Are these assembly source files of the SVC_handler
and the Fault_handler
things that the vector table will be pointing to? Or rather, does it make sense that these two Source files are being called at pre main ROM regions?
Almost all Mbed OS targets arrange for the initial exception vectors to be copied from ROM to RAM, and then set VTOR. This then allows NVIC_SetVector to work. A very few targets do not do this, and keep vectors in ROM.
IIRC, the stack pointer was being assigned to 0x20050000, which to me seemed to be the start of RAM. So if I'm guessing correctly here, the vector table of the bootloader should be at the start of it's ROM correct (0x08000000)? And then the vector table of the main app should be at the start of RAM (0x20050000)? I'll try to confirm this when I get back to the office.
I'm betting I didn't properly identify the failure to relocate or that the exercise needs to be done again.
@40Grit I wouldn't mind running through these exercises again if you guys would like to see some results. I think I have a better understanding now of whats going on so I could probably give you more helpful data.
Stepping through the debugger from after mbed_app_start()
in the bootloader, I've found this:
The fault is occurring within Reset_Handler, within the function _start()
, on this line ->
0x08020244 <+44>: blx r3 // r3 = 0x8023b91
Continuing over this line throws the crash, so let's step into it (note, something interesting here is that pc=0x8023b90
and NOT 0x8023b91
which is the actual value of r3 that I think we're supposed to branch into. Seems odd to me r3 doesn't even load a seemingly valid address in the first place!). From here it branches into this function:
software_init_hook () at libs/mbed-os/rtos/TARGET_CORTEX/TOOLCHAIN_GCC_ARM/mbed_boot_gcc_arm.c:46
46 mbed_stack_isr_start = (unsigned char *) &__StackLimit;
From here, I'm able to use nexti
until this line:
=> 0x08023bb2 <+34>: bl 0x8023c98 <mbed_init>
stepping into here I arrive at:
mbed_init () at libs/mbed-os/rtos/TARGET_CORTEX/mbed_boot.c:76
76 mbed_mpu_manager_init();
from here a couple of branches get called:
(gdb) disas
Dump of assembler code for function mbed_init:
0x08023c98 <+0>: push {r3, lr}
0x08023c9a <+2>: bl 0x80220d8 <mbed_mpu_init>
0x08023c9e <+6>: bl 0x8023ccc <mbed_cpy_nvic>
0x08023ca2 <+10>: bl 0x8029400 <mbed_sdk_init>
0x08023ca6 <+14>: bl 0x802a178 <us_ticker_init>
=> 0x08023caa <+18>: bl 0x8023d14 <mbed_rtos_init>
0x08023cae <+22>: nop
0x08023cb0 <+24>: pop {r3, pc}
End of assembler dump.
I'm able to nexti
until mbed_rtos_init, so again, stepping into it I get to:
=> 0x08023d16 <+2>: bl 0x80247c0 <osKernelInitialize>
In here, both IsIrqMode
and IsIrqMasked
return false, so we end up at this line:
__svcKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:488
488 SVC0_0 (KernelInitialize, osStatus_t)
(gdb) disas
Dump of assembler code for function osKernelInitialize:
0x080247c0 <+0>: push {lr}
0x080247c2 <+2>: sub sp, #12
0x080247c4 <+4>: bl 0x80247bc <osRtxKernelPreInit>
0x080247c8 <+8>: bl 0x8024208 <IsIrqMode>
0x080247cc <+12>: mov r3, r0
0x080247ce <+14>: cmp r3, #0
0x080247d0 <+16>: bne.n 0x80247dc <osKernelInitialize+28>
0x080247d2 <+18>: bl 0x8024222 <IsIrqMasked>
0x080247d6 <+22>: mov r3, r0
0x080247d8 <+24>: cmp r3, #0
0x080247da <+26>: beq.n 0x80247ec <osKernelInitialize+44>
0x080247dc <+28>: mvn.w r0, #5
0x080247e0 <+32>: bl 0x8023f5c <EvrRtxKernelError>
0x080247e4 <+36>: mvn.w r3, #5
0x080247e8 <+40>: str r3, [sp, #4]
0x080247ea <+42>: b.n 0x80247f6 <osKernelInitialize+54>
0x080247ec <+44>: ldr r3, [pc, #16] ; (0x8024800 <osKernelInitialize+64>)
0x080247ee <+46>: mov r12, r3
=> 0x080247f0 <+48>: svc 0
0x080247f2 <+50>: mov r3, r0
0x080247f4 <+52>: str r3, [sp, #4]
0x080247f6 <+54>: ldr r3, [sp, #4]
0x080247f8 <+56>: mov r0, r3
0x080247fa <+58>: add sp, #12
0x080247fc <+60>: ldr.w pc, [sp], #4
0x08024800 <+64>: muls r1, r0
0x08024802 <+66>: lsrs r2, r0, #32
End of assembler dump.
Where finally, we get to line 93 of irq_cm4f.S before entering the HardFault_Handler()
.
So there it is... a complete trace through the hardfault I'm seeing in my main program from using trace in my bootloader and not my main application for a custom target. @kjbracey-arm @40Grit @loverdeg-ep if any of you would like to see register values or any other tests please don't hesitate to ask.
*note, all of this seems to be working in ROM regions > 0x8020000 (start address of main) EXCEPT for the SVC_Handler
, which seems to start in this region:
Dump of assembler code for function SVC_Handler:
=> 0x08004974 <+0>: tst.w lr, #4
I can see VTOR being set in system_clock.c. The Nucleo_F413ZH version looks like it sets it just to FLASH_BASE
, which is wrong if you're the main image living above a bootloader.
Nucleo F429ZI is setting it to NVIC_FLASH_VECTOR_ADDRESS
, and that device does work with bootloaders.
So if you were based on F413ZH, I think you need to follow through and copy bits from F429ZI to get that initial VTOR set correctly - figure out where that define comes from and get it set up.
After target-specific init, the generic mbed_copy_nvic
copies from initial VTOR into RAM, then changes VTOR to point to that RAM, so it's relying on that initial VTOR pointing to the correct ROM table.
@LMESTM - could it be a task for you to make the start-up code for various STM devices consistent, if only some support bootloaders?
something interesting here is that
pc=0x8023b90
and NOT0x8023b91
The 1
bit at the bottom just indicates "Thumb" instruction set, rather than "ARM". All function pointers on a Cortex-M device will have their bottom bit set, but instructions are located at even addresses, and PC readouts are even.
@kjbracey-arm I think @drynnbavis is not seeing the issue on nucleo_f413zh.
But is seeing the issue on a custom target using f413rh.
But is seeing the issue on a custom target using f413rh.
Yes, but he must have copied that code from somewhere, so if he copied it from 413ZH, it would have inherited that issue.
I'm missing something.
Apparently the bootloader and application work fine on the nucleo board. (f413zh)
It's interesting that this target code feels the need to explicitly set VTOR for ROM anyway.
Given that we've just reset, we surely must have entered our Reset Handler, so VTOR must be correctly set?
The only issue is that the bootloader itself doesn't modify VTOR before jumping into the reset handler manually. That would be a reason to set VTOR in main image, but surely the bootloader should have done it itself...
Edit: actually it does, powerdown_scb
does set VTOR to the address of the new image.
Apparently the bootloader and application work fine on the nucleo board. (f413zh)
According to whom? Whose bootloader? @DrynnBavis's?
Looking at that code, I don't see how any bootloader would work - FLASH_BASE
is just defined to 0x08000000U, so VTOR will not be right, and mbed_copy_nvic
will copy from the wrong place.
Mind you, I'm just asserting that by code inspection. No idea who's using a Nucleo F413ZH.
But was that working by jumping into the bootloader and all the data happening to be in the same place, as it does on the custom board sometimes, if the bootloader is RTOS-based?
I see on that issue confirmation that VTOR was correct (0x08020000) entering the image, but by reading the code, SystemInit
would then set it to 0x08000000, then mbed_copy_nvic
would copy from 0x08000000 into RAM.
The moral of the story may be that SystemInit
should not be setting VTOR.
dumb luck that nucleo is working then?
Is this in ST's court now?
If all this analysis is correct, @DrynnBavis can modify his own custom target's VTOR setup to match a working STM target. Or just take the VTOR
setting out altogether - I don't think it's required.
ST should look at the inconsistency between their targets.
I have a very similar issue with my custom bootloader based on the SAML21J18. I can use mbed-trace just fine; however, when I use SDBlockDevice the application fails to start. Should I create a new issue before I go into any more detail here?
New issue please. Reference this one if you want.
If all this analysis is correct, @DrynnBavis can modify his own custom target's VTOR setup to match a working STM target. Or just take the VTOR setting out altogether - I don't think it's required.
@kjbracey-arm or @40Grit (your time zone matches mine a little better), what code am I modifying here? Can you give me a file name and line number? Thanks
@DrynnBavis https://github.com/ARMmbed/mbed-os/issues/11205#issuecomment-524220737
I the comment linked above should give a starting point.
Somehow missed this on first read, thanks @loverdeg-ep. I've got to leave for something right now but I'll try this when I return in a few hours.
@DrynnBavis and we actually are probably in the same timezone. I just get up super early and watch github.
It is the only chance I have to get in contact with the experts in Oulu.
YES. IT'S WORKING!!!
@kjbracey-arm I tried using NVIC_FLASH_VECTOR_ADDRESS
but it was undefined, so I commented this out enitrely:
// /* Configure the Vector Table location add offset address ------------------*/
// #ifdef VECT_TAB_SRAM
// SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
// #else
// SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */
// #endif
This seems to have fixed my troubles for now. Will having this commented out cause any problems in the future? Curious to know why this works on the ZH chip but not the RH chip...
To confirm: I did indeed copy system_clock.c
from the nucleo board. Commenting out the Vector Table location and offset address section within SystemInit
of system_clock.c
fixed my issue.
What's the best protocol here in terms of closing this issue? Is there a PR I can make to give a better target definition for the RH chips?
I'm also still interested in exactly why this solved the problem. So I uncommented that section (back to the faulty code) and set a break point right before this point:
#ifdef VECT_TAB_SRAM
SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */
#endif
info macro VECT_TAB_SRAM
in GDB, I found it to be undefinedSRAM_BASE
= SRAM1_BASE
= 0x20000000UFLASH_BASE
= 0x08000000UVECT_TAB_OFFSET
= 0x00So because VECT_TAB_SRAM
is undefined, it would seem we are indeed using the bootloader's vector table in ROM rather than the main application's vector table in SRAM. But surely this exact code inside of system_clock.c
is being used for the ZH
targets? Why is this not a problem there?
Does this have anything to do with me using an LSI rather than an LSE?
I've been through this type of exercise multiple times myself and had the Cortex-M user-manual next to me when writing custom boot-loaders. I go step by step through the assembly and watch the status of the register. This usually gets me the type of understanding you are looking for with issues like these.
Even still I don't know the architecture well enough to diagnose this without spending a couple hours of analysis.
Figure out who is most active from ST in Mbed and copy them on this thread. Or open a ticket with ST and point them to this issue.
However It will probably fall to you to learn the "rule book" and definitively prove exactly why two parts which should? have the same memory map, programmed with the same binary behave differently.
The next step i would take is prove that both parts have all the same peripherals mapped to all the same places in memory.
If that is true I would think the binary should work the same in both as long as neither part were relying on any external signals.
Curious to know why this works on the ZH chip but not the RH chip...
There's no evidence there's any chip difference here is there? The code sets VTOR wrong, but gets away with it if (a) there is no bootloader (so "wrong" happens to be right), or (b) the bootloader has an RTOS and the memory layout is the same (or similar enough)
Presumably in all your ZH builds, you've ended up with images with matching RTOS memory layout. If adding/removing trace doesn't affect the ZH, then there's likely no deep meaningful significance - some difference in link order/padding means trace ends up not shifting osRtxInfo
in that build. Feel free to investigate the maps, in case it reveals some inconsistency, but it may be nothing.
Feel free to investigate the maps, in case it reveals some inconsistency, but it may be nothing.
Very behind on my work now since this issue came up, so first I'll have to work on that. But later this week, if I've the time then I'll definitely look into it.
@LMESTM - could it be a task for you to make the start-up code for various STM devices consistent, if only some support bootloaders?
@DrynnBavis @kjbracey-arm - sorry I was away when copied here and I missed it.
Most of the vector relocation mechanism has been mostly pushed by arm engineers and has evolved over the past year, but it was indeed not updated consistently over all STM32 targets but rather on a case by case when enabling bootloader feature.
Anyway I agree that we probably have to clean-up thinkgs a little bit now.
To make things clear, does you custom target boots ok if you're modifying system_clock.c file as below rather than commenting out those lines:
/* Configure the Vector Table location add offset address ------------------*/
#ifdef VECT_TAB_SRAM
SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
SCB->VTOR = NVIC_FLASH_VECTOR_ADDRESS; /* Vector Table Relocation in Internal FLASH */
#endif
In an ideal world, you shouldn't need to set VTOR
on boot, if intending to carry on using the ROM vectors. It seems reasonable to assume that as you've just booted, you came through the Reset_Handler
, so VTOR must be correct.
The catch is that a bootloader might have entered your reset vector without adjusting VTOR. Our bootloader does set it, but others might not. So setting it seems reasonable.
This implementation was introduced back in 2017 here: https://github.com/ARMmbed/mbed-os/pull/3798
but I'm also fine with removing completely those lines from all our STM targets if we consider it reliable in mbed-os scope.
I've just had a chat with someone who is looking at booting into Mbed OS from another bootloader altogether, and he believes that it doesn't set VTOR, so better safe than sorry. (Mind you, it doesn't even set MSP or CONTROL correctly either, so you'd probably need to add more...).
@kjbracey-arm Maybe this is naive but it would be nice if there were some standards surrounding this stuff.
"CMSIS-boot"
Well, if the bootloader is jumping into the vector table reset handler of a "raw" standalone image, then it's effectively avoiding having its own standard and using the "reset" standard. In which case it's kind of the bootloader's responsibility to have the chip in perfect reset state, right?
So by that standard, we shouldn't have to write VTOR. But in practice, the main image is the malleable one while bootloaders tend to be locked in, so it's the main image that has to cope with whatever bootloaders do.
The fact that they are "locked in" I'd say is more reason to develop a standard.
That way an application could be better prepared to know what it is dealing with.
I'll keep thinking on this one and find the right place to converse further.
To make things clear, does you custom target boots ok if you're modifying system_clock.c file as below rather than commenting out those lines:
/* Configure the Vector Table location add offset address ------------------*/
#ifdef VECT_TAB_SRAM
SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
SCB->VTOR = NVIC_FLASH_VECTOR_ADDRESS; /* Vector Table Relocation in Internal FLASH */
#endif
@LMESTM To confirm, no that macro NVIC_FLASH_VECTOR_ADDRESS
is undefined. I've commented that chunk completely to get my bootloader/main app working. I haven't the time yet to check where that macro should be defined and see if maybe I could fix that instead.
For future readers that have scrolled this far looking for a solution: the fix for me was to comment out that entire chunk of code in my comment just above this inside of the system_clock.c
file that I had inside the folder of my custom target (I had copied this file from the folder nucleo_f413zh
-- a mbed supported target with an STM32F413ZH chip while I was using an STM32F413RH).
Thank you everyone involved on debugging this. Quite the struggle but I'm really happy we actually came to a fix.
I'm going to close this issue now as the problem has a working solution. Though I'll still be following here (or in another thread if we do that instead)
Most helpful comment
YES. IT'S WORKING!!!
@kjbracey-arm I tried using
NVIC_FLASH_VECTOR_ADDRESS
but it was undefined, so I commented this out enitrely:This seems to have fixed my troubles for now. Will having this commented out cause any problems in the future? Curious to know why this works on the ZH chip but not the RH chip...
To confirm: I did indeed copy
system_clock.c
from the nucleo board. Commenting out the Vector Table location and offset address section withinSystemInit
ofsystem_clock.c
fixed my issue.