Mbed-os: Custom target silently fails to start main app after adding `mbed_trace_init();` to bootloader

Created on 13 Aug 2019 · 72Comments · Source: ARMmbed/mbed-os

Description

The following is some barebones code I'm using to just boot to the main app space at POST_APPLICATION_ADDR:

#include "mbed.h"
#include "mbed-trace/mbed_trace.h"

int main()
{
    //mbed_trace_init();
    mbed_start_application(POST_APPLICATION_ADDR);
}

This code works, and I successfully boot from the bootloader into the main app space as expected. However, once I uncomment mbed_trace_init();, my app hangs after the bootloader and is unable to start the main app.

Using gdb, I've found that the fault seems to arrive at some point within __scvKernelInitialize (). This is the disassemble print out of its definition:

Dump of assembler code for function osKernelInitialize:
   0x0802463c <+0>: push    {lr}
   0x0802463e <+2>: sub sp, #12
   0x08024640 <+4>: bl  0x8024638 <osRtxKernelPreInit>
   0x08024644 <+8>: bl  0x80240a0 <IsIrqMode>
   0x08024648 <+12>:    mov r3, r0
   0x0802464a <+14>:    cmp r3, #0
   0x0802464c <+16>:    bne.n   0x8024658 <osKernelInitialize+28>
   0x0802464e <+18>:    bl  0x80240ba <IsIrqMasked>
   0x08024652 <+22>:    mov r3, r0
   0x08024654 <+24>:    cmp r3, #0
   0x08024656 <+26>:    beq.n   0x8024668 <osKernelInitialize+44>
   0x08024658 <+28>:    mvn.w   r0, #5
   0x0802465c <+32>:    bl  0x8023df4 <EvrRtxKernelError>
   0x08024660 <+36>:    mvn.w   r3, #5
   0x08024664 <+40>:    str r3, [sp, #4]
   0x08024666 <+42>:    b.n 0x8024672 <osKernelInitialize+54>
=> 0x08024668 <+44>:    ldr r3, [pc, #16]   ; (0x802467c <osKernelInitialize+64>)
   0x0802466a <+46>:    mov r12, r3
   0x0802466c <+48>:    svc 0
   0x0802466e <+50>:    mov r3, r0
   0x08024670 <+52>:    str r3, [sp, #4]
   0x08024672 <+54>:    ldr r3, [sp, #4]
   0x08024674 <+56>:    mov r0, r3
   0x08024676 <+58>:    add sp, #12
   0x08024678 <+60>:    ldr.w   pc, [sp], #4
   0x0802467c <+64>:    rors    r1, r3
   0x0802467e <+66>:    lsrs    r2, r0, #32

From there, a few things happen inside of irq_cm4f.S before hitting an exception on line 93 of said file. After that except.S is called from there a final fault message is seen in the debugger:

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08003320 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681         errno = EBADF;

Question

What's causing the fault within irq_cm4f.S?

Issue request type

My custom target uses an STM32F413RH chip, but inside custom_targets.json I've specified the device name to be "device_name": "STM32F413ZH" because this has sector information filled out and is virtually the same processor but with a larger package / more pins.

Target: Custom target with STM32F413RH processor
Toolchain: GCC_ARM 8.2.1
Tool: mbed-cli
Vers: mbed-os 5.13

[x] Question
[ ] Enhancement
[ ] Bug

Source

DrynnBavis

Most helpful comment

YES. IT'S WORKING!!!

@kjbracey-arm I tried using NVIC_FLASH_VECTOR_ADDRESS but it was undefined, so I commented this out enitrely:

//     /* Configure the Vector Table location add offset address ------------------*/
// #ifdef VECT_TAB_SRAM
//     SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
// #else
//     SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */
// #endif

This seems to have fixed my troubles for now. Will having this commented out cause any problems in the future? Curious to know why this works on the ZH chip but not the RH chip...

To confirm: I did indeed copy system_clock.c from the nucleo board. Commenting out the Vector Table location and offset address section within SystemInit of system_clock.c fixed my issue.

DrynnBavis on 23 Aug 2019

🎉3

All 72 comments

For what it's worth, this is the diff of the progress through that SVC0 from irq_cm4f.S between a bootloader not using trace (AKA no faults, left side) and bootloader that uses trace (AKA a fault, right side).

DrynnBavis on 13 Aug 2019

In the faulting condition, it seems that it diverges from 72 because the contents of print osRtxInf show that thread.run.curr = 0x10 and thread.run.next = 0x0. Because these aren't equal, a different branch is jumped to. I'm guessing this means that when trace is included there's an extra thread that needs to close. Again, just guessing here.

Moving on, line 93 is where the exception is caused. This is the line:

STR      R12,[R1,#TCB_SP_OFS]   ; Store SP

The values involved here are as follows:

R12: 0x2004ff98
R01: 0x10
#TCB_SP_OFS: 0x80048A5*

I don't really understand the STR op code but something in the above values is disagreeing with it.

*n.b. I got this value in gdb using print (uint32_t)TCB_SP_OFS -- it's possible I chose the wrong cast type here.

DrynnBavis on 13 Aug 2019

@AGlass0fMilk @40Grit @0xc0170 Tagging you guys now that I've got something worth showing and the issue thread has moved.

DrynnBavis on 13 Aug 2019

The bootloader->main image transition basically tries to "reset" hardware things back to a safe state, then effectively runs the main image as if through the reset vector, so everything gets reinitialised.

there's an extra thread that needs to close.

All software state is lost, so it's not about closing - we effectively do a software "reset" in mbed_start_application and run an entirely new stand-alone image.

But if the hardware is not correctly reset by either the bootloader's shutdown or main image's start-up code, then the new image could be being confused by hardware not being in the expected state.

In this case though, it seems like maybe it's just a failure to initialize RAM correctly? At the point that first SVC_Handler is called, RTX should not have been initialised yet, meaning all its static osRtxInfo structure should be in its default state (as specified by its initialiser in rtx_kernel.c).

I would expect that structure to have been placed in the .data area of the linker map, and then copy-initialised by the assembler CopyDataInit in startup_stm32f413xx.S early in the main image boot.

Check to see why that apparently hasn't happened, or has gone wrong. Does the linker map show it between _sdata and _edata?

Maybe stick a watchpoint on the location of osThreadInfo.thread.run.curr. That '0x10` is nonsense - it should be 0 at startup, and later become a valid pointer value.

I don't really understand the STR op code but something in the above values is disagreeing with it.

It's expecting R1 to be a pointer to a osRtxThread_t structure, and it's trying to set its sp member (TCB_SP_OFS is offsetof(osRtxThread_t, sp)). But R1 is not a valid pointer.

kjbracey-arm on 13 Aug 2019

🚀2

Did a comparison of osRtxInfo between the working bootloader (not calling mbed_trace_init()) and the failing code captured at line 69 of this SVC handling source file for context. Here's the dif (left side works, right side fails):

Interesting that the os_id and version are miss-matching. Definitely would have figured these to be the same.

Anyway, @kjbracey-arm I've compared the .map files of the successful bootloader (left) and the failing (right) at the region between _sdata and _edata:

Not sure what to make of this. What's left out of the screen shot is just more of the same symbols with the right side just offset by 0x40 all the way to _edata. Interesting to see .data.m_trace show up here though. Does anything look bizarre here? Not sure if this is exactly what you wanted to see. Perhaps you meant the map files of the main application, blinky_application.map.

DrynnBavis on 13 Aug 2019

Maybe stick a watchpoint on the location of osThreadInfo.thread.run.curr. That '0x10` is nonsense - it should be 0 at startup, and later become a valid pointer value.

Couldn't find a var named "osThreadInfo" so I'm assuming you meant "osRtxInfo". Here's the watchpoint trace of this var from start to fail:

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x10
New value = (osRtxThread_t *) 0x0
Reset_Handler ()
    at libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F413xH/device/TOOLCHAIN_GCC_ARM/startup_stm32f413xx.S:91
91    ldr  r0, =_sdata
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x200012a4 <os_timer_thread_cb>
SVC_Handler () at irq_cm4f.S:102
102 irq_cm4f.S: No such file or directory.
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x200012a4 <os_timer_thread_cb>
New value = (osRtxThread_t *) 0x20001198 <_main_obj>
SVC_Handler () at irq_cm4f.S:102
102 in irq_cm4f.S
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x20001198 <_main_obj>
New value = (osRtxThread_t *) 0x0
0x08026e1c in OS_Tick_Setup (freq=134244071, handler=0x4cc28)
    at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/Source/os_systick.c:77
77  }
(gdb) c
Continuing.

Breakpoint 13, 0x08024668 in osKernelInitialize ()
    at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:519
519   if (IsIrqMode() || IsIrqMasked()) {
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x10
0x0802421c in svcRtxKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:93
93    memset(&osRtxInfo.kernel, 0, sizeof(osRtxInfo) - offsetof(osRtxInfo_t, kernel));
(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08005ae8 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681         errno = EBADF;

That last line where the value of 0x10 is assigned, is actually given this value in the working bootloader:

    New value = (osRtxThread_t *) 0x20001104 <os_timer_thread_cb>

Not sure why having mbed_trace present is preventing this from being assigned to osRtxInfo.thread.run.curr rather than 0x10...

DrynnBavis on 13 Aug 2019

Not sure if this is exactly what you wanted to see. Perhaps you meant the map files of the main application

What I'm looking for is where osRtxInfo is in the main image map - that should be somewhere between _sdata and _edata so that it gets correctly initialised.

From the dump, it looks like there's been no attempt to actually initialise that structure at the point you've stopped in the SVC Handler.

The initialisation code should be copying the initialised data from _sidata in the ROM to _sdata in the RAM.

kjbracey-arm on 14 Aug 2019

To double-check - the crash is happening in the main image, right? As it tries to initialise its OS?

Also, I'd like to see the lines from _sidata to _sdata in each image's map.

kjbracey-arm on 14 Aug 2019

Are all those watchpoints from the main image execution - so the Reset_Handler is after your mbed_start_application?

It looks like it is being set to zero, but I don't see where that 0x10 has come from. That doesn't correspond to the code you're stopping at. Except that I can see a nearby osRtxInfo.isr_queue.max = osRtxConfig.isr_queue.max;, which might be setting a value to 16.

So mismatch between loaded image and ELF in the debugger?

Is this is a custom bootloader you've created that has the RTOS in use? Our own bootloaders disable the RTOS, so you may be hitting some sort of problem with interrupts not being properly shut down that we haven't?

The fact that SVC_Handler is being called before osKernelInitialize makes very little sense. As does being in OS_Tick_Setup. That shouldn't happen until osKernelStart, which should be after osKernelInitialize.

Are you sure you've got the right ELF file in the debugger?

kjbracey-arm on 14 Aug 2019

What I'm looking for is where osRtxInfo is in the main image map - that should be somewhere between _sdata and _edata so that it gets correctly initialised.

To be really clear here, I was previously using the .map file from the bootloader. That was probably wrong, so here's the section of the main application's .map file with the copy of osRtxInfo:

(osRtxInfo business occurs around 0x00000000200002ec)

.iplt           0x0000000008033830        0x0
 .iplt          0x0000000008033830        0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o

.ARM.extab
 *(.ARM.extab* .gnu.linkonce.armextab.*)
                0x0000000008033830                __exidx_start = .

.ARM.exidx      0x0000000008033830        0x8
 *(.ARM.exidx* .gnu.linkonce.armexidx.*)
 .ARM.exidx     0x0000000008033830        0x8 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/crt0.o
 .ARM.exidx     0x0000000008033838        0x0 projects/blinky/build/debug/libs/mbed-os/platform/TARGET_CORTEX_M/TOOLCHAIN_GCC/except.o
                                         0x28 (size before relaxing)
 .ARM.exidx     0x0000000008033838        0x0 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/TOOLCHAIN_GCC/TARGET_RTOS_M4_M7/irq_cm4f.o
                                         0x18 (size before relaxing)
 .ARM.exidx     0x0000000008033838        0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/libgcc.a(_udivmoddi4.o)
                                          0x8 (size before relaxing)
                0x0000000008033838                __exidx_end = .
                0x0000000008033838                __etext = .
                0x0000000008033838                _sidata = .

.rel.dyn        0x0000000008033838        0x0
 .rel.iplt      0x0000000008033838        0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o

.data           0x00000000200001d8      0xb78 load address 0x0000000008033838
                0x00000000200001d8                __data_start__ = .
                0x00000000200001d8                _sdata = .
 *(vtable)
 *(.data*)
 .data          0x00000000200001d8        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
                0x00000000200001d8                __dso_handle
 .data.irq_handler
                0x00000000200001dc        0x4 projects/blinky/build/debug/libs/mbed-os/hal/mbed_lp_ticker_api.o
 .data.irq_handler
                0x00000000200001e0        0x4 projects/blinky/build/debug/libs/mbed-os/hal/mbed_us_ticker_api.o
 .data._ZL11filehandles
                0x00000000200001e4      0x100 projects/blinky/build/debug/libs/mbed-os/platform/mbed_retarget.o
 .data._ZZ5_sbrkE4heap
                0x00000000200002e4        0x4 projects/blinky/build/debug/libs/mbed-os/platform/mbed_retarget.o
 .data._ZL14idle_hook_fptr
                0x00000000200002e8        0x4 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/mbed_rtx_idle.o
 .data.os       0x00000000200002ec       0xa4 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.o
                0x00000000200002ec                osRtxInfo
 .data.SystemCoreClock
                0x0000000020000390        0x4 projects/blinky/build/debug/libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/device/system_stm32f4xx.o
                0x0000000020000390                SystemCoreClock
 .data._impure_ptr
                0x0000000020000394        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-impure.o)
                0x0000000020000394                _impure_ptr
 .data.impure_data
                0x0000000020000398      0x428 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-impure.o)
 .data.__malloc_av_
                0x00000000200007c0      0x408 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
                0x00000000200007c0                __malloc_av_
 .data.__malloc_sbrk_base
                0x0000000020000bc8        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
                0x0000000020000bc8                __malloc_sbrk_base
 .data.__malloc_trim_threshold
                0x0000000020000bcc        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
                0x0000000020000bcc                __malloc_trim_threshold
 .data.__global_locale
                0x0000000020000bd0      0x16c /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-locale.o)
                0x0000000020000bd0                __global_locale
                0x0000000020000d40                . = ALIGN (0x8)
 *fill*         0x0000000020000d3c        0x4 
                0x0000000020000d40                PROVIDE (__preinit_array_start = .)
 *(.preinit_array)
                0x0000000020000d40                PROVIDE (__preinit_array_end = .)
                0x0000000020000d40                . = ALIGN (0x8)
                0x0000000020000d40                PROVIDE (__init_array_start = .)
 *(SORT_BY_NAME(.init_array.*))
 *(.init_array)
 .init_array    0x0000000020000d40        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
                0x0000000020000d44                PROVIDE (__init_array_end = .)
                0x0000000020000d48                . = ALIGN (0x8)
 *fill*         0x0000000020000d44        0x4 
                [!provide]                        PROVIDE (__fini_array_start = .)
 *(SORT_BY_NAME(.fini_array.*))
 *(.fini_array)
 .fini_array    0x0000000020000d48        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
                [!provide]                        PROVIDE (__fini_array_end = .)
 *(.jcr*)
                0x0000000020000d50                . = ALIGN (0x8)
 *fill*         0x0000000020000d4c        0x4 
                0x0000000020000d50                __data_end__ = .
                0x0000000020000d50                _edata = .

*also note that this map file was generated from mbed-cli and I found it in the build folder of my main app project.

The chunk of the .map file above is identical between .map file produced when the application includes trace init (fails) and omits it (succeeds).

DrynnBavis on 14 Aug 2019

To double-check - the crash is happening in the main image, right? As it tries to initialise its OS?

To verify -- indeed the crash occurs within the main image as it tries to initialise the OS (I think more specifically it's failing to initialise the Kernel)

Also, I'd like to see the lines from _sidata to _sdata in each image's map.

So this region in the main app is in my comment above. That same region in the bootloader_application.map is as follows (left works, right fails):

DrynnBavis on 14 Aug 2019

Are all those watchpoints from the main image execution - so the Reset_Handler is after your mbed_start_application?

I have both the bootloader and main application ELF files loaded into GDB. I've set a breakpoint at mbed_application.c:175 which is called from the bootloader - not the main applicaition - and I've recorded the watch points here for you with this breakpoint in place:

(gdb) monitor reset
Resetting target
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x200012a4 <os_idle_thread_stack+284>
New value = (osRtxThread_t *) 0x0
Reset_Handler ()
    at libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F413xH/device/TOOLCHAIN_GCC_ARM/startup_stm32f413xx.S:91
91    ldr  r0, =_sdata
(gdb)
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x20001144 <os_timer_thread_cb>
SVC_Handler () at irq_cm4f.S:102
102 irq_cm4f.S: No such file or directory.
(gdb)
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x20001144 <os_timer_thread_cb>
New value = (osRtxThread_t *) 0x20001038 <_main_obj>
SVC_Handler () at irq_cm4f.S:102
102 in irq_cm4f.S
(gdb)
Continuing.

Breakpoint 18, start_new_application (sp=0x20050000, pc=0x8026e0d <OS_Tick_Setup+52>)
    at libs/mbed-os/platform/mbed_application.c:175
175     __asm volatile(
(gdb)
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x20001038 <_main_obj>
New value = (osRtxThread_t *) 0x0
OS_Tick_Setup (freq=<error reading variable: Cannot access memory at address 0x20050004>,
    handler=<error reading variable: Cannot access memory at address 0x20050000>)
    at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/Source/os_systick.c:72
72    SysTick->VAL  =  0U;
(gdb)
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x10
0x0802421c in svcRtxKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:94
94    uint32_t offset = offsetof(osRtxInfo_t, kernel);
(gdb)
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08003880 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681         errno = EBADF;

So it seems the majority of the watch points are hit before entering the main app.

It looks like it is being set to zero, but I don't see where that 0x10 has come from. That doesn't correspond to the code you're stopping at. Except that I can see a nearby osRtxInfo.isr_queue.max = osRtxConfig.isr_queue.max;, which might be setting a value to 16.

So perhaps there's an offset here causing this 0x10 that was actually meant for the field osRtxConfig? Would this explain the bizarre values in os_id = 0x5 "v", as well?

So mismatch between loaded image and ELF in the debugger?

See my bottom response in this comment.

Is this is a custom bootloader you've created that has the RTOS in use? Our own bootloaders disable the RTOS, so you may be hitting some sort of problem with interrupts not being properly shut down that we haven't?

I've had a few people ask me about the bootloader. It's a managed mbed bootloader where the contents of main.cpp can be seen in my OP above. I've used target.restrict_size: 0x00020000 in the mbed_app.json and that's about it. I don't think I'm explicitly using any RTOS features (unless trace does?). And this bootloader works fine (with trace) on supported mbed targets. It's my own custom target that's causing this trouble. Is there a way I can explicitly check the RTOS is being disabled here?

The fact that SVC_Handler is being called before osKernelInitialize makes very little sense. As does being in OS_Tick_Setup. That shouldn't happen until osKernelStart, which should be after osKernelInitialize.

SVC_Handler is used for software interrupts right? Is there a way we can trace back to see who's calling these interrupts before osKernelInitialize gets called?

Are you sure you've got the right ELF file in the debugger?

So I was making the mistakes of assuming that when I typed > load into gdb, that it would flash the combined image of my main application with the bootloader. However, experimentally I've determined this not to be the case; instead it's loading both new versions of the ELF files, but only flashing the new image of the bootloader. I'm now using JFlashLite to flash the full main.bin. So I had the new bootloader flashed, with both the new elf files loaded into gdb, but the main application was never being updated. Not sure if this matters as much as I thought it may have since I was never really changing anything in my main.cpp, but there might be more in the binary that was changing other than the active app space that has to due with the bootloader from before it...

@kjbracey-arm What information would you like me to redo now that I'm properly flashing the binaries and loading the elfs to match?

DrynnBavis on 14 Aug 2019

Interesting update

Seeing above that there was some offset business I thought - at a high level here - perhaps this offset is coming from trace existing in the bootloader but not existing in the main application? So I initialised trace in my blinky (main) program as well and reflashed the image.

Sure enough, the bootloader successfully boots to the main app now and both are able to use trace messages.

[INFO][BOOT]: I'm tracing!
[INFO][MAIN]: Now in main!

I'm not sure what this tells us exactly, but I think this means for now I can have a working bootloader so long as the peripherals I use in it are also present in the main application as well. Surely this isn't intended behaviour?

EDIT: Something else that sucks is this problem exists backwards too. Meaning, if I have trace defined in my main app, but not in my bootloader, then it faults as well.

EDIT2: Another thing that may or may not be relevant -> The custom target we're using does not have an external low freq clock (no LSE). We have an 8 MHz external that we're using as for high freq stuff configured as the HSE, but for the low freq stuff it's being done with the LSI. I think I've configured my target correctly with this:

        "config": {
            "clock_source": {
                "help": "Mask value : USE_PLL_HSE_EXTC | USE_PLL_HSE_XTAL (need HW patch) | USE_PLL_HSI",
                "value": "USE_PLL_HSE_EXTC|USE_PLL_HSI",
                "macro_name": "CLOCK_SOURCE"
            },
            "lpticker_lptim": {
                "help": "This target supports LPTIM. Set value 1 to use LPTIM for LPTICKER, or 0 to use RTC wakeup timer",
                "value": 1
            }
        },
        "overrides": {
            "lpticker_delay_ticks": 4,
            "lse_available": 0
        },

But if I've made a mistake maybe that could be causing some troubles.

DrynnBavis on 14 Aug 2019

As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.

40Grit on 14 Aug 2019

As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.

How/Where would I confirm this?

DrynnBavis on 14 Aug 2019

I have both the bootloader and main application ELF files loaded into GDB.

This makes me nervous - I've no experience with loading two ELF files simultaneously into GDB, and I'm not sure how you disambiguate. Some of your output would be consistent with you printing the content of the bootloaders osRtxInfo (or the memory where it would have been) while actually running the main image.

Could you instead just load the ELF file for the main image into GDB?

I don't think I'm explicitly using any RTOS features

No, but it is an RTOS build, which means the RTOS is getting initialised, and you're running your bootloader main as an RTOS thread. As of 5.12/5.13, the RTOS can be excluded via a requires: [ "bare-metal" ] in your mbed_app.json, which would make it more like other bootloaders (and smaller). Still, the fact that your bootloader is RTOS is not _supposed_ to break anything.

As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.

Well, it's not really a reset, it's just "manually safe hardware and jump to Reset_Handler". Where that fails is for any hardware that was disrupted by the bootloader and not restored, and assumed to be in reset state by the main image.

SVC_Handler is used for software interrupts right? Is there a way we can trace back to see who's calling these interrupts before osKernelInitialize gets called?

If using an up-to-date pyOCD for your connection, then a breakpoint on SVC_Handler should show you - the calling point would be shown as a different "thread" - there's 1 thread for handler mode, and 1 (or more) threads for process mode.

If using a more archaic tool, then you'd have to dump the 8 words from SP while stopped on a SVC_Handler breakpoint - the return address for the SVC would be in the seventh word at SP+24. Show all 8 words in a dump, and that will tell us exactly what the supervisor call was. (SP+16 should be the address of an RTX function - look it up in the map).

These really should not be happening.

Anyway, all your memory maps look correct, so don't need to worry about them any more. But we still need to figure out what's going on with that 0x10 pointer that makes it abort - can you go around again with correctly matched ELF (main image only) and run through the boot? You won't have source references then for the bootloader bit, but that doesn't matter.

I can have a working bootloader so long as the peripherals I use in it are also present in the main application as well. Surely this isn't intended behaviour?

There isn't a very rigorous cleanup, tbh. I can imagine a situation where a peripheral is left active, then after entering the main image there's no driver (or init) code for it. Its interrupt generation should at least be masked by the powerdown_nvic call in mbed_start_application, which should minimise the opportunity for it to actually immediately break things. Power wastage would be more likely.

There's no general system "shut everything down" notification to give things a chance to clean-up before the start_new_application.

kjbracey-arm on 15 Aug 2019

@kjbracey-arm

Off topic but need to ask:
Is there any case here for the mbed bootloader to be linked to the same mbed-os instance in flash as the application?

40Grit on 15 Aug 2019

In principle, no, there shouldn't be. In practice, maybe?

In bigger systems, main images (like the Linux kernel) tend to be very paranoid about what their bootloader may or may not have done, so they tend have init code that manually sets/reset pretty much everything on entry. The bootloader could be years behind the kernel, so they just do not trust "reset" hardware state.

That does take code space though. It would be more space-efficient to trust the bootloader to put everything back it knows it touched, and have the main image just assume everything is in reset state.

As we're currently in a situation where neither of those is really happening, then I guess having bootloader and main image as close as possible does in practice minimise the chances of errors - such as this case potentially is.

kjbracey-arm on 15 Aug 2019

An ideal would be to have a chip that did support "real reset into secondary image". Have a register that the bootloader could write to that did "reset, but jump into this handler".

Or maybe you could even do that in bootloader software - if you could reliably indicate "reset reason", then the bootloader itself could do NVIC_Reset, re-enter itself, detect the reset reason and jump straight to main image this time.

I don't know if that approach has been attempted in Mbed OS - I've only seen the manual simulated-reset approach in mbed_start_application.

kjbracey-arm on 15 Aug 2019

My earlier comment about soft b reset was under the assumption that Mbed-os was using reset-reason and potentially some other core register.

I see now that is not the case.

40Grit on 15 Aug 2019

Thinking further that would require some sort of industry standards between BL and application to use the core in that way.

I'll make any further enquires into this topic in a separate issue.

40Grit on 15 Aug 2019

@40Grit @kjbracey-arm Going to try the bare metal setting when I get back, but I'll be out until tuesday. Just FYI so you don't think I've given up haha

DrynnBavis on 16 Aug 2019

Sorry for the delay, but I've returned and tried adding "requires bare-metal" to my mbed_app.json which now looks like this:

        "CUSTOM_TARGET": {
            "target.features_add": ["STORAGE"],
            "target.components_add": ["SPIF"],
            "target.components": ["SPIF"],
            "target.requires" : [ "bare-metal" ],
            "spif-driver.SPI_MOSI": "FLASH_SPI_MOSI",
            "spif-driver.SPI_MISO": "FLASH_SPI_MISO",
            "spif-driver.SPI_CLK": "FLASH_SPI_SCK",
            "spif-driver.SPI_CS": "FLASH_SPI_CS",
            "spif-driver.SPI_FREQ": 40000000,
            "target.restrict_size": "0x00020000"
        },

I'm still crashing inside of svcRtxKernelInitialize unfortunately. I can follow the crash up until this line here inside of svcRtxKernelInitialize:

0x0802445e <+622>:  ldr.w   pc, [sp], #4

where pc = 0x802445e <svcRtxKernelInitialize+622> and sp = 0x2004ffac

After this if I try to go forward (stepping), I end up at 0x08004842 and the elf file is undefined for this ROM region (using the main app's elf file as requested per @kjbracey-arm). It's some point after this that the crash occurs.

I'm confused with what's going on here. Why is it accessing code at 0x08004842?

EDIT: After adding the bootloader elf again to see what's going on, it seems to be accessing SVC_Handler code and just crashing as before:

Breakpoint 17, 0x0802445e in svcRtxKernelInitialize ()
    at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:186
186 }
(gdb) stepi
SVC_Handler () at irq_cm4f.S:64
64  irq_cm4f.S: No such file or directory.

DrynnBavis on 20 Aug 2019

@kjbracey-arm @40Grit Just for the sake of sanity, could either of you two reiterate the requirements for defining a custom target? I'm worried I may have missed a very basic step here and perhaps this is the source of the trouble here. I've read over this section from you guys a number of times but I want to be sure here.

I'm specifically concerned about this section here:

MCUs are required to have, and Families and Subfamilies may have:
release_versions.
supported_toolchains.
default_lib.
public.
device_has.

Is my target definition for an MCU, Family, or Subfamily? Perhaps I'm missing one of these fields and that's what's causing this.

DrynnBavis on 21 Aug 2019

I have no direct affiliation with ARM. I just work for one of their partners (Embedded Planet)

Without watching the config, build, and debug session right in front if me I'm starting to fish for ideas.

Another sanity check could be to check the parts errata.

What linker file does your build end up using?

@kjbracey-arm might the image built for the nucleo board run in the processor that @drynnbavis is using since it really only differs by package?

40Grit on 21 Aug 2019

Another sanity check could be to check the parts errata.

I've been through this for the STM32F4xH already and nothing jumped out at me.

What linker file does your build end up using?

Pretty sure I'm using this one here

DrynnBavis on 21 Aug 2019

Some more poking around I found the the osKernelPreInit section of assembly code is simply two lines, first is a nop and the second is just a branch back to the lr. Off-topic here, but is there any point in this and is this OK?

DrynnBavis on 21 Aug 2019

@drynnbavis if your bootloader has no operational dependency on peripheral IO, I would see what happens if you flash the working binaries built for the nucleo board.

40Grit on 21 Aug 2019

@DrynnBavis if your bootloader has no operational dependency on peripheral IO, I would see what happens if you flash the working binaries built for the nucleo board.

So I set my target to NUCLEO_F413ZH and made sure the bootloader + main code was previously working on my old target. The debugger gets trapped within mbed_rtos_start() and never seems to boot to main.

EDIT: Recall that there's no LSE on my custom board so I need to use LSI (as defined in my custom_targets.json). I don't see this in the target definition of the NUCLEO_F413ZH so this is probably not going to work.

DrynnBavis on 21 Aug 2019

@DrynnBavis Is there a fully internal clock you can use for the time being to bring this up?

loverdeg on 21 Aug 2019

@loverdeg-ep @40Grit I added an override to the NUCLEO_F413ZH target to disable the LSE (and thus use the LSI) with:

        "overrides": { 
            "lpticker_delay_ticks": 4, 
            "tickless-from-us-ticker": true,
            "lse_available": 0
        },

and this works for the same setup that was working when I used my custom target. However, once I added trace back to only the bootloader, the same behaviour with the same fault location occurs. Good idea, but still no luck.

DrynnBavis on 21 Aug 2019

Do you have access to a debugger that supports SWO?
I'd like to see what happens if you redirect trace over SWO.

loverdeg on 21 Aug 2019

Do you have access to a debugger that supports SWO?
I'd like to see what happens if you redirect trace over SWO.

I do indeed, what test would you like me to redo over SWO?

DrynnBavis on 21 Aug 2019

I haven't used this interface personally yet but it should achieve trace through swo.

https://github.com/ARMmbed/mbed-os/blob/master/platform/mbed_retarget.h#L97

Use your custom target and a bare minimum but still failing config I guess?

Using SWO should eliminate the need for the UART and provide another datapoint; since trace enabled, in exclusive or fashion, between app and bootloader seems to cause a failure.

loverdeg on 21 Aug 2019

Use your custom target and a bare minimum but still failing config I guess?

Even with SWO, I'm crashing in the same spot. Explicitly, I'm failing within SVC_0 during osKernelInitialize of the main program (the second time osKernelInitalize gets called)

DrynnBavis on 22 Aug 2019

Hmm, so has the system failed to set up the vector table pointer?

Maybe the target doesn't actually have RAM vector tables properly configured?

Almost all Mbed OS targets arrange for the initial exception vectors to be copied from ROM to RAM, and then set VTOR. This then allows NVIC_SetVector to work. A very few targets do not do this, and keep vectors in ROM.

I've not thought about this before, but there are two ways this can not work with bootloader

If the vectors are not copied into RAM, VTOR will remain pointing at the bootloader's initial vector table, so any exceptions will make it jump into the bootloader.
The vector copy routine might make the mistake of copying from the bootloader's vector table (start of ROM) rather than its own table (base address of main image). It should be getting an address from the linker map. Again, the effect would be to jump into the bootloader.

It's possible you've always been jumping into the bootloader. If so you've managed something quite fascinating, and ultimately desirable as an optimisation - sharing code between main image and bootloader.

There's no reason the bootloader's task and SVC dispatcher can't work for the main image, as long as osRtxInfo is in the same place for both images, and they're both configured the same....

kjbracey-arm on 22 Aug 2019

@kjbracey-arm we led @DrynnBavis through the exercise of confirming vtor in a previous issue. I'm betting I didn't properly identify the failure to relocate or that the exercise needs to be done again.

11140 (Which I just realized is not linked here.)

@kjbracey-arm I do hope shared code has been achieved; been wanting to look into that for a while. I was going to take a position independent approach however.

40Grit on 22 Aug 2019

If the vectors are not copied into RAM, VTOR will remain pointing at the bootloader's initial vector table, so any exceptions will make it jump into the bootloader.

@kjbracey-arm I've actually noticed that when I load only the main code's ELF, that the pre main() reset initialization work is occurring at addresses lower than the start address of the main codes region of ROM. I can confirm that osKernelInitalise is indeed occurring in the main code region, but the SVC_handler as well as the Fault_handler are being called from within the pre main code ROM region. Are these assembly source files of the SVC_handler and the Fault_handler things that the vector table will be pointing to? Or rather, does it make sense that these two Source files are being called at pre main ROM regions?

Almost all Mbed OS targets arrange for the initial exception vectors to be copied from ROM to RAM, and then set VTOR. This then allows NVIC_SetVector to work. A very few targets do not do this, and keep vectors in ROM.

IIRC, the stack pointer was being assigned to 0x20050000, which to me seemed to be the start of RAM. So if I'm guessing correctly here, the vector table of the bootloader should be at the start of it's ROM correct (0x08000000)? And then the vector table of the main app should be at the start of RAM (0x20050000)? I'll try to confirm this when I get back to the office.

I'm betting I didn't properly identify the failure to relocate or that the exercise needs to be done again.

@40Grit I wouldn't mind running through these exercises again if you guys would like to see some results. I think I have a better understanding now of whats going on so I could probably give you more helpful data.

DrynnBavis on 22 Aug 2019

Stepping through the debugger from after mbed_app_start() in the bootloader, I've found this:

The fault is occurring within Reset_Handler, within the function _start(), on this line ->

0x08020244 <+44>:   blx r3 // r3 = 0x8023b91

Continuing over this line throws the crash, so let's step into it (note, something interesting here is that pc=0x8023b90 and NOT 0x8023b91 which is the actual value of r3 that I think we're supposed to branch into. Seems odd to me r3 doesn't even load a seemingly valid address in the first place!). From here it branches into this function:

software_init_hook () at libs/mbed-os/rtos/TARGET_CORTEX/TOOLCHAIN_GCC_ARM/mbed_boot_gcc_arm.c:46
46      mbed_stack_isr_start = (unsigned char *) &__StackLimit;

From here, I'm able to use nexti until this line:

=> 0x08023bb2 <+34>:    bl  0x8023c98 <mbed_init>

stepping into here I arrive at:

mbed_init () at libs/mbed-os/rtos/TARGET_CORTEX/mbed_boot.c:76
76      mbed_mpu_manager_init();

from here a couple of branches get called:

(gdb) disas
Dump of assembler code for function mbed_init:
   0x08023c98 <+0>: push    {r3, lr}
   0x08023c9a <+2>: bl  0x80220d8 <mbed_mpu_init>
   0x08023c9e <+6>: bl  0x8023ccc <mbed_cpy_nvic>
   0x08023ca2 <+10>:    bl  0x8029400 <mbed_sdk_init>
   0x08023ca6 <+14>:    bl  0x802a178 <us_ticker_init>
=> 0x08023caa <+18>:    bl  0x8023d14 <mbed_rtos_init>
   0x08023cae <+22>:    nop
   0x08023cb0 <+24>:    pop {r3, pc}
End of assembler dump.

I'm able to nexti until mbed_rtos_init, so again, stepping into it I get to:

=> 0x08023d16 <+2>: bl  0x80247c0 <osKernelInitialize>

In here, both IsIrqMode and IsIrqMasked return false, so we end up at this line:

__svcKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:488
488 SVC0_0 (KernelInitialize,       osStatus_t)
(gdb) disas
Dump of assembler code for function osKernelInitialize:
   0x080247c0 <+0>: push    {lr}
   0x080247c2 <+2>: sub sp, #12
   0x080247c4 <+4>: bl  0x80247bc <osRtxKernelPreInit>
   0x080247c8 <+8>: bl  0x8024208 <IsIrqMode>
   0x080247cc <+12>:    mov r3, r0
   0x080247ce <+14>:    cmp r3, #0
   0x080247d0 <+16>:    bne.n   0x80247dc <osKernelInitialize+28>
   0x080247d2 <+18>:    bl  0x8024222 <IsIrqMasked>
   0x080247d6 <+22>:    mov r3, r0
   0x080247d8 <+24>:    cmp r3, #0
   0x080247da <+26>:    beq.n   0x80247ec <osKernelInitialize+44>
   0x080247dc <+28>:    mvn.w   r0, #5
   0x080247e0 <+32>:    bl  0x8023f5c <EvrRtxKernelError>
   0x080247e4 <+36>:    mvn.w   r3, #5
   0x080247e8 <+40>:    str r3, [sp, #4]
   0x080247ea <+42>:    b.n 0x80247f6 <osKernelInitialize+54>
   0x080247ec <+44>:    ldr r3, [pc, #16]   ; (0x8024800 <osKernelInitialize+64>)
   0x080247ee <+46>:    mov r12, r3
=> 0x080247f0 <+48>:    svc 0
   0x080247f2 <+50>:    mov r3, r0
   0x080247f4 <+52>:    str r3, [sp, #4]
   0x080247f6 <+54>:    ldr r3, [sp, #4]
   0x080247f8 <+56>:    mov r0, r3
   0x080247fa <+58>:    add sp, #12
   0x080247fc <+60>:    ldr.w   pc, [sp], #4
   0x08024800 <+64>:    muls    r1, r0
   0x08024802 <+66>:    lsrs    r2, r0, #32
End of assembler dump.

Where finally, we get to line 93 of irq_cm4f.S before entering the HardFault_Handler().

So there it is... a complete trace through the hardfault I'm seeing in my main program from using trace in my bootloader and not my main application for a custom target. @kjbracey-arm @40Grit @loverdeg-ep if any of you would like to see register values or any other tests please don't hesitate to ask.

*note, all of this seems to be working in ROM regions > 0x8020000 (start address of main) EXCEPT for the SVC_Handler, which seems to start in this region:

Dump of assembler code for function SVC_Handler:
=> 0x08004974 <+0>: tst.w   lr, #4

DrynnBavis on 22 Aug 2019

I can see VTOR being set in system_clock.c. The Nucleo_F413ZH version looks like it sets it just to FLASH_BASE, which is wrong if you're the main image living above a bootloader.

https://github.com/ARMmbed/mbed-os/blob/2c3ce9629eefd3af63657c62a1e40c28f4a1facd/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F413xH/TARGET_NUCLEO_F413ZH/system_clock.c#L97

Nucleo F429ZI is setting it to NVIC_FLASH_VECTOR_ADDRESS, and that device does work with bootloaders.

https://github.com/ARMmbed/mbed-os/blob/2c3ce9629eefd3af63657c62a1e40c28f4a1facd/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F429xI/TARGET_NUCLEO_F429ZI/system_clock.c#L90

So if you were based on F413ZH, I think you need to follow through and copy bits from F429ZI to get that initial VTOR set correctly - figure out where that define comes from and get it set up.

After target-specific init, the generic mbed_copy_nvic copies from initial VTOR into RAM, then changes VTOR to point to that RAM, so it's relying on that initial VTOR pointing to the correct ROM table.

https://github.com/ARMmbed/mbed-os/blob/2c3ce9629eefd3af63657c62a1e40c28f4a1facd/platform/source/mbed_sdk_boot.c#L52-L68

kjbracey-arm on 23 Aug 2019

@LMESTM - could it be a task for you to make the start-up code for various STM devices consistent, if only some support bootloaders?

something interesting here is that pc=0x8023b90 and NOT 0x8023b91

The 1 bit at the bottom just indicates "Thumb" instruction set, rather than "ARM". All function pointers on a Cortex-M device will have their bottom bit set, but instructions are located at even addresses, and PC readouts are even.

kjbracey-arm on 23 Aug 2019

@kjbracey-arm I think @drynnbavis is not seeing the issue on nucleo_f413zh.

But is seeing the issue on a custom target using f413rh.

40Grit on 23 Aug 2019

But is seeing the issue on a custom target using f413rh.

Yes, but he must have copied that code from somewhere, so if he copied it from 413ZH, it would have inherited that issue.

kjbracey-arm on 23 Aug 2019

😕1

I'm missing something.

Apparently the bootloader and application work fine on the nucleo board. (f413zh)

40Grit on 23 Aug 2019

It's interesting that this target code feels the need to explicitly set VTOR for ROM anyway.

Given that we've just reset, we surely must have entered our Reset Handler, so VTOR must be correctly set?

The only issue is that the bootloader itself doesn't modify VTOR before jumping into the reset handler manually. That would be a reason to set VTOR in main image, but surely the bootloader should have done it itself...

Edit: actually it does, powerdown_scb does set VTOR to the address of the new image.

kjbracey-arm on 23 Aug 2019

Apparently the bootloader and application work fine on the nucleo board. (f413zh)

According to whom? Whose bootloader? @DrynnBavis's?

Looking at that code, I don't see how any bootloader would work - FLASH_BASE is just defined to 0x08000000U, so VTOR will not be right, and mbed_copy_nvic will copy from the wrong place.

Mind you, I'm just asserting that by code inspection. No idea who's using a Nucleo F413ZH.

kjbracey-arm on 23 Aug 2019

https://github.com/ARMmbed/mbed-os/issues/11140#issuecomment-519648999

40Grit on 23 Aug 2019

But was that working by jumping into the bootloader and all the data happening to be in the same place, as it does on the custom board sometimes, if the bootloader is RTOS-based?

kjbracey-arm on 23 Aug 2019

https://github.com/ARMmbed/mbed-os/issues/11140#issuecomment-520283248

40Grit on 23 Aug 2019

I see on that issue confirmation that VTOR was correct (0x08020000) entering the image, but by reading the code, SystemInit would then set it to 0x08000000, then mbed_copy_nvic would copy from 0x08000000 into RAM.

The moral of the story may be that SystemInit should not be setting VTOR.

kjbracey-arm on 23 Aug 2019

dumb luck that nucleo is working then?

Is this in ST's court now?

40Grit on 23 Aug 2019

If all this analysis is correct, @DrynnBavis can modify his own custom target's VTOR setup to match a working STM target. Or just take the VTOR setting out altogether - I don't think it's required.

ST should look at the inconsistency between their targets.

kjbracey-arm on 23 Aug 2019

🎉1

I have a very similar issue with my custom bootloader based on the SAML21J18. I can use mbed-trace just fine; however, when I use SDBlockDevice the application fails to start. Should I create a new issue before I go into any more detail here?

learytomlin on 23 Aug 2019

New issue please. Reference this one if you want.

loverdeg on 23 Aug 2019

👍1

If all this analysis is correct, @DrynnBavis can modify his own custom target's VTOR setup to match a working STM target. Or just take the VTOR setting out altogether - I don't think it's required.

@kjbracey-arm or @40Grit (your time zone matches mine a little better), what code am I modifying here? Can you give me a file name and line number? Thanks

DrynnBavis on 23 Aug 2019

@DrynnBavis https://github.com/ARMmbed/mbed-os/issues/11205#issuecomment-524220737

I the comment linked above should give a starting point.

loverdeg on 23 Aug 2019

Somehow missed this on first read, thanks @loverdeg-ep. I've got to leave for something right now but I'll try this when I return in a few hours.

DrynnBavis on 23 Aug 2019

@DrynnBavis and we actually are probably in the same timezone. I just get up super early and watch github.

It is the only chance I have to get in contact with the experts in Oulu.

loverdeg on 23 Aug 2019

👍1

YES. IT'S WORKING!!!

@kjbracey-arm I tried using NVIC_FLASH_VECTOR_ADDRESS but it was undefined, so I commented this out enitrely:

//     /* Configure the Vector Table location add offset address ------------------*/
// #ifdef VECT_TAB_SRAM
//     SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
// #else
//     SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */
// #endif

This seems to have fixed my troubles for now. Will having this commented out cause any problems in the future? Curious to know why this works on the ZH chip but not the RH chip...

To confirm: I did indeed copy system_clock.c from the nucleo board. Commenting out the Vector Table location and offset address section within SystemInit of system_clock.c fixed my issue.

DrynnBavis on 23 Aug 2019

🎉3

What's the best protocol here in terms of closing this issue? Is there a PR I can make to give a better target definition for the RH chips?

I'm also still interested in exactly why this solved the problem. So I uncommented that section (back to the faulty code) and set a break point right before this point:

#ifdef VECT_TAB_SRAM
    SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
    SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */
#endif

using info macro VECT_TAB_SRAM in GDB, I found it to be undefined
with the same tool, I found SRAM_BASE = SRAM1_BASE = 0x20000000U
and FLASH_BASE = 0x08000000U
and finally VECT_TAB_OFFSET = 0x00

So because VECT_TAB_SRAM is undefined, it would seem we are indeed using the bootloader's vector table in ROM rather than the main application's vector table in SRAM. But surely this exact code inside of system_clock.c is being used for the ZH targets? Why is this not a problem there?

Does this have anything to do with me using an LSI rather than an LSE?

DrynnBavis on 23 Aug 2019

I've been through this type of exercise multiple times myself and had the Cortex-M user-manual next to me when writing custom boot-loaders. I go step by step through the assembly and watch the status of the register. This usually gets me the type of understanding you are looking for with issues like these.

Even still I don't know the architecture well enough to diagnose this without spending a couple hours of analysis.

Figure out who is most active from ST in Mbed and copy them on this thread. Or open a ticket with ST and point them to this issue.

However It will probably fall to you to learn the "rule book" and definitively prove exactly why two parts which should? have the same memory map, programmed with the same binary behave differently.

The next step i would take is prove that both parts have all the same peripherals mapped to all the same places in memory.

If that is true I would think the binary should work the same in both as long as neither part were relying on any external signals.

loverdeg on 23 Aug 2019

Curious to know why this works on the ZH chip but not the RH chip...

There's no evidence there's any chip difference here is there? The code sets VTOR wrong, but gets away with it if (a) there is no bootloader (so "wrong" happens to be right), or (b) the bootloader has an RTOS and the memory layout is the same (or similar enough)

Presumably in all your ZH builds, you've ended up with images with matching RTOS memory layout. If adding/removing trace doesn't affect the ZH, then there's likely no deep meaningful significance - some difference in link order/padding means trace ends up not shifting osRtxInfo in that build. Feel free to investigate the maps, in case it reveals some inconsistency, but it may be nothing.

kjbracey-arm on 26 Aug 2019

Feel free to investigate the maps, in case it reveals some inconsistency, but it may be nothing.

Very behind on my work now since this issue came up, so first I'll have to work on that. But later this week, if I've the time then I'll definitely look into it.

DrynnBavis on 26 Aug 2019

@LMESTM - could it be a task for you to make the start-up code for various STM devices consistent, if only some support bootloaders?

@DrynnBavis @kjbracey-arm - sorry I was away when copied here and I missed it.
Most of the vector relocation mechanism has been mostly pushed by arm engineers and has evolved over the past year, but it was indeed not updated consistently over all STM32 targets but rather on a case by case when enabling bootloader feature.

Anyway I agree that we probably have to clean-up thinkgs a little bit now.

To make things clear, does you custom target boots ok if you're modifying system_clock.c file as below rather than commenting out those lines:

    /* Configure the Vector Table location add offset address ------------------*/
#ifdef VECT_TAB_SRAM
    SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
    SCB->VTOR = NVIC_FLASH_VECTOR_ADDRESS; /* Vector Table Relocation in Internal FLASH */
#endif

LMESTM on 27 Aug 2019

👍1

In an ideal world, you shouldn't need to set VTOR on boot, if intending to carry on using the ROM vectors. It seems reasonable to assume that as you've just booted, you came through the Reset_Handler, so VTOR must be correct.

The catch is that a bootloader might have entered your reset vector without adjusting VTOR. Our bootloader does set it, but others might not. So setting it seems reasonable.

kjbracey-arm on 27 Aug 2019

This implementation was introduced back in 2017 here: https://github.com/ARMmbed/mbed-os/pull/3798
but I'm also fine with removing completely those lines from all our STM targets if we consider it reliable in mbed-os scope.

LMESTM on 27 Aug 2019

I've just had a chat with someone who is looking at booting into Mbed OS from another bootloader altogether, and he believes that it doesn't set VTOR, so better safe than sorry. (Mind you, it doesn't even set MSP or CONTROL correctly either, so you'd probably need to add more...).

kjbracey-arm on 27 Aug 2019

@kjbracey-arm Maybe this is naive but it would be nice if there were some standards surrounding this stuff.

"CMSIS-boot"

40Grit on 27 Aug 2019

Well, if the bootloader is jumping into the vector table reset handler of a "raw" standalone image, then it's effectively avoiding having its own standard and using the "reset" standard. In which case it's kind of the bootloader's responsibility to have the chip in perfect reset state, right?

So by that standard, we shouldn't have to write VTOR. But in practice, the main image is the malleable one while bootloaders tend to be locked in, so it's the main image that has to cope with whatever bootloaders do.

kjbracey-arm on 27 Aug 2019

The fact that they are "locked in" I'd say is more reason to develop a standard.

That way an application could be better prepared to know what it is dealing with.

I'll keep thinking on this one and find the right place to converse further.

40Grit on 27 Aug 2019

To make things clear, does you custom target boots ok if you're modifying system_clock.c file as below rather than commenting out those lines:

    /* Configure the Vector Table location add offset address ------------------*/
#ifdef VECT_TAB_SRAM
    SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
    SCB->VTOR = NVIC_FLASH_VECTOR_ADDRESS; /* Vector Table Relocation in Internal FLASH */
#endif

@LMESTM To confirm, no that macro NVIC_FLASH_VECTOR_ADDRESS is undefined. I've commented that chunk completely to get my bootloader/main app working. I haven't the time yet to check where that macro should be defined and see if maybe I could fix that instead.

DrynnBavis on 30 Aug 2019

For future readers that have scrolled this far looking for a solution: the fix for me was to comment out that entire chunk of code in my comment just above this inside of the system_clock.c file that I had inside the folder of my custom target (I had copied this file from the folder nucleo_f413zh -- a mbed supported target with an STM32F413ZH chip while I was using an STM32F413RH).

Thank you everyone involved on debugging this. Quite the struggle but I'm really happy we actually came to a fix.

I'm going to close this issue now as the problem has a working solution. Though I'll still be following here (or in another thread if we do that instead)

DrynnBavis on 30 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings