Mbed-os: wait() hangs if a 5.9 app is started with a 5.8 bootloader

Created on 1 Jul 2018 · 15Comments · Source: ARMmbed/mbed-os

Tested on 2 boards:

A custom board with EFM32PG12: confirmed
EFM32GG_STK3700: confirmed

With a minimal blinky example, the application starts, LED lights up and stays like this forever.

App:

#include "mbed.h"

DigitalOut led(LED0);

int main() {
    led = 1;

    while (true) {
        Thread::wait(1000); // stuck if started with a 5.8 bootloader, same thing with wait_ms()
        led = !led;
    }
}

Bootloader:

#include "mbed.h"

int main() {
    mbed_start_application(POST_APPLICATION_ADDR);
}

Here is a full minimal example: https://github.com/amq/bootloader-issue

Toolchain: 6-2017-q2-update

[ ] Question
[ ] Enhancement
[x] Bug

IOTOSM-2231 OPEN platform mirrored bug

Source

amq

Most helpful comment

I am also using a bootloader in my application, but for the bootloader I am using mbed-Dev (former mbed-os 2) instead of mbed-os 5.x.x. It's working for me as expected.
Furthermore with mbed-dev and some minor compiler tricks, I was able to reduce the size of the bootloader from ~50KB+ to 17KB. But nevertheless I am a little bit concerned about this issue, because I am fearing this affects my application too.
If I have time (maybe tomorrow) I will look into this issue, to verify it this happens also on STM Boards.
[Mirrored to Jira]

DBS06 on 12 Jul 2018

👍2

All 15 comments

@ARMmbed/team-silabs Please review
[Mirrored to Jira]

0xc0170 on 2 Jul 2018

Results of more testing:

| bootloader | application | result |
|------------|-------------|--------|
| 5.8.3 | 5.8.3 | OK |
| 5.8.3 | 5.8.6 | OK |
| 5.8.3 | 5.9.0-rc2 | BAD |
| 5.8.3 | 5.9.0 | BAD |
| 5.8.3 | 5.9.2 | BAD |
| 5.8.6 | 5.8.6 | OK |
| 5.8.6 | 5.9.2 | BAD |
| 5.9.2 | 5.9.2 | OK |
[Mirrored to Jira]

amq on 11 Jul 2018

Is this a Silicon Labs specific issue, or a general mbed OS one? Do you have other mbed targets to test with, @amq?
[Mirrored to Jira]

stevew817 on 11 Jul 2018

@stevew817 I don't have other boards nearby.

@c1728p9 could you help?
[Mirrored to Jira]

amq on 11 Jul 2018

DBS06 on 12 Jul 2018

👍2

If I had to take an educated guess, I'd think it has something to do with the timebases...

mbed-OS 5.9 switched to running tickless on EFM32 targets in 5.8.2 (#6475), in addition to changing the implementation and default clock frequency of the low power tickers starting with 5.9.0 (#6471 changes the Silicon Labs HAL to be compatible with the mbed API changes).

It is possible that the mbed bootloader uses a low power timebase (haven't looked into this much), which would initialise that timebase the way it happened before 5.9, and then afterwards your 5.9-based application gets confused because of the already-initialised timebase.

IMO it would be the bootloader's job to put the system back in a reset-ish state before starting the bootloaded application, but mbed-os doesn't have a conclusive strategy on destructors... (And neither do I think their bootloader destructs all resources it uses - however I didn't validate that claim).
[Mirrored to Jira]

stevew817 on 12 Jul 2018

I am able to reproduce the issue on an EFM32PG12_STK3402, bud sadly I wasn't able to test it on an STM-Nucleo, because the NUCLEO_F401RE Boards which I have, don't have bootloader support. 😞
[Mirrored to Jira]

DBS06 on 13 Jul 2018

It would be really helpful if someone could try to replicate this on STM, the issue is rather critical, we have already deployed devices in the field with 5.8
[Mirrored to Jira]

amq on 20 Jul 2018

It would be really helpful if someone could try to replicate this on STM, the issue is rather critical, we have already deployed devices in the field with 5.8

@ARMmbed/team-st-mcd Can you the bootloader test shared above?
[Mirrored to Jira]

0xc0170 on 20 Jul 2018

@amq I tested your minimal example on a STM NUCLEO_F746ZG Dev-Board and it looks it isn't a Silicon Labs specific issue, I have the same results as you already mentioned:

bootloader | application | result
-- | -- | --
5.8.3 | 5.8.3 | OK
5.8.3 | 5.8.6 | OK
5.8.3 | 5.9.0-rc2 | BAD
5.8.3 | 5.9.0 | BAD
5.8.3 | 5.9.2 | BAD
5.8.6 | 5.8.6 | OK
5.8.6 | 5.9.2 | BAD
5.9.2 | 5.9.2 | OK

[Mirrored to Jira]

DBS06 on 23 Jul 2018

👍1

Thanks @DBS06 , we will review this issue
[Mirrored to Jira]

0xc0170 on 23 Jul 2018

👍1

@0xc0170 This looks like Mbed OS issue, not Silicon Labs one. Will remove the silicon-labs tag
[Mirrored to Jira]

screamerbg on 10 Sep 2018

@amq and @DBS06 could you attach the binaries you built (bootloader, application and combined), to this jira so I can test with them? I tried reproducing this problem with a NUCLEO_F746ZG but so far have been unable to recreate a crash or lockup. I was using mbed-os-5.8.3 for the bootloader and mbed-os-5.9.2 for the application both taken from https://github.com/amq/bootloader-issue.

c1728p9 on 27 Oct 2018

Just tried bootloader 5.8.6 with application 5.12.1, and the problem persists.

Note that the application does not crash or lock up completely, just the clocks seem to be off. With 5.12.1 what I'm seeing is that Thread::wait(1000) takes about 7s in reality on EFM32GG_STK3700.

amq on 15 Apr 2019

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2231