Mbed-os: Timer issues with several targets

Created on 5 May 2020 · 21Comments · Source: ARMmbed/mbed-os

Description of defect

Inaccurate timeouts since d1ae0d570ceac567849881dd75639c57e93de05a was merged. Affects to Wi-SUN frequency hopping timer and potentially to other Nanostack timers too.

Measured with oscilloscope: Configured periodic timeout of 255 milliseconds is initially working properly but after couple of minutes, it start growing and error could be eventially several hundreds of milliseconds. For example, 30 minutes after test start the periodic timeout of 255ms was actually jumping somewhere between 400 and 500 milliseconds.

Target(s) affected by this defect ?

Seems that at least K64F, K66F and Disco_F769NI doesn't work properly.

For some reason, this issue doesn't affect to Nucleo_F429ZI.

Toolchain(s) (name and version) displaying this defect ?

GCC for Arm (gcc-arm-none-eabi-9-2019-q4-major)

What version of Mbed-os are you using (tag or sha) ?

d1ae0d570ceac567849881dd75639c57e93de05a

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

mbed-cli 1.2.2

How is this defect reproduced ?

Build simple Wi-SUN network of Border router (nanostack-border-router) and Router (mbed-os-example-mesh-minimal). mbed-mesh-test-application could also be used to configure device as BR or Router

mbed-mesh-test-application commands for Border router:
ifconfig --extension Wi-Sun
ifconfig mesh0 --mode brouter
ifup

mbed-mesh-test-application commands for Router:
ifconfig --extension Wi-Sun
ifup

Wait until Router joins the network. This issue should prevent it from joining.

CLOSED mirrored bug

Source

JarkkoPaso

Most helpful comment

(If it turns out CI has not been testing timing stuff, I'm going to be a bit grumpy)

kjbracey-arm on 6 May 2020

😄2 👍1

All 21 comments

PR that includes the sha referenced above: https://github.com/ARMmbed/mbed-os/pull/12425 (Chrono update)

For some reason, this issue doesn't affect to Nucleo_F429ZI.

A question was if this is the only target that works and its tickless - how the others differ ? Is this related to deep sleep locking or not?

cc @kjbracey-arm @ARMmbed/mbed-os-core

0xc0170 on 5 May 2020

Nucleo 429ZI is the only non-tickless target in the list, from my reading of targets.json.

kjbracey-arm on 5 May 2020

To double-check what you've previously told me - the measurement is coming purely from GPIO instrumentation of the point of call to Timeout::attach_us, and the routine called by Timeout::attach_us, yes?

Which would mean no possible factors from RTOS/event queue timing calculations or scheduling, so it has to be from the usticker-based Timeout calculation, or general system IRQ load/locking or wakeup problems?

kjbracey-arm on 5 May 2020

Yes, GPIO is toggled in callback from FHSS timer driver: https://github.com/ARMmbed/mbed-os/blob/master/features/nanostack/nanostack-hal-mbed-cmsis-rtos/arm_hal_fhss_timer.cpp#L99

Every callback immediately starts new timeout. No events used there.

JarkkoPaso on 5 May 2020

I did look at the FHSS while doing the Chrono work - PR #12903 adapts it to use new APIs, including a Timeout::remaining_time() call I added specifically for it to remove the need for a separate Timer and local start_time and stop_time variables.

Because you start a Timer and leave it always running, that effectively locks deep sleep forever, or should. The new version does that explicitly. I'd like you to double-check the current version by putting in the explicit lock call there as well as the Timer start, to make absolutely sure.

I'm struggling to think of a mechanism that can make you a massive 100ms late aside from deep sleep wakeup problems. That's not interrupt timescale, that's very bad event loop timescale.

Ultimately, to avoid any sort of drift problems accumulating, this stuff could be using absolute time. All the framework is now in place - it could be platform_fhss_timer_start_absolute(abs_time) -> Timeout::attach_absolute.

kjbracey-arm on 5 May 2020

You've said MBED_TICKLESS off makes K64F pass.

Another thing to try is passing an empty function to Kernel::attach_idle_hook. That will make it do nothing when idle rather than trying to enter full sleep.

If that works, then try passing a function that just does __WFE() to that - the lightest possible sleep.

Both of those make the tickless-built system semi-tickless. Retains the infrastructure for tickless, but keeps the ticker always running, rather than suspending the OS. Will narrow it down.

kjbracey-arm on 5 May 2020

I afraid that tickles may affect drift to timers which is critical for fhss. Why tickles mode is default mode?

juhhei01 on 5 May 2020

It's default because it saves power. At a latency cost. That latency cost should largely be dispelled by the fact you've done a Timer->start() or manual deep sleep lock in init though. That should stop deep sleep ever being entered at runtime, which does in turn make tickless somewhat pointless. ~~Your systems would probably be better built with it off, if you will always have FHSS active. But if you ever stopped FHSS network operations, it would be different.~~

edit: going to take that back - there's still a benefit to tickless, even with deep sleep disabled. It stops you waking up from your shallow sleep every millisecond.

But there's clearly a massive regression here that needs to be investigated - those platforms have been tickless for a couple of years. Their performance shouldn't have gotten worse. And this isn't just "getting worse", it's going drastically wrong.

Software timer drift would not be an issue if absolute time was used - there's always a continuous monotonic timebase that can be used to trigger stuff on any strict schedule. But I guess it would still need adjustment for long-term hardware crystal drift.

kjbracey-arm on 5 May 2020

👍1

@kjbracey-arm Actually K66F, but yes, turning off MBED_TICKLESS made it work.

Next test was to call "sleep_manager_lock_deep_sleep" in FHSS timer driver init after timer->start() but it didn't help.

JarkkoPaso on 5 May 2020

Thanks - keep giving me info. Remaining requests are the idle hook suggestions, and a further Git bisection.

Still thinking through it, but not got anything yet.

kjbracey-arm on 5 May 2020

Ok, I tested idle hook with empty function and with __WFE() call. Both seem to fix the timer issue.

JarkkoPaso on 5 May 2020

👍3

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/MBOTRIAGE-2656

ciarmcom on 5 May 2020

Hi
From my side, I noticed that tests-mbed_drivers-lp_timeout becomes failed with test case
'Timing drift (attach)' since #12425 merge
(several targets)

jeromecoutant on 6 May 2020

@jeromecoutant I'll browse nightly tests now to check (how come we havent seen it in PRs testing, will check)

0xc0170 on 6 May 2020

how come we havent seen it in PRs testing

Maybe you have SKIP_TIME_DRIFT_TESTS macro in CI....

jeromecoutant on 6 May 2020

I dont see them in nightly results. @jeromecoutant can you add details how to reproduce? It would be great to have easy to reproduce test case.

0xc0170 on 6 May 2020

I see, that could be it but would expect these time drift to run at least once in a while :/

@kjbracey-arm can you reproduce TESTS/mbed_drivers/lp_timeout/main.cpp test locally ?

0xc0170 on 6 May 2020

Only have a K64F here - I'll look. @jeromecoutant - do you have a list of platforms you have/haven't seen fails on? Any patterns, eg TICKLESS?

kjbracey-arm on 6 May 2020

@jeromecoutant - do you have a list of platforms you have/haven't seen fails on? Any patterns, eg TICKLESS?

Easy! All platforms, all tool chains

jeromecoutant on 6 May 2020

You mean all ST platforms, I assume? Not tested any others?

kjbracey-arm on 6 May 2020

(If it turns out CI has not been testing timing stuff, I'm going to be a bit grumpy)

kjbracey-arm on 6 May 2020

😄2 👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

AnalogOut Works in Online Compiler, But Not When Using MBED CLI

davidantaki · 3Comments

[NRF51_DK] us_ticker.c doesn't support IAR build (rc1 oob)

toyowata · 4Comments

EEP/I2C WR 2 bytes ci-test FAIL on some STM32 platforms

bcostm · 4Comments

get_tick assert fails

drahnr · 4Comments

Condition variable

DuyTrandeLion · 3Comments