Nightly CI test fails on DISCO-L475VG-IOT01A on lp_timeout test only when compiled on IAR.
It started to fail after this commit: 9858b161d6199de25139914fea0cd5b41fa357a8
[1569184789.85][CONN][RXD] >>> Running case #1: 'Callback called once (attach)'...
[1569184789.85][CONN][INF] found KV pair in stream: {{__testcase_name;1 s delay during deepsleep (attach_us)}}, queued...
mbedgt: :85::FAIL: Expected TRUE Was FALSE
[1569184789.95][CONN][RXD] :85::FAIL: Expected TRUE Was FALSE
[1569184789.95][CONN][INF] found KV pair in stream: {{__testcase_start;Callback called once (attach)}}, queued...
[1569184790.06][CONN][INF] found KV pair in stream: {{__testcase_finish;Callback called once (attach);0;1}}, queued...
[1569184790.16][CONN][RXD] >>> 'Callback called once (attach)': 0 passed, 1 failed with reason 'Assertion Failed'
@maciejbocianski @jamesbeyond @LMESTM
[ ] Question
[ ] Enhancement
[x] Bug
cc @ARMmbed/team-st-mcd
@Tharazi97 since the commit 9858b16 the target will automatically enter Deep sleep when there is nothing to do. I don't think that the lp_timeout test cases takes this into account.
When deep sleep is entered, it will induce latencies of several ms which are not considered in the test cases. By adding some more margin to the tests cases, I can run the tests okay. An alternative way forward could be that deep sleep needs to be disabled.
Snapshot of the quick diffs I made to have the test pass OK:
$ git diff TESTS/mbed_drivers/timeout/timeout_tests.h
diff --git a/TESTS/mbed_drivers/timeout/timeout_tests.h b/TESTS/mbed_drivers/timeout/timeout_tests.h
index 534901c1fb..e676b33fb4 100644
--- a/TESTS/mbed_drivers/timeout/timeout_tests.h
+++ b/TESTS/mbed_drivers/timeout/timeout_tests.h
@@ -81,10 +81,10 @@ void test_single_call(void)
bool acquired = sem.try_acquire();
TEST_ASSERT_FALSE(acquired);
- acquired = sem.try_acquire_for(TEST_DELAY_MS + 2);
+ acquired = sem.try_acquire_for(TEST_DELAY_MS + 5);
TEST_ASSERT_TRUE(acquired);
- acquired = sem.try_acquire_for(TEST_DELAY_MS + 2);
+ acquired = sem.try_acquire_for(TEST_DELAY_MS + 5);
TEST_ASSERT_FALSE(acquired);
timeout.detach();
@@ -147,7 +147,7 @@ void test_override(void)
TEST_ASSERT_FALSE(acquired);
timeout.attach_callback(mbed::callback(sem_callback, &sem2), 2.0f * TEST_DELAY_US);
- acquired = sem2.try_acquire_for(2 * TEST_DELAY_MS + 2);
+ acquired = sem2.try_acquire_for(2 * TEST_DELAY_MS + 6);
TEST_ASSERT_TRUE(acquired);
acquired = sem1.try_acquire();
TEST_ASSERT_FALSE(acquired);
Team in charge of tests could propose the best way forward ...
3 ms delay is quite a big change in case of 10 ms delay. I think disabling deep sleep is a good way to go.
@Tharazi97
The deep sleep latency is currently estimated to 3ms.
The System Timer takes this into account and program wake-up time in advance to ensure the precision of the system.
Now I don't think that the Low Power ticker or timeout drivers have implemented a similar safety mechanism, which mean that they will not cope with the deep sleep latency and risk of being unprecise ... but maybe I'm worng here, maybe @kjbracey-arm can correct me if I'm wrong ....
Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1804
@LMESTM what's the deep sleep wake up time for this target (I recall some targets have it defined as 1-3ms, we set this to 10ms to be sufficient for a target to wake up and app should not expect anything lower than that).
We are testing lp ticker in this test case, should it test both - deep sleep and also shallow sleep? If deep sleep involved, timeout (default is 10 + 2 = 12ms - might not be enough for some). I can't find the time reference in the power management header file at the moment (https://os.mbed.com/docs/mbed-os/v5.13/apis/power-management-sleep.html - does not provide this detail).
@0xc0170 I think that the exit time is about 3ms as defined in "deep-sleep-latency", that was the value last time I made measurements. Neverthless I need to increase the test delay to TEST_DELAY_MS + 5 be able to PASS the test. TEST_DELAY_MS + 4 wouldn't work.
I don't know how the semaphore timeouts are managed ...
The HAL specification for sleep permits wakeup time from deep sleep to be up to 10ms.
So if testing the lpticker API, if the system is being allowed to enter deep sleep, then extra 10ms delays on the timer wakeup are to be expected.
If you're only intending to test the lpticker itself, and its IRQ generation accuracy, and you do not want the potential added delay of deep sleep wake interfering with that, then the test could lock deep sleep. That seems the thing to do here.
A broader test might try checking whether the delay exceeds the specified target.deep_sleep_latency. Although that would be also revealed by a test of rtos::ThisThread::sleep_for or similar - that should enter deep sleep but compensate for the latency so that you get woken at the correct time. (If the target has specified a latency - default is 0, meaning no compensation).
I don't know how the semaphore timeouts are managed ...
I can see in that test_single_call there is going to be some juggling of the HAL timer programming. The manual LowPowerTimeout is going to be setting a timer for start + 10ms, and the semaphore wait is going to be wanting start + 12ms - latency.
If latency > 2, that means a couple of back-to-back programmings happening on the way into the sleep for the semaphore wait.
Still, when the timers go off, the semaphore acquire should not return until either the sem_callback has been run (meaning the HAL timer indicates time > start + 10ms) or the RTOS thinks time is up (meaning the HAL timer indicates time > start + 12ms).
It's a struggle to see how you could get !acquired on the semaphore read. If the OS thinks the 12ms is up, surely the sem_callback must have run?
I've walked through the code, and think I can see a failure mode.
Consider the sequence:
sem_callback and RTX timeout seem simultaneoussem_callback runs, releases semaphoreisrRtxSemaphoreRelease)svcKernelResume)osRtxThreadDelayTick)So there's an effective race if due to wake latency issues, we see the 10ms IRQ and the 12ms wake simultaneously; the wake makes the semaphore return timeout, despite there being a semaphore available - the semaphore "release" is not fully processed until the next time we return to a thread, and at that point we're already returning "timeout".
I'm thinking that could potentially be called an RTX bug, although I'm not sure exactly how it might be resolved.
However, why does that situation arise here? What's the declared latency? How long might we be getting stuck inside timer reprogramming?
I guess that failure I describe above happens reliably if declared latency is 0, so no RTOS compensation, but the hardware actually takes >2ms to wake, so we program wake for "start + 10" (for the LowPowerTicker), and we only actually wake at "start + 13".
So if testing the
lptickerAPI, if the system is being allowed to enter deep sleep, then extra 10ms delays on the timer wakeup are to be expected.
That might be argued don't you think ? lpticker driver could be aware of deep-sleep latency and compensate for it, it could even lock deep-sleep in case the next ticker in the pipe is too close ... ?
or maybe not ... I'm not sure whether this makes sense or not - this is really an open question.
Yes, you could certainly put a pile of intelligence into the HAL driver, but I think it would ultimately make the middle layer's job more confusing. Even before considering variance in HAL implementations.
I would be open to maybe moving latency compensation down one layer into the mbed_ticker_api.c layer, which would make LowPowerTicker get it.
In this test though, there's apparently something weird happening with the semaphores and RTX, together with an apparent failure of the RTOS latency compensation - if that was working we should be out of deep sleep before the 12ms is up, so that the race I think I've deduced shouldn't occur.
Yes, you could certainly put a pile of intelligence into the HAL driver, but I think it would ultimately make the middle layer's job more confusing. Even before considering variance in HAL implementations.
I thought the variance would be managed through deep-sleep-latency
I would be open to maybe moving latency compensation down one layer into the
mbed_ticker_api.clayer, which would makeLowPowerTickerget it.
That's I guess the idea here ... If we're not going that way, the impact of deep sleep latency needs to be described in the LP Timer/Ticker/timeout APIs as I think it's not enough to document the power management / sleep part. As of now, users may program a LP timeout to fire after 150碌s and will end up with a 3ms delay or so ...
In this test though, there's apparently something weird happening with the semaphores and RTX, together with an apparent failure of the RTOS latency compensation - if that was working we should be out of deep sleep before the 12ms is up, so that the race I think I've deduced shouldn't occur.
I agree there seems to be something to figure out. I fully trust your analysis as I'm not familiar with RTX :-)
the impact of deep sleep latency needs to be described in the LP Timer/Ticker/timeout APIs as I think it's not enough to document the power management / sleep part. As of now, users may program a LP timeout to fire after 150碌s and will end up with a 3ms delay or so ...
I agree - the 3ms delay actually impacts many IRQs you could register with attach() (the ones that don't implicitly lock deep sleep) but the discrepancy is kind of more obvious with LowPowerTimeout.
I agree there seems to be something to figure out. I fully trust your analysis as I'm not familiar with RTX
What I'd be interested in from your end is why is the latency compensation not sufficient here? If it was, I think we should handle those semaphores fine. Is the JSON underestimating? Or is there more to it?
Aside from that, you could experiment - try making the RTX modification I'm proposing in the linked issue. Just add a call to osRtxPendSV_Handler() to the top of svcRtxKernelResume, just before incrementing the tick count.
@kjbracey-arm ok I take the point, I need to put in place a setup with debug / measurements ... but can't do it this week unfortunately.
@LMESTM any progress on this?
@TuomoHautamaki @kjbracey-arm I'm affraid I could not make any measurement yet (and can't in coming days). I agree this can be helpful but I don't think it would not bring the solution to this issue.
I understand that @kjbracey-arm pointed to possible issues around RTX or semaphores and proposed a few steps to test in https://github.com/ARMmbed/mbed-os/issues/11545#issuecomment-534528736, could someone from mbed-os team look at that ?
@TuomoHautamaki I just tried reproducing the issue again on my current working tree, master branch from Oct 23, and the test doesn't fail - are you still seeing the issue ?
@LMESTM I can see it's still failing but only on IAR
@TuomoHautamaki my mistake I ran timeout tests instead of lp_timeout tests ... I can still reproduce as well
@kjbracey-arm @TuomoHautamaki I found some time to make measurements and fount out that the total wake-up latency is actually around 3,7ms, this time is used on this board for restoring the complete clock tree: LSE, MSI clocks and PLLs. This may be possible to optimize, but for sake of simplicity and maintenance we go through all the clock settings scheme again.
I can increase the deep-sleep latency for this board to 4ms, which actually makes the tests PASS as far as I've seen. That doesn't explain everything, the test makes use of 2ms DELTA so it should be okay with the 1ms extra latency, but this is not the case.
Nevertheless the low power ticker fires 2ms later than expected and the below point is still valid :
the impact of deep sleep latency needs to be described in the LP Timer/Ticker/timeout APIs as I think it's not enough to document the power management / sleep part. As of now, users may program a LP timeout to fire after 150碌s and will end up with a 3ms delay or so ...
I agree - the 3ms delay actually impacts many IRQs you could register with attach() (the ones that don't implicitly lock deep sleep) but the discrepancy is kind of more obvious with LowPowerTimeout.
If that 3.7ms is a hardware measurement, maybe it doesn't include extra overhead for reading the timer, etc. (Embedded in SysTimer::update_and_get_tick()).
It might be worth (temporarily) adding a test assert in do_timed_sleep_relative that sleep_finish - sleep_start <= wake_delay. I think that if that ever triggered, it would be because the latency compensation is insufficient.
If that 3.7ms is a hardware measurement, maybe it doesn't include extra overhead for reading the timer, etc. (Embedded in
SysTimer::update_and_get_tick()).
The 3,7ms is the time it takes to mostly restore clocks, so this is the SW sequence of the deep sleep exit, from the HW interrupt (end of WFI) to the point where the interrupt handler is called. I'm not sure about your next statement.
Anyway, I still think that something is wrong with the LP timeout and the test.
The test is supposed to verify that the lp_timeout handler when programmed to fire after 10ms will be called within 10 to 12ms max. But in case of deep sleep, the handler will only be called after 13ms to 14ms as the timeout driver does not consider the deep sleep latency. And the test fails to detect it.
The only reason why the test is passed is that the system is woken-up early by the semaphore itself (12ms max - deep sleep latency) before the 10ms has expired. But the semaphore is supposed to be used as a test environment, not as the reason for waking-up the system.
Closed via #11767
You're right, the test is flawed, because it doesn't at all consider latency on the low power timer IRQ.
My description above was that when considering application timing correctness for things scheduled by the RTOS (eg ThisThread::sleep_until(XXX)), we need to also incorporate the time taken to get back to the application, not just the time taken to get out to interrupt handler after wake.
So the extra work versus your measurement is:
SysTimer::handler) and returnSysTimer::wake_time_passed, which looks at bool set by interrupt handler),osKernelResume(elapsed time)That latency number is currently used for RTOS scheduling, so notionally should incorporate those steps.
I note that the slowest step there is "read actual current time from HW". Conceptually, we could skip it if we see that wake_time_passed is true - assume that we did wake in time, so the time must be exactly the scheduled wake time. (As opposed to an early wake from another interrupt). Being "optimistic" about wake delays and assuming we're never late might make part of the problem go away?
@kjbracey-arm
Most of the wake-up tme is spent wiaiting for HW clocks (in particular PLLs) to be ready and stable
I've measured a few timings on NUCLEO_L073RZ which a pretty slow target, here they are:
So that's up to extra 0,3ms on a slow target for the full handling back to the main thread.
But the callback that was attached to fire after a given time seems only to be delayed by an extra 0.226碌s in this case compared to the "HW latency".
How shall we proceed with the test flaw and the fact that lp_ticker and related drivers do not take into account the deep-sleep-latency ?
FYI - what is looks like on NUCLEO_L073RZ when there is no semaphore to wake-up the target. (To do so, I extended the semaphore timeout to 20ms in the test ...)

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2271
Most helpful comment
The HAL specification for sleep permits wakeup time from deep sleep to be up to 10ms.
So if testing the
lptickerAPI, if the system is being allowed to enter deep sleep, then extra 10ms delays on the timer wakeup are to be expected.If you're only intending to test the
lptickeritself, and its IRQ generation accuracy, and you do not want the potential added delay of deep sleep wake interfering with that, then the test could lock deep sleep. That seems the thing to do here.A broader test might try checking whether the delay exceeds the specified
target.deep_sleep_latency. Although that would be also revealed by a test ofrtos::ThisThread::sleep_foror similar - that should enter deep sleep but compensate for the latency so that you get woken at the correct time. (If the target has specified a latency - default is 0, meaning no compensation).