Mbed-os: Lp_ticker problem

Created on 29 Aug 2019  路  7Comments  路  Source: ARMmbed/mbed-os

Description

DISCO_L475VG_IOT01A failed mbed_drivers-lp_ticker on nightly CI so I have looked into its lp_ticker implementation.
https://github.com/ARMmbed/mbed-os/blob/8ef742a49c1682f9ef3ba50148b871e38c3866cc/targets/TARGET_STM/lp_ticker.c#L204-L319

It is not protected for the situation where lp_cmpok is set to false, so the lp_delayed_prog is set and the timestamp saved in lp_delayed_counter will be sent to lp_ticker_set_interrupt function via LPTIM1_IRQHandler. This timestamp could get out of date while waiting for CMPOK flag. (I mean the value of timestamp would be less than a value got by lp_ticker_read()) In that situation lp_ticker_set_interrupt function would see this timestamp as an event that has to be handled after turning the clock counter and will set next interrupt to 0xFFFF rather than handling the event right then, or setting an interrupt for after LP_TIMER_SAFE_GUARD time. This could lead to some critical errors.

This implementation is also vulnerable to timestamps close to ticker roll over.

To reproduce I've changed the code of test_multi_ticker to this:

void test_multi_ticker(void)
{
    for(uint64_t k = 0; k<10000; k++)
    {
        ThisThread::sleep_for(750);

        LowPowerTicker ticker[TICKER_COUNT];
        const uint32_t extra_wait = 2; // extra 2ms wait time

        multi_counter = 0;

        for (int i = 0; i < TICKER_COUNT; i++) {
            ticker[i].attach_us(callback(increment_multi_counter), MULTI_TICKER_TIME_MS * 1000);
        }

        ThisThread::sleep_for(MULTI_TICKER_TIME_MS + extra_wait);

        TEST_ASSERT_EQUAL(TICKER_COUNT, multi_counter);

        for (int i = 0; i < TICKER_COUNT; i++) {
            ticker[i].detach();
        }
    }
}

and changed MULTI_TICKER_TIME_MS to 10 to reduce testing time. It should fail like 1 in 1000 times.

I'll probably raise a PR with a proposition of fix soon.

We should consider creating something like "golden stress tests" for crusial low level peripherals which would be run on targets from golden list over night or even manualy after changing implementation.

@maciejbocianski @mprse @fkjagodzinski @jamesbeyond @LMESTM

Issue request type


[ ] Question
[ ] Enhancement
[x] Bug

CLOSED mirrored bug

Most helpful comment

The problem is when we have many tickers set to close values. In the test scenario, we have 16 tickers which are set in the loop to fire at time NOW + 100ms. The common ticker layer operates on us and low-level driver on lp ticker ticks (lp ticker operates at 32 kHz). To simplify the problem I will focus on lp ticker ticks.
Lest assume that we have set 5 tickers to fire at NOW + 100 ticks. The ordered lp ticker event list may look like this (note that time is passing while further ticker events are being added to the list):

Ticker       Lp ticker count [tick]        Scheduled interrupt [tick]
1              0                                      100
2              0                                      100
3              1                                      101
4              1                                      101
5              2                                      102

Let's assume that 100 ticks have elapsed and Ticker 1 interrupt fired. The ticker common layer interrupt handler will go through the list and serve all expired events from the list. This means that the first two events are served and the next interrupt is scheduled to tick 101 (note that at this moment current tick is 100).
The low-level ticker wrapper on STM platforms should add 5 ticks (because next interrupt is too close - hardware limitation), so the interrupt will be scheduled to 106th tick.
But while setting the interrupt the Lp ticker counter increased to 101 and this was caught by the common layer, so interrupt has been fired immediately (by means of fire_interrupt() call). The next 2 tickers were served (3 and 4) by the interrupt handler and the next one is now scheduled to 102nd tick (current tick is 101).
Now we have already set the interrupt at 100th tick, so at 101 tick we are waiting for CMPOK and the 102 value to be set is buffered (note that last scheduled tick is now 106). Lest assume that CMPOK interrupt fires 3 ticks after writing to CMP reg. This means we have CMPOK interrupt at 103 tick and we want to set next interrupt to 102 (in the past). So the low-level ticker wrapper will write 0xFFFF to the compare reg and we're done.

All 7 comments

It would be good to post here debug failure logs with the description what has happened.

This is the data collected from lp_ticker_set_interrupt function. timestamp is the value sent to it, current tick is current tick measured by lp_ticker_read(), and third column is the value sent by __HAL_LPTIM_COMPARE_SET or stored in lp_delayed_counter

[1567155957.69][CONN][RXD] timestamp[0]:45062    current_tick:44744   IRQ_set_for_tick:45062
[1567155957.75][CONN][RXD] timestamp[1]:45065    current_tick:45064   IRQ_set_for_tick:45070
[1567155957.81][CONN][RXD] timestamp[2]:45066    current_tick:45065   timestamp_set_in_buffor:45066
[1567155957.88][CONN][RXD] timestamp[3]:45066    current_tick:45067   IRQ_set_for_tick:65535
[1567155957.94][CONN][RXD] timestamp[4]:498      current_tick:272     IRQ_set_for_tick:498
[1567155957.99][CONN][RXD] timestamp[5]:295      current_tick:273     timestamp_set_in_buffor:295

In row [0] at tick 44744 next interrupt is set to tick 45062 :white_check_mark:
In row [1] at tick 45064 next interrupt is set to tick 45070 because timestamp (45065) is too close to current tick :white_check_mark:
in row [2] at tick 45065 it wants to set next interrupt to 45066, but lp_cmpok is currenty set to false so it stores the value in buffor and sets lp_delayed_prog flag to true, so it can be set later :white_check_mark:
in row [3] at tick 45067 it received timestamp 45066 via LPTIM1_IRQHandler so it read it as time in past and set the next interrupt to the end of ticker counter (0xFFFF) :x:

The timestamp got outdated while waiting for lp_cmpok and was not handled correctly.

cc @ARMmbed/team-st-mcd

hi - the description is clear and I acknowledge that the problem is probably there.
Thanks for the detailed analysis ...
@Tharazi97 you mentioned a possible upcoming PR - do you confirm ?
ready for review

@LMESTM I confirm. I'll test my solution on other targets and after that I will rise a PR

The problem is when we have many tickers set to close values. In the test scenario, we have 16 tickers which are set in the loop to fire at time NOW + 100ms. The common ticker layer operates on us and low-level driver on lp ticker ticks (lp ticker operates at 32 kHz). To simplify the problem I will focus on lp ticker ticks.
Lest assume that we have set 5 tickers to fire at NOW + 100 ticks. The ordered lp ticker event list may look like this (note that time is passing while further ticker events are being added to the list):

Ticker       Lp ticker count [tick]        Scheduled interrupt [tick]
1              0                                      100
2              0                                      100
3              1                                      101
4              1                                      101
5              2                                      102

Let's assume that 100 ticks have elapsed and Ticker 1 interrupt fired. The ticker common layer interrupt handler will go through the list and serve all expired events from the list. This means that the first two events are served and the next interrupt is scheduled to tick 101 (note that at this moment current tick is 100).
The low-level ticker wrapper on STM platforms should add 5 ticks (because next interrupt is too close - hardware limitation), so the interrupt will be scheduled to 106th tick.
But while setting the interrupt the Lp ticker counter increased to 101 and this was caught by the common layer, so interrupt has been fired immediately (by means of fire_interrupt() call). The next 2 tickers were served (3 and 4) by the interrupt handler and the next one is now scheduled to 102nd tick (current tick is 101).
Now we have already set the interrupt at 100th tick, so at 101 tick we are waiting for CMPOK and the 102 value to be set is buffered (note that last scheduled tick is now 106). Lest assume that CMPOK interrupt fires 3 ticks after writing to CMP reg. This means we have CMPOK interrupt at 103 tick and we want to set next interrupt to 102 (in the past). So the low-level ticker wrapper will write 0xFFFF to the compare reg and we're done.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ashok-rao picture ashok-rao  路  4Comments

ccchang12 picture ccchang12  路  4Comments

chrissnow picture chrissnow  路  4Comments

MarceloSalazar picture MarceloSalazar  路  3Comments

DuyTrandeLion picture DuyTrandeLion  路  3Comments