Mbed-os: STM32L0xx: (tickless using lp_ticker) Becomes unresponsive after indeterminate time

Created on 22 Mar 2018  路  26Comments  路  Source: ARMmbed/mbed-os

Description

  • Type: Bug
  • Priority: Major

Bug

Target
STM32L072, possibly more

mbed-os sha:
5.7.7 16bac101a6b7b4724023dcf86ece1548e3a23cbf

Issue
We are experiencing some problems where the device becomes unresponsive after an indeterminate time. This can be anywhere between several minutes and a couple of hours.

The configuration we are using is the following:

// Configuration parameters
#define MBED_CONF_APP_THREAD_STACK_SIZE                   1024        
#define MBED_CONF_EVENTS_PRESENT                          1           
#define MBED_CONF_TARGET_LOWPOWERTIMER_LPTIM              0           
#define MBED_CONF_APP_MAIN_STACK_SIZE                     2048        
#define MBED_CONF_PLATFORM_STDIO_BAUD_RATE                115200      
#define CLOCK_SOURCE                                      USE_PLL_HSI 
#define MBED_CONF_EVENTS_SHARED_HIGHPRIO_STACKSIZE        1024        
#define MBED_CONF_PLATFORM_DEFAULT_SERIAL_BAUD_RATE       9600        
#define MBED_CONF_RTOS_PRESENT                            1           
#define MBED_CONF_EVENTS_SHARED_DISPATCH_FROM_APPLICATION 1           
#define MBED_CONF_EVENTS_SHARED_EVENTSIZE                 1024        
#define MBED_CONF_EVENTS_SHARED_STACKSIZE                 2048        
#define MBED_CONF_EVENTS_USE_LOWPOWER_TIMER_TICKER        1           
#define MBED_CONF_TARGET_LSE_AVAILABLE                    1           
#define MBED_CONF_EVENTS_SHARED_HIGHPRIO_EVENTSIZE        512         

// Macros
#define MBED_TICKLESS                                                 
#define NO_SYSTICK                                                    

After some testing using some gpio pins I have found out that the processor hangs in a continuous loop of the following functions:

rtc_irq
    lp_ticker_irq_handler()
        irq_handler() default -> ticker_irq_handler
            ticker_irq_handler()
                first loop
                    Set systick irq
                second loop
                    schedule_interrupt()
                        rtc_set_wake_up_timer()
exit_irq

systick_irq
    SysTick_Handler()
        osRtxTick_Handler()
            OS_Tick_AcknowledgeIRQ()
                os_timer->schedule_tick() with default delta = 1
                    insert_absolute()
                        ticker_insert_event_us()
                            update_present_time()
                            schedule_interrupt()
                                fire_interrupt()
exit_irq

The rtc_irq sets the systick_irq request and vice versa.

When the processor hangs it seems the in the os_timer->_tick is out of date. Whereas normally it is in sync with the lp_data->queue->present_time. This causes the interrupt to be scheduled in the systick irq handler to fire immediately because the event is requested in the past. The synchronization should happen in the critical section of the default_idle_hook but this seems to have stopped for some reason.

I will continue to investigate the problem but there is an issue here.

closed_in_jira st mirrored bug

Most helpful comment

We have been running 5.9.3 which includes #7365 for a while and I haven't encountered this problem since so this can be closed.

All 26 comments

A potential cause might be a ticker value that is smaller than the previous:

ticker_event_queue_t *queue = ticker->queue;
uint32_t ticker_time = ticker->interface->read();
if (ticker_time == ticker->queue->tick_last_read) {
    // No work to do
    return;
}

uint64_t elapsed_ticks = (ticker_time - queue->tick_last_read) & queue->bitmask;
queue->tick_last_read = ticker_time;

If ticker_time reads smaller than that what is stored in tick_last_read for whatever reason the result is a small negative number cast to a uint becoming a very large number which is then added to the present_time.

Edit:
After some more testing it does occur that the ticker_time is smaller than the tick_last_read. It seems the ticker_time is exactly 1s - 1us earlier than the tick_last_read. This might be an issue with the rtc synchronization.

In the datasheet it says we must wait for the RSF flag to be set to verify that the shadow registers have been set.

Edit2:
My previous edit seems to be false. It sometimes is 1s - 1us earlier but it can also be less than 1s off.

I now run into a breakpoint if tick_last_read is bigger than ticker_time. If i then compare the timestructs I have saved after reading out the rtc twice they are equal.

@c1728p9 Can you review please?

cc @ARMmbed/team-st-mcd

I have now bypassed the shadow registers and followed the instructions in the datasheet (reading twice and verifying the data) and this seems to solve the issues I am having. I will let this run for the weekend to see if it is indeed a suitable solution.

After a weekend of running two devices: one is still running without problems, one is stuck in the same problem as we had before.

Most likely it can still happen that the read time is earlier than the stored previous time. Might be an implementation issue on my side. I will see if I can improve on that.

Otherwise it might be an idea to make the update_present_time function a bit more robust to prevent this issue also.

@marcemmers thanks for sharing your inputs and findings. One question : why are you changing the default STM32L072 configuration like this;

define MBED_CONF_TARGET_LOWPOWERTIMER_LPTIM 0

Is there a specific reason for that ?

@LMESTM Yes there is, we are using the LPTIM as a PWM output for driving some leds during sleep. Hence we couldn't use it also as base clock for the OS.

@marcemmers that's interesting input - thanks. And what low power modes are you entering ? Would like to understand if you're using the LP TICKER as a very periodic wake-up source (which period ?) or more like an RTC alarm for waking-up after a rather long sleep period

@LMESTM We are entering stop mode via the mbed sleep api.

In the configuration we are using the operating system is using the wakeup event from the RTC to go into sleep until the next event needs to be activated. I am not actively controlling the RTC for this. The rtc date/time is used as the base time from which the us time is calculated.

I did not have to change anything except using the previous configuration.

@marcemmers
would you try out the commit here and let me know if this helps:
LMESTM/mbed@bbc01b8

@LMESTM I am sorry, I missed your last comment. I will try this in a few days. We have been running using bypassed shadow registers for a while now without random crashes.

There is however another problem that has come up by using the RTC as the main timer. The RTC function rtc_read_us() returns a uint32_t. A uint32_t can only contain 4294 seconds in us accuracy. This causes the result to overflow several times each day which is fine.

However, at the end of the day this does become a problem. 23:59:59 translates to 499154100us or 0x01DC07CB4 where the next second will result in 0us. This causes the same issues in the mbed_ticker_api layer as discussed before in:

ticker_event_queue_t *queue = ticker->queue;
uint32_t ticker_time = ticker->interface->read();
if (ticker_time == ticker->queue->tick_last_read) {
    // No work to do
    return;
}

uint64_t elapsed_ticks = (ticker_time - queue->tick_last_read) & queue->bitmask;
queue->tick_last_read = ticker_time;

The way I see it there are 2 possible fixes for this issue:

  • Also use the date struct to calculate the time in the rtc_read_us function.
  • Keep an internal timer in the rtc_api file that can be incremented with how much time has passed. This also fixes the issue that the RTC time cannot be set while using the rtc_read_us as a timebase because setting the time will mess up the readout.

@marcemmers note that I've pushed a PR to master which includes similar fix as I pointer in previous comment. You'd rather rebase on master to get the latest fix. Anyway from your analysis, this looks like a different matter than the one I fixed

@LMESTM Yeah I figured. I will make a PR for the fix of the shadow registers, we can then discuss if the fix i have made is acceptable.

Should i make an other issue for the RTC overflow problem or should we continue here?
@0xc0170 ?

Should i make an other issue for the RTC overflow problem or should we continue here?
@0xc0170 ?

This issue is still present on https://github.com/ARMmbed/mbed-os/tree/feature-hal-spec-rtc ?

cc @c1728p9 you might be interested in the above report

This is already based on the changes in commit 9fb865ae576f47aa924e445a800c1113f623a435 that is present in the feature branch and already merged to master.

There are no other changes to the rtc_read_us part so it seems the issue would still be present.

Edit:
A cleaner solution might be to calculate the time using _rtc_maketime() and then multiply it to create a us time in the rtc_read_us function.

Hi
Interested in the results with the latest implementation.
Thx

I am testing the newest feature-hal-spec-ticker now with #6788. I also had to undo the changes made in #6561 because my serial didn't work anymore.

what is the solution to this problem

I am testing the newest feature-hal-spec-ticker now

Any good news ? :-)

@jeromecoutant It seems to be running stable however I am still a bit wary about the RTC readout problems I have experienced before that don't seem to be fixed in the ticker branch.

ARM Internal Ref: MBOTRIAGE-209

@marcemmers Can you update this with the latest release - 5.9 ? Ticker and also RTC were updated.

Let's check with #7365
Thx

I will see if I can test this in the coming days. Quite busy at the moment.

@marcemmers bump :-)

Thx

We have been running 5.9.3 which includes #7365 for a while and I haven't encountered this problem since so this can be closed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bulislaw picture bulislaw  路  3Comments

DuyTrandeLion picture DuyTrandeLion  路  3Comments

davidantaki picture davidantaki  路  3Comments

toyowata picture toyowata  路  4Comments

drahnr picture drahnr  路  4Comments