Priority: Blocker
Compiling device management client example for Nucleo F429ZI with debug profile and GCC_ARM compiler will reliably result in crash.
++ MbedOS Fault Handler ++
FaultType: HardFault
Context:
R0 : 00000000
R1 : 00000001
R2 : E000ED00
R3 : 08084F5F
R4 : 00000000
R5 : 00000000
R6 : 00000000
R7 : 00000000
R8 : 00000000
R9 : 00000000
R10 : 00000000
R11 : 00000000
R12 : 00000000
SP : 20012F68
LR : 0808D2DF
PC : 0808D2E0
xPSR : 61000000
PSP : 20012F48
MSP : 2002FFC0
CPUID: 410FC241
HFSR : 40000000
MMFSR: 00000000
BFSR : 00000000
UFSR : 00000008
DFSR : 00000000
AFSR : 00000000
Mode : Thread
Priv : Privileged
Stack: PSP
-- MbedOS Fault Handler --
++ MbedOS Error Info ++
Error Status: 0x80FF013D Code: 317 Module: 255
Error Message: Fault exception
Location: 0x808D2E0
Error Value: 0x200001B0
Current Thread: rtx_idle Id: 0x200129C8 Entry: 0x8085E7D StackSize: 0x280 StackMem: 0x20012D10 SP: 0x20012F68
For more info, visit: https://mbed.com/s/error?error=0x80FF013D&tgt=NUCLEO_F429ZI
-- MbedOS Error Info --
Two workarounds allow the target to bootup correctly:
Disabling sleep in hal_code:
diff --git a/targets/TARGET_STM/TARGET_STM32F4/device/stm32f4xx_hal_pwr.c b/targets/TARGET_STM/TARGET_STM32F4/device/stm32f4xx_hal_pwr.c
index dffb78e25c..e699fd9fff 100644
--- a/targets/TARGET_STM/TARGET_STM32F4/device/stm32f4xx_hal_pwr.c
+++ b/targets/TARGET_STM/TARGET_STM32F4/device/stm32f4xx_hal_pwr.c
@@ -391,7 +391,7 @@ void HAL_PWR_EnterSLEEPMode(uint32_t Regulator, uint8_t SLEEPEntry)
if(SLEEPEntry == PWR_SLEEPENTRY_WFI)
{
/* Request Wait For Interrupt */
- __WFI();
+ __NOP();
}
else
{
or changing the flash cache at application start in beginning of main.
FLASH->ACR = 0x405;
FLASH->ACR = 0xC05;
or
FLASH->ACR = 0x405;
FLASH->ACR = 0x705;
Nucleo F429ZI
tested with both LWIP and WISUN configurations.
gcc-arm-none-eabi-9-2019-q4-major
This _does not_ reproduce with ARMC6 compiler.
Mbed OS 5.15.0
Mbed CLI 1.10.2
mbed import https://github.com/armmbed/mbed-cloud-client-example (4.2.1 version with Mbed OS 5.15.0).
mbed compile -m NUCLEO_F429ZI --profile debug
Crashes immediately on application start.
Also originally verified with internal test-tool which failed the same way.
@evedon @bulislaw can we get someone to help investigate this? Teemu has said this is a blocker for client team...
I've been working with Teemu - I think we need @ARMmbed/team-st-mcd input.
Symptoms are consistent with the flash cache being corrupted during sleep. When it occurs and doesn't occur isn't clear, but we're currently thinking certain alignments are probably a factor.
In 2 failing cases we've failed with PC = 0x......80
and 0x......E0
- same 32n+0 alignment - where the code looks like
xxxxxx7A BL HAL_PWR_EnterSLEEPMode
xxxxxx7E B backwards branch <-- apparently didn't take this branch
xxxxxx80 DCD not_an_instruction <-- crashes trying to execute this
HAL_PWR_EnterSLEEPMode finishes by executing
WFI
BX LR
I would assume the BX LR
is in the CPU's pipeline, so isn't fetched from the I-bus after wake-up, but the code at 7E/DE after returning would be the first thing fetched.
@kjbracey-arm So it seems playing around with the FLASH->ACR values help to workaround the issue. Also alternatively adding enough NOP() before the __WFI() seems to help also, so this can be alignment issue as you suggested.
Crashing: 80568ea: bf30 wfi (default application)
Working: 8056902: bf30 wfi (disabling flash cache)
Working: 8056902: bf30 wfi (disable and enable)
Working: 8056900: bf30 wfi (adding 11 __NOP() before __WFI() ).
Internal Jira reference: https://jira.arm.com/browse/MBOTRIAGE-2515
playing around with the FLASH->ACR values help to workaround the issue
Do you have evidence of that? Can you make it work at the ea
alignment with a different ACR value? Those results just show it always working at 00 or 02.
And it seems the address of the BL may not be relevant, only the WFI?
@ARMmbed/team-st-mcd can you please review and comment on this issue asap?
Have you seen this in other cases or know where the problem might be coming from?
The current workaround is to disable sleep as commented by @teetak01 .
Maybe issue could be raised in https://github.com/STMicroelectronics/STM32CubeF4
I come across this thread when I facing a similar issue. I am using Mbed OS 5.15.0 with IAR EWB 8.42.1. It doesn't always crash, but when it does, it cannot be recovered easily. Previously, I just erase the whole flash. Now, @teetak01 's method works great for me.
A typical error message is:
01-29 13:26:14 UART-RX DEBUG logStr=FaultType: HardFault
01-29 13:26:14 UART-RX DEBUG logStr=
01-29 13:26:14 UART-RX DEBUG logStr=Context:
01-29 13:26:14 UART-RX DEBUG logStr=R0 : E000ED10
01-29 13:26:14 UART-RX DEBUG logStr=R1 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R2 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R3 : 00000001
01-29 13:26:14 UART-RX DEBUG logStr=R4 : 200004F0
01-29 13:26:14 UART-RX DEBUG logStr=R5 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R6 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R7 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R8 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R9 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R10 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R11 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=R12 : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=SP : 2000CD64
01-29 13:26:14 UART-RX DEBUG logStr=LR : 0811ACEF
01-29 13:26:14 UART-RX DEBUG logStr=PC : 0811ACF6
01-29 13:26:14 UART-RX DEBUG logStr=xPSR : 61000200
01-29 13:26:14 UART-RX DEBUG logStr=PSP : 2000CD40
01-29 13:26:14 UART-RX DEBUG logStr=MSP : 20011050
01-29 13:26:14 UART-RX DEBUG logStr=CPUID: 410FC241
01-29 13:26:14 UART-RX DEBUG logStr=HFSR : 40000000
01-29 13:26:14 UART-RX DEBUG logStr=MMFSR: 00000000
01-29 13:26:14 UART-RX DEBUG logStr=BFSR : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=UFSR : 00000001
01-29 13:26:14 UART-RX DEBUG logStr=DFSR : 0000000B
01-29 13:26:14 UART-RX DEBUG logStr=AFSR : 00000000
01-29 13:26:14 UART-RX DEBUG logStr=Mode : Thread
01-29 13:26:14 UART-RX DEBUG logStr=Priv : Privileged
01-29 13:26:14 UART-RX DEBUG logStr=Stack: PSP
01-29 13:26:14 UART-RX DEBUG logStr=
01-29 13:26:14 UART-RX DEBUG logStr=-- MbedOS Fault Handler --
01-29 13:26:14 UART-RX DEBUG logStr=
01-29 13:26:14 UART-RX DEBUG logStr=
01-29 13:26:14 UART-RX DEBUG logStr=
01-29 13:26:14 UART-RX DEBUG logStr=++ MbedOS Error Info ++
01-29 13:26:24 UART-RX DEBUG logStr=Error Status: 0x80FF013D Code: 317 Module: 255
01-29 13:26:24 UART-RX DEBUG logStr=Error Message: Fault exception
01-29 13:26:24 UART-RX DEBUG logStr=Location: 0x811ACF6
01-29 13:26:24 UART-RX DEBUG logStr=Error Value: 0x2000F73C
01-29 13:26:24 UART-RX DEBUG logStr=Current Thread: rtx_idle Id: 0x2000FB1C Entry: 0x80FB945 StackSize: 0x280 StackMem: 0x2000CAF8 SP: 0x2000CD64
01-29 13:26:24 UART-RX DEBUG logStr=For more info, visit: https://mbed.com/s/error?error=0x80FF013D&tgt=UBLOX_EVK_ODIN_W2
01-29 13:26:24 UART-RX DEBUG logStr=-- MbedOS Error Info --
Maybe issue could be raised in https://github.com/STMicroelectronics/STM32CubeF4
@jeromecoutant Was this created there? Just checking if this can be fixed in near future.
@se7ensong Thanks for the report, would you be able to share how to reproduce the issue (your app might do different things but the same result as this issue. It would be good to have the steps to reproduce it) ? Do you have a code snippet that would allow us to reproduce locally ?
@0xc0170 , I will try to get a minimal example for you as I cannot use the exact code I am currently working on. FYI, I am running this on ODIN-EVK-UBLOX-W2 and will try 5.15.0 instead of 5.15.1.
@ARMmbed/team-st-mcd has there been any update? Was this shared with the Cube4 team ?
Hi
How is this defect reproduced ?
mbed import https://github.com/armmbed/mbed-cloud-client-example (4.2.1 version with Mbed OS 5.15.0).
mbed compile -m NUCLEO_F429ZI --profile debugCrashes immediately on application start.
I am sorry but I couldn't reproduce the crash...
mbed import https://github.com/armmbed/mbed-cloud-client-example -v
cd mbed-cloud-client-example
<update mbed_cloud_dev_credentials.c>
mbed compile -t GCC_ARM -m NUCLEO_F429ZI -v --profile debug -f
Hi @jeromecoutant,
Thanks for helping the investigation, we are currently looking at this internally. here are our findings so far:
But based on teams observations, there seems to be another crash, which using develop profile build, and the crash happens randomly after targets booted up.
it is not conclusive whether these 2 crashings are the same cause. We haven't able to reproduce the 2nd type crash reliably
Hi,
I don't have too many details to disclose, but we are experiencing a very similar issue on our STML4 target. A few observations:
Please raise an issue in pyOCD
Any updates on this issue?
@se7ensong and @chopbo did you use pyocd when flashing?
@TuomoHautamaki , no I use IAR. I still see the issues sometimes, but not 100% reproduce yet.
Just answering on @chopbo behalf, as we work on the same project. We don't use pyOCD but flash via openOCD.
@jamesbeyond did we do any progress since your last comment?
Thanks everyone for the reports, useful to have multiple records - to see the scope of this (multiple toolchain/debug tools and targets).
Please raise an issue in pyOCD
openOCD and IAR also have this so would mean they all 3 share the same bug or rather this is in this codebase.
I am now trying to run my application on UBLOX_EVK_ODIN_W2 without the above workarounds. The error is not 100% reproducible, but cannot be reset once occurred.
The latest finding is that I can get it working (without reprogramming etc.) by the following steps:
So far, these steps work for me 100%.
Thanks for the info @se7ensong, with further investigation we believe we found the cause:
it doesn't relate to flash but reset is a trigger.
pyOCD or other debug tool when they perform a reset to the target it didn't clear DBGMCU_CR
register. sometime if the flashed image is built with debug profile, it deliberately set DBGMCU_CR
register to 0x7
.
When this register is set, especially the DBG_SLEEP
bit, and in the target sleep mode it will crash at WFI
instruction. For details please see STM32F4 errata chapter 2.1.3 - Debugging Sleep/Stop mode with WFE/WFI entry
I tried our image with manually clear the DBGMCU_CR
, the crash is gone,
Also base on the errata, there are two other conditions to met to see the crash:
BTW, we tried the workaround of adding NOP
x3 after WFI
, that solution seems works for us. If you can confirm whether that is working for you or not, that would be great.
Thank you @jamesbeyond ! I have now modified HAL_PWR_EnterSLEEPMode in the "stm32f4xx_hal_pwr.c". I will let you know if the bug happens anymore.
Awesome work @jamesbeyond.
I have just tried to clear the DBGMCU_CR register as the first thing in our main (On a STM32L4 board), and now our application works with -Os and -flto.
I will try with the three NOP also.
Three times NOP in HAL_PWR_EnterSLEEPMode also works here.
Thank you @jamesbeyond ! I have now modified HAL_PWR_EnterSLEEPMode in the "stm32f4xx_hal_pwr.c". I will let you know if the bug happens anymore.
So far, during my developing process, it hasn't happened once yet. Thank you so much for the fix!
ST_INTERNAL_REF 83447
The errata also specifies "if the application software disables the Prefetch queue". Have we done that? (Are the conditions all supposed to be "and"?)
Fix is now on master, I'll close this as resolved
PR that fixes this; https://github.com/ARMmbed/mbed-os/pull/12717 (for tracing purposes), should make Mbed OS 5.15.2.
Most helpful comment
Thanks for the info @se7ensong, with further investigation we believe we found the cause:
it doesn't relate to flash but reset is a trigger.
pyOCD or other debug tool when they perform a reset to the target it didn't clear
DBGMCU_CR
register. sometime if the flashed image is built with debug profile, it deliberately setDBGMCU_CR
register to0x7
.When this register is set, especially the
DBG_SLEEP
bit, and in the target sleep mode it will crash atWFI
instruction. For details please see STM32F4 errata chapter 2.1.3 -Debugging Sleep/Stop mode with WFE/WFI entry
I tried our image with manually clear the
DBGMCU_CR
, the crash is gone,Also base on the errata, there are two other conditions to met to see the crash:
(0x080xx_xxx4)
maybe these adding some of the randomnesses when we see the crash happens
BTW, we tried the workaround of adding
NOP
x3 afterWFI
, that solution seems works for us. If you can confirm whether that is working for you or not, that would be great.