Mbed-os: MCUXpresso exporter - bad assembler flags can cause HardFault on Thread exit

Created on 18 Apr 2018  路  31Comments  路  Source: ARMmbed/mbed-os

Description

  • Type: Bug
  • Priority: Minor | Major

(EDIT: Original title: "exiting thread after printf() causes HardFault".)


Bug

Target
LPC4088 (EA QuickStart Board)

Toolchain:
GCC_ARM

Toolchain version:
MCUXpresso IDE v10.1.1 [Build 606] [2018-01-02]

mbed-cli version:
1.5.0

mbed-os sha:
f9ee4e849 (tag: mbed-os-5.8.2, origin/mbed-os-5.8, mbed-os-5.8) Merge pull request #6579 from ARMmbed/release-candidate

DAPLink version:

Expected behavior

Thread doesn't cause HardFault on exit. Can call join() on an mbed::Thread before or after it exits.

Actual behavior

Exiting thread causes HardFault.

I suspect it has something to do with the printf() not finishing before the thread exits.

Steps to reproduce

#include "mbed.h"

static UARTSerial SerialPort(USBTX, USBRX, 115200);

// redirect stdio to this serial port
FileHandle *mbed::mbed_override_console(int fd)
{
   return &SerialPort;
}

static void bgThreadTask()
{
   while (true)
   {
      osEvent ose = Thread::signal_wait(0);

      if (ose.value.v == 1)
      {
         printf("\r\nSignal received!\r\n");
         break;
      }
   }
};

int main()
{
   Thread bgThread;

   Callback<void()> bgThreadCB(bgThreadTask);

   bgThread.start(bgThreadCB);
   bgThread.signal_set(1);

   bgThread.join();

   // indicator LED if we get this far
   DigitalOut led2(LED2, 0); // on

   while (true)
   {
      sleep();
   }
}
IOTOSM-2116 DONE nxp mirrored bug

All 31 comments

I'm currently working around the issue by using a Semaphore and Thread::terminate() instead of Thread::join().

(That is, Semaphore::release() then while (true) { sleep(); } in the thread; Semaphore::wait() then Thread::terminate() in main().)
[Mirrored to Jira]

@bmcdonnell-ionx which instruction is causing the hardfault, can you trace it back ? Any more info for the fault?

@bulislaw Can you please review?
[Mirrored to Jira]

  • mbed new mbed-test-thread
  • Paste the code exactly from above into main.cpp
  • mbed config target LPC4088
  • mbed export -i mcuxpresso
  • Build in MCUXpresso, load, run → HardFault
  • Run crash_log_parser

main.cpp.txt (renamed b/c GitHub restricts file types)

crash-log.txt

Debug.zip

$ crash_log_parser.py crash-log.txt mbed-test-thread.axf mbed-test-thread.map

Crash Info:
        Crash location = thread_switch_helper [0x00004594] (based on PC value)
        Caller location = SVC_ContextSwitch [0x0000144D] (based on LR value)
        Stack Pointer at the time of crash = [1000FFB0]
        Target and Fault Info:
                Processor Arch: ARM-V7M or above
                Processor Variant: C24
                Forced exception, a fault with configurable priority has been escalated to HardFault
                Imprecise data access error has occurred

[Mirrored to Jira]

Don't see how the serial print is directly relevant - if the printf has returned, everything "thread-specific" in the UARTSerial/printf chain has completed. The text will now be in a static buffer inside the UARTSerial object and being pumped out over interrupt, but that doesn't care about threads.

How much of the text do you get on the output? None?

[Mirrored to Jira]

With this build, all I see is the S (so \r\nS).
[Mirrored to Jira]

Am I right in thinking that sleep() is a no-op in the RTOS? I believe that's the case. If it wasn't, maybe that's triggering some problem - not sure it should be used "raw" like that. Can you try taking it out?

If the crash dissection is accurate, a crash in thread_switch_helper should only be possible if there's a thread switch observer installed, and I don't think there should be.

Perhaps osEventObs has been corrupted?

Could you put something at the end of the hard fault handler to dump out in turn the things accessed by thread_switch_helper: osEventObs, osEventObs->thread_switch, osRtxInfo.thread.run.next and osRtxInfo.thread.run.next->context.

Does the problem go away if the console is left as default (unbuffered serial with 'platform.stdio-buffered-serial` false)?
[Mirrored to Jira]

Am I right in thinking that sleep() is a no-op in the RTOS?

Apparently not.

Assuming my IDE is correctly showing me which code is #defined in/out, for my platform and build, sleep() calls sleep_manager_sleep_auto(), which calls hal_sleep() (which calls __WFI()) or hal_deepsleep() (which calls hal_sleep()).
[Mirrored to Jira]

Could you put something at the end of the hard fault handler to dump out in turn the things accessed by thread_switch_helper: osEventObs, osEventObs->thread_switch, osRtxInfo.thread.run.next and osRtxInfo.thread.run.next->context.

I'm not sure what exactly you're asking for here.

I see:

osEventObs is a pointer to a struct, which contains a version and function pointers. osEventObs->thread_switch is one of those function pointers. You want to know the addresses of the pointers?

...Anyway, since you're more familiar with these internals, could you provide a code snipped I can drop in to provide the info you want?
[Mirrored to Jira]

Does the problem go away if the console is left as default (unbuffered serial with platform.stdio-buffered-serial false)?

No, the problem persists.
[Mirrored to Jira]

Looking at the crash dump this is an imprecise bus access error. That means the crash location may not be reflecting the actual crash location. See https://os.mbed.com/docs/v5.8/tutorials/analyzing-mbed-os-crash-dump.html for more info(see the bottom section specifically). Is it possible to enable bit DISDEFWBUF bit in the Auxiliary Control Register (ACTLR) and re-run the test?
[Mirrored to Jira]

@SenRamakri

enable bit DISDEFWBUF bit in the Auxiliary Control Register (ACTLR) and re-run the test?

Yes.

https://github.com/bmcdonnell-ionx/mbed-os-issue-6661/commit/3d764cd18e9ae785c0a83cb9a8c384fcb0eaeb68

Interestingly, as you can see from the attached PuTTY log, in this case it does print the whole string.

2018-04-20-140531_COM5_putty_log.txt

Crash Info:
        Crash location = SVC_ContextSave [0x00001154] (based on PC value)
        Caller location = __exidx_start [0xFFFFFFED] (based on LR value)
        Stack Pointer at the time of crash = [1000FFB0]
        Target and Fault Info:
                Processor Arch: ARM-V7M or above
                Processor Variant: C24
                Forced exception, a fault with configurable priority has been escalated to HardFault
                A precise data access error has occurred. Faulting address: 00000038

[Mirrored to Jira]

@bmcdonnell-ionx - Thanks for capturing the info. Latest crash dump does tells me that the HardFault happened while processing SVC call and my guess is the crash is happening while executing this instruction in SVC_ContextSave. For some reason, value of R1 at this point becomes 0 which causes invalid access.

STR R12,[R1,#TCB_SP_OFS] ; Store SP

Few other things which came to my attention is that LPC4088 does have FPU and when I checked build configuration I do see that we are enabling it in Mbed build config/targets.json. Surprisingly, when I did a disassembly of your mbed-test-thread.axf provided in your Debug.zip I don't see any indication of FPU being enabled. We need special FPU context save/restore code(enabled by __FPU_PRESENT flag) while processing SVC calls for a M4F vs M4, but your build looks as if its compiled for non-FPU version. So can you please double check your build options and if you are building for right target.

[Mirrored to Jira]

please double check your build options and if you are building for right target.

Recall that I'm exporting to NXP MCUXpresso IDE. Here are the "defined symbols" listed in project properties → Tool Settings tab → MCU C++ Compiler → Preprocessor.

__MBED__=1
DEVICE_I2CSLAVE=1
TARGET_LIKE_MBED
TARGET_NXP
DEVICE_PORTINOUT=1
TARGET_RTOS_M4_M7
DEVICE_RTC=1
TOOLCHAIN_object
__CMSIS_RTOS
DEVICE_DEBUG_AWARENESS=1
TOOLCHAIN_GCC
DEVICE_CAN=1
TARGET_CORTEX_M
TARGET_DEBUG
TARGET_LIKE_CORTEX_M4
DEVICE_ANALOGOUT=1
TARGET_M4
TARGET_UVISOR_UNSUPPORTED
DEVICE_PWMOUT=1
DEVICE_INTERRUPTIN=1
TARGET_LPCTarget
TARGET_LPC408X
TARGET_CORTEX
DEVICE_I2C=1
DEVICE_PORTOUT=1
__CORTEX_M4
DEVICE_STDIO_MESSAGES=1
TARGET_MCU_LPC4088
__FPU_PRESENT=1
DEVICE_PORTIN=1
DEVICE_SERIAL=1
TARGET_LPC4088
__MBED_CMSIS_RTOS_CM
DEVICE_SLEEP=1
TOOLCHAIN_GCC_ARM
DEVICE_SPI=1
DEVICE_ETHERNET=1
DEVICE_SPISLAVE=1
DEVICE_ANALOGIN=1
MBED_BUILD_TIMESTAMP=1524240845.35
ARM_MATH_CM4
__NEWLIB__

[Mirrored to Jira]

Thanks for the info, @bmcdonnell-ionx , the above list does have all the right options including __FPU_PRESENT I was looking for. I wonder why I don't see the FPU context/save code in your build. If you haven't tried yet, I would suggest doing a clean build. And, if possible please share the new .axf, .map and the crash dump if you are still seeing the crash.
[Mirrored to Jira]

clean build. And ... new .axf, .map and the crash dump if you are still seeing the crash.

https://github.com/bmcdonnell-ionx/mbed-os-issue-6661/commit/3d764cd18e9ae785c0a83cb9a8c384fcb0eaeb68

Built w/ -std=gnu++98.

Debug build (-O0)

mbed-os-issue-6661--3d764cd-dbg.zip

2018-04-21-110004_COM4_putty_log.txt

Crash Info:
        Crash location = SVC_ContextSave [0x00001154] (based on PC value)
        Caller location = __exidx_start [0xFFFFFFED] (based on LR value)
        Stack Pointer at the time of crash = [1000FFB0]
        Target and Fault Info:
                Processor Arch: ARM-V7M or above
                Processor Variant: C24
                Forced exception, a fault with configurable priority has been escalated to HardFault
                A precise data access error has occurred. Faulting address: 00000038

Release build (-Os)

mbed-os-issue-6661--3d764cd-rls.zip

2018-04-21-110721_COM4_putty_log.txt

Crash Info:
        Crash location = SVC_ContextSave [0x00000A08] (based on PC value)
        Caller location = __exidx_start [0xFFFFFFED] (based on LR value)
        Stack Pointer at the time of crash = [1000FFD8]
        Target and Fault Info:
                Processor Arch: ARM-V7M or above
                Processor Variant: C24
                Forced exception, a fault with configurable priority has been escalated to HardFault
                A precise data access error has occurred. Faulting address: 00000038

[Mirrored to Jira]

@bmcdonnell-ionx - Thanks for capturing the data. It does confirm that the instruction crashing is the same as we inferred earlier. The thing Im struggling with is, when I compiled for LPC4088 using gcc, I get the below for SVC_ContextSave.

SVC_ContextSave
    0x000024fc:    e92c0ff0    ,...    STMDB    r12!,{r4-r11}
    0x00002500:    f01e0f10    ....    TST      lr,#0x10
    0x00002504:    bf08        ..      IT       EQ
    0x00002506:    ed2c8a10    ,...    VSTMDBEQ r12!,{s16-s31}
    0x0000250a:    f8c1c038    ..8.    STR      r12,[r1,#0x38]
    0x0000250e:    f881e022    ..".    STRB     lr,[r1,#0x22]

whereas, when I disassembled the axf you provided I get the below.

SVC_ContextSave
    0x00001150:    e92c0ff0    ,...    STMDB    r12!,{r4-r11}
    0x00001154:    f8c1c038    ..8.    STR      r12,[r1,#0x38]
    0x00001158:    f881e022    ..".    STRB     lr,[r1,#0x22]

See the difference where FPU context is handled. Still don't know whats causing this difference? Even if there is a possibility that it may not be the root cause of the issue, it concerns me that we are potentially running different code from what is expected.
[Mirrored to Jira]

I just built with mbed compile -t GCC_ARM, and the crash did not occur. So I guess the problem could be in the exporter, the compiler options, the compiler version (less likely I think), library differences, or MCUXpresso IDE.

Attached are the following from both mbed compile and MCUXpresso:

  • Build logs
  • Build commands for a couple arbitrary files (serial_api.c, UARTSerial.cpp). One each from GCC and G++. The 00 files are as copied from the build log. The 01 files have line breaks inserted before each CLI option, and the 02 files are sorted, for easy diff-ing.

(I used the "Release" build config from MCUXpresso b/c that uses -Os, which is what mbed compile is doing.)

Does this reveal anything to you?

build-log--mbed-compile.txt
build-log--mcuxpresso.txt

build-cmd-serial_api.c--mbed-compile--00.txt
build-cmd-serial_api.c--mbed-compile--01-lines.txt
build-cmd-serial_api.c--mbed-compile--02-sorted.txt
build-cmd-serial_api.c--mcuxpresso--00.txt
build-cmd-serial_api.c--mcuxpresso--01-lines.txt
build-cmd-serial_api.c--mcuxpresso--02-sorted.txt
build-cmd-UARTSerial.cpp--mbed-compile--00.txt
build-cmd-UARTSerial.cpp--mbed-compile--01-lines.txt
build-cmd-UARTSerial.cpp--mbed-compile--02-sorted.txt
build-cmd-UARTSerial.cpp--mcuxpresso--00.txt
build-cmd-UARTSerial.cpp--mcuxpresso--01-lines.txt
build-cmd-UARTSerial.cpp--mcuxpresso--02-sorted.txt

Aside: oddly, the compiler options for the release build in MCUXpresso include both TARGET_DEBUG and NDEBUG.
[Mirrored to Jira]

Doesn't look like there are any compiler differences.

I'm using Windows 10 on my PC. My PATH environment variable includes C:\Program Files (x86)\GNU Tools ARM Embedded\6 2017-q2-update\bin, so I'm guessing that's where mbed compile is getting ARM_GCC. (Please correct me if I'm wrong.)

For MCUXpresso, the folder is C:\nxp\MCUXpressoIDE_10.1.1_606\ide\tools\bin. The contents of this folder are binary identical to the one above.
[Mirrored to Jira]

In your logs, I can see these options being passed to gcc for irq_cm4.S by mbed compile, but not by the IDE export:

D__CMSIS_RTOS -D__FPU_PRESENT=1 -DARM_MATH_CM4 -D__CORTEX_M4 -D__MBED_CMSIS_RTOS_CM

Assembler is only visibly getting -DNDEBUG -D__NEWLIB__ in the IDE case, unless there's a via file I'm missing.
[Mirrored to Jira]

Indeed, in project properties in the MCUXpresso IDE, "Assembler flags" had -c -DNDEBUG -D__NEWLIB__.

-D__CMSIS_RTOS -D__FPU_PRESENT=1 -DARM_MATH_CM4 -D__CORTEX_M4 -D__MBED_CMSIS_RTOS_CM

And when I add those options manually in the project properties via the IDE, it works!

So why isn't the exporter including them?

Hint: from mbed-os/tools/export/mcuxpresso/.cproject.tmpl, I see that the way that the mcuxpresso exporter looks for compiler options for the C (lines 107-111) and C++ compilers (lines 37-41) differs from how it looks for them for the assembler (lines 165-183).
[Mirrored to Jira]

In your logs, I can see these options being passed to gcc for irq_cm4.S by mbed compile, but not by the IDE export:

D__CMSIS_RTOS -D__FPU_PRESENT=1 -DARM_MATH_CM4 -D__CORTEX_M4 -D__MBED_CMSIS_RTOS_CM

Yep, that explains. Thanks @kjbracey-arm

@bmcdonnell-ionx - Hopefully this fixes your issue as you have root cause figured out as well. Do you need more help on this?

[Mirrored to Jira]

@SenRamakri,

Do you need more on this?

Yes.

you have root cause figured out

I think the solution is to patch the Mbed OS MCUXpresso exporter. I don't know what the fix is.
[Mirrored to Jira]

Adding @theotherjimmy and @cmonr to get more info on MCUXpresso exporters.
[Mirrored to Jira]

@ARMmbed/team-nxp Please review https://github.com/ARMmbed/mbed-os/issues/6661#issuecomment-383586747
[Mirrored to Jira]

@0xc0170, @theotherjimmy, @cmonr - bump.
[Mirrored to Jira]

Wasn't this already addressed @ARMmbed/team-nxp ?
[Mirrored to Jira]

Internal Jira reference: https://jira.arm.com/browse/IOTPART-6502

@mmahadevan108 Please review https://github.com/ARMmbed/mbed-os/issues/6661#issuecomment-383586747
Please help identify a fix to the exporters as soon as possible. I will be happy to help setup a call to discuss.

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2116

I am closing this now as the tools were frozen and the new tools will be released.

Was this page helpful?
0 / 5 - 0 ratings