Hi,
This zephyr/samples/boards/nrf52/mesh/onoff_level_lighting_vnd_app App in latest master branch
works perfectly normal with Zephyr v1.12.99 (with last commit ba6763a187a347cfc825a2bece78e7d1ef28772d).
But with latest master branch or after v1.13 onward, I am facing issue of MPU FAULT intermittently.
If we set "LIGHT_CTL_TT" in publisher.c & configure buttons to publish Light CTL set (acknowledged) messages then it get easily encountered while playing with on-boards buttons on #nRF52840_PDK boards.
Have you verified that this is not because of some too small thread stack? You should check all threads, but the two most likely culprits are the Bluetooth RX thread and the system workqueue thread. It's not really worth investigating this further until stack overflow has been excluded as a potential cause.
1)
CONFIG_MAIN_STACK_SIZE=512 ......set to 1024
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048 ....set to 4096
even after that getting MPU_FAULT.
2)
If stack size is issue then it should also get replicate in case of 1.12.99 ...but it is not like that.
3)
There are some commit which are related to MPU after ba6763a187a347cfc825a2bece78e7d1ef28772d which may be cause of it.
4)
May be there is bug in App itself ..but then how it is working with v1.12.99 ?
There are many aspects of the system that can cause an increase stack consumption. You didn't mention the Bluetooth RX stack size. Why not? I'd recommend setting both it and the system workque to 4k. The Kconfig option for the RX stack is CONFIG_BT_RX_STACK_SIZE. It'd also be good to get the exact consumption numbers for all threads, for which you'll need to enable CONFIG_INIT_STACKS and CONFIG_THREAD_STACK_INFO.
Now I've set
CONFIG_INIT_STACKS=y
CONFIG_THREAD_STACK_INFO=y
CONFIG_MAIN_STACK_SIZE=2048
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_BT_RX_STACK_SIZE=4096
For other configuration, please refer
https://github.com/vikrant8051/zephyr/blob/fix_bugs10/samples/boards/nrf52/mesh/onoff_level_lighting_vnd_app/prj.conf
After that I got following log ....
prio recv thread stack (real size 448): unused 336 usage 112 / 448 (25 %)
recv thread stack (real size 4096): unused 3920 usage 176 / 4096 (4 %)
prio recv thread stack (real size 448): unused 152 usage 296 / 448 (66 %)
recv thread stack (real size 4096): unused 3844 usage 252 / 4096 (6 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 0320
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
power-> 100, color-> 10
power-> 100, color-> 20
power-> 100, color-> 30
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 19a0
Target CTL Lightness = 4000
Target CTL Temperature = 0320
Remaining Time = 45
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
power-> 92, color-> 27
power-> 0, color-> 27
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
power-> 100, color-> 27
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 1760
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 1760
Target CTL Lightness = 4000
Target CTL Temperature = 0320
Remaining Time = 45
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
prio recv thread stack (real size 448): unused 152 usage 296 / 448 (66 %)
recv thread stack (real size 4096): unused 3844 usage 252 / 4096 (6 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
power-> 92, color-> 24
power-> 0, color-> 24
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
power-> 100, color-> 24
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 155a
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 155a
Target CTL Lightness = 4000
Target CTL Temperature = 0320
Remaining Time = 45
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
power-> 92, color-> 21
power-> 0, color-> 21
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = 0000
Present CTL Temperature = 1388
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
power-> 100, color-> 21
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
***** MPU FAULT *****
Instruction Access Violation
***** Hardware exception *****
Current thread ID = 0x2000188c
Faulting instruction address = 0x20001ca0
Fatal fault in ISR! Spinning...
Is it because of adv_stack ?
How to increase its size ?
@vikrant8051 this does looks suspicious.
https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/bluetooth/host/mesh/adv.c#L53
Can you try increasing that one?
I simply copy & paste this app from 1.13 to v1.12.99
here too I got
adv stack (real size 512): unused 48 usage 464 / 512 (90 %)
But that does't cause any MPU_FAULT.
@carlescufi I will increase it & re-check.
@carlescufi I set it to 1024 but no effect.
Remaining Time = 45
adv stack (real size 1024): unused 560 usage 464 / 1024 (45 %)
adv stack (real size 1024): unused 560 usage 464 / 1024 (45 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = 0000
Present CTL Temperature = 32fe
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 1024): unused 560 usage 464 / 1024 (45 %)
power-> 100, color-> 63
adv stack (real size 1024): unused 560 usage 464 / 1024 (45 %)
MPU FAULT
Instruction Access Violation
Hardware exception
Current thread ID = 0x2000188c
Faulting instruction address = 0x20001ca0
Fatal fault in ISR! Spinning...
@vikrant8051 @carlescufi
Faulting instruction address = 0x20001ca0
It looks like we try to execute from SRAM.
Hello @ioannisg,
Is it due to any bug in App itself ?
Is it due to any bug in App itself ?
I have no idea.
All I see by inspecting the fault dump is that:
@jhedberg @carlescufi some debugging, here, might be needed, IMHO.
@vikrant8051 (could you enable MPU_STACK_GUARD) and see if you get stack overflow?
CONFIG_MPU_STACK_GUARD=y after enabling this, on reset getting following fault....
Booting Zephyr OS zephyr-v1.13.0-152-g6770919e7
power-> 100, color-> 0
Initializing...
Bluetooth initialized
MPU FAULT
Stacking error
Data Access Violation
MMFAR Address: 0x20003538
Hardware exception
Current thread ID = 0x200006ac
Faulting instruction address = 0x1d9ec
Fatal fault in thread 0x200006ac! Aborting.
Mesh initialized
ecc stack (real size 1024): unused 120 usage 904 / 1024 (88 %)
After some testing with onboard buttons, get following log on terminal ..... (buttons suddenly stop publishing )....
power-> 100, color-> 18
power-> 0, color-> 18
power-> 100, color-> 18
bt_mesh_model_publish: err: -55
prio recv thread stack (real size 448): unused 120 usage 328 / 448 (73 %)
recv thread stack (real size 4096): unused 3768 usage 328 / 4096 (8 %)
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
prio recv thread stack (real size 448): unused 120 usage 328 / 448 (73 %)
recv thread stack (real size 4096): unused 3768 usage 328 / 4096 (8 %)
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
prio recv thread stack (real size 448): unused 120 usage 328 / 448 (73 %)
recv thread stack (real size 4096)
@vikrant8051 could you please try this for us:
git revert d8d5ec3f913e69b8b2d3bf46692da818567402d9
And run the test again.
@vikrant8051
Could you try to bisect?
$ git bisect start
$ git bisect good <commit sha that you know works>
$ git bisect bad HEAD
Then test each revision presented to you by git and type:
git bisect good if it works
git bisect bad if it doesn't work (you get an MPU fault)
until Git tells you which commit is responsible for the error.
@carlescufi
After executing following command to remove d8d5ec3f913e69b8b2d3bf46692da818567402d9 I'm not facing MPU_FAULT issue.
git revert d8d5ec3f913e69b8b2d3bf46692da818567402d9
Hooray, finally bug has found.
@vikrant8051 thank you for testing this.
@andyross seems like #9620 introduces MPU faults for some users
CC @nashif
Will investigate. Pretty sure the handling is correct now, but the problem was subtle and I might have messed something up...
there is a possible fix in a PR already
@nashif @carlescufi @andyross
After merging https://github.com/zephyrproject-rtos/zephyr/pull/9724 in latest local master branch, I am not facing issue of MPU_FAULT while testing.
Most helpful comment
@vikrant8051 could you please try this for us:
git revert d8d5ec3f913e69b8b2d3bf46692da818567402d9And run the test again.