Describe the bug
When uploading an image using mcumgr over BLE, after 'testing' the image and rebooting the target device, the images are correctly swapped, however, the boot flags do not get updated. This causes the image management to end up in an incorrect state and never being able to upload a new again.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
After the succesfull swap in MCUBoot, I expect the boot flag on slot=0 to be active, but not confirmed, and I expect the boot flag on slot=1 to be confirmed.
Impact
Showstopper: I cannot remotely update the device anymore. No new images are accepted anymore.
Screenshots or console output
Images:
image=0 slot=0
version: 1.2.0
bootable: true
flags: active confirmed
hash: 8e2bcefe24213252eeb652037c12edb69f58540376559b67ee71d19e84edcfea
image=0 slot=1
version: 1.2.0
bootable: true
flags:
hash: 14af1ee2fa49366e37a988f64f249ef684b7f5accf968e9bad482706c9f9b361
Split status: N/A (0)
Images:
image=0 slot=0
version: 1.2.0
bootable: true
flags: active confirmed
hash: 8e2bcefe24213252eeb652037c12edb69f58540376559b67ee71d19e84edcfea
image=0 slot=1
version: 1.2.0
bootable: true
flags: pending
hash: 14af1ee2fa49366e37a988f64f249ef684b7f5accf968e9bad482706c9f9b361
Split status: N/A (0)
Images:
image=0 slot=0
version: 1.2.0
bootable: true
flags: active confirmed
hash: 14af1ee2fa49366e37a988f64f249ef684b7f5accf968e9bad482706c9f9b361
image=0 slot=1
version: 1.2.0
bootable: true
flags: pending
hash: 9643caaf5ec60ce187d179bc51a1660c501bd681c2dc5f8923a63ba940874792
Environment (please complete the following information):
Additional context
I am using slot1 on an external flash.
Follow-up from discussion from slack with @andrzej-kaczmarek / @carlescufi.
cc @utzig
Follow-up from discussion from slack with @andrzej-kaczmarek / @carlescufi.
What slack is that?
I did a basic Mynewt test on nrf52840_pca10056 with the secondary slot stored on the QSPI flash and seems to work. This is Mynewt master with MCUBoot v1.3.1.
$ newtmgr -t 60 --conn ttyACM0 image upload bin/targets/nrf52840_slinky/app/apps/slinky/slinky.img
68.05 KiB / 68.05 KiB [========================================] 100.00% 2.24 KiB/s 30s
Done
$ newtmgr -t 60 --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: 155de0afc7e000246107a8e145adb014af5f2741884929f7589e65e05b986d4f
image=0 slot=1
version: 1.1.0
bootable: true
flags:
hash: 5d8888d2208ef90ce76422a789a9c971f0378ec924eb7d779ab7b031ebf2c61c
Split status: N/A (0)
$ newtmgr -t 60 --conn ttyACM0 image test 5d8888d2208ef90ce76422a789a9c971f0378ec924eb7d779ab7b031ebf2c61c
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: 155de0afc7e000246107a8e145adb014af5f2741884929f7589e65e05b986d4f
image=0 slot=1
version: 1.1.0
bootable: true
flags: pending
hash: 5d8888d2208ef90ce76422a789a9c971f0378ec924eb7d779ab7b031ebf2c61c
Split status: N/A (0)
$ newtmgr -t 60 --conn ttyACM0 reset
Done
$ newtmgr -t 60 --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.1.0
bootable: true
flags: active
hash: 5d8888d2208ef90ce76422a789a9c971f0378ec924eb7d779ab7b031ebf2c61c
image=0 slot=1
version: 1.0.0
bootable: true
flags: confirmed
hash: 155de0afc7e000246107a8e145adb014af5f2741884929f7589e65e05b986d4f
Split status: N/A (0)
$ newtmgr -t 60 --conn ttyACM0 reset
Done
$ newtmgr -t 60 --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: 155de0afc7e000246107a8e145adb014af5f2741884929f7589e65e05b986d4f
image=0 slot=1
version: 1.1.0
bootable: true
flags:
hash: 5d8888d2208ef90ce76422a789a9c971f0378ec924eb7d779ab7b031ebf2c61c
Split status: N/A (0)
$ newt create-image nrf52840_slinky 1.2.0 key_rsa.pem -2
App image successfully generated: /home/utzig/src/mynewt/mynewt-core-clean2/bin/targets/nrf52840_slinky/app/apps/slinky/slinky.img
$ newtmgr -t 60 --conn ttyACM0 image upload bin/targets/nrf52840_slinky/app/apps/slinky/slinky.img
68.05 KiB / 68.05 KiB [=======================================] 100.00% 2.24 KiB/s 30s
Done
$ newtmgr -t 60 --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: 155de0afc7e000246107a8e145adb014af5f2741884929f7589e65e05b986d4f
image=0 slot=1
version: 1.2.0
bootable: true
flags:
hash: 9ab75e40feb6c3877485a77f4c9ae290071d0fcda5cbdaae82bf297c978398eb
Split status: N/A (0)
I am gonna try with Zephyr v2.0.0 and seen if it happens for me as well...
Ok, so I tested smp_srv over UART on the nrf52840_pca10056, and mcumgr seems to work correctly.
$ mcumgr --conn ttyACM0 image test 919c129684ee4af39900646bbcc86cc166de03ac6097b31e7015df9a7345eabb
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: da9638519a797bd85ff32a5f1bebfa4f6064516e733a73abd53f62f8969cdb8f
image=0 slot=1
version: 1.1.0
bootable: true
flags: pending
hash: 919c129684ee4af39900646bbcc86cc166de03ac6097b31e7015df9a7345eabb
Split status: N/A (0)
$ mcumgr --conn ttyACM0 reset
Done.
$ mcumgr --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.1.0
bootable: true
flags: active
hash: 919c129684ee4af39900646bbcc86cc166de03ac6097b31e7015df9a7345eabb
image=0 slot=1
version: 1.0.0
bootable: true
flags: confirmed
hash: da9638519a797bd85ff32a5f1bebfa4f6064516e733a73abd53f62f8969cdb8f
Split status: N/A (0)
$ mcumgr --conn ttyACM0 reset
Done
$ mcumgr --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: da9638519a797bd85ff32a5f1bebfa4f6064516e733a73abd53f62f8969cdb8f
image=0 slot=1
version: 1.1.0
bootable: true
flags:
hash: 919c129684ee4af39900646bbcc86cc166de03ac6097b31e7015df9a7345eabb
Split status: N/A (0)
Did it twice just to be sure. I have not uploaded my image with image upload, gonna try that as well. Also I am still using only the internal flash, so I'll check how to enable the QSPI flash on Zephyr and test with that.
Not necessary related, but what is the imgtool command line you are using?
image upload seems to work as well.
$ mcumgr --conn ttyACM0 image upload mcumgr2.img
46.25 KiB / 46.25 KiB [===============================================================================] 100.00% 2.13 KiB/s 21s
Done
$ mcumgr --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: da9638519a797bd85ff32a5f1bebfa4f6064516e733a73abd53f62f8969cdb8f
image=0 slot=1
version: 1.2.0
bootable: true
flags:
hash: 69a57a9898925acf5069734facb9098ba97e122de7c5822ffddaec99a6836548
Split status: N/A (0)
$ mcumgr --conn ttyACM0 image test 69a57a9898925acf5069734facb9098ba97e122de7c5822ffddaec99a6836548
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: da9638519a797bd85ff32a5f1bebfa4f6064516e733a73abd53f62f8969cdb8f
image=0 slot=1
version: 1.2.0
bootable: true
flags: pending
hash: 69a57a9898925acf5069734facb9098ba97e122de7c5822ffddaec99a6836548
Split status: N/A (0)
$ mcumgr --conn ttyACM0 reset
Done
$ mcumgr --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.2.0
bootable: true
flags: active
hash: 69a57a9898925acf5069734facb9098ba97e122de7c5822ffddaec99a6836548
image=0 slot=1
version: 1.0.0
bootable: true
flags: confirmed
hash: da9638519a797bd85ff32a5f1bebfa4f6064516e733a73abd53f62f8969cdb8f
Split status: N/A (0)
$ mcumgr --conn ttyACM0 reset
Done
$ mcumgr --conn ttyACM0 image list
Images:
image=0 slot=0
version: 1.0.0
bootable: true
flags: active confirmed
hash: da9638519a797bd85ff32a5f1bebfa4f6064516e733a73abd53f62f8969cdb8f
image=0 slot=1
version: 1.2.0
bootable: true
flags:
hash: 69a57a9898925acf5069734facb9098ba97e122de7c5822ffddaec99a6836548
Split status: N/A (0)
Some updates:
1) MCUBoot seems to do the right thing when swapping, and erasing the trailer:
[00:00:07.445,526] <dbg> mcuboot.boot_erase_trailer_sectors: erasing trailer; fa_id=4
[00:00:07.453,765] <dbg> mcuboot.boot_erase_trailer_sectors: slot=1 trailer_sz=816
[00:00:07.461,761] <dbg> mcuboot.boot_erase_trailer_sectors: erasing sector=12, off=0x000c0000, sz=65536
2) The SPI flash is using a block erase command. This device is seen by Zephyr/MCUBoot as having 64K sectors (which is not itself an issue but weird!). The block erase seen above was sent to the correct address and using the correct size but the data was not erased (this can be verified with nrfjprog --readqspi <file>).
3) The following config changes allow it to work correctly:
CONFIG_SPI_NOR_FLASH_LAYOUT_PAGE_SIZE=4096
CONFIG_SPI_NOR_PAGE_SIZE=256
CONFIG_SPI_NOR_SECTOR_SIZE=4096
@nvlsianpu, We assigned this issue to you in the hope that you can give some guidance on whether this is a real problem or not and what priority should be assigned to this item.
@utzig I was just investigating why it is set to 64K: this seems to be the default value in case CONFIG_SPI_NOR_FLASH_LAYOUT_PAGE_SIZE is not set in the project configuration. Because MCUBoot is still dependent on CONFIG_FLASH_PAGE_LAYOUT the 64K page default is imported into the spi_nor driver, which was recently rewritten to use hardcoded values (https://github.com/zephyrproject-rtos/zephyr/commit/2a590d3fa5e2b15b88df70a17cd6d2d630a2c933). I believe this is a bug in the SPI driver, I will open a PR for a proposed fix.
The SPI NOR driver is a bit of a mess due to its historical development and hard-coded values. Originally the driver used the erase block size (which was defaulted 64 KiBy) as the page size. Relatively recently this was cleaned up so that hardware sizes (such as 256-byte pages, 4 KiBy sectors, and 32 KiBy and/or 64 KiBy erase blocks) are no longer customizable.
However, because existing systems were using a configurable "page size" exposed through the Zephyr flash API the existing defaults were kept. We have to be careful about changing that, because file systems like littlefs become unusable if their geometry changes.
I'm reasonably certain that Zephyr's SPI NOR driver properly handles erase operations. Nothing for the SPI NOR driver should have changed any non-default value for CONFIG_SPI_NOR_FLASH_LAYOUT_PAGE_SIZE, but it may be that a default that had previously been provided through some other Kconfig option now needs to be set in that option as well.
In short: #19844 is not the right fix. Instead it's important to figure out how existing systems might be getting a non-default flash page size, and how to make new applications consistent.
@mfiumara , Can you retest based on the merged #19858?
@mfiumara , Can you retest based on the merged #19858?
^^ above patch have nothing to do witch this issue
Bug signaled here is caused by error in implementation SPI NOR flash driver (erased less than declared).
@nvlsianpu I understood that the bug here was that something was specifying lengths calculated from an incorrect block size, which resulted in less memory being erased than was intended.
If there's a situation where the spi_nor flash driver can fail to erase the length of data it was asked to erase, we need to get information on how to reproduce that, and fix it, quickly.
As I remember it was erasing 256 B instead of expected 4 KB, so everything which was after 256 B from aligned to 4 KB addresses is not erased by flash_erase call.
As best I can tell what happened in https://github.com/zephyrproject-rtos/zephyr/issues/19438#issuecomment-537556345 was:
mcuboot's boot_erase_region was invoked to erase 65536 bytes at offset 0xc0000. This was passed to flash_map's flash_area_erase(), which in turn passes it to flash_erase possibly adjusting the offset to the start of the corresponding area.
That comment also states that changing CONFIG_SPI_NOR_* parameters made it work.
A problem we have is that those Kconfig parameters were removed before 2.0 was released, so any misbehavior that is corrected by changing them may not exist in the current driver.
Specifically to the point: I've confirmed that both flash_erase and flash_area_erase of a 64 KiBy area at offset 0xC0000 is correctly cleared in current master Zephyr. So if there's still concern that the spi_nor driver is incorrect, I need more information.
In #20174 there's a patch that detects a misconfiguration that might have produced this behavior: unless CONFIG_SPI_NOR_FLASH_LAYOUT_PAGE_SIZE is a multiple of 4096 (the underlying serial flash sector size) flash pages can't be erased. A smaller size should have caused an error to be returned, which should have caused an assert to fail, unless the build disabled asserts, or the superseded code that failed had a different behavior.
With the PR the configuration will now be rejected at compile time.
@mfiumara can this be close?
I guess it can be closed: I updated to zephyr 2.1 and the warning seems to be working for me that was introduced in #20174.
Most helpful comment
Some updates:
1) MCUBoot seems to do the right thing when swapping, and erasing the trailer:
2) The SPI flash is using a block erase command. This device is seen by Zephyr/MCUBoot as having 64K sectors (which is not itself an issue but weird!). The block erase seen above was sent to the correct address and using the correct size but the data was not erased (this can be verified with
nrfjprog --readqspi <file>).3) The following config changes allow it to work correctly: