Now I am trying to perform an OTA update, I initially created a mix between HTTPS client example and OTA examples provided, and it works as expected.
When I merge the OTA part to the project and try to perform the exact same OTA update using the same piece of code, credentials, endpoints, ... the board gives me the error:
esp_image: Checksum failed. Calculated 0xb3 read 0xcd

either exactly after the download (as in the screenshot) or at boot time after the restart, the checksum fails and the partition is not bootable.
the calculated value for checksum is different every time, the read one is constant.
The following is the custom partition table we use:

I don't think that the problem is related to partition table because using the same partition table with the mix of examples it also works. Maybe it's some flags I should raise from menuconfig, or something else?
How is ota1 only 200k?
How is ota1 only 200k?
For debugging now I am using a "hello world" bin to save time in downloading the file, it's only 150k or so.
Hi @omarcartera!
Could you check your bootloader.bin size and CONFIG_PARTITION_TABLE_OFFSET?
It has to be so: (bootloader.bin_size + 0x1000) < CONFIG_PARTITION_TABLE_OFFSET.
Please attach your sdkconfig file, and the actual size of app.bin.
One more suggestion to try make erase_flash.
Could you check your bootloader.bin size and
CONFIG_PARTITION_TABLE_OFFSET?
bootloader.bin size is: 25,328 bytes
CONFIG_PARTITION_TABLE_OFFSET is set to 0x8000 (default)
Please attach your sdkconfig file, and the actual size of app.bin.
The base bin file is: 2,758,624 bytes
The OTA bin file is: 149,824 bytes
One more suggestion to try make erase_flash.
it doesn't help unfortunately.
sdkonfig of the project (gives the error):
issue_sdkconfig.txt
sdkonfig of the examples mix (working):
working_sdkconfig.txt
@omarcartera Could you try to run the cmd: ~/esp/esp-idf/components/esptool_py/esptool/esptool.py --chip esp32 image_info build/app_name.bin and attach the log.
I am sorry this one was for the already installed app, not the OTA one. The next comment is right
user@user:~$ ~/esp/esp-idf/components/esptool_py/esptool/esptool.py --chip esp32 image_info ~/project/build/gateway.bin
esptool.py v2.8-dev
Image version: 1
Entry point: 400813f0
6 segments
Segment 1: len 0x8c7a0 load 0x3f400020 file_offs 0x00000018
Segment 2: len 0x03848 load 0x3ffbdb60 file_offs 0x0008c7c0
Segment 3: len 0x1c9b34 load 0x400d0018 file_offs 0x00090010
Segment 4: len 0x00fa4 load 0x3ffc13a8 file_offs 0x00259b4c
Segment 5: len 0x00400 load 0x40080000 file_offs 0x0025aaf8
Segment 6: len 0x199d8 load 0x40080400 file_offs 0x0025af00
Checksum: dd (valid)
Validation Hash: 7f829c0c4cbb16fe6b9ee6bd8033609c737fa29b453f012220e16a112ecaeeae (valid)
The checksum value from this log Checksum: dd (valid) is the same as from the application log esp_image: Checksum failed ..... right?

If you mean this one, it reads 0xcd and calculates another value != 0xdd
Did you run the cmd esptool.py --chip esp32 image_info forOTA bin file: 149.824 bytes? I expected that 0xdd would match one of the values from the application log.
It turns out that the calculated and read values do not match the real one. Strangely.
Did you run the cmd
esptool.py --chip esp32 image_infoforOTA bin file: 149.824 bytes? I
omarcartera@omarcartera:~$ esptool --chip esp32 image_info ~/esp-idf/examples/get-started/hello_world/build/hello-world.bin
esptool.py v2.1
Image version: 1
Entry point: 40081054
6 segments
Segment 1: len 0x06640 load 0x3f400020 file_offs 0x00000018
Segment 2: len 0x0209c load 0x3ffb0000 file_offs 0x00006660
Segment 3: len 0x00400 load 0x40080000 file_offs 0x00008704
Segment 4: len 0x074fc load 0x40080400 file_offs 0x00008b0c
Segment 5: len 0x12a88 load 0x400d0018 file_offs 0x00010010
Segment 6: len 0x01e74 load 0x400878fc file_offs 0x00022aa0
Checksum: cd (valid)
Validation Hash: 3803acbdb04d7870fe210e98a3b4147de56c77be4dc08646a936d09b9a1a5172 (valid)
checksum for the 149.824 binary is 0xcd (as the one read in the application) .. and different to the calculated.
in the first comment I calculated for the installed app, not for the OTA one.
but now as the comment before this one, the checksum matches the read one in the app logs.
小orrect me if I'm wrong. You do so:
Checksum failed when try to download the small app.ota_data was not updatedota_data has link only to the big app).Could you clarify this:
either exactly after the download (as in the screenshot) or at boot time after the restart, the checksum fails and the partition is not bootable.
You wrote that ...or at boot time after the restart, the checksum fails. It shouldn't have been like that. If a small application has the Checksum error, then the original OTA algorithm should reject this application and does not make this application bootable.
yes exactly, but between steps 1 and 2 I run a script to flash the first two partitions (plantconfnvs and settingsnvs):
!/usr/bin/env sh
HOMEKIT_PATH=${HOMEKIT_PATH-"$HOME/esp-homekit-sdk"}
IDF_PATH=${IDF_PATH-"$HOMEKIT_PATH/esp-idf"}
ESPPORT=${ESPPORT-"/dev/ttyUSB0"}
chdir to project root (this script should reside in 'NVSsetup')
cd "$(dirname $0)/.."
python $IDF_PATH/components/nvs_flash/nvs_partition_generator/nvs_partition_gen.py generate --version 2 NVSsetup/plantconfig_nvs_partition.csv build/plantconfig.bin 32768
python $IDF_PATH/components/nvs_flash/nvs_partition_generator/nvs_partition_gen.py generate --version 2 NVSsetup/settings_nvs_partition.csv build/settings.bin 32768
python $IDF_PATH/components/esptool_py/esptool/esptool.py --port $ESPPORT write_flash 0x9000 build/plantconfig.bin
python $IDF_PATH/components/esptool_py/esptool/esptool.py --port $ESPPORT write_flash 0x11000 build/settings.bin
I can't reproduce this now (the checksum failed at boot time) but yesterday the checksum failed was either appearing after downloading, or it succeeds in this part, restart the esp32, at boot time it tries to boot from ota_1 but it fails due to checksum failed problem.
But we still have the issue as shown in the screenshot above
If your application needs to store data, please add a custom partition type in the range 0x40-0xFE.
I see this on espressif website, but I can't fully understand where to use custom types?
I will investigate this problem. This will help if you provide an example. My opinion is either memory corruption, or something else. I am more concerned about the moment when this error occurs during a reboot in the bootloader. This may mean that the bug is somewhere inside image_load. I will dig in that direction.
I see this on espressif website, but I can't fully understand where to use custom types?
It means you can create the own type like:
custom_partition, 0x40, [any_SybType0-255], , 32K
...
ota_data, data, ota, , 8K
ota_0, app, ota_0, , 2700K
ota_1, app, ota_1, , 200K
And get that partition as
const esp_partition_t *data_partition = esp_partition_find_first(0x40, [any_SybType0-255], NULL);
Ok I found a piece of yesterday's log that had the problem I am talking about, please take a look over it.
Here I was trying with a different OTA image, so you will find the correct checksum value = 0x5e.

esptool.py --port COM4 read_flash 0x2e0000 0x32000 small_app.bin. FYI make partition_table will show offsets of the partions. ... ota_1,app,ota_1,0x2e0000,200K,plantconfnvs, data, nvs, 0x9000, 32K
settingsnvs, data, nvs, 0x11000, 32K
nvs, data, nvs, , 32K
phy_init, data, phy, , 4K
ota_data, data, ota, , 8K
ota_1, app, ota_1, , 2700K
ota_0, app, ota_0, , 200K
factory_nvs, data, nvs, , 16K
nvs_keys, data, nvs_keys, , 4K
- If you at once flash the small application to ota_0 through the serial. Will the small application work properly without a Checksum error at boot time?
yes it works properly. and even when I use the mix between http/ota examples app in ota_0 and flash the small app over the air into ota_1, it also works properly.
- Could you read back the small app through the serial and compare it with original file.
esptool.py --port COM4 read_flash 0x2e0000 0x32000 small_app.bin. FYImake partition_tablewill show offsets of the partions.... ota_1,app,ota_1,0x2e0000,200K,

- Another thought, could you exchange the names ota_0 and ota_1 among themselves. Just redact the partition table file and repeat your usual flash way.
... ... ota_1, app, ota_1, , 200K ota_0, app, ota_0, , 2700K
This also gives the exact same error, Checksum failed directly after downloading the small_app.bin over the air.
Thanks. It is bad that we cannot read back a small application. I just wanted to make sure that the image on ota_1 was correctly written to the flash without distortion, maybe something spoiled the buffer at some point in the OTA update. Need to find a way to check it....
Upgrade esptool and also don't overwrite the original bin
Upgrade esptool and also don't overwrite the original bin
okay, it was my fault, sorry.
I upgraded esptool and now it works, here are the original bin:
hello-world.txt
and the retrieved bin:
test.txt
I have compared original_app.bin and flash_app.bin.
The original file has len = 149824 bytes.
The written file was written only up to 62840 bytes. And in the 36534 address have 2 different bytes.
Something breaks the OTA update.




These are another three trials, it's random, it writes on every address (no 0xFF) but sometimes it puts 0x00 instead of the right value, in random locations.
EDIT: some addresses are repeated over multiple trials
Could you try to simplify your code to reduce the circle of a potential problem with it? From which step we get the corrupted image.
Could you try to simplify your code to reduce the circle of a potential problem with it? From which step we get the corrupted image.
- When receiving part of an image to buffer OTA.
- Or when trying to write it to flash. (To exclude this you can add reading after)
I printed the buffer in esp_https_ota_perform exactly after esp_http_client_read and the data seems to be complete (compared to a working example).
- Try to disable the HTTPS part of code.
where exactly? because I need HTTPS for the whole process of OTA.
- Could you try something from https://docs.espressif.com/projects/esp-idf/en/latest/api-reference/system/heap_debug.html to catch memory corruption?
I used heap_caps_check_integrity_all(true); and it doesn't print any errors.
- Do you use the own hi-interrupts which written on asm?
no
@KonstantinKondrashov any updates please??
I am still stuck on this point.
Hi @omarcartera!
Please try to set these options.
CONFIG_ESP32_REV_MIN_0=y
CONFIG_ESP32_REV_MIN=0
CONFIG_ESP32_DPORT_WORKAROUND=y
I don't know how to set CONFIG_ESP32_DPORT_WORKAROUND=y
DPORT_WORKAROUND config is enabled automatically if dual core FreeRTOS is enabled. we use one core in our setup:
config ESP32_DPORT_WORKAROUND bool default "y" if !FREERTOS_UNICORE && ESP32_REV_MIN < 2
Yes, I forgot that you use only one core. Sorry. My suggestion is wrong.
Have you resolved the issue that the new firmware was not uploaded whole (only 62840 from 149824 bytes of the firmware was flashed)? https://github.com/espressif/esp-idf/issues/4120#issuecomment-535842262. Or is it still so?
What tasks also work during the OTA update task?
Have you resolved the issue that the new firmware was not uploaded whole (only 62840 from 149824 bytes of the firmware was flashed)? #4120 (comment). Or is it still so?
Yes, like in this comment It now downloads and flashes all the bytes except for some random 15~20 bytes that are put 0x00.
What tasks also work during the OTA update task?
BLE
BLE Mesh
MQTT
homekit
any time SPI flash is written then the data will be read back and verified.
- Could you try to stop these tasks in order to get only the OTA task work?
Done.
- Try to update your branch on the master or latest 4.0.
It's already on V4.0, and the same IDF version is working through the other test project (that only has the OTA task)
- Try to set this SPI_FLASH_VERIFY_WRITE option.
any time SPI flash is written then the data will be read back and verified.
I set it but nothing new is logged
That spi guard line appears only when the OTA fails this way, but in the working version it doesn't appear does it have something to do with our issue?


No you can ignore that line it is just decoding the address from the line above it
I will propose a theory that if the ota buffer is in psram and the flash write has to copy it to internal ram before writing and the psram bug is triggered it could corrupt the buffer but still pass the flash write verify. Maybe try forcing the ota buffer to internal ram?
I updated the value of CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=16384 instead of 128
now it works properly and I don't see the Checksum Failed anymore!
Thanks @KonstantinKondrashov @negativekelvin for your help.
Ok good. I'm just going to link in the psram issue https://github.com/espressif/esp-idf/issues/2892