Using the bleprph NimBLE example app configured for secure connections, MITM, bonding and MAX_BONDS = 3, the device immediately disconnects with an iPhone running LightBlue after turning Bluetooth off and on the iPhone. The iPhone maintains the bond when toggling Bluetooth off and on, but the iPhone does change its BT address. This seems to be an iPhone BLE behavior. I believe that the failure might be related to the config store being full. When this happens, there are already three bond entries saved to NVS.
The device should detect that the peer address has changes, call the store status callback to make room, and proceed without disconnecting.
The ESP32 disconnects and the iPhone is unable to establish a connection until the iPhone deletes the bond.
bleprph app with the options described aboveThe bleprph sample app without any changes.
When I launch I see that three bonds are already stored:
D (1305) NIMBLE_NVS: ble_store_config_cccds restored 3 bonds
When I run without turning Bluetooth off on the iPhone, the log shows a successful connection with bonding:
mtu update event; conn_handle=0 cid=4 mtu=256
subscribe event; conn_handle=0 attr_handle=8 reason=1 prevn=0 curn=0 previ=0 curi=1
I (369071) NimBLE_BLE_PRPH: PASSKEY_ACTION_EVENT started
I (369081) NimBLE_BLE_PRPH: Passkey on device's display: 943244
I (369081) NimBLE_BLE_PRPH: Accept or reject the passkey through console in this format -> key Y or key N
I (374771) You entered: key Y
I (374771) NimBLE_BLE_PRPH: ble_sm_inject_io result: 0
encryption change event; status=0 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=54:00:de:56:50:12 peer_id_addr_type=1 peer_id_addr=54:00:de:56:50:12 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1
subscribe event; conn_handle=0 attr_handle=8 reason=2 prevn=0 curn=0 previ=1 curi=0
disconnect; reason=531 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=54:00:de:56:50:12 peer_id_addr_type=1 peer_id_addr=54:00:de:56:50:12 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1
After I turn Bluetooth off and on, a subsequent connection shows the failure. Note that the peer_id_addr has changed:
GAP procedure initiated: advertise; disc_mode=2 adv_channel_map=0 own_addr_type=0 adv_filter_policy=0 adv_itvl_min=0 adv_itvl_max=0
connection established; status=0 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=41:24:f7:d6:bd:10 peer_id_addr_type=1 peer_id_addr=41:24:f7:d6:bd:10 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0
disconnect; reason=531 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=41:24:f7:d6:bd:10 peer_id_addr_type=1 peer_id_addr=41:24:f7:d6:bd:10 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0
I've attached my sdkconfig file to this case.
At this point we are looking for a workaround. Ideally it would be helpful to know in advance at the app level when this failure might trigger so we can hedge against any issues by deleting all bonds or something similar.
Thank you!
Regards,
Brian
Thanks for reporting, we will look into. Thanks.
Hi @bfriedkin , iPhone/Android devices use RPA addresses by default, which change periodically or after Bluetooth on-off. Now once the iPhone changes address, then it's treated as a new device and hence you are observing pairing event for each Bluetooth on-off cycle. However the RPA address can be resolved using IRKs (Identity Resolving keys). For that you may need to add, BLE_SM_PAIR_KEY_DIST_ID in ble_hs_cfg.sm_our_key_dist & ble_hs_cfg.sm_their_key_dist. I am attaching a reference patch based on bleprph example, please try out the patch and let me know if the issue gets resolved : bleprph_IRK_distribute.txt. If you still face the issue please provide complete debug log. You can enable debug log option by make menuconfig ---> Component config ---> Log output ---> Default log verbosity to Debug.
@prasad-alatkar -
Thank you for the quick reply and patch. Unfortunately the patch seems to make the issue trigger right away. Perhaps that provides a clue? Here are my latest steps to reproduce:
That triggers the pairing sequence which succeeds. But then on my iPhone:
At this point LightBlue immediately displays a Disconnect alert. In the LightBlue log the error is "Peer removed pairing information". The attached irk-log.txt file contains the log.
I then reverted the patch and tried the same. Without the patch, I am able to back out of the service view, reconnect with the peripheral and read the characteristic. After the one-time bonding, each subsequent time I reconnect and read the characteristic there is no pairing dialog. This is what I would expect. Unfortunately if I then toggle Bluetooth off and on, I experience the same disconnect failure in LightBlue. The attached no-irk-log.txtfile contains the log.
I've also attached the bleprph app I used in the first trial. Note that I do have LE Secure Connections enabled. In the original bleprph app the key_dist settings were not set when ble_hs_cfg.sm_sc is set to 1.
Looking forward to next steps.
Regards,
Brian
Hi @bfriedkin Can you please confirm if IDF commit: 04e3cf44 is present in your git repo ? If not Can you please pull in latest changes from release/v3.3 branch ? The reason for this is PR!7 has already fixed similar issue. If this doesn't resolve your issue, Can you please try applying this small patch on NimBLE submodule i.e. on $IDF_PATH/components/nimble/nimble and provide me the debug logs ? :
NimBLE_RPA_not_resolved_Debug_logs.txt
In the original bleprph app the key_dist settings were not set when ble_hs_cfg.sm_sc is set to 1.
Thank you for pointing out, I will fix this.
@prasad-alatkar -
Thank you. I will confirm the commit and try the patches today.
Regarding LE Secure Connections, should I still be running with the prior bleprph_IRK_distribute.txt patch that sets the key_dist config fields?
Regards,
Brian
@prasad-alatkar -
I merged commit 04e3cf4 into ESP-IDF v3.3.2. I also applied your NimBLE_RPA_not_resolved_Debug_logs.txt patch. Your patch didn't originally build, but I was able to fix the problems and run the bleprph app. I have the bleprph app configured to use the bleprph_IRK_distribute.txt patch.
The good news is that PR!7 does seem to fix both the disconnect problem and subsequent failures in the ble key store. I was able to toggle Bluetooth off/on on my iPhone and the device reconnected without requiring another passkey entry. I was able to read the encrypted random number characteristic in LightBlue.
The bad news is that with this patch applied, there is a new problem: Failure reading encrypted characteristic. Steps to reproduce:
At this point the iPhone fails to read the characteristic. The IDF log at this point shows the following:
ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=7 data=0x03 0x00 0x04 0x00 0x0a 0x0c 0x00
rxed att command: read req; conn=0 handle=0x000c
looking up peer sec; peer_addr_type=0 peer_addr=0x50 0x83 0xd5 0x1a 0xab 0x84
txed att command: error rsp; conn=0 req_op=10 handle=0x000c error_code=5
host tx hci data; handle=0 length=9
ble_hs_hci_acl_tx(): 0x00 0x00 0x09 0x00 0x05 0x00 0x04 0x00 0x01 0x0a 0x0c 0x00 0x05
Number of Completed Packets: num_handles=1
handle:0 pkts:1
I've attached the complete log showing the successful re-pair and reconnects. The failure above can be found at the end of the log.
Regards,
Brian
In your log just before the error:
encryption change event; status=1035 handle=0 <...> encrypted=0 authenticated=0 bonded=0
status=1035 means: BLE_SM_ERR_DHKEY Indicates to the remote device that the DHKey Check value received doesn鈥檛 match the one calculated by the local device.
Which happened because you deleted the bond from your phone, which would trigger the repeat pairing event on the peripheral and in the example it deletes the bond locally.
After this you need to reconnect to re-pair the devices and it should work. Would be nice if that wasn't necessary but it seems to be at this point.
@h2zero -
Thank you for helping here and good catch on the encryption error. What you describe in terms of the DHKey Check value makes sense and is consistent with the error message I see in the LightBlue log on the phone. Unfortunately I still bump into problems even after trying to reconnect and re-pair from the phone.
I was also able to confirm that the BLE_GAP_EVENT_REPEAT_PAIRING event is triggered before the failure. In the attached log I added a trace in the event handler code:
ESP_LOGI(tag, "Got BLE_GAP_EVENT_REPEAT_PAIRING event");
That trace appears in the attached NimBLE log.
I spent a little more time today trying to further isolate when the failure happens. With the 04e3cf4 commit merged into ESP-IDF 3.3.2, I found that the encryption failure always occurs after you remove the bond on the phone and toggle Bluetooth off and on. I believe this is because the iPhone changes it's BT address.
In today's testing I started fresh, erasing the ESP32 flash and did the following:
The third try always fails with the BLE_SM_ERR_DHKEY error.
I then built the same service on top of the ESP32/Bluedroid and repeated the same steps above. No failure was reproduced. When running on top of Bluedroid, I notice that the pairing dialog displays on the iPhone _before_ I try to read the characteristic, as opposed to _when_ I read the characteristic using NimBLE. I have seen this type of behavior before and assume it is unrelated to the issue at hand.
Both the NimBLE and Bluedroid tests were run on the same ESP32 hardware connected to the same iPhone. I therefore believe the problem lies in the NimBLE implementation. A mobile client should not have to reconnect and/or delete bonds for this scenario to succeed.
I've attached both NimBLE and Bluedroid logs for reference. The Bluedroid log is a little thin... I can increase the verbosity if that would help at all.
@prasad-alatkar - I am hoping this additional information helps to better isolate the failure. Crossing my fingers for a fix...
Regards,
Brian
I can confirm and repro this, but also confirm that after the failure to pair/read, disconnecting and reconnecting will allow pairing to succeed. I'm unsure as to why this happens but it may be something that needs to be addressed upstream, @prasad-alatkar would know more about that though.
When running on top of Bluedroid, I notice that the pairing dialog displays on the iPhone before I try to read the characteristic, as opposed to when I read the characteristic using NimBLE.
This is how it works with NimBLE and is more inline with the BLE core spec as the database is allowed to be inspected but values are what is protected. Just thought I'd share that while I'm posting. If you want to have similar behavior to bluedroid you can call ble_gap_security_initiate() in the connect event.
Hi @bfriedkin Thank you for putting detailed description of the issue. Can you please try the attached patch and let me know if it resolves the issue ? Patch: NimBLE_RPA_fix_identity_change.txt
The patch needs to be applied on NimBLE submodule ($IDF_PATH/components/nimble/nimble). As already discussed above, the BLE_SM_ERR_DHKEY failure comes after deleting the existing bond so as to continue with pairing request received from peer. Since we have resolved identity of peer device (RPA address) , we change peer address to its identity address. Now once we delete the bond because of repeated pairing request, we are still using the "changed" peer address and that is the reason for this issue. The attached patch fixes this bahavior.
@prasad-alatkar -
Thank you for your continued focus here. I tried the additional patch. Unfortunately that leads to a crash when I reconnect the iPhone after deleting the bond. It looks like the crash occurs when the app calls ble_store_util_delete_peer() from the BLE_GAP_EVENT_REPEAT_PAIRING event handler. Here is the call stack:
Replace Identity addr with random addr received at start of the connection
Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x400f5597 PS : 0x00060b30 A0 : 0x800f5818 A1 : 0x3ffc7100
0x400f5597: ble_rpa_replace_id_with_rand_addr at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:204
A2 : 0x3ffc7130 A3 : 0x3ffc2fb1 A4 : 0x3ffc2f9c A5 : 0x3ffc2fa4
A6 : 0x00000006 A7 : 0x00000000 A8 : 0x00000000 A9 : 0x3ffc70e0
A10 : 0x00000000 A11 : 0x00000000 A12 : 0x00000006 A13 : 0x3ffc7107
A14 : 0x00000084 A15 : 0x00000084 SAR : 0x00000018 EXCCAUSE: 0x0000001d
EXCVADDR: 0x00000007 LBEG : 0x4000c28c LEND : 0x4000c296 LCOUNT : 0x00000000
ELF file SHA256: 3441d1da177dd7e8523373318e5b7c370af6a13068134c3a9998dacf354a1e68
Backtrace: 0x400f5597:0x3ffc7100 0x400f5815:0x3ffc7130 0x400f9daa:0x3ffc7160 0x400dc76b:0x3ffc71a0 0x400eeda9:0x3ffc7220 0x400eef33:0x3ffc7240 0x400f02c5:0x3ffc7270 0x400f724c:0x3ffc72d0 0x400f7842:0x3ffc7350 0x400f7c66:0x3ffc7390 0x400f4b37:0x3ffc73d0 0x400f2c39:0x3ffc7400 0x400f2c4b:0x3ffc7420 0x400fa0ee:0x3ffc7440 0x400dc422:0x3ffc7460 0x4008fe22:0x3ffc7480
0x400f5597: ble_rpa_replace_id_with_rand_addr at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:204
0x400f5815: ble_hs_resolv_list_rmv at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:619
0x400f9daa: ble_store_util_delete_peer at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_store_util.c:239
0x400dc76b: bleprph_gap_event at /Users/brianfriedkin/esp32/esp-idf-3.3.2/examples/bluetooth/nimble/bleprph/main/main.c:235
0x400eeda9: ble_gap_call_event_cb at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_gap.c:4893
0x400eef33: ble_gap_call_conn_event_cb at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_gap.c:4893
0x400f02c5: ble_gap_repeat_pairing_event at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_gap.c:4893
0x400f724c: ble_sm_chk_repeat_pairing at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_sm.c:2561
0x400f7842: ble_sm_pair_req_rx at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_sm.c:2561
0x400f7c66: ble_sm_rx at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_sm.c:2561
0x400f4b37: ble_hs_hci_evt_acl_process at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:886
0x400f2c39: ble_hs_process_rx_data_queue at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:109
0x400f2c4b: ble_hs_event_rx_data at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:109
0x400fa0ee: ble_npl_event_run at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/porting/nimble/src/nimble_port.c:132
(inlined by) nimble_port_run at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/porting/nimble/src/nimble_port.c:82
0x400dc422: bleprph_host_task at /Users/brianfriedkin/esp32/esp-idf-3.3.2/examples/bluetooth/nimble/bleprph/main/main.c:337
0x4008fe22: vPortTaskWrapper at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/freertos/port.c:403
I've also attached the full log here as well as my copy of the bleprph app.
Regards,
Brian
@bfriedkin I am attaching modified patch to try, please try it out. I am a little afraid that this patch may not help you overcome the issue, but the crash. Please let me know your observations after trying out this patch, If the issue still persists I may need to try with iPhone instead of Android phone. Patch to try out:
NimBLE_RPA_fix_identity_change_2.txt
@prasad-alatkar -
Today I built with the following:
I ran the following tests:
Tests 1-3 succeeded. Unfortunately test 4 failed. Turning Bluetooth off and back on on the iPhone but keeping the bond to the ESP32 device triggered the immediate disconnect failure.
I've attached the log here. Note that I tried step 4 two times before terminating the device app and copying the log.
Good idea to try with an iPhone. Hoping you can do that and more easily reproduce/resolve the issue. This issue will need to be resolved on both iOS and Android anyway...
Regards,
Brian
I can confirm the issue with the patch on my iPhone as described above.
I made a small addition to the patch that seems to have resolved it.
@bfriedkin could you give this a try?
NimBLE_RPA_fix_identity_change_h2zero.txt
Edit: wrong file 馃槩
@h2zero -
Thank you for your continued support here. I removed the NimBLE_RPA_fix_identity_change_2.txt and applied the NimBLE_RPA_fix_identity_change_h2zero.txt patch. Unfortunately that didn't resolve issue (4) above. The connection fails on the phone. The LightBlue log shows _Error: Peer removed pairing information_.
I've attached the log corresponding to tests (1)-(4) above.
Regards,
Brian
@bfriedkin You're welcome, I'd like to see this fixed as well.
In your last 2 logs I see disconnect; reason=19 which is BLE_HS_ETIMEOUT_HCI : HCI request timed out; controller unresponsive. It seems the controller is not responding after sending it the LE Long Term Key Requested Negative Reply command.
I just realized you're on idf v3.3.2, I tested that patch on v4.0.1.
Thanks for testing and logging 馃槃
Hello -
Just to underscore what @h2zero pointed out, we need a fix for ESP-IDF v3.3.2. We have products built on that SDK version that we cannot currently migrate to v4.x. Furthermore, v3.3.2 is a Long Term Support release that will be supported until March 2022.
@h2zero - I appreciate you independently confirming the failures. Lots of moving parts here with the steps to reproduce and multiple patches.
@prasad-alatkar - Hoping today's discussion helps to get us closer to a solution.
Regards,
Brian
Hi @bfriedkin , I was able to reproduce the issue with iPhone, however I do see the issue is not iPhone specific, iPhone just disconnects continuously whereas Android device deletes the saved bond and re-pairs again. There have been lots of patches and trials here, so I will try to put conclusive solution here:
BLE_SM_PAIR_KEY_DIST_ID enabled in their & our key distribution. This is achievable from bleprph_IRK_distribute.txt patch shared earlier.04e3cf4 should be present in your code repo.Regarding
We have products built on that SDK version that we cannot currently migrate to v4.x. Furthermore, v3.3.2 is a Long Term Support release that will be supported until March 2022.
No need to migrate to v4.x, for that matter v4.x will also need this fix. The fix will be available in next release/v3.3.x so you may need to upgrade to next 3.3.x release or cherry-pick the relevant commits. Please let me know if it resolves your issue.
Regards,
Prasad
@prasad-alatkar -
I followed your steps (1) - (3) above, but I applied the 0001-NimBLE_RPA_fix_identity_change_repairing_failure.txt patch _after_ moving the files from IDF commit 04e3cf4 into the nimble component. I started from a fresh clone of v3.3.2:
git clone -b v3.3.2 --recursive https://github.com/espressif/esp-idf.git
Unfortunately I see the same failure on iPhone. The connection fails on the iPhone when the bond is saved on the iPhone but Bluetooth is toggled off and then back on.
I've attached a diff of my nimble component directory that shows the changed files. I am running the bleprph_IRK_distribute.txt patch in the bleprph app. I am running these tests on a 2020 iPhone SE running iOS 13.5.1. Please let me know if I missed any steps. In these tests I did not erase the flash before starting.
I've attached my copy of the bleprph example app and the log from this set of tests. Towards the end of the log I see a few potential failures:
LE LTK Req. handle=0 rand=00 encdiv=0
looking up our sec; peer_addr_type=1 peer_addr=0x9c 0x40 0xb7 0xe9 0x84 0x42 ediv=0x00 rand=0x0
ble_hs_hci_cmd_send: ogf=0x08 ocf=0x001b len=2
0x1b 0x20 0x02 0x00 0x00
Command complete: cmd_pkts=4 ogf=0x8 ocf=0x1b status=0 handle=0
disconnect; reason=19 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=42:84:e9:b7:40:9c peer_id_addr_type=1 peer_id_addr=42:84:e9:b7:40:9c conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0
error setting advertisement data; rc=22
Resetting state; reason=19
looking up peer sec;
failed to configure restored IRK
looking up peer sec;
looking up peer sec;
Device Address: 24:6f:28:19:d2:52
ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0007 len=0
0x07 0x20 0x00
Command complete: cmd_pkts=5 ogf=0x8 ocf=0x7 status=0
Regards,
Brian
Hi @bfriedkin , I have confirmed the issue gets resolved with the patch for release/v3.3.2 and release/v3.3. I would recommend you to follow below mentioned steps to be on the same page:
git checkout v3.3.2git submodule update --initbleprph_IRK_distribute.txt patch i.e. git apply bleprph_IRK_distribute.patchgit checkout nimble-1.1.0-idf-v3.3git pullgit apply 0001-NimBLE_RPA_fix_identity_change_repairing_failure.patch, this will bring nimble submodule pointer to a9239c6cmake -j8 erase_flash flash monitor@prasad-alatkar -
I have mostly good news. :-) Following you steps above I was able to successfully repeat the test steps 1 to 4 on my iPhone without any failures! Apparently I previously didn't apply the nimble-1.1.0-idf-v3.3 patch correctly. Thank you for your patience. The detailed steps really helped.
Unfortunately when I tested our BLE peripheral application on Android (not the bleprph example app), I am getting a reproducible crash related to the re-pairing operation. Our application has more services and characteristics, though that doesn't seem particularly relevant here. Steps to reproduce:
At this point the ESP32 app asserts and drops into GDB. Here is the relevant part of the log output:
Replace Identity addr with random addr received at start of the connection
assertion "ble_hs_locked_by_cur_task()" failed: file "/Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_conn.c", line 303, function: ble_hs_conn_find_by_addr
[android-remove-bond-crash.log](https://github.com/espressif/esp-idf/files/4907801/android-remove-bond-crash.log)
abort() was called at PC 0x400d355b on core 0
0x400d355b: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)
ELF file SHA256: 3e39da5ef1e1ec6667cac80ad346f7261f3f60bc1febcbbb6cfc0720d738f213
Backtrace: 0x4008ca24:0x3ffd0610 0x4008cc4d:0x3ffd0630 0x400d355b:0x3ffd0650 0x40114bba:0x3ffd0680 0x40111fe7:0x3ffd06a0 0x40112271:0x3ffd06d0 0x40105a7a:0x3ffd0700 0x400e2b0d:0x3ffd0740 0x4010d468:0x3ffd0790 0x4010d5f3:0x3ffd07b0 0x4010eae9:0x3ffd07e0 0x401067fc:0x3ffd0840 0x40106f16:0x3ffd08c0 0x401073ea:0x3ffd0900 0x40109248:0x3ffd0940 0x40108791:0x3ffd0970 0x401087a3:0x3ffd0990 0x40113bb6:0x3ffd09b0 0x400e4cb3:0x3ffd09d0 0x4008cf96:0x3ffd09f0
0x4008ca24: invoke_abort at /Users/brianfriedkin/esp32/esp-idf/components/esp32/panic.c:715
0x4008cc4d: abort at /Users/brianfriedkin/esp32/esp-idf/components/esp32/panic.c:715
0x400d355b: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)
0x40114bba: ble_hs_conn_find_by_addr at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_conn.c:93
0x40111fe7: ble_rpa_replace_id_with_rand_addr at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:200
0x40112271: ble_hs_resolv_list_rmv at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:617
0x40105a7a: ble_store_util_delete_peer at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_store_util.c:239
0x400e2b0d: nimble_gap_event at /Users/brianfriedkin/Projects/moddable/modules/network/ble/nimble/modBLEServer.c:739
0x4010d468: ble_gap_call_event_cb at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_gap.c:4632
0x4010d5f3: ble_gap_call_conn_event_cb at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_gap.c:4632
0x4010eae9: ble_gap_repeat_pairing_event at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_gap.c:4632
0x401067fc: ble_sm_chk_repeat_pairing at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_sm.c:196
0x40106f16: ble_sm_pair_req_rx at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_sm.c:1861
0x401073ea: ble_sm_rx at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_sm.c:2648
0x40109248: ble_hs_hci_evt_acl_process at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:883
0x40108791: ble_hs_process_rx_data_queue at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:109
0x401087a3: ble_hs_event_rx_data at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:109
Our repeat pairing event handler case looks like this:
case BLE_GAP_EVENT_REPEAT_PAIRING:
// delete old bond and accept new link
if (0 == ble_gap_conn_find(event->repeat_pairing.conn_handle, &desc))
ble_store_util_delete_peer(&desc.peer_id_addr);
return BLE_GAP_REPEAT_PAIRING_RETRY;
I've attached the complete log here. Hoping this is an easy fix. Perhaps there is a missing ble_hs_lock() and ble_hs_unlock() set of calls. Assuming that is the case, I would appreciate having a new patch to replace the 0001-NimBLE_RPA_fix_identity_change_repairing_failure.patch.
Regards,
Brian
@bfriedkin yes, you are right regarding ble_hs_lock. I am attaching reworked patch which should resolve the issue.
0002-NimBLE_RPA_fix_identity_change_repairing_failure.txt
@prasad-alatkar -
The 0002-NimBLE_RPA_fix_identity_change_repairing_failure.txt patch along with the nimble-1.1.0-idf-v3.3 "patch" seems to resolve the issue for us. I tested on both iOS and Android using the BLE peripheral setup from one of our products. Nice work. Glad we were able to get this handled together.
Because we have products that require this fix, I'd appreciate any information you can provide related to how and when these patches will be publicly available. Will these fixes be back-ported to ESP-IDF v3.3.2 or do we need to wait for the next v3.3.x release? Can I be notified in either case?
Regards,
Brian
Hi @bfriedkin , glad that the issue is resolved. So the fix will be first merged into master branch and subsequently it will be backported till release/v3.3 branch. The fix will be made available in the next v3.3.x release, I don't think it will be backported to release/v3.3.2(cc: @Alvin1Zhang ). I will notify you here once the fix is merged. Just FYI, the details of esp-idf releases are shared here: https://github.com/espressif/esp-idf/releases
@prasad-alatkar -
Unfortunately we've bumped into another issue that needs to be resolved. It seems possibly related to these bonding fixes, so I'm reporting the issue here.
The problem is that when the CCCD store (NVS on ESP32) fills up, the functions that purges the store, starting from the Storage Status callback, eventually fails. This is triggered by the client subscribing for notifications, when the client/server are bonded and the CCCD store is full. It can be a little challenging to reproduce, but here are the basic steps:
bleprph_cccd appI am using LightBlue as the client. The bleprph_cccd app is a lightly modified version of the bleprph app that simply adds the notify property to the static characteristic. The app sdkconfig specifies CONFIG_NIMBLE_MAX_CCCDS=3 to help trigger the problem more quickly. Basically, to trigger the failure, you want to fill the CCCD store by having the phone change its address and/or creating new bonds. After that happens the problem triggers fairly quickly. The bug leads to a failure in our product app when the client isn't able to successfully register for notifications.
I've attached a log that shows the problem. To help pinpoint the failure, I've also added log output to a few of the NimBLE modules, which I've also attached here. Recall I am running ESP-IDF v3.3.2 with the patches provided.
Looking at the log, the first failure occurs here at line 901:
subscribe event; conn_handle=0 attr_handle=8 reason=1 prevn=0 curn=0 previ=0 curi=1
error persisting cccd; too many entries (3)
In status cb, BLE_STORE_OBJ_TYPE_CCCD
in ble_gap_unpair_oldest_except
In ble_store_util_bonded_peers, max_peers = 3
looking up our sec;
looking up our sec;
too many peers, i = 1, num_peers = 1
In status cb, ble_gap_unpair_oldest_except result: 6
We are using the stock ble_store_util_status_rr() callback function. Notice the error 6 returned from the ble_gap_unpair_oldest_except() function.
At line 1447 we see a case that succeeds:
subscribe event; conn_handle=0 attr_handle=8 reason=1 prevn=0 curn=0 previ=0 curi=1
error persisting cccd; too many entries (3)
In status cb, BLE_STORE_OBJ_TYPE_CCCD
in ble_gap_unpair_oldest_except
In ble_store_util_bonded_peers, max_peers = 3
looking up our sec;
looking up our sec;
ble_gap_unpair
...
D (203136) NIMBLE_NVS: Deleting CCCD, nvs idx = 1
...
In status cb, ble_gap_unpair_oldest_except result: 0
...
D (203316) NIMBLE_NVS: Persisting CCCD value in NVS...
Finally at line 1588 there is another failure, but with a different trigger:
subscribe event; conn_handle=0 attr_handle=14 reason=1 prevn=0 curn=1 previ=0 curi=0
error persisting cccd; too many entries (3)
In status cb, BLE_STORE_OBJ_TYPE_CCCD
in ble_gap_unpair_oldest_except
In ble_store_util_bonded_peers, max_peers = 3
looking up our sec;
num_peers == 0
In status cb, ble_gap_unpair_oldest_except result: 5
I hope these notes, log and test app help you more quickly isolate and resolve the problem.
Regards,
Brian
Hi @bfriedkin, unfortunately I could not reproduce the issue. Either I am missing something here or we are not at same NimBLE pointer. Can you please confirm if your NimBLE submodule pointer is updated to nimble-1.1.0-idf-v3.3. Below are steps to follow to bring NimBLE submodule to latest:
components/nimble/nimblegit checkout nimble-1.1.0-idf-v3.3git pullNow nimble submodule pointer should point to a9239c6c or you can simply try to reproduce the issue on release/v3.3 (with patch 0002-NimBLE_RPA_fix_identity_change_repairing_failure.txt applied). Basically I want to make sure that the NimBLE submodule has these commits: 0d46b272, ca299e70 present. If you are already on latest NimBLE submodule pointer, Can you please provide more details on sequence of steps to follow to reproduce the issue ?
@prasad-alatkar -
Sorry to hear you are having trouble reproducing the issue.
Git seems to think that I am pointing at the correct submodule:
Brians-MacBook-Air:esp-idf brianfriedkin$ cd components/nimble/nimble/
Brians-MacBook-Air:nimble brianfriedkin$ git log
commit a9239c6ca5e8f6dc72792ac716a8fe3ffad08b1e (HEAD -> nimble-1.1.0-idf-v3.3, origin/nimble-1.1.0-idf-v3.3)
Merge: 5364a96f 69bdb87c
Author: Hrishikesh Dhayagude <[email protected]>
Date: Wed May 20 17:43:25 2020 +0800
In that same log, I see the two commits you reference above:
commit 0d46b2724e80591af7d09fcbff8230cff6109c64
Author: Prasad Alatkar <[email protected]>
Date: Tue Apr 28 23:40:40 2020 +0530
NimBLE : Change peer_addr type to PUBLIC in case of Host based privacy
- When Identity of peer is resolved and peer address is RPA then NimBLE stack
changes the peer_addr type to PUBLIC_ID(0x02); in Host based privacy it
should not be done as it can result in mismatch while comparing stored
addresses especially in case of CCCD storage.
commit ca299e7065ba9d11f5a9aed41b0adf3a9c65daf9
Author: Prasad Alatkar <[email protected]>
Date: Tue Apr 28 23:28:38 2020 +0530
NimBLE store: Fix bug in NVS while retrieving matching index in get_nvs_db_attribute
I am running with the 0002-NimBLE_RPA_fix_identity_change_repairing_failure.txt patch applied in addition to the bleprph_IRK_distribute.txt patch applied to the previously attached bleprph_cccd app in the esp-cccd.zip archive. Basically I am running the same set of patches you recommended leading up to my comment on 16 July. Again I am running the patches against ESP-IDF v3.3.2.
While problem is difficult to reproduce, it does occur. There are two different failures, which I've already documented above. Filling up the CCCD store along with the iPhone changing its address (RPA) seem to trigger the failure. I spent some time today trying to find a simple set of steps. While I wasn't able to find a simple way to reproduce the failure, I did reproduce the failure by doing the following in the previously attached bleprph_cccd app. I am using LightBlue as the mobile client on iOS:
bleprph_cccd app: make erase_flash flash monitorbrian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.5c3a659e-897e-45e1-b016-007107c96df7 and Listen for NotificationsForget brian-bleprphbrian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications5c3a659e-897e-45e1-b016-007107c96df7 and Listen for NotificationsForget brian-bleprph and toggle Bluetooth off/onbrian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notificationsbrian-bleprph and browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications(Note at this point two of the three CCCD slots are filled)
Forget brian-bleprphbrian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.5c3a659e-897e-45e1-b016-007107c96df7 and Listen for NotificationsForget brian-bleprph and toggle Bluetooth off/onbrian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.5c3a659e-897e-45e1-b016-007107c96df7 and Listen for NotificationsForget brian-bleprphbrian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications5c3a659e-897e-45e1-b016-007107c96df6brian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.5c3a659e-897e-45e1-b016-007107c96df7 and Listen for NotificationsAt this point, at line 3649 in the attached cccd-failure2.log file, you can see one of the failures:
subscribe event; conn_handle=0 attr_handle=14 reason=1 prevn=0 curn=1 previ=0 curi=0
error persisting cccd; too many entries (3)
In status cb, BLE_STORE_OBJ_TYPE_CCCD
in ble_gap_unpair_oldest_except
In ble_store_util_bonded_peers, max_peers = 3
looking up our sec;
looking up our sec;
too many peers, i = 1, num_peers = 1
In status cb, ble_gap_unpair_oldest_except result: 6
txed att command: error rsp; conn=0 req_op=18 handle=0x000f error_code=6
host tx hci data; handle=0 length=9
ble_hs_hci_acl_tx(): 0x00 0x00 0x09 0x00 0x05 0x00 0x04 0x00 0x01 0x12 0x0f 0x00 0x06
Number of Completed Packets: num_handles=1
handle:0 pkts:1
Also at this point in LightBlue, the Listen for Notifications text does not change to Stop Listening, presumably due to the error response.
This is one of the two errors I pointed out previously. The logs don't lie. :-) Certainly the problem is difficult to reproduce. I'm hoping given the additional details you can find a quicker way to reproduce the failure.
If you are still having trouble, I would recommend taking a look at the prior logs and modified sources I provided in my previous post. You can easily see where things are failing in the code.
Regards,
Brian
I haven't tested to confirm this but I was looking through the log and I can see the client takes up all 3 cccd storage slots and they are not all deleted with unpairing.
It happens like this:
ble_gap_unpair_oldest_except fails with result: 6, because the only bond is the current peer > disconnectble_gap_unpair_oldest_except succeeds (should not happen, maybe called with ota address?) > store cccd (full again) >ble_gap_unpair_oldest_except fails with result: 5 because the bond was incorrectly removed.Hi @bfriedkin , extremely sorry to make you go through pain of listing all possible steps, couldn't help as I was not able to reproduce the issue despite trying multiple sequence of steps. After following the steps you mentioned, I was able to reproduce the issue with "iPhone + LightBlue" combination. I mostly have good news here, please try attached patch
0003-NimBLE_RPA_fix_identity_change_repairing_failure.txt , to be applied on NimBLE submodule. This patch also includes changes from previous patch, so applying this patch alone would do. This issue occurs when connection address (ble_hs_conn) is not updated with the changed peer address in "LE connection complete event" along with RPA stuff. Because of this changed/wrong "peer address" there is appearance of extra CCCD in NVS. Most probably the issue should be resolved with the patch, but if you still face the issue then please provide me log with this additional patch :
CCCD_failure_RPA_log.txt
@prasad-alatkar -
Glad to hear that you were ultimately able to reproduce a problem! Thank you for the patch and instructions.
Based on your analysis thus far, do you anticipate the latest fixes address both of the error cases @h2zero and I outlined above?
Unfortunately I might not be able to give this new patch a try until Thursday (U.S. time). Apologies for the delay. I'll let you know then how our testing goes. If things still fail I'll get you a log built against the CCCD_failure_RPA_log.txt patch.
Regards,
Brian
Hi @bfriedkin, @h2zero has not pointed out any other/different issue, rather has tried to summarized the issue. Yes, He is right regarding this comment: "_should not happen, maybe called with ota address_". This is exactly what I try to fix with the latest patch. So I believe the latest fix addresses the issue, however I will wait for confirmation from your end for the same.
@prasad-alatkar btw - if it helps, I can confirm that 0003 patch worked nicely to fix RPA android clients connecting to my open-source project.
@prasad-alatkar -
I am having trouble applying the patch on top of the updated NimBLE submodule. I am applying these changes to ESP IDF v3.3.2 cloned today:
Brians-MacBook-Air:esp32 brianfriedkin$ git clone -b v3.3.2 --recursive https://github.com/espressif/esp-idf.git
Brians-MacBook-Air:esp32 brianfriedkin$ cd esp-idf
Brians-MacBook-Air:esp-idf brianfriedkin$ cd components/nimble/nimble/
Brians-MacBook-Air:nimble brianfriedkin$ git checkout nimble-1.1.0-idf-v3.3
Previous HEAD position was ecb1e9e5... NimBLE: Add Host based privacy (RPA) support (v3.3)
Branch nimble-1.1.0-idf-v3.3 set up to track remote branch nimble-1.1.0-idf-v3.3 from origin.
Switched to a new branch 'nimble-1.1.0-idf-v3.3'
Brians-MacBook-Air:nimble brianfriedkin$ git pull
Already up-to-date.
Brians-MacBook-Air:nimble brianfriedkin$ git apply 0003-NimBLE_RPA_fix_identity_change_repairing_failure.txt
error: patch failed: nimble/host/src/ble_hs_resolv.c:178
error: nimble/host/src/ble_hs_resolv.c: patch does not apply
error: patch failed: nimble/host/src/ble_hs_resolv_priv.h:53
error: nimble/host/src/ble_hs_resolv_priv.h: patch does not apply
What am I missing?
Thanks,
Brian
@geeksville Thank you for confirming the fix independently. @bfriedkin, looks like you pulled in latest NimBLE submodule, so fix is already present :) Can you please confirm if NimBLE submodule pointer is this: e7f0c530 ? If that is the case, you may not require to apply the patch.
@prasad-alatkar -
This is what I see before updating to nimble-1.1.0-idf-v3.3 :
Brians-MacBook-Air:nimble brianfriedkin$ git log
commit ecb1e9e5e3efb830d0b04dc3382be7b1012fdeaa (HEAD)
Author: Prasad Alatkar <[email protected]>
Date: Thu Mar 5 20:18:40 2020 +0800
NimBLE: Add Host based privacy (RPA) support (v3.3)
commit c876cbcbc2b1fcae257d8f921083eebf0964eef5
Author: Prasad Alatkar <[email protected]>
Date: Mon Mar 2 16:29:45 2020 +0530
nimble/host: Fix check for valid static random address
commit 4229791707c707a03309e33b0a54d1f1ac980e84
Author: Prasad Alatkar <[email protected]>
Date: Tue Feb 4 12:10:58 2020 +0530
And then after updating to nimble-1.1.0-idf-v3.3 :
Brians-MacBook-Air:nimble brianfriedkin$ git log
commit e7f0c530b6816e156292163f7d07b7375292a1af (HEAD -> nimble-1.1.0-idf-v3.3, origin/nimble-1.1.0-idf-v3.3)
Merge: a9239c6c 4f3e5e92
Author: Hrishikesh Dhayagude <[email protected]>
Date: Thu Jul 23 13:28:05 2020 +0800
Merge branch 'nimble/repeated_pairing_rpa_fix_v3.3' into 'nimble-1.1.0-idf-v3.3'
NimBLE: Add NRPA, Fix repeated pairing request failure in RPA feature and minor fix in `os_mempool.c` (Backport nimble-1.1.0-idf-v3.3)
See merge request espressif/esp-nimble!51
commit 4f3e5e923103d04c7849fec97f0a075b79f88dfc
Author: Prasad Alatkar <[email protected]>
Date: Tue Jul 14 12:07:10 2020 +0530
nimble/porting: Explicitly NULL terminate potential non NULL terminated string buffer
os_mempool.c: Using strncpy, buffer will not be NULL terminated if size of
source string is same as buffer size. Explicitly added NULL character at the
end.
commit 549dd34723ef7bf6764dc9d59953310e1f19a2c5
Author: Prasad Alatkar <[email protected]>
Date: Tue Jul 14 10:55:17 2020 +0530
NimBLE: Fix repeated pairing request failure in RPA feature
Fixes https://github.com/espressif/esp-idf/issues/5530
Do I need to add anything else? Please let me know. From what I can tell it looks like this is all I need.
Thanks,
Brian
Yes @bfriedkin , you don't need to do anything else. The fix is already present.
@prasad-alatkar -
I did some testing against the latest fix using LightBlue on iOS, Android and macOS. No problems found. :-) Thank you!
We didn't get a chance to test the products today. Hoping to do that tomorrow. Sorry for the delay.
I do have a few questions regarding the fixes. The app updates will be rolled out as a product update built against ESP-IDF v3.3.2 and the fixes. The products were previously built against Bluedroid. The ESP32 side will likely have bonds stored by Bluedroid and similarly the mobile client apps will have the associated bonds. Will the ESP32 device ignore the Bluedroid bond because the NimBLE store is different? Will the bond stored by the mobile client(s) be effectively ignored and replaced by a new bond after re-pairing completes?
Thanks,
Brian
Hi @bfriedkin
Will the ESP32 device ignore the Bluedroid bond because the NimBLE store is different?
Yes.
Will the bond stored by the mobile client(s) be effectively ignored and replaced by a new bond after re-pairing completes?
This totally depends on design of mobile client, I have observed few Android phones deleting the old bond if the peer has deleted the bond. However in case of iPhone it may not be the case. So to answer your question, the mobile client may require to delete the previous bond and reinitialize pairing again. However this will be only one time activity.
Most helpful comment
@prasad-alatkar btw - if it helps, I can confirm that 0003 patch worked nicely to fix RPA android clients connecting to my open-source project.