Zephyr: CONFIG_BT_SETTINGS writes bt/hash to storage twice

Created on 11 Feb 2020  路  8Comments  路  Source: zephyrproject-rtos/zephyr

Describe the bug
When CONFIG_BT_SETTINGS is enabled and the user calls settings_load(), initially bt/hash is stored as 0x0000. Later, the OS calles bt_hash_gen() and re-stores the hash (this time calculated). In combination with a flash back-end this causes excessive flash wear.

To Reproduce
Steps to reproduce the behavior:

  1. enable CONFIG_BT_SETTINGS in project, call settings_load in bt_ready callback (from `bt_init)
  2. west build -b nrf52_pca10040
  3. west flash
  4. Reset the device a couple of times and check the storage partition using nrfjprog --memrd flash_partition_offset --n 512

Expected behavior
bt/hash should not be stored immediately, but calculated once and stored once.

Impact
In combination with a flash backend causes excessive flash wear on every reboot of the device.

Screenshots or console output
memrd output:

0x0007B000: C0FFEEEE 0007FF01 2F746218 68736168   |.........bt/hash|
0x0007B010: 9243D53D 3B7FBAD2 2DD486F0 2AD9F5CC   |=.C....;...-...*|
0x0007B020: 6218905A 61682F74 003D6873 00000000   |Z..bt/hash=.....|
0x0007B030: 00000000 00000000 EF000000 2F746218   |.............bt/|
0x0007B040: 68736168 9243D53D 3B7FBAD2 2DD486F0   |hash=.C....;...-|
0x0007B050: 2AD9F5CC 6218905A 61682F74 003D6873   |...*Z..bt/hash=.|
0x0007B060: 00000000 00000000 00000000 EF000000   |................|
0x0007B070: 2F746218 68736168 9243D53D 3B7FBAD2   |.bt/hash=.C....;|
0x0007B080: 2DD486F0 2AD9F5CC 6218905A 61682F74   |...-...*Z..bt/ha|
0x0007B090: 003D6873 00000000 00000000 00000000   |sh=.............|
0x0007B0A0: EF000000 2F746218 68736168 9243D53D   |.....bt/hash=.C.|
0x0007B0B0: 3B7FBAD2 2DD486F0 2AD9F5CC 6218905A   |...;...-...*Z..b|
0x0007B0C0: 61682F74 003D6873 00000000 00000000   |t/hash=.........|
0x0007B0D0: 00000000 EF000000 2F746218 68736168   |.........bt/hash|
0x0007B0E0: 9243D53D 3B7FBAD2 2DD486F0 2AD9F5CC   |=.C....;...-...*|
0x0007B0F0: 6218905A 61682F74 003D6873 00000000   |Z..bt/hash=.....|
0x0007B100: 00000000 00000000 EF000000 2F746218   |.............bt/|
0x0007B110: 68736168 9243D53D 3B7FBAD2 2DD486F0   |hash=.C....;...-|
0x0007B120: 2AD9F5CC 6218905A 61682F74 003D6873   |...*Z..bt/hash=.|
0x0007B130: 00000000 00000000 00000000 EF000000   |................|
0x0007B140: 2F746218 68736168 9243D53D 3B7FBAD2   |.bt/hash=.C....;|
0x0007B150: 2DD486F0 2AD9F5CC 6218905A 61682F74   |...-...*Z..bt/ha|
0x0007B160: 003D6873 00000000 00000000 00000000   |sh=.............|
0x0007B170: EF000000 2F746218 68736168 9243D53D   |.....bt/hash=.C.|
0x0007B180: 3B7FBAD2 2DD486F0 2AD9F5CC 6218905A   |...;...-...*Z..b|
0x0007B190: 61682F74 003D6873 00000000 00000000   |t/hash=.........|
0x0007B1A0: 00000000 EF000000 2F746218 68736168   |.........bt/hash|
0x0007B1B0: 9243D53D 3B7FBAD2 2DD486F0 2AD9F5CC   |=.C....;...-...*|
0x0007B1C0: 6218905A 61682F74 003D6873 00000000   |Z..bt/hash=.....|
0x0007B1D0: 00000000 00000000 EF000000 2F746218   |.............bt/|
0x0007B1E0: 68736168 9243D53D 3B7FBAD2 2DD486F0   |hash=.C....;...-|
0x0007B1F0: 2AD9F5CC 6218905A 61682F74 003D6873   |...*Z..bt/hash=.|
0x0007B200: 00000000 00000000 00000000 EF000000   |................|
0x0007B210: 2F746218 68736168 9243D53D 3B7FBAD2   |.bt/hash=.C....;|

Environment (please complete the following information):

  • OS: Manjaro-Linux
  • Toolchain: Zephyr SDK
  • Commit SHA: zephyr 2.1.99

Additional context
Created from issue #20629

Bluetooth bug has-pr medium

Most helpful comment

I have discussed this one on slack with @mfiumara.

The issue appears to be this check.

static int db_hash_commit(void)
{
    /* Stop work and generate the hash */
    if (k_delayed_work_remaining_get(&db_hash_work)) {
        k_delayed_work_cancel(&db_hash_work);
        db_hash_gen(false);
    }

The check if (k_delayed_work_remaining) returns the amount of time until the work will execute, that timeout has run out. but the work has not been executed.
Since it is a single queue of work items, and settings_load is called from bt_ready which is a work-item then we get this situation, the bt_ready is executing, bt_hash_work should execute, but has no remaining time remaining.

I'm currently looking into reproducing this and fixing it.

All 8 comments

I have discussed this one on slack with @mfiumara.

The issue appears to be this check.

static int db_hash_commit(void)
{
    /* Stop work and generate the hash */
    if (k_delayed_work_remaining_get(&db_hash_work)) {
        k_delayed_work_cancel(&db_hash_work);
        db_hash_gen(false);
    }

The check if (k_delayed_work_remaining) returns the amount of time until the work will execute, that timeout has run out. but the work has not been executed.
Since it is a single queue of work items, and settings_load is called from bt_ready which is a work-item then we get this situation, the bt_ready is executing, bt_hash_work should execute, but has no remaining time remaining.

I'm currently looking into reproducing this and fixing it.

Reproduce:

Configurations:
CONFIG_BT_GATT_DYNAMIC_DB=y
CONFIG_BT_GATT_CACHING=y
CONFIG_BT_DEBUG_GATT=y

Modified (extracted) peripheral sample:

static void bt_ready(int err)
{
    k_busy_wait(10 * 1000); // DB hash delayed work is 10 MS

    if (IS_ENABLED(CONFIG_SETTINGS)) {
        settings_load();
    }
}

void main(void)
{
    int err;

    err = bt_enable(bt_ready);
    if (err) {
        printk("Bluetooth init failed (err %d)\n", err);
        return;
    }

       /* ... */
}

Then this will happen on every reset:

00:00:00.020,965] <dbg> bt_gatt: Stored Hash: 
57 b5 7c 4a f9 e1 e6 08  23 21 0e 89 c1 21 fc 92 |W.|J.... #!...!..
[00:00:00.021,453] <dbg> bt_gatt: New Hash: 
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |........ ........
[00:00:00.021,453] <dbg> bt_gatt.sc_indicate: start 0x0001 end 0xffff
[00:00:00.021,453] <dbg> bt_gatt.sc_indicate: indicate pending, waiting until complete...
[00:00:00.023,284] <dbg> bt_gatt.db_hash_store: Database Hash stored
....
[00:00:00.259,124] <dbg> bt_gatt: Hash: 
57 b5 7c 4a f9 e1 e6 08  23 21 0e 89 c1 21 fc 92 |W.|J.... #!...!..
[00:00:00.260,589] <dbg> bt_gatt.db_hash_store: Database Hash stored
[00:00:00.260,589] <dbg> bt_gatt.sc_process: start 0x0001 end 0xffff

@mfiumara I posted a possible fix for this issue. It solves the issue when I was reproducing it locally. Can you verify that it also works in your setup?

I just tried this. It seems the hash is now not stored anymore as zero, but it is calculated differently and the problem still persists. I tried by erasing all memory, then running my application twice. My storage partition output after this:

nrfjprog --memrd 499712 --n 256
0x0007A000: C0FFEEEE 0000FF01 2F746218 68736168   |.........bt/hash|
0x0007A010: AF4FA13D 51B45CE5 73EF1DAA 45C27DD2   |=.O..\.Q...s.}.E|
0x0007A020: 621856E6 61682F74 D53D6873 BAD29243   |.V.bt/hash=.C...|
0x0007A030: 86F03B7F F5CC2DD4 905A2AD9 2F746218   |.;...-...*Z..bt/|
0x0007A040: 68736168 AF4FA13D 51B45CE5 73EF1DAA   |hash=.O..\.Q...s|
0x0007A050: 45C27DD2 621856E6 61682F74 D53D6873   |.}.E.V.bt/hash=.|
0x0007A060: BAD29243 86F03B7F F5CC2DD4 905A2AD9   |C....;...-...*Z.|
0x0007A070: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A080: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A090: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A0A0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A0B0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A0C0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A0D0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A0E0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A0F0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|

So it is still swithcing between two hashes on every boot.

@mfiumara Can you check if you have any calls to bt_gatt_service_register, and if they are executed before or after settings_load?

I just checked using gdb. I can confirm bt_gatt_service_register is called _after_ settings_load. The piece of application initialization code I have:

    /* Enable Bluetooth. */
    rc = bt_enable(bt_ready);
    if (rc != 0) {
        LOG_ERR("Bluetooth init failed (err %d)", rc);
        return rc;
    }

    /* Register connection callbacks */
    bt_conn_cb_register(&conn_callbacks);

    /* Register Authentication callbacks */
    bt_conn_auth_cb_register(&conn_auth_callbacks);

    /* Initialize the Bluetooth mcumgr transport. */
    smp_bt_register();

The gdb trace:

(gdb) b bt_gatt_service_register 
Breakpoint 1 at 0x22d28: file /zephyr/subsys/bluetooth/host/gatt.c, line 1034.
(gdb) b settings_load
Breakpoint 2 at 0x36d66: file /zephyr/subsys/settings/src/settings_store.c, line 47.
(gdb) c
Continuing.

Breakpoint 2, settings_load () at /zephyr/subsys/settings/src/settings_store.c:47
47              return settings_load_subtree(NULL);
(gdb) 
Continuing.

Breakpoint 1, bt_gatt_service_register (svc=svc@entry=0x200095fc <smp_bt_svc>) at /zephyr/subsys/bluetooth/host/gatt.c:1034
1034            bt_gatt_init();

The settings_load is in the bt_ready callback.

Ok. then that explains it I think. Could you move the call to smp_bt_register before settings_load for now?

I documented this current requirement here: https://github.com/zephyrproject-rtos/zephyr/pull/22756/commits/a9e0c6c47016a1a480157eff3af3631d21cdfbc9
Also I asked the question in the PR if this is really the behaviour that we want. So let's hear what the others think about this.

@joerchan I retried by re-ordering my initialisation routine (calling smp_bt_register before settings_load) and this seems to work. I redid the same test (restarting my device twice) and the storage partition now looks as follows:

nrfjprog --memrd 499712 --n 128
0x0007A000: C0FFEEEE 0000FF01 2F746218 68736168   |.........bt/hash|
0x0007A010: 9243D53D 3B7FBAD2 2DD486F0 2AD9F5CC   |=.C....;...-...*|
0x0007A020: FFFF905A FFFFFFFF FFFFFFFF FFFFFFFF   |Z...............|
0x0007A030: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A040: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A050: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A060: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|
0x0007A070: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF   |................|

So can confirm this fix works as intended.

Was this page helpful?
0 / 5 - 0 ratings