Zephyr: bt_rand() is called over HCI when BT_HOST_CRYPTO=y, even if BT_CTLR_LE_ENC=n

Created on 25 Jan 2020 · 10Comments · Source: zephyrproject-rtos/zephyr

Describe the bug
If LE encryption is not supported (like the case for VEGA controllers), and host encryption is enabled, there will still be calls to the controller for obtaining the necessary 8 bytes of random data.
This is done in two places in subsys/bluetooth/host/crypto.c, in prng_reseed() and in prng_init().

The problem is that the sequence of calls is actually a kind of infinite recursion, as per below

prng_reseed() -> BT_HCI_OP_LE_RAND -> le_rand() -> bt_rand() (host) -> prng_reseed()

The call to bt_rand() in host is done because only one implementation of the bt_rand() function (either from controller or from host) is being linked at a time.

Although I have not tested this, I suppose that this could be reproducible on Nordic as well, if setting BT_CTLR_LE_ENC_SUPPORT to n and enabling host crypto. Will test later on, when I can get access to a Nordic board.

To Reproduce
Steps to reproduce the behavior:

mkdir build; cd build
cmake -DBOARD=rv32m1_vega_ri5cy -DBT_HOST_CRYPTO=y
make
run on board and notice an error similar with the one below:

[00:00:00.120,000] <err> os: Exception cause Illegal instruction (2)
[00:00:00.120,000] <err> os: Faulting instruction address = 0x0d8fe860
[00:00:00.120,000] <err> os:   ra: 0x832f304a  gp: 0xec7e3d7e  tp: 0xe25f7e36  t0: 0x306eda80
[00:00:00.120,000] <err> os:   t1: 0x2c6fde6f  t2: 0x30a507e6  t3: 0x6a09e667  t4: 0xbb67ae85
[00:00:00.120,000] <err> os:   t5: 0x3c6ef372  t6: 0xa54ff53a  a0: 0x510e527f  a1: 0x9b05688c
[00:00:00.120,000] <err> os:   a2: 0xb78b2e79  a3: 0xb9a2e204  a4: 0x2fb44915  a5: 0xd90c5060
[00:00:00.120,000] <err> os:   a6: 0xd4839fe3  a7: 0x579a9afc

This is due to the fact that eventually the stack will run out.

Expected behavior
The controller random function should not be called when BT_CTLR_LE_ENC_SUPPORT is not enabled. This is according the the BLE spec, vol 2, part E, 3.1 LE Controller Requirements: where the HCI_LE_Rand command is noted as C4; per the note "C4: Mandatory if LE Feature (LE Encryption) is supported, otherwise excluded.".

Environment (please complete the following information):

OS: Linux
Toolchain: VEGA Toolchain
Commit SHA or Version used: 7cb74655da83bf100eaacc3e00d96c94f1cfbcb3

Additional context
One quick-fix would be to re-implement prng_reseed() and prng_init() to call locally the entropy device. Another dirty fix would be to just use the kernel ticks (k_uptime_get) as the 8 bytes of 'entropy'.

Bluetooth bug has-pr medium

Source

alexandru-porosanu-nxp

All 10 comments

The correct way for the host to test LE Encryption support in the controller is:

if (BT_FEAT_LE_ENCR(bt_dev.le.features))) {
    ...
}

The reasons why the host is avoiding to use a local entropy source are mainly historical: in the early days of Zephyr (3-4 years ago) most boards did not have an implementation for it, which resulted in them easily falling back to using a "dummy" PRNG which was something similar to what you suggested with k_uptime_get(). Since this is extremely insecure and we didn't want to risk anyone ending up with a product with such a security vulnerability we always just used the controller-side entropy source since, until now, all controllers we've ever used have had this available.

I'm fine with falling back to the local entropy support if the controller indicates lack of encryption support, but we have to ensure that it's not a dummy one. In any other scenario the Bluetooth init (bt_enable) should fail. In fact, if there's a reliable and secure local entropy source it's probably more efficient to use that instead of using HCI.

jhedberg on 26 Jan 2020

On a combined build CONFIG_BT_CTLR_CRYPTO is selected by default and the corresponding crypto.c file that calls into ecb.c functions.

Switching to CONFIG_BT_HOST_CRYPTO (if there is no crypto hw acceleration in controller) on a combined build results in that deadlock

George-Stefan on 27 Jan 2020

@George-Stefan The controller shouldn't do calls into to the host. And currently bt_rand is used by the controller. For example to create the access address of the connection (ull_master.c).
I think what we should do is to make sure that the controller never calls bt_rand itself, but provides bt_rand as a function that the host can call.

@cvinayak @jhedberg So I would rename the implementation and all uses of bt_rand in the controlelr to bt_ctlr_rand(), and then introduce bt_rand which calls this function. That would solve the recursion problem.
Also since it appears that the controller already has a dependency on having a bt_rand function available I don't see the problem with implement the LE Rand command even though encryption is disabled.

For the host we should also handle the case where the controller does not provide the LE Rand function. That would mean that the host would possibly do a call to the entropy device. But if there is no entropy device in the host it appears that we have a problem. The host would need these random numbers for the Privacy feature. Which relies on secure random numbers. Using k_uptime would not be good enough for this.

Is there a non-secure rand function implement somewhere? We appear to use bt_rand most places, not all of those need secure random numbers I guess?

joerchan on 27 Jan 2020

I don't see the problem with implement the LE Rand command even though encryption is disabled.

We could kind of do that.. I suppose. The reason I hesitate is that the core spec states LE Encryption as the feature which makes this command available. That said, instead of checking for supported features the host could be checking for supported commands (which is what we do when the core spec doesn't clearly specify what feature enables support for an optional command)

jhedberg on 27 Jan 2020

...or then we just pretend ignorance and keep the host code presuming that this command is always available ;)

jhedberg on 27 Jan 2020

So the use case we need to consider is a privacy-enabled host build with a controller without encryption, and where the host does not have a access to secure random numbers. Using pseudo-random numbers with privacy would mean you could track it across RPA update simply because the rand part of the RPA would be known.

@jhedberg Yeah I meant that we could include the function and set the function as supported, in which case the host would have to read the supported commands instead of features. Is there any possible drawbacks to this?

joerchan on 27 Jan 2020

Maybe the discussion I started should be taken to it's own issue?

In any case I think what should be done to solve this problem is what I already mentioned:

So I would rename the implementation and all uses of bt_rand in the controlelr to bt_ctlr_rand(), and then introduce bt_rand which calls this function.

joerchan on 27 Jan 2020

in which case the host would have to read the supported commands instead of features

The host is already reading the supported commands and storing them into bt_dev.supported_commands. The only "drawback" would then just be an additional branch in the code.

jhedberg on 27 Jan 2020

prng_init checks for supported command, and returns -ENOTSUP in case it is not supported. prng_reseed does not do this, in this case we should get -EIO. The caller of bt_rand will have to handle this error in that case. The host will check this error code and should not start any roles that needed privacy. Same for other users of bt_rand.

In my opinion this is good enough. We just need to fix the recursion problem.

joerchan on 27 Jan 2020

👍1

Although I have not tested this, I suppose that this could be reproducible on Nordic as well, if setting BT_CTLR_LE_ENC_SUPPORT to n and enabling host crypto. Will test later on, when I can get access to a Nordic board.

@alexandru-porosanu-nxp Although I don't get the exact same crash as you do, setting BT_CTLR_CRYPTO=n is definitively not working on Nordic either.

joerchan on 30 Jan 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings