Mbed-os: AES hardware acceleration not working for STM32F439xI

Created on 17 Aug 2017 · 32Comments · Source: ARMmbed/mbed-os

Description

Type: Bug
Priority: Major

Bug

Our automated tests for the tls-client example in the mbed-os-example-tls fails with the following error message printed in the serial console (target UBLOX_EVK_ODIN_W2) :

mbedtls_ssl_handshake() failed: -0x7780 (-30592): SSL - A fatal alert message was received from our peer

When we enable debug printing, we observe that the TLS connection terminates prematurely because the server sent the tls-client a fatal alert message as the MAC of a TLS record does not check out:

...
ssl_tls.c:3961: |2| got an alert message, type: [2:20]
ssl_tls.c:3969: |1| is a fatal alert message (msg 20)
ssl_tls.c:3744: |1| mbedtls_ssl_handle_message_type() returned -30592 (-0x7780)
ssl_cli.c:3184: |1| mbedtls_ssl_read_record() returned -30592 (-0x7780)
ssl_tls.c:6354: |2| <= handshake
mbedtls_ssl_handshake() failed: -0x7780 (-30592): SSL - A fatal alert message was received from our peer
...

We investigated the problem and found that disabling the AES hardware acceleration code fixes it. To test this, we used the following diff:

diff --git a/features/mbedtls/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/mbedtls_device.h b/features/mbedtls/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/mbedtls_device.h
index dfbc820..2c2fff8 100644
--- a/features/mbedtls/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/mbedtls_device.h
+++ b/features/mbedtls/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/mbedtls_device.h
@@ -20,8 +20,6 @@
 #ifndef MBEDTLS_DEVICE_H
 #define MBEDTLS_DEVICE_H

-#define MBEDTLS_AES_ALT
-
 #define MBEDTLS_SHA256_ALT

 #define MBEDTLS_SHA1_ALT

Target
STM32F439xI family of devices with hardware acceleration enabled

Toolchain:
GCC_ARM

mbed-os sha:
Git tag mbed-os-5.5.5

Expected behavior
The tls-client example should succeed.

Actual behavior
The tls-client example fails with error:

mbedtls_ssl_handshake() failed: -0x7780 (-30592): SSL - A fatal alert message was received from our peer

Steps to reproduce
Run the tls-client at mbed-os-example-tls repository (with mbed-os-5.5.4 tag) using the GCC_ARM toolchain on the UBLOX_EVK_ODIN_W2 target. The failure message can be observed in the serial output.

tls st ublox

Source

andresag01

All 32 comments

cc @RonEld @Patater @0xc0170

andresag01 on 17 Aug 2017

Hi @andresag01 Thanks for raising this. I don't understand the description, is the target device STM32F439Xl or UBLOX_EVK_ODIN_W2 ?
Also, do you happen to know if the AES used here is AES192 by any chance?

RonEld on 17 Aug 2017

@RonEld UBLOX_EVK_ODIN_W2 is a TARGET_STM32F439xI, so it is affected.

Patater on 17 Aug 2017

I see, thanks,

RonEld on 17 Aug 2017

cc @andreaslarssonublox @andreaspeterssonublox

0xc0170 on 17 Aug 2017

cc @adustm

andresag01 on 17 Aug 2017

This issue isn't affecting only u-blox targets. This issue affects at least all STM32F439xI-family targets that support AES hardware acceleration.

Patater on 17 Aug 2017

@RonEld: The ciphersuite used for this specific server and example is TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256. So I suppose its AES-128.

andresag01 on 17 Aug 2017

Hello, Thanks for reporting. I have reproduced the issue and will look at it.

adustm on 18 Aug 2017

Seems that using the HW acceleration for crypto also breaks the SD-cards init.

JanneKiiskila on 18 Aug 2017

Hello, I have a question. Is it possible that once the issue happens ('TLS handshake failure'), the server refuses a new connection from my IP address for a while ?
It looks like it is difficult to reconnect when pressing the reset button several times in a raw

Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Starting the TLS handshake...
mbedtls_ssl_handshake() failed: -0x7780 (-30592): SSL - A fatal alert message 
was received from our peer
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009

adustm on 18 Aug 2017

Seems that using the HW acceleration for crypto also breaks the SD-cards init.

@JanneKiiskila Why would you suggest this? AES and SD/SPI should be completely unrelated. Do you have an application example of this failure?

sg- on 18 Aug 2017

@adustm: I looked up the error number -3009 and found this in mbed-os/features/netsocket/nsapi_types.h:

NSAPI_ERROR_DNS_FAILURE         = -3009,     /*!< DNS failed to complete successfully */

Also, from the tls-client app error message you got, it seems the failure was in line tls-client/main.cpp:205:

        mbedtls_printf("Connecting with %s\r\n", _domain);
        ret = _tcpsocket->connect(_domain, _port);
        if (ret != NSAPI_ERROR_OK) {
            mbedtls_printf("Failed to connect\r\n");
            printf("MBED: Socket Error: %d\r\n", ret);
            _tcpsocket->close();
            return;
        }

It looks to me like the device is not able to resolve the DNS? Perhaps the device is not in the network or the server is somehow unreachable? Perhaps there is some network configuration that is causing your device to return this error when it is reset quickly too many times? It seems that there are multiple functions in ./features/netsocket/nsapi_dns.h that could return that specific error code, you could try looking there.

I suppose that it is also possible for servers to refuse connections from the same IP in quick succession, but I would expect the error to have a different value. Of course, I could be wrong...

andresag01 on 18 Aug 2017

I just wanted to quickly ask if there were any updates regarding this issue...

andresag01 on 22 Aug 2017

Dear all,
I've tried to disable / enable interrupts during the HW process, Remove the AES_FORCE_RESET during aes_free function / and some other things. No clue at the moment...
You can find in attachment the log files of the tls_handshake part (teratermaes_sw.txt is the OK version when there is no AES HW acceleration, teraterm_aes_alt.txt is the failing version with AES HW acceleration). This is done with DEBUG LEVEL 4

Could someone look at that ?
At line 762 of the log files, we can see that the failing version receives a message length of 2 and not 202.

teratermaes_sw.txt
teraterm_aes_alt.txt

Kind regards
Armelle

adustm on 22 Aug 2017

Hi @adustm As you can see, it's not only a different message length. It's a different message. The msgtype is 21 ( alert message) instead of 22 (handshake message. The reason for TLS failure is a fatal alert message received by the server. We need to investigate reason for the alert message, and why with HW acceleration the server failed. I suggest you test AES GCM with and with HW accelerated AES, perhaps there is something wrong with this part of the message

RonEld on 22 Aug 2017

The SD-card issue for us is related to the fact that we we encrypt the SD-card content, so it seems the HW crypto block doesn't work reliably. With the mbed-os-example-client we see the TLS failure.

JanneKiiskila on 22 Aug 2017

Hi @adustm,
Could you check whether HAL_CRYP_AESECB_Encrypt has failed on this device, and since mbedtls_aes_encrypt doesn't return error, the driver's error wasn't surfaced up?

RonEld on 22 Aug 2017

Hi @adustm,

Code freeze for Mbed OS 5.5.6 is tomorrow (2017-08-24). Will a fix be ready by then? If not, could you please review https://github.com/ARMmbed/mbed-os/pull/4934 ?

Thanks

Patater on 23 Aug 2017

Hello @RonEld

Could you check whether HAL_CRYP_AESECB_Encrypt has failed on this device, and since mbedtls_aes_encrypt doesn't return error, the driver's error wasn't surfaced up?

No error was returned by HAL_CRYP_AESECB_Encrypt .

GCM selftest is also fine (tested with both master branch and mbed-os-5.5 branch).
test case: 'mbedtls_gcm_self_test' ........................................................... OK in 2.58 sec

Would you like to suggest another test ?
Kind regards
Armelle

adustm on 25 Aug 2017

I have modified gcm.c so that it can test 2 instances of ctx in parallel, and it's all OK. It looks like the AES hardware is perfectly well managing the save and restore context.

Would someone have a multiple aes thread example that I could work on ?

Kind regards
Armelle

adustm on 25 Aug 2017

Hi @adustm
The alert message that is received is MBEDTLS_SSL_ALERT_MSG_BAD_RECORD_MAC , so I am quite positive that it is a matter of GCM result is not as expected. Probably the key used on both sides is different. Since GCM uses AES, I would focus on the AES part, as you are doing.
I think your direction on multi-threading is correct.

Regards,
Ron

RonEld on 27 Aug 2017

Hello @RonEld
I have rewritten the gcm_selftest in order to launch 5 threads of GCM in // (see attached main.txt file, to rename as main.cpp if you want to test it)

main.txt

It's all OK.

| target                | platform_name | test suite           | result | elapsed_time (sec) | copy_method |
+-----------------------+---------------+----------------------+--------+--------------------+-------------+
| NUCLEO_F439ZI-GCC_ARM | NUCLEO_F439ZI | tests-mbedtls-thread | OK     | 19.89              | shell       |
+-----------------------+---------------+----------------------+--------+--------------------+-------------+

Any other idea ?

adustm on 28 Aug 2017

@JanneKiiskila could I access your program to test it ?

adustm on 28 Aug 2017

👍1

HI @adustm At the moment, I can think that perhaps there was some preemption, causing the HW to load a different key. Perhaps it's a matter of GCM + AES muti threading scenario.

RonEld on 28 Aug 2017

Hei,

@adustm - I know STM is a member of mbed Cloud Partners, you have access to these repositories which contain the SW we are running.

Email was sent with a bit more details.

JanneKiiskila on 28 Aug 2017

👎1

Can we raise this to blocker, please.

JanneKiiskila on 31 Aug 2017

👍1

Can we raise this to blocker, please.

The fix will get CI once CI is back running.

0xc0170 on 31 Aug 2017

The fix will get CI once CI is back running.

@0xc0170 can you add a reference here to the fixing PR?

RobMeades on 4 Sep 2017

I think this is the PR: https://github.com/ARMmbed/mbed-os/pull/4934

JanneKiiskila on 4 Sep 2017

That's not the fix, though, that's a workaround. I guess ST, maybe @adustm, is still fighting the problem?

RobMeades on 4 Sep 2017

👍1

Hello,
The fix is eventually here in PR #5018
(with explanations)
Kind regards
Armelle

adustm on 5 Sep 2017

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

CI tests failing for filesystem tests on targets with 32K RAM

ashok-rao · 4Comments

BLE builds cannot find mbedtls feature

hasnainvirk · 3Comments

Callback: pass result?

pilotak · 3Comments

NU_PFM_M2351_NPSA_S: failing nightly in ATHandler - not supporting serial

0xc0170 · 3Comments

X-NUCLEO-NFC05A1

1domen1 · 3Comments