Esp-idf: WPA2: Incorrect initialisation behind slow WiFi connection times? (IDFGH-1520)

Created on 16 Jul 2019  路  21Comments  路  Source: espressif/esp-idf

Slow WPA2 connection times have long been an issue for many ESP32 users. I think I may have a small clue as to where to look for the problem, and a bandaid to fix it for now.
Today I was able to reduce connection time on my home WPA2 network from the usual ~1100ms down to ~350ms (240MHz, CONFIG_LWIP_DHCP_DOES_ARP_CHECK=0). Those times are using Arduino's millis(), the debug output below might have more meaningful timestamps.
All it took was this seemingly redundant fix:

wifi_config_t conf;
esp_wifi_get_config(WIFI_IF_STA, &conf);
esp_wifi_set_config(WIFI_IF_STA, &conf);

Often I get this additional debug output (the indicated line):
Without fix:

I (722) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1
D (1472) nvs: nvs_set_blob sta.apinfo 700[1B][0m           <-------
I (1522) wifi: state: init -> auth (b0)
I (1532) wifi: state: auth -> assoc (0)
I (1542) wifi: state: assoc -> run (10)
I (1582) wifi: connected with MYSSID, channel 11
D (1582) nvs: nvs_set sta.chan 1 11[1B][0m
D (1582) nvs: nvs_set_blob sta.apinfo 700[1B][0m
I (1592) wifi: pm start, type: 1

D (1592) event: SYSTEM_EVENT_STA_CONNECTED

With fix, for comparison:

I (722) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1
I (722) wifi: state: init -> auth (b0)
I (732) wifi: state: auth -> assoc (0)
I (732) wifi: state: assoc -> run (10)
I (782) wifi: connected with MYSSID, channel 11
D (782) nvs: nvs_set sta.chan 1 11[1B][0m
D (782) nvs: nvs_set_blob sta.apinfo 700[1B][0m
I (792) wifi: pm start, type: 1

D (792) event: SYSTEM_EVENT_STA_CONNECTED

That additional "nvs_set_blob sta.apinfo" doesn't seem to occur every time though, so maybe a red herring. The ~750ms mysterious delay unfortunately does.

In fact, I realised just now that I see this delay and 'apinfo' output when no NVS data exists for the current WiFi config. Indeed, if I erase the NVS partition, I'm sure to see this exact behaviour next boot, every time. Maybe WPA2 cryptographic data isn't being loaded properly on WiFi init so it goes through the whole process with the AP every time?

Whatever it is, this fixes it:
esp_wifi_set_config(WIFI_IF_STA, &conf);

Tested with esp-idf (PlatformIO) , esp-idf+Arduino component (PlatformIO), Arduino (Arduino IDE and PlatformIO).
IDF Test code:

#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_system.h"
#include "esp_wifi.h"
#include "esp_event_loop.h"
#include "esp_log.h"
#include "nvs_flash.h"
#include "lwip/err.h"
#include "lwip/sys.h"

/**************************************** */
#define WIFI_SSID        "MYSSID"
#define WIFI_PASSWORD    "MYPASSWORD"
//Comment/Uncomment to toggle fix
#define APPLY_FIX
//Suggest CONFIG_LWIP_DHCP_DOES_ARP_CHECK=0
/**************************************** */

static const char *TAG = "WPA2_FIX";

static esp_err_t event_handler(void *ctx, system_event_t *event)
{
    switch(event->event_id) {
    case SYSTEM_EVENT_STA_START:
        esp_wifi_connect();
        break;
    case SYSTEM_EVENT_STA_GOT_IP:
        ESP_LOGI(TAG, "WiFi Connected. IP: %s", ip4addr_ntoa(&event->event_info.got_ip.ip_info.ip));
        break;
    case SYSTEM_EVENT_STA_DISCONNECTED:
        ESP_LOGI(TAG, "Connection failed");
        break;
    default:
        break;
    }
    return ESP_OK;
}

void app_main()
{
    esp_err_t ret = nvs_flash_init();
    if (ret == ESP_ERR_NVS_NO_FREE_PAGES || ret == ESP_ERR_NVS_NEW_VERSION_FOUND) {
      ESP_ERROR_CHECK(nvs_flash_erase());
      ret = nvs_flash_init();
    }
    ESP_ERROR_CHECK(ret);

    tcpip_adapter_init();
    ESP_ERROR_CHECK(esp_event_loop_init(event_handler, NULL) );
    wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
    ESP_ERROR_CHECK(esp_wifi_init(&cfg));
    ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_STA) );

    wifi_config_t wifi_config;
    ESP_ERROR_CHECK(esp_wifi_get_config(ESP_IF_WIFI_STA, &wifi_config) );
    if(strcmp(WIFI_SSID,(char*)wifi_config.sta.ssid) || strcmp(WIFI_PASSWORD,(char*)wifi_config.sta.password)) {
        ESP_LOGI(TAG, "Setting new WiFi credentials");
        wifi_config_t new_config = {
            .sta = {
                .ssid = WIFI_SSID,
                .password = WIFI_PASSWORD,
            },
        };
        ESP_ERROR_CHECK(esp_wifi_set_config(ESP_IF_WIFI_STA, &new_config) );
    }
    #ifdef APPLY_FIX
    else {
        ESP_LOGI(TAG, "Testing fix");
        ESP_ERROR_CHECK(esp_wifi_set_config(ESP_IF_WIFI_STA, &wifi_config) );
    }
    #endif

    ESP_ERROR_CHECK(esp_wifi_start() );

    ESP_LOGI(TAG, "Connecting...");

    vTaskDelete(NULL);
}

All 21 comments

Hi @boarchuz,
The delay is most likely due to time required for scanning first time when AP information (channel, BSSID) is not available and not because of NVS. You can confirm the same by disabling NVS usage by WIFI (component config -> Wi-Fi -> Wi-Fi NVS flash). If the NVS usage is disabled, the delay will be seen every time, because scan will be triggered for every connect. NVS is used to store channel and bssid information upon successful connect, so that next time the connection is faster. Hope this helps!

@sagb2015 does this make sense because in both logs it is already on channel 11 at timestamp 722
I (722) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1

Thanks for the reply @sagb2015
I've tested with NVS disabled, and it takes longer to connect as expected. I don't think that's the issue here. I'm still convinced that this should be investigated.

With NVS enabled, that longer connection time should only occur once (or when AP channel changes, for example). This unusual delay with WPA2 connection that I'm trying to describe occurs every time, unless esp_wifi_set_config() is explicitly called beforehand.

I realise I rambled a lot in the first comment, so I'll try make it really clear.
This (pseudo)code has a SYSTEM_EVENT_STA_START to SYSTEM_EVENT_STA_CONNECTED interval of <300ms:

esp_wifi_init()
esp_wifi_start()
esp_wifi_connect()

Whereas this code has a SYSTEM_EVENT_STA_START to SYSTEM_EVENT_STA_CONNECTED interval of ~1100ms:

esp_wifi_init()
wifi_config_t conf;
esp_wifi_get_config(WIFI_IF_STA, &conf);
esp_wifi_set_config(WIFI_IF_STA, &conf);
esp_wifi_start()
esp_wifi_connect()

That's it. Identical code except for simply getting and setting the same config. It appears to bypass a hidden bug that is currently causing a long and inexplicable delay. Again, this bug does not occur with WEP or open WiFi security in my experience. WPA2 only.

There is definitely an inexplicable delay of about 750ms that occurs every time as seen here:

[1B][0;32mI (644) WPA2_FIX: Connecting...[1B][0m
I (764) wifi: n:6 0, o:1 0, ap:255 255, sta:6 0, prof:1
D (1604) nvs: nvs_set_blob sta.apinfo 700[1B][0m
I (1604) wifi: state: init -> auth (b0)

With fix:

[1B][0;32mI (664) WPA2_FIX: Connecting...[1B][0m
I (774) wifi: n:6 0, o:1 0, ap:255 255, sta:6 0, prof:1
I (774) wifi: state: init -> auth (b0)

It appears to me that there is some kind of bug that causes a significant interval before commencing the auth sequence, which is exclusive to WPA2.

One theory is the user has a mesh wifi system with multiple bssid on same ssid and channel. When client sends probe request as broadcast, first response has bssid not matching expected value so client panics and does full scan? But I don't know why esp_wifi_set_config fixes this.

@boarchuz can you read and print the full wifi_config before and after connecting with and without fix and also esp_wifi_sta_get_ap_info?

Actually, I was meaning to mention that. The wifi_config is a mess. BSSID is FF:FF:FF:FF:FF:FF, channel is 128 or 63 or something. I thought that was a clear problem, but then I tried applying the fix: same thing. I then tried WEP: same thing. So it appears to have no direct impact on connection time.
I concluded that the WiFi driver isn't dependent on these fields, they're only for passing new config to the driver. It must have some other struct with that data, or load it later. However, this may very well give a clue to the source of the problem if this data is meant to be loaded but is apparently not.

To test if that dodgy channel was the problem, I adjusted my router's WiFi channel between 1-11. You would expect channel 11 to take longer if it was defaulting to channel scan start at 0 every time, or for channel 1 to eliminate the delay (depending on start channel, of course). It has no effect though. The delay still exists.

If you look in the few log lines I quoted above, it seems to have no problem finding the AP very quickly. The issue comes later; it then takes an awfully long time to initiate auth, with that peculiar NVS save in there too.

Lastly, it's not a mesh system. There's only one instance of the SSID visible to the ESP32 (also one at 5GHz).

Able to reproduce here also

I (1269) wifi: connect_op: status=0, auth=5, cipher=3
I (1279) wifi: auth mode is not none
D (2259) nvs: nvs_set_blob sta.apinfo 700
I (2269) wifi: connect_bss: auth=1, reconnect=0
I (2269) wifi: state: init -> auth (b0)
I (1289) wifi: connect_op: status=0, auth=5, cipher=3
I (1289) wifi: auth mode is not none
I (1299) wifi: connect_bss: auth=1, reconnect=0
I (1299) wifi: state: init -> auth (b0)

Is the delay due to the nvs write itself? Why is the wifi driver saving the apinfo when nothing has changed? It thinks apinfo has changed but when you esp_wifi_set_config it to the same value it correctly identifies that it hasn't changed and doesn't do a nvs write?

@boarchuz - We are definitely investigating this. I missed the point that you are getting the delay every time with NVS enabled in WPA2 case. Also, can you confirm that the interval you mentioned in the two cases from SYSTEM_EVENT_STA_START to SYSTEM_EVENT_STA_CONNECTED are 1100ms in the first case and 300 ms in the second and not the other way around as you mentioned? This definitely has nothing to do with mesh.

That's right, sorry I mixed them up above.

@negativekelvin When you say your are able to reproduce the issue. Do you see the delay, every time without flash erase?

@sagb2015 yes, i see it every time if esp_wifi_set_config is not called. Also I don't think it is related to nvs_set_blob sta.apinfo because I patched nvs_api to ignore that key and it did not eliminate the delay. Also my logs are from full verbose level so there is nothing happening during that delay.

I updated to latest master and erased flash. I am now seeing the same delay both with and without calling esp_wifi_set_config. Here is the chronicle:

set log to VERBOSE
ad3b820e7 with esp_wifi_set_config D (1331) system_event: SYSTEM_EVENT_STA_CONNECTED
ad3b820e7 without esp_wifi_set_config D (2331) system_event: SYSTEM_EVENT_STA_CONNECTED
don't erase flash
master without esp_wifi_set_config D (3696) system_event: SYSTEM_EVENT_STA_CONNECTED
master with esp_wifi_set_config -> handshake timeout
erase flash
master with esp_wifi_set_config D (4816) system_event: SYSTEM_EVENT_STA_CONNECTED <-- this is due to scan
reboot
master with esp_wifi_set_config D (3456) system_event: SYSTEM_EVENT_STA_CONNECTED
master without esp_wifi_set_config D (3436) system_event: SYSTEM_EVENT_STA_CONNECTED
set log back to INFO
master with esp_wifi_set_config I (1864) wifi: connected
master without esp_wifi_set_config I (1834) wifi: connected
set to release build
master without esp_wifi_set_config I (1681) wifi: connected
master with esp_wifi_set_config I (1691) wifi: connected
ad3b820e7 without esp_wifi_set_config I (1446) wifi: connected
ad3b820e7 with esp_wifi_set_config I (495) wifi: connected

So there is a problem, we should be getting 500ms connection timestamps but they have jumped up to 1700ms in latest master

Yeah it's definitely an auth thing. I can see CPU0 is stuck doing "pbkdf2_sha1()" every time without the fix, whereas it only runs once on first connection with the fix. Even the comments mention that it is "very slow" - it's the direct cause of that additional ~800ms.
I see the WPA cryptography component has changed a lot in the last month or so with some significant commits, so I gave up on tracing it further back than that. I'm on 3.2.2 with PlatformIO, for reference.

After debugging, it was found that @adb820 commit mentioned by @negativekelvin, esp_wifi_set_config makes wifi library code to check if ap_info stored in NVS already has a valid PMK generated from previous connection. This is not the case, if you directly connect. As a result, PMK is always generated if you do wifi_init/start/connect sequence. PMK generation takes a lot of time. The corresponding latest code is here
https://github.com/espressif/esp-idf/blob/master/components/wpa_supplicant/src/crypto/sha1-pbkdf2.c#L81-L101
@adb820 The above code was in the wifi library. With c139683 a lot of wpa_supplicant is moved out and initialisation flow has also changed. There is also a bug related to PMK not stored properly which is getting fixed and is in the process of getting merged (Refer to this issue for more details). In both the cases, root cause for the delay seems to be same. We will try to get this fixed in both new as well as old version soon. Thanks!

Just to add, this also explains why issue is WPA2 specific.

@sagb2015 mbedtls version is significantly faster

I (350) wifi station: Calling mbedtls_pbkdf2
I (800) mbedtls_pkcs5_pbkdf2: 44 11 6e a8 81 53 19 96 d8 a2 3a f5 8b 37 6d 70
I (800) mbedtls_pkcs5_pbkdf2: f1 96 05 74 29 c2 58 f5 29 57 7a 26 e7 27 ec 1b
I (800) wifi station: Calling wpa_pbkdf2
I (1650) wpa_pbkdf2: 44 11 6e a8 81 53 19 96 d8 a2 3a f5 8b 37 6d 70
I (1650) wpa_pbkdf2: f1 96 05 74 29 c2 58 f5 29 57 7a 26 e7 27 ec 1b

Is there any plan to use mbedtls for all wpa crypto?

@negativekelvin Definitely. We have marked this as an action item to be done.

Workaround works again on latest commit. Timestamp behavior has changed

330ms

I (399) cpu_start: Pro cpu start user code
I (82) cpu_start: Starting scheduler on PRO CPU.
I (144) wifi: wifi driver task: 3ffc072c, prio:23, stack:3584, core=0
I (474) wifi: connected

310ms

I (402) cpu_start: Pro cpu start user code
I (421) cpu_start: Starting scheduler on PRO CPU.
I (511) wifi: wifi driver task: 3ffc09d4, prio:23, stack:3584, core=0
I (821) wifi: connected

Hi @negativekelvin, yes both the changes 1) using mbedtls_pbkdf2 & 2) fix for slow connection if set_config is not called, are merged in internal repository and should be available soon.

Working now, still 310ms but lost some ms in initialization stuff

I (419) cpu_start: Pro cpu start user code
I (437) spi_flash: detected chip: generic
I (438) spi_flash: flash io: dio
I (438) cpu_start: Chip Revision: 1
W (439) cpu_start: Chip revision is higher than the one configured in menuconfig. Suggest to upgrade it.
I (450) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (522) wifi station: ESP_WIFI_MODE_STA
I (532) wifi: wifi driver task: 3ffc0968, prio:23, stack:3584, core=0
I (842) wifi: connected 

Closing this issue now. Let us know if you face any other issues.

Was this page helpful?
0 / 5 - 0 ratings