WiFi.status() returns WL_CONNECTED after connection is lost when using WiFi.setAutoReconnect( false );
The included MCVE produces the following output:
SDK:3.0.0-dev(c0f7b44)/Core:2.5.0=20500000/lwIP:STABLE-2_1_2_RELEASE/glue:1.1/BearSSL:6778687
mode : sta(a0:20:a6:0a:a6:96) + softAP(a2:20:a6:0a:a6:96)
add if0
Looking for WiFi ....wlstatus:WL_IDLE_STATUS=0
WiFi.localIP():(IP unset)
.scandone
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt
connected with TEST, channel 3
dhcp client start...
ip:192.168.43.39,mask:255.255.255.0,gw:192.168.43.45
connected to TEST
wlstatus:WL_CONNECTED=3
WiFi.localIP():192.168.43.39
wlstatus:WL_CONNECTED=3
WiFi.localIP():192.168.43.39
state: 5 -> 2 (3a0)
rm 0
wlstatus:WL_CONNECTED=3
WiFi.localIP():(IP unset)
wlstatus:WL_CONNECTED=3
WiFi.localIP():(IP unset)
wlstatus:WL_CONNECTED=3
WiFi.localIP():(IP unset)
wlstatus:WL_CONNECTED=3
WiFi.localIP():(IP unset)
wlstatus:WL_CONNECTED=3
The AP was turned off shortly after the connection was established and you will notice the logs show loss of IP but status() returns WL_CONNECTED. This only happens when using WiFi.setAutoReconnect( false )
#include <ESP8266WiFi.h>
#include <ESP8266WiFiMulti.h>
ESP8266WiFiMulti wifiMulti;
boolean connectioWasAlive = true;
const char* statuses[] = { "WL_IDLE_STATUS=0", "WL_NO_SSID_AVAIL=1", "WL_SCAN_COMPLETED=2", "WL_CONNECTED=3", "WL_CONNECT_FAILED=4", "WL_CONNECTION_LOST=5", "WL_DISCONNECTED=6"};
void setup()
{
Serial.begin(115200);
Serial.println();
WiFi.setAutoReconnect( false );
wifiMulti.addAP("TEST", "12345678");
}
unsigned long timestamp = millis();
void monitorWiFi()
{
if (millis() - timestamp > 2000) {
timestamp = millis();
Serial.print("wlstatus:");
Serial.println(statuses[WiFi.status()]);
Serial.print("WiFi.localIP():");
Serial.println(WiFi.localIP().toString());
}
if (wifiMulti.run() != WL_CONNECTED)
{
if (connectioWasAlive == true)
{
connectioWasAlive = false;
Serial.print("Looking for WiFi ");
}
Serial.print(".");
delay(500);
}
else if (connectioWasAlive == false)
{
connectioWasAlive = true;
Serial.printf(" connected to %s\n", WiFi.SSID().c_str());
}
}
void loop()
{
monitorWiFi();
}
As a workaround, using the WiFiEventStationModeDisconnected event (which does fire), performing a disconnect (even though the connection is already disconnected) fixes the WiFi.status() problem.
void onStationModeDisconnectedEvent(const WiFiEventStationModeDisconnected& evt) {
if (WiFi.status() == WL_CONNECTED) {
WiFi.disconnect();
} else {
Serial.println(" WiFi disconnected...");
}
}
That's a nice workaround for core release 2.5.0.
This bug has disappeared in git master version (soon to be release-2.5.1).
It also not reproducible with fw 2.2.x (SDK:2.2.2-dev(c0eb301)) available for testing with the generic esp8266 board.
edit: If a release is needed, version 2.4.2 hasn't this bug (it uses fw 2.2.1).
Could this also be a nice work-around for core 2.4.x issues?
@TD-er core-2.4.2 uses fw 2.2.1, same as with current master (by default).
I tried yesterday, when I rebooted my AP the connection was showed as lost.
Do you have a different behaviour ?
Well I have not tried this code myself.
But I have noticed a few times (even running core 2.4.0 a while ago) that the reported wifi status is not always reflecting the true status.
So I was hoping this would be some help also, since I have really no clue how to truly detect the connection status.
It is not always happening, but it may happen that there is no connection and the wifi status still reports it is connected.
This then often leads to something waiting for data that will never arrive or starting to initiate a connection. Both end up in a hardware watchdog reset.
I use timeouts where possible, but still see a lot of HW watchdogs happen.
Well I have not tried this code myself.
But I have noticed a few times (even running core 2.4.0 a while ago) that the reported wifi status is not always reflecting the true status.So I was hoping this would be some help also, since I have really no clue how to truly detect the connection status.
It is not always happening, but it may happen that there is no connection and the wifi status still reports it is connected.
This then often leads to something waiting for data that will never arrive or starting to initiate a connection. Both end up in a hardware watchdog reset.
I use timeouts where possible, but still see a lot of HW watchdogs happen.
I would be hesitant to implement a change as you did in your commit. It is a patch that does not solve the root cause. In my experience, WDT reset are caused by other types of problems. One of the most useful things is the ESP.getHeapFragmentation() recently introduced. It shows the state of memory better than just looking at the available memory from ESP.getFreeHeap(). You may have fragmentation problems. Also, if you are using char arrays, look out for overflows. Those will cause a lot of WDT resets.
In the builds with core 2.6.0 I do have the heap fragmentation shown in the sysinfo page and also have been running some tests to see if the fragmentation was an issue.
But these WDT resets do still happen, also on test nodes that hardly do any intensive String manipulation or other stuff regarding memory allocation. (not big parts at least)
These show a very strong correlation with wifi reception quality and the WDT resets happen more often when a node is performing more network IO (thus more chance of running into these issues).
I know it isn't going to solve the root cause of these WDT resets, but it has already taken 100's of hours for me to get some grip on these WDT resets that I really want to try anything.
I was not sure yet if I was going to merge that commit into the main branch. For now that PR has been built and is running on a few nodes as a test.
These WDT resets are bugging me since I think August last year and some of them were definitely related to some bugs in our code or simply running out of resources. So that's just a number of possible reasons for crash reports, but the ones that now still occur do have a very strong correlation with wifi stability, power management, active pings to the node and those kind of things.
I will now have a good look at parts that handle strings to see if there's something that may use char arrays since calling .length() or something like that may indeed take much longer than expected.
Also the inaccuracy of the wifi connected status is also bugging me in other ways.
When trying to send something, I do perform checks on the connection state. So if this state is incorrect, it will cause significant delays in code execution of other parts, which will cause communication to sensors to be out of sync in need of re-init etc.
That's happening on a node I have in my car (offline logging data), running ESP82xx Core 2.6.0-dev, NONOS SDK 2.2.2-dev(c0eb301), LWIP: 2.1.2 PUYA support
Build Time: Mar 20 2019 23:30:57
That SDK should not have this specific issue, right?
That SDK should not have this specific issue, right?
I guess not (sdk 2.2.2-dev has not been extensively used though)
I'm using 2.5.0 and WiFiEventStationModeDisconnected is not firing unless I do some network activity, webserver sits behind a lost connection waiting for requests, WIFI state 3
the moment esp8266 does a network request, as getting NTP time, then the event fires...
as a check, I keep pinging the router and bang, the event fires as soon as the router drops the connection. However, the ping is too intrusive and generates too much instability especially if using the AP mode.
please advise.
I'm using this kind of ping to maintain net activity, it works as a test and might shed some light on the problem origin...
Is there a better way of having wifi events firing properly for web servers, because unlike sensors we don't do network unless requested.
extern "C" {
#include <ping.h>
}
void tick_back(void *opt, void *resp) {
constexpr int ticks_tolerance=3;//how many fails in a row will we tolerate (or something like that)
static volatile int ticks_ok=0;
ping_resp* ping_resp = reinterpret_cast<struct ping_resp*>(resp);
if (ticks_ok>0&&ping_resp->ping_err==-1) ticks_ok--;
else if (ticks_ok<ticks_tolerance) ticks_ok++;
wifiConnected&=ticks_ok>0;//update my own state
}
//network tick, not using delay stuff and sending only one packet
void nw_tick(IPAddress dest) {
static ping_option tick_options;
memset(&tick_options, 0, sizeof(struct ping_option));
tick_options.count = 1;
tick_options.coarse_time = 0;
tick_options.ip = dest;
tick_options.sent_function = NULL;
tick_options.recv_function = reinterpret_cast<ping_recv_function>(&tick_back);
ping_start(&tick_options);
}
@neu-rah See above
@d-a-v yes I saw it, but I'm already using the wifi events. Changed to the latest core version:
using platformio, plaktformio.ini set to:
platform = https://github.com/platformio/platform-espressif8266.git (guess it makes it the latest)
and the problem persists, I'm recovering ok from a lost connection as soon as I get WiFiEventStationModeDisconnected, my problem is that it only fires when I do a network request (ping or NTP) until then the connection is on state 3 and no report of connection lost (event not firing).
my esp8266 is primarily a web server and it does not send network requests (unless set manually to do so)...
thanks for your reply :+1: I'm open to any suggestions
this ping thing is working for me, instability was due to a bug, I was requesting (and debug printing) it free-wheel on loop, now that I set up a proper 500ms the esp8266 seems stable and I get the events firing as soon as the connection is lost.
still I'm missing a WL_CONNECTING state on WiFi.status(), well i can keep my own track on that of course, but it would be nice.
a view ofesp8266/arduino code gave me no clue on how to solve this, so i guess it is even deeper.
hope it helps
@neu-rah Just to be sure, you use that internal ping trick at a 500 msec interval?
And what do you usually ping? The gateway?
@TD-er yes, I'm pinging the gateway. Not sure what you mean by "internal", I'm using the pasted code to ping the gateway and yes 500ms, experimented with other values, but this one seems enough and esp8266 seems stable.
Most topics about this just suggested to let some host ping the node, which makes it "miraculously" work more stable.
You just let the node itself ping to somewhere.
So it was just to be sure that you were not just letting the nodes ping each other, but that letting a node perform the ping itself was also helping.
Now that it becomes more clear (at least to me) that it is just a matter of not updating the internal wifi status by creating some kind of network activity, it also makes sense that this should indeed help to increase wifi connection stability.
@TD-er What are your findings ? #2330(comment)
@d-a-v
I am now using the following schema for sending the Gratuitous ARP:
WiFi.hostByName() failsAnd on top of that, there is a setting to continuously send these ARP packets with an increasing interval.
This interval is reset to 100 msec on every occasion mentioned above. The max is 5000 msec.
To be honest, I don't see a lot of differences in connectivity between enabling/disabling this last setting.
Activating the "Eco" mode I recently added has a lot more impact on the connectivity.
When this "Eco" mode is active, the scheduler will call delay() when there is nothing to be run. This means the loop count will be (a lot) lower on idling nodes. (still around 300 - 400 loops/sec) This delay is not longer than 5 msec.
But as soon as the power consumption of the node drops, you will see it is missing packets, regardless of the gratuitous ARP packets.
A few pings to that node will "wake" it and it will receive all packets again until it is going to reduce its power again.
In short, I don't think it does help resolving all missing packets, but it is strangely not missing any ping packet. Even the first ping sent when it is in low power mode does get a reply even when it may need a few 100 msec for such a reply. (sometimes up to 700 msec)
So why it is missing other kinds of packets, I have no clue.
facing this issue if i set autoreconnect off and use both wifi mode
WiFi.mode(WIFI_AP_STA);
WiFi.setAutoReconnect(false);
currently below piece of code somehow working. i calling it in loop after every 10 seconds.
void handleWiFiConnectivity(){
Serial.println( F("\nHandeling WiFi Connectivity") );
if( !WiFi.localIP().isSet() || !WiFi.isConnected() ){
Serial.println( F("Handeling WiFi Reconnect Manually.") );
WiFi.reconnect();
}else{
Serial.print(F("IP address: "));
Serial.println(WiFi.localIP());
}
}
looking for any better suggestions/solution.
@TD-er yes, I'm pinging the gateway. Not sure what you mean by "internal", I'm using the pasted code to ping the gateway and yes 500ms, experimented with other values, but this one seems enough and esp8266 seems stable.
good day! neu-rah, can you please tell me how you solved the problem in more detail or can you write the code that you used to solve the problem? I have the problem that the internet service is very unstable and it happens many times that I have connection to the router but there is no internet and in the ESP8266 WDT is activated, but when the internet connection returns I have to manually reset my ESP8266 !! thank you
@javierferwolf all code is above, but after some update I had to remove it, guess it or something with same purpose was done into the core. Fell free to experiment with it thou.
@neu-rah thanks for replying, I don't have much programming experience so I would like to know where in the code? or in which part of the ESP8266WiFi library do I have to put the mentioned code:
extern "C" {
#include
}
void tick_back(void opt, void *resp) {
constexpr int ticks_tolerance=3;//how many fails in a row will we tolerate (or something like that)
static volatile int ticks_ok=0;
ping_resp ping_resp = reinterpret_cast
if (ticks_ok>0&&ping_resp->ping_err==-1) ticks_ok--;
else if (ticks_ok
}
//network tick, not using delay stuff and sending only one packet
void nw_tick(IPAddress dest) {
static ping_option tick_options;
memset(&tick_options, 0, sizeof(struct ping_option));
tick_options.count = 1;
tick_options.coarse_time = 0;
tick_options.ip = dest;
tick_options.sent_function = NULL;
tick_options.recv_function = reinterpret_cast
ping_start(&tick_options);
}
@javierferwolf https://github.com/esp8266/Arduino/issues/5912#issuecomment-477597488
i have the include on top of sketch (with the others, if any)
then the functions tick_back and nw_tick somewhere at global scope
and then i call tick_back on the loop giving it the IP address of the gateway, on the main loop i only call this functions every 500ms or so... but the timing was a matter of adjusting.
_if i recall it correctly_
hi @neu-rah Thank you for your answers! but sorry I can't understand how I can implement your code in the sketch, as I understand it, the nw_tick funtion i only call 500ms But what do I do with this? for example how could I implement in this simple sketch?
#include <ESP8266WiFi.h>
extern "C" {
#include <ping.h>
}
const char* ssid = "";
const char* password = "";
boolean wifiConnected = false;
unsigned long previousMillis = 0;
const long interval = 500;
void setup() {
Serial.begin(115200);
Serial.println();
Serial.println();
connectWifi();
}
void loop() {
unsigned long currentMillis = millis();
if (currentMillis - previousMillis >= interval) {
previousMillis = currentMillis;
nw_tick(IPAddress (192,168,1,1));
}
//here what is needed when the ESP8266 is connected
}
void tick_back(void *opt, void *resp) {
constexpr int ticks_tolerance=3;//how many fails in a row will we tolerate (or something like that)
static volatile int ticks_ok=0;
ping_resp* ping_resp = reinterpret_cast<struct ping_resp*>(resp);
if (ticks_ok>0&&ping_resp->ping_err==-1) ticks_ok--;
else if (ticks_ok<ticks_tolerance) ticks_ok++;
wifiConnected&=ticks_ok>0;//update my own state
}
//network tick, not using delay stuff and sending only one packet
void nw_tick(IPAddress dest) {
static ping_option tick_options;
memset(&tick_options, 0, sizeof(struct ping_option));
tick_options.count = 1;
tick_options.coarse_time = 0;
tick_options.ip = dest;
tick_options.sent_function = NULL;
tick_options.recv_function = reinterpret_cast<ping_recv_function>(&tick_back);
ping_start(&tick_options);
}
boolean connectWifi() {
boolean state = true;
int i = 0;
WiFi.begin(ssid, password);
Serial.println("");
Serial.println("Connecting to WiFi");
// Wait for connection
Serial.print("Connecting");
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
if (i > 10) {
state = false;
break;
}
i++;
}
if (state) {
Serial.println("");
Serial.print("Connected to ");
Serial.println(ssid);
Serial.print("IP address: ");
Serial.println(WiFi.localIP());
}
else {
Serial.println("");
Serial.println("Connection failed.");
}
return state;
}
thank you Rui!
I'm facing the same issue in the latest 2.5.1 SDK.
@yangyud-cn You should try again with latest release 2.7.1.
I just retried and the issue is gone.
@PurpleAir is it OK to close this issue ?
@yangyud-cn You should try again with latest release 2.7.1.
I just retried and the issue is gone.
@d-a-v, I'm using PlatformIO 2.5.1 which is Arduino 2.7.1, according to https://github.com/platformio/platform-espressif8266/releases
Interestingly, enabling DEBUG with ESP8266WebServer actually lower the chance of failure.
This makes me suspicious that there could be some timing related issue in network stack code, possibly some race condition.
Here are the log I collected, hope that helps. I also included my monitor and reconnect code.
ping Gateway 192.168.8.67 => 192.168.8.1 ... OK
ping Gateway 192.168.8.67 => 192.168.8.1 ... Fail
Mode: STA
PHY mode: N
Channel: 4
AP id: 0
Status: 5
Auto connect: 1
SSID (10): ZZZZZZZZZZ
Passphrase (12): xxxxxxxxxxxx
BSSID set: 0
ping Gateway 192.168.8.67 => 192.168.8.1 ... Fail
-- Several minutes later The IP lost --
ping Gateway 192.168.8.67 => 192.168.8.1 ... Fail
ping Gateway 192.168.8.67 => 192.168.8.1 ... Wifi not connected : 3
Mode: STA
PHY mode: N
Channel: 4
AP id: 0
Status: 5
Auto connect: 1
SSID (10): ZZZZZZZZZZ
Passphrase (12): xxxxxxxxxxxx
BSSID set: 0
state: 2 -> 0 (0)
Reconn
scandone
state: 0 -> 2 (b0)
state: 2 -> 0 (2)
Wifi not connected : 6
Reconn
Wifi not connected : 6
Reconn
Wifi not connected : 6
Reconn
....
Then I have to fall back to Reboot to recover.
My Ping code:
extern "C" {
#include <ping.h>
}
static bool _gPingCompleted = true;
static bool _gPingSucceed = false;
static uint32 _gPingRespTime = 0;
static ping_option _gPingOptions;
static void ping_recv_cb(void *opt, void *resp) {
// Cast the parameters to get some usable info
ping_resp* ping_resp = reinterpret_cast<struct ping_resp*>(resp);
// Error or success?
_gPingSucceed = ping_resp->ping_err != -1;
_gPingRespTime = ping_resp->resp_time;
_gPingCompleted = true;
}
bool startPing(IPAddress dest) {
if (_gPingCompleted) {
memset(&_gPingOptions, 0, sizeof(struct ping_option));
// Repeat count (how many time send a ping message to destination)
_gPingOptions.count = 1;
// Time interval between two ping (seconds??)
_gPingOptions.coarse_time = 1;
// Destination machine
_gPingOptions.ip = dest;
// Callbacks
_gPingOptions.recv_function = ping_recv_cb;
_gPingOptions.sent_function = NULL; //reinterpret_cast<ping_sent_function>(&_ping_sent_cb);
// Let's go!
if(ping_start(&_gPingOptions)) {
_gPingCompleted = false;
}
return !_gPingCompleted;
}
return false;
}
bool hasPingCompleted() {
return _gPingCompleted;
}
bool isPingSuccessful() {
return _gPingSucceed;
}
Fragment of my monitor code:
void netMonitorHandler() {
DEBUGPRINT(E("Mon: "));
char buf[32];
if(!WiFi.localIP().isSet() || !WiFi.isConnected()) {
onWiFiReconnect();
DEBUGPRINTLN(E(" Reconn"));
return;
}
else {
onWifiConnected();
}
if(hasPingCompleted())
{
if (_gPingCount) {
if (isPingSuccessful()) {
DEBUGPRINT(E(" OK"));
_gPingFailCount = 0;
}
else {
DEBUGPRINT(E(" Fail"));
WiFi.printDiag(Serial);
_gPingFailCount ++;
}
// start a new ping
IPAddress gwIP = WiFi.gatewayIP();
DEBUGPRINT(E(" ping Gateway " ));
DEBUGPRINT(WiFi.localIP());
DEBUGPRINT(E(" => " ));
DEBUGPRINT(gwIP);
startPing(gwIP);
DEBUGPRINT(E(" ... "));
_gPingCount ++;
}
void onWiFiReconnect() {
if (!WiFi.localIP().isSet() || !WiFi.isConnected()) {
char buf[32];
DEBUGPRINT(E("Wifi not connected : "));
DEBUGPRINTLN(WiFi.status());
if (gWifiConnected) {
gWifiConnected = false;
gReconnectRetryCount = 0;
WiFi.printDiag(Serial);
Serial.setDebugOutput(true);
gReconnectCount ++;
_gPingCount = 0; // reset ping count to start over
}
else if (gReconnectRetryCount >= 5)
{
DEBUGPRINTLN(E("Wifi not connected after retrying, reboot system ..."));
ESP.restart();
}
// looks like the auto reconnect handler is not reliable
if(millis() >= gLastReconnectTime + 30*1000 || millis() < gLastReconnectTime) {
gReconnectRetryCount ++;
gLastReconnectTime = millis();
// try reconnect each 30 seconds
WiFi.reconnect();
}
}
}
BTW, I tried the older version 2.6.2 (https://github.com/esp8266/Arduino/releases/tag/2.6.2) and with DEBUG_ESP_HTTP_SERVER enabled, it seems much more stable. Without DEBUG_ESP_HTTP_SERVER it will fail with similar issue too. To reproduce the issue, I actually did the stress test with an infinite loop of wget to the board without delay in between.
I actually managed to crash it with a simultaneous "ping -f" while doing the wgets with a WDT crash. But that is already pretty stable.
WDT @ 0x40104185, which is located inside this:
.text.lmacEndFrameExchangeSequence
0x00000000401040bc 0x367 C:\users\yuyang.platformio\packages\framework-arduinoespressif8266\tools\sdk\lib\NONOSDK22x_190703\libpp.a(lmac.o)
0x45a (size before relaxing)
@yangyud-cn
Can you provide an MCVE and the way to make it fail, that we could recompile and test locally ?
Per internal discussions, pushing back to v3.
Just curious, what is the intended fix, @devyte ?
FYI, I had the suggested work-around of performing an explicit disconnect in my code and it was still happening occasionally to have the WiFi.status to report the wrong state.
In my setup at least, it was showing to be not connected, although it was connected and serving web pages just fine.
My work-around for now (which is far from ideal) is to detect the inconsistency of the wifi state and if it has not improved after some timeout (15 seconds in my setup) it will turn off the WiFi and try again.
It does seem to be a timing issue, as it occurs way more often when I have the UDP syslog feature in my software enabled.
This suspicion of a timing issue could also explain why some nodes appear to suffer from this a lot more then other nodes as they may be occupied more on other tasks and/or actually run faster or slower due to differences in used flash chips or access point brands.
what is the intended fix
There isn't one. Per previous experience in such cases, this requires investigation, and that can take a while. We don't want to hold up v3 any further.
A workaround for me when I detect such anomaly is to do a wifi.reconnect.
Also this still is reproducible in 2.6.2