Hardware: ESP-WROOM-02
Core Version: 2.4.0
WiFiClient causes exception 28 in connect function.
Module: Generic ESP8266 Module
Flash Size: 2M
CPU Frequency: 80Mhz
Flash Mode: DIO
Flash Frequency: 40Mhz
Upload Using: SERIAL
Reset Method: ck
#include <ESP8266WiFi.h>
#include <Esp.h>
const char *ssid = "MySSID";
const char *password = "MyPass";
const char *serverIP = "192.168.1.106";
const int serverPort = 1883;
WiFiClient client;
void toggleLED()
{
pinMode(5, OUTPUT);
digitalWrite(5, HIGH);
delay(250);
digitalWrite(5, LOW);
delay(250);
}
void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
WiFi.setOutputPower(20.5);
WiFi.mode(WIFI_STA);
WiFi.begin(ssid, password);
}
void loop() {
if(WiFi.status() == WL_CONNECTED)
{
Serial.println("Connected to WiFi");
if(!client.connect(serverIP, serverPort))
{
Serial.println("Can't connect client.");
}
else
{
Serial.println("Client connected.");
}
}
else
{
Serial.println("Not connected to WiFi");
}
toggleLED();
}
Exception (28):
epc1=0x40202af1 epc2=0x00000000 epc3=0x00000000 excvaddr=0x000000bb depc=0x00000000
ctx: cont
sp: 3ffefcd0 end: 3ffeff00 offset: 01a0
>>>stack>>>
3ffefe70: 3ffeec58 00000000 3fff112c 40202aea
3ffefe80: 6a01a8c0 00000010 3ffe8880 3ffeeecc
3ffefe90: 3fffdad0 00000011 3ffeeea0 3ffeeecc
3ffefea0: 3fffdad0 0000075b 3ffeec58 402026a8
3ffefeb0: 3ffe8bf0 6a01a8c0 3ffe8bf0 6a01a8c0
3ffefec0: 3ffe88c8 40202460 3ffeeea0 40203198
3ffefed0: 00000000 00000000 3ffeeea0 40202132
3ffefee0: 3fffdad0 00000000 3ffeeec4 40203404
3ffefef0: feefeffe feefeffe 3ffeeee0 40100710
<<<stack<<<<
ets Jan 8 2013,rst cause:1, boot mode:(3,6)
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
I'm trying to connect to a MQTT server. It connects to the server and works as expected. But when I turn of my router and the connection is lost it crashes with exception 28. Am I using something wrong or is this a bug?
I can confirm this. Doesn't happen with LwIP v1.4 selected.
Log output with 1.4 when WiFi disconnects:
:ref 1
Client connected.
state: 5 -> 0 (0)
rm 0
:er -8 0x00000000
del if0
usl
mode : null
wifi evt: 1
STA disconnect: 8
:ur 1
:del
With 2.0, this results either in an exception or a hardware WDT (still wondering how that happens):
:ref 1
Client connected.
state: 5 -> 0 (0)
rm 0
del if0
usl
mode : null
wifi evt: 1
STA disconnect: 8
:ur 1
:close
ets Jan 8 2013,rst cause:4, boot mode:(3,6)
wdt reset
Seems that if TCP disconnect is triggered after WiFi goes down, this results in an invalid memory access somewhere.
With LwIP 1.4, TCP PCB receives error callback (log with :er -8), and closes the connection, while with LwIP 2, this error callback is not sent. When WiFiClient.connect is called again, old connection is closed (tcp_close), but the network interface is already dead. Why this triggers a hardware WDT is something i don't entirely understand though.
The "del if0" is treated differently in lwip2.
I'll check asap.
--
on mobile
On January 4, 2018 2:26:20 PM Ivan Grokhotkov notifications@github.com wrote:
I can confirm this. Doesn't happen with LwIP v1.4 selected.
Log output with 1.4 when WiFi disconnects:
:ref 1 Client connected. state: 5 -> 0 (0) rm 0 :er -8 0x00000000 del if0 usl mode : null wifi evt: 1 STA disconnect: 8 :ur 1 :delWith 2.0, this results either in an exception or a hardware WDT (still
wondering how that happens)::ref 1 Client connected. state: 5 -> 0 (0) rm 0 del if0 usl mode : null wifi evt: 1 STA disconnect: 8 :ur 1 :close ets Jan 8 2013,rst cause:4, boot mode:(3,6) wdt resetSeems that if TCP disconnect is triggered after WiFi goes down, this
results in an invalid memory access somewhere.--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/esp8266/Arduino/issues/4078#issuecomment-355281312
Thanks for reproducing!
That's the same behavior I described here a few days ago.
Thank you for your explanation! Where can I change to LwIP v1.4 if it is possible without changing core version?
@pakokol In Arduino IDE you can change it in the Tools menu

@jp112sdl Thank you!
about lwip2:
I think I have fixed the wdt problem, but i still need to understand what is triggering wifi_station_get_connect_status() result. It does not go back to STATION_GOT_IP even though dhcp is re-triggered. I will dig deeper.
@pakokol Please retest with the reference PR.
@devyte I have cloned and tested your commit, but it still crashes after a few connection drops.
Connected to WiFi
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
pm close 7
ip:0.0.0.0,mask:255.255.255.0,gw:192.168.1.1
Fatal exception 28(LoadProhibitedCause):
epc1=0x40202af9, epc2=0x00000000, epc3=0x00000000, excvaddr=0x000002cb, depc=0x00000000
Exception (28):
epc1=0x40202af9 epc2=0x00000000 epc3=0x00000000 excvaddr=0x000002cb depc=0x00000000
ctx: cont
sp: 3ffefcd0 end: 3ffeff00 offset: 01a0
>>>stack>>>
3ffefe70: 3ffeec58 00000000 3fff1044 40202af2
3ffefe80: 6a01a8c0 00000010 3ffe8880 3ffeeecc
3ffefe90: 3fffdad0 00000011 3ffeeea0 3ffeeecc
3ffefea0: 3fffdad0 0000075b 3ffeec58 402026b0
3ffefeb0: 3ffe8bf0 6a01a8c0 3ffe8bf0 6a01a8c0
3ffefec0: 3ffe88c8 40202468 3ffeeea0 402031dc
3ffefed0: 00000000 00000000 3ffeeea0 4020213a
3ffefee0: 3fffdad0 00000000 3ffeeec4 40203448
3ffefef0: feefeffe feefeffe 3ffeeee0 40100710
<<<stack<<<
ets Jan 8 2013,rst cause:1, boot mode:(3,7)
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v00000000
~ld
scandone
Not connected to WiFi
@pakokol
Can you retry with a clean installation from git (the referenced PR is merged),
and use the stack decoder with your stack dump ?
@d-a-v I have retested it with a clean installation from git and also removed my printouts from the above code. The exception occurred after the second connection drop. The result from stack decoder is:
0x40202ab6: WiFiClient::connect(IPAddress, unsigned short) at C:\Users\patrik.kokol\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.0\libraries\ESP8266WiFi\src/WiFiClient.cpp line 329
0x40202674: WiFiClient::connect(char const*, unsigned short) at C:\Users\patrik.kokol\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.0\libraries\ESP8266WiFi\src/WiFiClient.cpp line 329
0x4020242c: ESP8266WiFiSTAClass::status() at C:\Users\patrik.kokol\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.0\libraries\ESP8266WiFi\src/ESP8266WiFiSTA.cpp line 483
0x40202085: toggleLED() at C:\Users\patrik.kokol\Documents\Arduino\sketch_jan05a/sketch_jan05a.ino line 19
0x4020211d: loop at C:\Users\patrik.kokol\Documents\Arduino\sketch_jan05a/sketch_jan05a.ino line 48
0x402033c8: loop_wrapper at C:\Users\patrik.kokol\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/core_esp8266_main.cpp line 57
0x40100710: cont_norm at C:\Users\patrik.kokol\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/cont.S line 109
And the printouts:
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt
pm open,type:2 0
state: 5 -> 0 (2)
rm 0
pm close 7
reconnect
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt
connected with QuickEagle, channel 1
dhcp client start...
ip:192.168.1.107,mask:255.255.255.0,gw:192.168.1.1
pm open,type:2 0
bcn_timout,ap_probe_send_start
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
pm close 7
ip:0.0.0.0,mask:255.255.255.0,gw:192.168.1.1
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
no QuickEagle found, reconnect after 1s
reconnect
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt
pm open,type:2 0
state: 5 -> 0 (2)
rm 0
pm close 7
reconnect
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt
connected with QuickEagle, channel 1
dhcp client start...
ip:192.168.1.107,mask:255.255.255.0,gw:192.168.1.1
pm open,type:2 0
bcn_timout,ap_probe_send_start
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
pm close 7
ip:0.0.0.0,mask:255.255.255.0,gw:192.168.1.1
Fatal exception 28(LoadProhibitedCause):
epc1=0x40202abd, epc2=0x00000000, epc3=0x00000000, excvaddr=0x000001c1, depc=0x00000000
Exception (28):
epc1=0x40202abd epc2=0x00000000 epc3=0x00000000 excvaddr=0x000001c1 depc=0x00000000
ctx: cont
sp: 3ffefc60 end: 3ffefe90 offset: 01a0
>>>stack>>>
3ffefe00: 3ffeebe8 00000000 3fff0fa4 40202ab6
3ffefe10: 6a01a8c0 02f64e96 3ffeec50 00000000
3ffefe20: 3ffee610 3ffeec50 3ffeee70 3ffeee5c
3ffefe30: 3fffdad0 0000075b 3ffeebe8 40202674
3ffefe40: 3ffe8b98 6a01a8c0 3ffe8b98 6a01a8c0
3ffefe50: 3ffe8870 4020242c 3ffeee54 40202085
3ffefe60: 00000000 00000000 3ffeee54 4020211d
3ffefe70: 3fffdad0 00000000 3ffeee54 402033c8
3ffefe80: feefeffe feefeffe 3ffeee70 40100710
<<<stack<<<
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v00000000
~ld
I hope I did everything the right way, because I'm a little confused about what did you meant that the referenced PR is merged. Aren't PR just printouts on the serial or did I miss something?
a PR is a "Pull Request" meaning a pending source code update which become included (part of core) once "Merged". Get some doc about git and GitHub for more information.
Could you please add client.println("hello"); client.stop(); right after your "Client connected" and retest ?
It works for me (10 times AP off and on).
Double check you are using the latest master branch of the core from git (not the latest release 2.4.0 which will not work of course, you created this issue for that reason), with
git pull origin master
Thanks for the explanation, I will look up in the docs so that I won't be surprised next time. I added the client.println("hello"); client.stop(); after the printout client connected and the output is:
```
Connected to WiFi
bcn_timout,ap_probe_send_start
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
pm close 7
ip:0.0.0.0,mask:255.255.255.0,gw:192.168.1.1
Fatal exception 28(LoadProhibitedCause):
epc1=0x40202bbd, epc2=0x00000000, epc3=0x00000000, excvaddr=0x002b00e3, depc=0x00000000
Exception (28):
epc1=0x40202bbd epc2=0x00000000 epc3=0x00000000 excvaddr=0x002b00e3 depc=0x00000000
ctx: cont
sp: 3ffefd80 end: 3ffeffb0 offset: 01a0
stack>>>
3ffeff20: 3ffeed08 00000000 3fff0934 40202bb6
3ffeff30: 6a01a8c0 00000010 3ffe8880 3ffeef7c
3ffeff40: 3fffdad0 00000011 3ffeef50 3ffeef7c
3ffeff50: 3fffdad0 0000075b 3ffeed08 402026dc
3ffeff60: 3ffe8ca8 6a01a8c0 3ffe8ca8 6a01a8c0
3ffeff70: 3ffe88d0 4020248c 3ffeef50 40203364
3ffeff80: 00000000 3ffeed08 3ffeef50 40202144
3ffeff90: 3fffdad0 00000000 3ffeef74 402035f8
3ffeffa0: feefeffe feefeffe 3ffeef90 40100710
<<
ets Jan 8 2013,rst cause:1, boot mode:(1,6)
ets Jan 8 2013,rst cause:4, boot mode:(1,6)
wdt reset
```
I also checked with git that I'm on the latest master. Maybe my settings are wrong.

Later today I will also try to change my router.
Hi, sorry for the late response. I have changed my router and the exception is still triggered.
I have been able to reproduce, but it is honestly difficult to do.
I currently use my phone's AP and the latter occasionally reboots more often than the esp fails :]
I will try with another AP.
I could capture interesting logs.
I always had this behaviour since I could isolate the problem (lwip v1 or v2).
I instrumented ClientContext as follow:
--- a/libraries/ESP8266WiFi/src/include/ClientContext.h
+++ b/libraries/ESP8266WiFi/src/include/ClientContext.h
@@ -129,13 +129,18 @@ public:
}
_connect_pending = 1;
_op_start_time = millis();
+os_printf(":x1 %p\n", _pcb);
// This delay will be interrupted by esp_schedule in the connect callback
delay(_timeout_ms);
+os_printf(":x2 %p\n", _pcb);
_connect_pending = 0;
if (state() != ESTABLISHED) {
+os_printf(":x3 %p\n", _pcb);
abort();
+os_printf(":x4 %p\n", _pcb);
return 0;
}
+os_printf(":x5 %p\n", _pcb);
return 1;
}
and the log:
Connected to WiFi
:x1 0x3fff1394
state: 5 -> 2 (3c0)
rm 0
:x2 0x017500ad
Fatal exception 28(LoadProhibitedCause):
We can see that ClientContext's members are modified (this is untouched, _pcb address is borked) when WiFi is lost during the delay (and possibly *_connected() callbacks happening during that delay).
In my tests, this behaviour is very seldom. I will dig further.
Thanks to an added delay(), I can now more systematically trigger the issue.
AP disconnection must happen during the second delay.
sources:
--- a/libraries/ESP8266WiFi/src/include/ClientContext.h
+++ b/libraries/ESP8266WiFi/src/include/ClientContext.h
@@ -129,13 +129,26 @@ public:
}
_connect_pending = 1;
_op_start_time = millis();
+void* pcbsave = _pcb;
+os_printf(":x1 %p\n", _pcb);
// This delay will be interrupted by esp_schedule in the connect callback
delay(_timeout_ms);
+// this delay should not be interrupted if connection occured
+os_printf(":x1b %p rx=%p this=%p\n", _pcb, _rx_buf, this);
+malloc(0);//umm's integrity check
+delay(2000);
+malloc(0);//umm's integrity check
+os_printf(":x2 %p =? %p rx=%p this=%p\n", _pcb, pcbsave, _rx_buf, this);
+assert(_pcb == pcbsave);
+os_printf(":x2b %p\n", _pcb);
_connect_pending = 0;
if (state() != ESTABLISHED) {
+os_printf(":x3 %p\n", _pcb);
abort();
+os_printf(":x4 %p\n", _pcb);
return 0;
}
+os_printf(":x5 %p\n", _pcb);
return 1;
}
logs:
Connected to WiFi
:x1 0x3fff1604
:x1b 0x3fff1604 rx=0x00000000 this=0x3fff115c
:oom(0)@.../src/include/ClientContext.h:138
:oom(0)@.../src/include/ClientContext.h:140
:x2 0x3fff1604 =? 0x3fff1604 rx=0x3fff0e1c this=0x3fff115c
:x2b 0x3fff1604
:x5 0x3fff1604
Client connected.
Connected to WiFi
:x1 0x3fff16b4
:x1b 0x3fff16b4 rx=0x00000000 this=0x3fff115c
:oom(0)@.../src/include/ClientContext.h:138
:oom(0)@.../src/include/ClientContext.h:140
:x2 0x3fff16b4 =? 0x3fff16b4 rx=0x3fff0e1c this=0x3fff115c
:x2b 0x3fff16b4
:x5 0x3fff16b4
Client connected.
Connected to WiFi
:x1 0x3fff1764
:x1b 0x3fff1764 rx=0x00000000 this=0x3fff115c
:oom(0)@.../src/include/ClientContext.h:138
:oom(0)@.../src/include/ClientContext.h:140
:x2 0x3fff1764 =? 0x3fff1764 rx=0x3fff0e1c this=0x3fff115c
:x2b 0x3fff1764
:x5 0x3fff1764
Client connected.
Connected to WiFi
:x1 0x3fff1814
:x1b 0x3fff1814 rx=0x00000000 this=0x3fff115c
:oom(0)@.../src/include/ClientContext.h:138
<---------------------------------- disconnection here
state: 5 -> 2 (3c0)
rm 0
pm close 7
ip:0.0.0.0,mask:255.255.255.0,gw:192.168.43.1
:oom(0)@.../src/include/ClientContext.h:140
:x2 0x0a64253a =? 0x3fff1814 rx=0xa5a5a500 this=0x3fff115c
:x2b 0x0000008e
Fatal exception 28(LoadProhibitedCause):
epc1=0x40202d92, epc2=0x00000000, epc3=0x00000000, excvaddr=0x0a64254e, depc=0x00000000
Panic .../src/include/ClientContext.h:142 int ClientContext::connect(ip_addr_t*, uint16_t)
L142 is the assert line.
I have enabled umm's integrity check (I will make a PR for that).
umm's poison is also enabled (debug mode), and we can see that _rx member is overwritten with a5a5 which is the poison byte.
At this point, I have still not figured out why _pcb member is modified but at the light of the poison byte, there could be a free() occuring at some points (wifi callbacks on disconnections maybe).
It has to be noted that in normal operation, there seems to be a memory leak, by looking at _pcb address before failing.
@pakokol Do you think you could try the pull request ?
edit: ~You can simply replace your libraries/ESP8266WiFi/src/include/ClientContext.h by this one and recompile.~ try latest git master.
@d-a-v Hi sorry for the late response I was away for the weekend. I retested it with your changes and I couldn't reproduce the exception. Thank you for your support on that issue!! :)
The reason is found, this proposed fix is valid but a better one is coming.
What are the build flags for lwip, using platformio?
I am having a problem with this correction, after 5 days working well, communication with the access point is ok, I can ping the device, the device connect to the server but there is no communication with the server. The sketch stop when calling httpclient. Timeout dont work and hardware watch dog dont restart the device. After reboot the access point or the device, every thing go normal.
I'm using iis 8.5 server to connect and i got the error 1232:
ERROR_HOST_UNREACHABLE
The network location cannot be reached.