I'm getting the following exception shortly (a few seconds) after connecting to WiFi:
ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0x00
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3ffc0008,len:0
load:0x3ffc0008,len:1964
load:0x40078000,len:3668
load:0x40080000,len:260
entry 0x40080034
I (550) heap_alloc_caps: Initializing heap allocator:
I (550) heap_alloc_caps: Region 19: 3FFC0E9C len 0001F164 tag 0
I (552) heap_alloc_caps: Region 25: 3FFE8000 len 00018000 tag 1
I (561) cpu_start: Pro cpu up.
I (567) cpu_start: Single core mode
I (573) cpu_start: Pro cpu start user code
I (598) rtc: rtc v160 Nov 22 2016 19:00:05
I (609) rtc: XTAL 40M
W (696) cpu_start: failed to load RF calibration data, falling back to full calibration
I (2041) phy: phy_version: 258, Nov 29 2016, 15:51:07, 0, 2
I (2106) cpu_start: Starting scheduler on PRO CPU.
tcpip_task_hdlxxx : 3ffc4824, prio:18,stack:2048
I (2119) wifi: frc2_timer_task_hdl:3ffc635c, prio:22, stack:2048
I (2131) wifi: pp_task_hdl : 3ffc8bf4, prio:23, stack:8192
I (2132) wifi: mode : sta (24:0a:c4:00:2f:9a)
I (2133) foo: event: 2
I (3008) wifi: n:6 0, o:1 0, ap:255 255, sta:6 0, prof:1
I (3657) wifi: state: init -> auth (b0)
I (3658) foo: ping
I (3670) wifi: state: auth -> assoc (0)
I (3675) wifi: state: assoc -> run (10)
I (3686) wifi: connected with Cesanta, channel 6
I (3687) foo: event: 4
I (4658) foo: pong
I (5658) foo: ping
I (6658) foo: pong
I (6759) event: ip: 192.168.1.40, mask: 255.255.255.0, gw: 192.168.1.1
I (6759) foo: event: 7
I (7658) foo: ping
I (8658) foo: pong
I (9658) foo: ping
Guru Meditation Error of type InstrFetchProhibited occurred on core 0. Exception was unhandled.
Register dump:
PC : 00000001 PS : 00060e30 A0 : 800e9d8e A1 : 3ffc4700
A2 : 00000001 A3 : 3ffc9cc0 A4 : 00000001 A5 : 00000008
A6 : 00000000 A7 : 3ffbd594 A8 : 800e5a50 A9 : 00000002
A10 : 3ffc9cc0 A11 : 00000001 A12 : 3ffbf2a4 A13 : 00000000
A14 : 3ffc4008 A15 : 3ffbf07c SAR : 00000012 EXCCAUSE: 00000014
EXCVADDR: 00000000 LBEG : 4000c2e0 LEND : 4000c2f6 LCOUNT : ffffffff
Rebooting...
this is latest master esp-idf + esp-idf-template. i only changed ssid pass and later added a couple debug statements to log system event type and provide heartbeat to give an idea when the crash happens.
this only happens on successful wifi connection - if i specify nonexistent ssid or invalid pass, there's no crash. our office AP is nothing special, ESP8266 works with it just fine.
tried on 2 different devices, so it's not a hardware fault.
let me know if you need any info to help debug this.
When an exception like this is encountered, we can use gdb to determine where the exception occurred. I believe you are with Cesanta ... I am extremely interested in getting Mongoose supported on ESP32. If it would be useful, I will be delighted to partner with you ... you with your Cesanta code skills and me with some useful ESP32 skills. You can reach me on ESP32 IRC (kolban) or Skype at "neil.kolban".
yes, this is my first foray into ESP32 programming. i'm already setting up JTAG debug probe, a bit sooner than i expected :)
If you have trouble with JTAG, you can actually use gdb on your PC to examine the details of what failed through the serial port. Again, if you can use me, please do. I find that when approaching any new environment, I like to find folks who may have walked some of the path in the past that can offer tips on how to proceed.
in this case the problem is that we don't have useful PC to work with.
If needed, I can load your code on my PC ... via a screen share. Where are you compiling source?
On the manufacturer's forum for my board, I stumbled upon this.
If my laptop or my phone does a DHCP request, the board panics every time, which explains the regularity of the crashes and why it only happens when the board is associated with the access point.
Thanks for the reports. I have tried both with the latest master and the one which was available on the day this was originally reported. So far I wasn't able to reproduce this. Just to be sure, could any of you please upload the elf file which demonstrates the issue?
From the linked forum issue I got an impression that this may be related to some specific DHCP packet. I wonder if I may ask for a wireshark trace with DHCP packets logged?
/cc @abiliojr, I would like to keep discussion of this issue in one thread.
this must be triggered by some traffic, as for example it only repros on the office network for me and not at home. i'll try to correlate it with network activity today.
i ran tcpdump on my wlan0 interface at the same time as device is running and crashing, with an idea that it might be triggered by some broadcast traffic, which my laptop should then see as well.
unfortunately, i don't see any smoking gun in the logs, i could not identify a packet that precedes device crash.
here are some more findings. i went back in time to see if it is a recent regression - it's not.
the bug is present as far back as 9a3f9af2db534f993b22cd71cba8f13648fd2ee9, and that is as far as i can go because at previous commit (72422b32dc322b0d41f42a9da0cbaf7765359436) wifi won't even connect.
so it goes from being unable to connect at all to connect and crash, which is an improvement, of sorts :)
now, there is a minor change i noticed: before 9423390c2e88bb33be0b9064b3d8f0583f6aee7e PC had a different invalid value:
register dump at f394632319df343b1c1fc95e5d704589578a99f8:
PC : 002c002c PS : 00060e30 A0 : 800e9d8e A1 : 3ffc47a0
A2 : 00000001 A3 : 3ffc9c98 A4 : 002c002c A5 : 00000008
A6 : 00000000 A7 : 3ffbd588 A8 : 800e5a50 A9 : 00000002
A10 : 3ffc9c98 A11 : 00000001 A12 : 3ffbf294 A13 : 00000000
A14 : 3ffc2480 A15 : 3ffbf06c SAR : 0000001e EXCCAUSE: 00000014
EXCVADDR: 002c002c LBEG : 00000000 LEND : 00000000 LCOUNT : 00000000
and at 9423390c2e88bb33be0b9064b3d8f0583f6aee7e:
PC : 00000001 PS : 00060e30 A0 : 800e9d8e A1 : 3ffc47a0
A2 : 00000001 A3 : 3ffc9cc8 A4 : 00000001 A5 : 00000008
A6 : 00000000 A7 : 3ffbd594 A8 : 800e5a50 A9 : 00000002
A10 : 3ffc9cc8 A11 : 00000001 A12 : 3ffbf2a4 A13 : 00000000
A14 : 3ffc40b0 A15 : 3ffbf07c SAR : 0000001e EXCCAUSE: 00000014
EXCVADDR: 00000000 LBEG : 4000c2e0 LEND : 4000c2f6 LCOUNT : ffffffff
i can offer remote access to a machine with ESP32 module connected so you can try to debug it.
ping me on IRC if interested.
@rojer Thanks for the observation. I suppose I need to try a bunch of different routers.
The fact that PC contains garbage, and always the same kind of garbage (002c002c is also seen in the issue report on the Pycom forum) makes me think that this may be a stack overflow. Since it is triggered by some DHCP packet, it may be a stack overflow in LwIP task. Could you please do a test again, after increasing ESP_TASK_TCPIP_STACK from 2048 to some larger value in components/esp32/include/esp_task.h?
Also please enable "Run FreeRTOS on the first core only" option in menuconfig, Component config, FreeRTOS. LwIP task was moved to the second core in one of the recent changes, and from our internal testing it seems to be causing issues (although different from the one you have reported).
no, increasing ESP_TASK_TCPIP_STACK to 4096 didn't help. CONFIG_FREERTOS_UNICORE is set to y.
let me note that the device actually manages to acquire an IP address just fine, crash happens after that (usually a few seconds):
I (3615) wifi: state: auth -> assoc (0)
I (3625) wifi: state: assoc -> run (10)
I (3635) wifi: connected with Cesanta, channel 6
[0;32mI (3645) foo: event: 4 [0m
[0;32mI (4615) foo: pong [0m
[0;32mI (5615) foo: ping [0m
[0;32mI (6615) foo: pong [0m
[0;32mI (6715) event: ip: 192.168.1.34, mask: 255.255.255.0, gw: 192.168.1.1 [0m
[0;32mI (6715) foo: event: 7 [0m
^A^X [0;32mI (7615) foo: ping [0m
[0;32mI (8615) foo: pong [0m
[0;32mI (9615) foo: ping [0m
Guru Meditation Error of type InstrFetchProhibited occurred on core 0. Exception was unhandled.
Register dump:
PC : 00000001 PS : 00060e30 A0 : 800e9d8e A1 : 3ffc4fb0
A2 : 00000001 A3 : 3ffca4bc A4 : 00000001 A5 : 00000008
A6 : 00000000 A7 : 3ffbd594 A8 : 800e5a50 A9 : 00000002
A10 : 3ffca4bc A11 : 00000001 A12 : 3ffbf2a4 A13 : 00000000
A14 : 3ffc2490 A15 : 3ffbf07c SAR : 00000012 EXCCAUSE: 00000014
EXCVADDR: 00000000 LBEG : 4000c2e0 LEND : 4000c2f6 LCOUNT : ffffffff
Rebooting...
fwiw my router model is TP-Link TD-W9980
I did some tcpdump (during my previous attempt, before finding the thread talking of DHCP requests triggering the problem, I only tried to capture the 802.11 frames and saw some corrupt ones but replaying them didn't produce anything) and by elimination, I was left with only one packet: an ICMPv6 Router Solicitation.
Each tcpreplay of this packet provoked a panic.
I'm not sure it is useful I post the pcap here, as I can simply reproduce the issue with the rdisc6 command (could not compile it under OS X so I installed it on a laptop with Debian as it is available in the main APT repository via the ndisc6 package).
For the record, here is how to use it:
rdisc6 "$router_ip6" "$interface"
Where $router_ip6 may be found via:
ip -6 route
And $interface is simply your Wi-Fi interface (that you should also see in the ip's command output).
Unfortunately, #define LWIP_IPV6 0 in lwipopts.h will break the build and I'm not sure I can disable IPv6 support on the ESP32 (I'm new with this board & SDK, sorry) to fully assert there is a problem with the IPv6 stack.
Hope this helps.
fwiw, this is something i have in common with @ether42 - i also have ipv6 in my network.
Thanks @ether42, I have passed this way of reproducing the issue to the team, we should have a fix shortly.
i have similar issue on my beloved LoPy with ESP32. Changed IPv6 settings on my wifirouter from 6to4-tunnel to Disable and the crash has not happened again. Router Netgear R7000 i do not run any 脦Pv6 software. Not a fix but a workaround.
Kudus to @ether42 for the thorough report. Thanks!
I can confirm that running rdisc6 triggers the crash.
FYI, @ether42, I just had a look at rdisc on OS X, and made two fixes to rdisc. I mailed it's author just a minute ago.
struct netif's ipv6_addr_cb field was non-NULL but the pointer was invalid.
turns out, struct netif was not zeroed on allocation. #148 fixes that and possible other bugs due to uninitialized struct fields.
Most helpful comment
I did some tcpdump (during my previous attempt, before finding the thread talking of DHCP requests triggering the problem, I only tried to capture the 802.11 frames and saw some corrupt ones but replaying them didn't produce anything) and by elimination, I was left with only one packet: an ICMPv6 Router Solicitation.
Each tcpreplay of this packet provoked a panic.
I'm not sure it is useful I post the pcap here, as I can simply reproduce the issue with the rdisc6 command (could not compile it under OS X so I installed it on a laptop with Debian as it is available in the main APT repository via the
ndisc6package).For the record, here is how to use it:
Where
$router_ip6may be found via:And
$interfaceis simply your Wi-Fi interface (that you should also see in theip's command output).Unfortunately,
#define LWIP_IPV6 0inlwipopts.hwill break the build and I'm not sure I can disable IPv6 support on the ESP32 (I'm new with this board & SDK, sorry) to fully assert there is a problem with the IPv6 stack.Hope this helps.