Arduino: multiple ESP8266s on one WPA2 network -> no ARP

Created on 28 Mar 2017  Β·  26Comments  Β·  Source: esp8266/Arduino

I had the issue so many others have had, where my ESP8266s would stop responding to ARP. In my case (and perhaps others?) ArduinoOTA they also stopped responding to MDNS queries. However, if I add an ARP entry, I can reach them. Unlike with others, ARP/UDP broadcast reception seems to fail after a matter of a few hours, not 36.

Through a _lot_ of trial and error, I determined that the problem can be triggered by having more than one ESP8266 on the same WPA2 Personal AES network. I created a WEP network, moved one ESP8266 over, and to my surprise, _both_ started working stably for days. I moved the second over to the WEP network, and they're both still stable.

I wonder if this might have to do with group key renewal. Perhaps two ESP8266s confuse each other? I can't be certain, but I've never seen one fail while the other was stable. One thing is certain: they don't fail 100% after the first group key renewal, so perhaps it's partially random?

This is all with the latest master branch of this repo, so I definitely have a recent SDK.

In any event, I have a workaround, so I'm not begging for a fix. Perhaps this issue will also help others in the same boat.

All 26 comments

Could this be due a quirk or bug with the access point you are using? I have multiple esp on wifi WPA2 aes with no such issues, perfectly stable, the AP is mikrotik. Sometimes I would get that issue of missing arp with other APs like ubiquiti.

On Mar 27, 2017, at 7:21 PM, Lex Neva notifications@github.com wrote:

I had the issue so many others have had, where my ESP8266s would stop responding to ARP. In my case (and perhaps others?) ArduinoOTA they also stopped responding to MDNS queries. However, if I add an ARP entry, I can reach them. Unlike with others, ARP/UDP broadcast reception seems to fail after a matter of a few hours, not 36.

Through a lot of trial and error, I determined that the problem can be triggered by having more than one ESP8266 on the same WPA2 Personal AES network. I created a WEP network, moved one ESP8266 over, and to my surprise, both started working stably for days. I moved the second over to the WEP network, and they're both still stable.

I wonder if this might have to do with group key renewal. Perhaps two ESP8266s confuse each other? I can't be certain, but I've never seen one fail while the other was stable. One thing is certain: they don't fail 100% after the first group key renewal, so perhaps it's partially random?

This is all with the latest master branch of this repo, so I definitely have a recent SDK.

In any event, I have a workaround, so I'm not begging for a fix. Perhaps this issue will also help others in the same boat.

β€”
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub https://github.com/esp8266/Arduino/issues/3095, or mute the thread https://github.com/notifications/unsubscribe-auth/AKy2zrsu4-l-G5y3-9hJud6hlXkXuSarks5rqG6XgaJpZM4MrDpU.

@mtnbrit Thanks for the quick reply!

I suppose a router issue is possible. I have an Asus RT-AC66U running tomato shibby. I systematically modified every advanced wifi setting I could think of, to no avail. I upgraded the firmware to the latest. No dice.

I don't really like the idea that we might close this issue as "router issue". I've never had ARP or MDNS issues with any other device connected to it. I can clearly see in packet captures on other devices and on the router itself that the ESP8266 devices in this state simply do not send ARP or MDNS responses, even to the router itself.

I'm hardly going to go and buy another $100+ router on the chance that it fixes my problem, and I doubt others would either -- especially to make a <$5 part work.

Tomato shibby is rock solid, in my experience. If there _is_ some kind of lack of adherence to standards or whatever, it only seems to affect ESP8266s, so they should just work around the issue. It seems far more likely to me that the ESP8266, with its relatively new network stack, has some kind of edge-case bug or standards compliance issue.

Agreed.

When these devices would go unreachable on the ubiquiti APs, I would β€œkick” them from the AP to disassociated them and have the re-connect, then they would start responding to ping again. Does kicking the dead ones work to get them back for you?

On Mar 28, 2017, at 5:46 AM, Lex Neva notifications@github.com wrote:

@mtnbrit https://github.com/mtnbrit Thanks for the quick reply!

I suppose a router issue is possible. I have an Asus RT-AC66U running tomato shibby. I systematically modified every advanced wifi setting I could think of, to no avail. I upgraded the firmware to the latest. No dice.

I don't really like the idea that we might close this issue as "router issue". I've never had ARP or MDNS issues with any other device connected to it. I can clearly see in packet captures on other devices and on the router itself that the ESP8266 devices in this state simply do not send ARP or MDNS responses, even to the router itself.

I'm hardly going to go and buy another $100+ router on the chance that it fixes my problem, and I doubt others would either -- especially to make a <$5 part work.

Tomato shibby is rock solid, in my experience. If there is some kind of lack of adherence to standards or whatever, it only seems to affect ESP8266s, so they should just work around the issue. It seems far more likely to me that the ESP8266, with its relatively new network stack, has some kind of edge-case bug or standards compliance issue.

β€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/esp8266/Arduino/issues/3095#issuecomment-289758026, or mute the thread https://github.com/notifications/unsubscribe-auth/AKy2zj0hqbHUtwgO6mDLo6LWPLFGYE9Nks5rqQEWgaJpZM4MrDpU.

I didn't try that, but I doubt it would work. Not even resetting or power-cycling the ESP8266s would bring them back. They'd DHCP fine but still not respond to ARP and MDNS. Only a router reboot would do the job. Perhaps they somehow end up in some kind of temporary blacklist in the router for disobeying the WPA2 spec?

That pretty firmly points to the issue being with the AP, no?

Why not pick up a Mikrotik hAP lite for $24 on amazon and give it a try. That will at least eliminate or incriminate your Asus.

On Mar 28, 2017, at 9:42 AM, Lex Neva notifications@github.com wrote:

I didn't try that, but I doubt it would work. Not even resetting or power-cycling the ESP8266s would bring them back. They'd DHCP fine but still not respond to ARP and MDNS. Only a router reboot would do the job. Perhaps they somehow end up in some kind of temporary blacklist in the router for disobeying the WPA2 spec?

β€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/esp8266/Arduino/issues/3095#issuecomment-289830921, or mute the thread https://github.com/notifications/unsubscribe-auth/AKy2ztsLNKxyiTnyEH7hF132auWyvr7oks5rqTiTgaJpZM4MrDpU.

If it's supported, you could try dd-wrt instead of tomato on your router. I
use it in my 4 TP Link Archer C7s, and I haven't had issues with the ESPs,
even when walking out of range of one and into another.
I suggest this because I've heard of really odd issues with tomato and Asus.

On Mar 28, 2017 1:50 PM, "mtnbrit" notifications@github.com wrote:

That pretty firmly points to the issue being with the AP, no?

Why not pick up a Mikrotik hAP lite for $24 on amazon and give it a try.
That will at least eliminate or incriminate your Asus.

On Mar 28, 2017, at 9:42 AM, Lex Neva notifications@github.com wrote:

I didn't try that, but I doubt it would work. Not even resetting or
power-cycling the ESP8266s would bring them back. They'd DHCP fine but
still not respond to ARP and MDNS. Only a router reboot would do the job.
Perhaps they somehow end up in some kind of temporary blacklist in the
router for disobeying the WPA2 spec?

β€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <
https://github.com/esp8266/Arduino/issues/3095#issuecomment-289830921>,
or mute the thread AKy2ztsLNKxyiTnyEH7hF132auWyvr7oks5rqTiTgaJpZM4MrDpU>.

β€”
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/esp8266/Arduino/issues/3095#issuecomment-289833048,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQC6BhomuNt_sITr0eehQuJo_WD2TkYgks5rqTpUgaJpZM4MrDpU
.

I've got an Asus which I've had odd behaviour with stock, custom and open
wrt. It's now just a switch in our lan.
I've got ubiquiti and don't think I've struck this issue.
I am slightly suspicious of Asus wifi now tho...

On 29/03/2017 4:45 pm, "Develo" notifications@github.com wrote:

If it's supported, you could try dd-wrt instead of tomato on your router. I
use it in my 4 TP Link Archer C7s, and I haven't had issues with the ESPs,
even when walking out of range of one and into another.
I suggest this because I've heard of really odd issues with tomato and Asus.

On Mar 28, 2017 1:50 PM, "mtnbrit" notifications@github.com wrote:

That pretty firmly points to the issue being with the AP, no?

Why not pick up a Mikrotik hAP lite for $24 on amazon and give it a try.
That will at least eliminate or incriminate your Asus.

On Mar 28, 2017, at 9:42 AM, Lex Neva notifications@github.com wrote:

I didn't try that, but I doubt it would work. Not even resetting or
power-cycling the ESP8266s would bring them back. They'd DHCP fine but
still not respond to ARP and MDNS. Only a router reboot would do the job.
Perhaps they somehow end up in some kind of temporary blacklist in the
router for disobeying the WPA2 spec?

β€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <
https://github.com/esp8266/Arduino/issues/3095#issuecomment-289830921>,
or mute the thread AKy2ztsLNKxyiTnyEH7hF132auWyvr7oks5rqTiTgaJpZM4MrDpU>.

β€”
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/esp8266/Arduino/issues/3095#issuecomment-289833048,
or mute the thread
auth/AQC6BhomuNt_sITr0eehQuJo_WD2TkYgks5rqTpUgaJpZM4MrDpU>

.

β€”
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/esp8266/Arduino/issues/3095#issuecomment-289973732,
or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAN_A575geoelOcO7mFv-PqeqTvq0QVbks5rqdPkgaJpZM4MrDpU
.

I'm having the same issue! Asus RT-N66U with Tomato Shibby 140. WPA2 Personal (PSK) + AES. No problems with a single ESP8266 connected. Within a week of adding the second, both quit responding to ARP. Power cycle on the ESP8266 would reconnect to WiFi and DHCP assignment appears successful.

But ESP8266 devices would no longer respond to ping commands, except from one PC that had a previous ARP entry still cached.

I can confirm that having multiple ESPs connected to my ASUS RT AC66U (running latest rmerlin's firmware) fail to respond to ARP requests after a few days. I can also confirm that:

  • resetting the ESPs doesn't help and that rebooting the
  • rebooting the router does fix the issue
  • manually adding an ARP entry for the ESPs on any machine temporarily fixes the problem

Hi
in my case ESP respond correctly at ARP request made by PC, don't respond at ARP request made by another ESP.
ARP request made by a PC (packet dissection):

No.     Time           Source                Destination           Protocol Length Info
      2 94.496161000   HonHaiPr_e0:7a:ed     Broadcast             ARP      42     Who has 192.168.2.183?  Tell 192.168.2.14

Frame 2: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0
Ethernet II, Src: HonHaiPr_e0:7a:ed (b0:10:41:e0:7a:ed), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
    Destination: Broadcast (ff:ff:ff:ff:ff:ff)
        Address: Broadcast (ff:ff:ff:ff:ff:ff)
        .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
        .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
    Source: HonHaiPr_e0:7a:ed (b0:10:41:e0:7a:ed)
        Address: HonHaiPr_e0:7a:ed (b0:10:41:e0:7a:ed)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: ARP (0x0806)
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IP (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: HonHaiPr_e0:7a:ed (b0:10:41:e0:7a:ed)
    Sender IP address: 192.168.2.14 (192.168.2.14)
    Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Target IP address: 192.168.2.183 (192.168.2.183)

ARP request made by another ESP (failed):

No.     Time           Source                Destination           Protocol Length Info
      1 0.000000000    5c:cf:7f:3c:d4:2d     Broadcast             ARP      42     Who has 192.168.2.183?  Tell 192.168.2.184

Frame 1: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0
Ethernet II, Src: 5c:cf:7f:3c:d4:2d (5c:cf:7f:3c:d4:2d), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
    Destination: Broadcast (ff:ff:ff:ff:ff:ff)
        Address: Broadcast (ff:ff:ff:ff:ff:ff)
        .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
        .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
    Source: 5c:cf:7f:3c:d4:2d (5c:cf:7f:3c:d4:2d)
        Address: 5c:cf:7f:3c:d4:2d (5c:cf:7f:3c:d4:2d)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: ARP (0x0806)
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IP (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: 5c:cf:7f:3c:d4:2d (5c:cf:7f:3c:d4:2d)
    Sender IP address: 192.168.2.184 (192.168.2.184)
    Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Target IP address: 192.168.2.183 (192.168.2.183)

the only difference is in "source" data

Me too. I lose the ability to ping my ESP modules after a few hours, and need to reboot to bring them back. I know it's the ARP issue because they are able to maintain a constant connection with a server, and operate otherwise correctly until the ping issue occurs from a machine which doesn't have them in it's ARP cache.

FYI my router is a TP Link Archer C7, stock firmware.

I've dug out an ancient Linksys WRT54GL running dd-wrt and have moved them over to that today, to see if it helps.

Long discussion and workaround proposals are in #2330.
Testers are welcome.

@SupraJames here's something interesting: I use Archer C7s with dd-wrt, and I haven't encountered the arp issue.

@devyte before I brick my router and annoy my wife with lack of internet, is there a build of DD-WRT that you recommend?

I have a Archer C7 V2 which does seem to be supported, and I can see the latest build is from Jan 7th, but I'd be wary of flashing something that new.

I have the same v2. I don't have my build number at hand, but I updated beginning of December, and no issues. I'd say go for it :P

Just for info and referring to 2330, I am also experiencing issues with a TP-Link and the ESP.

Me too, with a TP-Link Archer C7 V1.0 stock firmware
I spend a lot of time to find the issue on my code, but finally I solved it by using the DHCP of the rooter to have fixed ip with MAC address and binding the ip and mac on ARP table. Now all my ESP run without issue for days !!!
I plane to buy the Archer C7 V2.0, (i found one for 35€)
Can you confirme there is no issue with this one ?

having same issue especially when esp8266 has static ip

@riker65 no, or at least not directly.

@devyte,
Some Not directly
Some never
Some when I Ping them

Thus Happens mainly when connected to Unifi APs

It does seem to be related to the power consumption of the ESP itself.
I have had a few ESPs here connected to a power supply that also displays its consumption.
The nodes I have do exchange some UDP packets every minute as a p2p protocol between nodes to exchange sensor data.
All nodes in the network are also displayed on a page with the last time one of these were seen (in minutes), so it is a very simple view on what nodes can reach others.

As soon as a node starts using less energy (some kind of "eco mode"), these packets are lost. Either missed by the receiving end or never reaching the receiving end. (ARP issue?)
This lower energy mode can be achieved by calling delay(...) as long as nothing has to be done.
But it may also happen to other nodes when not occupied full time. It may take a minute or sometimes even 10 minutes.
As soon as you're actively accessing several pages served by the node, or simply sending ping packets to it, it will jump back to 'normal' power consumption again.
Funny thing is, an ICMP packet is always replied (as long as the ARP is known) but it may take several 100's of msec for the first one to be replied.

I do send Gratuitous ARP packets from the ESP to overcome the ARP issue, but that still doesn't help for missing packets when the node is in some kind of "eco mode".
But this "eco mode" can explain why it may not reply to ARP packets in the first place.

@TD-er your observations sound like the modem sleep thing that was addressed in sdk pre3, where there are 2 sort of modem sleep states. In one of them, the esp could miss incoming packets due to desync of some beacon or something. In the other, all beacons were listened for, so no packets missed. I think the former case is the default and used in 2.2.x, while the latter is made available with a new api in pre3.
As a side note, the missed packets case has lower power usage, and I suspect is meant for deep sleep cases where on wakeup only transmission is done, while the other good case with higher power usage is meant for serving, i. e. when you need to access the ESP remotely. That's just my guess, though.

Is that the new "listeninterval" that can be set in SDK3?
Do you know if this is set in SDK2.2.x to some dynamic value?
Can we read its value somewhere? The wrapper in this repo is just returning 0 for SDK < 3

listenInterval is ignored in sdk2.

I have a whole bunch of ESPs on the same network. They have been running for several weeks without a single drop, all since #6484 .
I'm closing this.
If anyone still encounters dropouts, please open a new issue and we can continue discussion there.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mreschka picture mreschka  Β·  3Comments

rudydevolder picture rudydevolder  Β·  3Comments

tiestvangool picture tiestvangool  Β·  3Comments

gosewski picture gosewski  Β·  3Comments

treii28 picture treii28  Β·  3Comments