Arduino: [BUG] ESP8266 Cannot open certain TCP connections

Created on 31 Mar 2018  路  39Comments  路  Source: esp8266/Arduino

Basic Info

  • [x] This issue complies with the issue POLICY doc.
  • [x] I have read the documentation at readthedocs and the issue is not addressed there.
  • [x] I have tested that the issue is present in current master branch (aka latest git).
  • [x] I have searched the issue tracker for a similar issue.
  • [ ] If there is a stack dump, I have decoded it.
  • [x] I have filled out all fields below.

Platform

  • Hardware: NodeMCU V0.9 (ESP-12E Module) (Compatible)
  • Core Version: [2.4.1]
  • Development Env: [Arduino IDE 1.8.5]
  • Operating System: [Windows, OSX]

Settings in IDE

  • Board: NodeMCU V0.9 (ESP-12E Module)
  • Flash Size: 4MB (1M SPIFFS)
  • Debug Port: Serial
  • Debug Level: None
  • lwIP Variant: v2 Lower Memory
  • CPU Frequency: 80Mhz
  • Upload Speed: 115200
  • Erase Flash: Only Sketch
  • Programmer: AVRISP mkII

The Problem

I have this server: 185.205.210.197
(Check that it works: http://185.205.210.197/)

And I have this code in my ESP8266:

#include <ESP8266WiFi.h>

#define HOST "185.205.210.197"
#define PORT 80
#define WIFI_SSID "SSID"
#define WIFI_PSW "PASSWORD"

const char* ssid = WIFI_SSID;
const char* password = WIFI_PSW;

void setup () {
  Serial.begin(115200);
  WiFi.begin(ssid, password);

  while (WiFi.status() != WL_CONNECTED) {
    delay(5000);
    Serial.println("Connecting to Wifi...");
  }

  while(!pingServer()) {
    delay(3000); //Send a request every 3 seconds
  }
}


void loop() {}


bool pingServer() {
  WiFiClient client;

  if (!client.connect(HOST, PORT)) {
    Serial.println("connection failed");
    return false;
  }

  Serial.println("connection success!");
  return true;
}

The expected result: Connection success, but got connection failed.

If I try connecting with my PC, it works, only with ESP8266 it doesn't.

Wireshark

Tapping with Wireshark I suspect that there is a problem with ESP8266 TCP packets, here I leave them for you to analyze:

Success TCP Packet

PC TCP connection request (which works):

000af5f4e90c9801a7ad45b708004500004057b6400040066a7dc0a82b49b9cdd2c5dc6d1b3932f96fd800000000b002ffffc71e0000020405b4010303050101080a4ab5122c0000000004020000

(captured using basic ethernet)

Screenshot of an example of a TCP packet that is able to get a server's response.

ESP8266 TCP Packet (Not Success)

ESP8266 TCP connection request (which receives no response from server):

000019006f080000794c5e9b00000000126c9e098004d4a10008013c00000af5f4e90c2c3ae80f137f000af5f4e90c9002aaaa0300000008004500002c00270000ff0642f9c0a82b70b9cdd2c5c00d1b3900001c77000000006002086022f9000002040218a9636c75

(captured using monitor mode in Wireshark - has radiotap headers)

Screenshot example of an ESP8266 packet.

What I've also checked:

  • The ESP connects fine to other servers.

  • There is no firewall blocking the connection between ESP and the server.

  • I've tried this in different networks and conditions, the error persists.

  • Other computers can connect to the server, so ESP should be able as well.

  • Tried with different ESP8266 boards to rule out hardware malfunctioning.

How to replicate:

  1. Using the latest Arduino IDE and ESP8266 library.

  2. Check that the test server works at http://185.205.210.197:80/ (just open this link basically and see that is running nginx).

  3. Copy and upload the code above to your ESP8266 (NODEMCU or similar).

  4. Open the Arduino IDE console/monitor and confirm that no connection is established to port 80.

  5. Rage in admiration for such a strange bug.

Possible Workaround

@Pablo2048 Suggested setting the Arduino IDE board option lwIP Variant from v2 Lower Memory to v2 Higher Bandwidth. This increases the TCP MSS (http://lwip.wikia.com/wiki/Tuning_TCP) that goes from around 500 bytes to around 1400 bytes, which is accepted by this server.

Therefore by increasing the MSS, the server starts responding and a TCP connection is established, whereas with a low MSS the server wouldn't even answer back with an error packet (complete silence).

Most helpful comment

with no information about the core and lwip version you are using (no, the questionair is not to tease you) there will be low response ... if you are not willing to spend some time, why should others.

and no, this is not sufficient info:

Using the latest Arduino IDE and ESP8266 library.

All 39 comments

PS. I learned from a comment in Freenode that telnet also appears not to work with that server, with the same error as ESP8266:

telnet 185.205.201.197 80

Telnet works, ip is mistyped in your command: 201 instead of 210.

Right... Ok, so ESP8266 is the only one who cannot access the server. This must indeed be a library bug.

Has anyone been able to reproduce this with the steps I wrote?

with no information about the core and lwip version you are using (no, the questionair is not to tease you) there will be low response ... if you are not willing to spend some time, why should others.

and no, this is not sufficient info:

Using the latest Arduino IDE and ESP8266 library.

@5chufti done.

Ok, so please try to:

  1. turn on debugging messages to see whats SDK/libraries are doing inside
  2. try to pass IPAddress parameter into .connect method instead of char * to eliminate gethostbyname error

@Pablo2048 ok, I just did that, and I think we can confirm that he's parsing the hostname properly.

I attach below:

  1. the sketch code with the modification to IPAddress you proposed
  2. a screenshot of the wireshark tapping of all packets ESP8266 tried to send to the server while running that sketch (in this screenshot you see 6 packets, 3 per connection attempt)
  3. the wireshark cap file
  4. the console prints of said sketch

Sketch

#include <ESP8266WiFi.h>

#define HOST IPAddress(185,205,210,197)
#define PORT 80
#define WIFI_SSID "redecomfios"
#define WIFI_PSW ""

const char* ssid = WIFI_SSID;
const char* password = WIFI_PSW;

void setup () {
  Serial.begin(115200);
  WiFi.begin(ssid, password);

  while (WiFi.status() != WL_CONNECTED) {
    delay(5000);
    Serial.println("Connecting to Wifi...");
  }

  while(!pingServer()) {
    delay(3000); //Send a request every 3 seconds
  }
}


void loop() {}


bool pingServer() {
  WiFiClient client;

  if (!client.connect(HOST, PORT)) {
    Serial.println("connection failed");
    return false;
  }

  Serial.println("connection success!");
  return true;
}

Console Prints

state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 5

connected with redecomfios, channel 11
dhcp client start...
cnt 
wifi evt: 0
ip:192.168.43.112,mask:255.255.255.0,gw:192.168.43.1
wifi evt: 3
Connecting to Wifi...
:ref 1
:ctmo
:abort
:ur 1
:del
connection failed
pm open,type:2 0
:ref 1
:ctmo
:abort
:ur 1
:del
connection failed
:ref 1
:ctmo
:abort
:ur 1
:del
connection failed
:ref 1
:ctmo
:abort
:ur 1
:del
connection failed
:ref 1
:ctmo
:abort
:ur 1
:del
connection failed

I was curious so I tested your server and I got the same results. I even tested from my app (a forth interp with network commands) and It also couldn't connect to your server. Interestingly, I run many Nginx servers and I've never encountered this problem before, so my first thought is there may be some packet mangling going on in a router that's in front of your server. It's not clear if you tested on the same LAN with local addresses. If you did, that rules out the router being the issue. Does the Nginx server OS have any firewall rules enabled?

I'll look at the packet capture.

Hi @Eszartek thanks for testing.

What I have validated so far (99% sure):

  • It is not nginx fault, I have tried with netcat and my own TCP server written in Java running in the server, ESP8266 still does not connect, I putted nginx there for you guys to try out this issue, but nginx is not the cause.

  • I have tested this using my home router, my work router, my smartphone's AP (3G), and it happens no matter what. If there is something blocking the connection, it is not from my side.

_Are the ESP8266 TCP packets being correctly generated? Everything else is being able to connect to the server._

This is creepy as hell. This server is the only server having issues with ESP8266. However, ESP8266 is the only client having issues connecting to this server as well. It is like a romantic relationship that is never going to happen ahah.

According to your Wireshark screenshot it seems like ngnix doesn't respond with SYN ACK. I have no idea why :-( ...

@Pablo2048 yeah, nothing will respond. Every server I put there (nginx, netcat, my custom server) will not respond.

But this only happens with ESP8266, because if you try with your browser (or other tcp client), nginx (and the other servers) responds just fine.

It makes me wonder if ESP8266 packets are properly built.

@Pablo2048 here you can see a TCP packet directed the exact same way as ESP8266 one (to that server, to port 80 - nginx) and it is able to get an SYN ACK answer: http://prntscr.com/iz9h3a

@igrr @mmiscool This is a weird one, someone with deep network knowledge should take a look at it.

Ok, there is one difference in MSS - can you try LWIP v2 but not the lower memory variant?

@Pablo2048 OMG... it connected.

 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v614f7c32
~ld
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 2
cnt 

connected with redecomfios, channel 1
dhcp client start...
wifi evt: 0
ip:192.168.69.109,mask:255.255.255.0,gw:192.168.69.1
wifi evt: 3
Connecting to Wifi...
:ref 1
connection success!
:ur 1
:close
:del
pm open,type:2 0

I feel like crying.

Why was MSS limiting the connection? All I know about it comes from this article: http://lwip.wikia.com/wiki/Tuning_TCP

Well it was just blind shot from me ;-) ... It seems like nginx need MSS to be set to bigger value (or there is something like router in the way which needs Don't fragment flag to be set, but lwip lower memory does not set this flag)...
Edit: wait - you wrote that you have tested more servers but no one work so it seems like something in TCP stack configuration on the target machine...

Can you describe "there" with a little more detail. What is the server OS , is there a router with the public IP? It would be interesting to know what equipment to stay away from :)

I doubt that the issue is Nginx, I'd think something else is handling the packet first and making the call to drop it. It could be the OS Nginx is running on.

@Eszartek agree with that...

@igrr that definitely fits this scenario. I'm really curious to know if it is a router or a an ISP level system in place that's causing the drop.

Hey everyone,

this is not a problem with NGINX

practically, any server I put there gives this same error.

The issue must be with the VPS itself, or some firewall in front of it filtering TCP packets. I have no idea but I will try to find out, since I am not the owner providing that VPS.

All I know is that the VPS is running Ubuntu 16.

Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-87-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
Last login: Sat Mar 31 19:01:16 2018 from 109.48.194.56
$ uname -a
Linux unassigned-hostname 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

@Eszartek @igrr strange that in that article you linked, their example at least receives a SYN ACK, in this scenario ESP receives nothing from the server.

So basically, for some reason ESP8266 MSS is 536 bytes by default, but the server requires around 1400 bytes to work properly.

It seems the server answers back with MSS = 1420 bytes.

I edited the OP with this workaround. I'll try to learn more about this server.

Does the Ubuntu server have the public IP address 185.205.210.197 and if so, can you list the iptables rules loaded on the server?

@Eszartek in theory yes, that public IP is from that Ubuntu server.

$ iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
LOG        tcp  --  anywhere             anywhere             tcp dpt:9193 state NEW LOG level alert prefix "New Connection "
LOG        tcp  --  anywhere             anywhere             tcp dpt:5901 state NEW LOG level alert prefix "New Connection "

Chain FORWARD (policy DROP)
target     prot opt source               destination
DOCKER-ISOLATION  all  --  anywhere             anywhere
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (2 references)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             172.18.0.2           tcp dpt:postgresql
ACCEPT     tcp  --  anywhere             172.18.0.3           tcp dpt:6379
ACCEPT     tcp  --  anywhere             172.18.0.4           tcp dpt:9042
ACCEPT     tcp  --  anywhere             172.18.0.4           tcp dpt:afs3-fileserver

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere
$ iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION
-A INPUT -p tcp -m tcp --dport 9193 -m state --state NEW -j LOG --log-prefix "New Connection " --log-level 1
-A INPUT -p tcp -m tcp --dport 5901 -m state --state NEW -j LOG --log-prefix "New Connection " --log-level 1
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o br-ec854a8fb357 -j DOCKER
-A FORWARD -o br-ec854a8fb357 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i br-ec854a8fb357 ! -o br-ec854a8fb357 -j ACCEPT
-A FORWARD -i br-ec854a8fb357 -o br-ec854a8fb357 -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.18.0.2/32 ! -i br-ec854a8fb357 -o br-ec854a8fb357 -p tcp -m tcp --dport 5432 -j ACCEPT
-A DOCKER -d 172.18.0.3/32 ! -i br-ec854a8fb357 -o br-ec854a8fb357 -p tcp -m tcp --dport 6379 -j ACCEPT
-A DOCKER -d 172.18.0.4/32 ! -i br-ec854a8fb357 -o br-ec854a8fb357 -p tcp -m tcp --dport 9042 -j ACCEPT
-A DOCKER -d 172.18.0.4/32 ! -i br-ec854a8fb357 -o br-ec854a8fb357 -p tcp -m tcp --dport 7000 -j ACCEPT
-A DOCKER-ISOLATION -i docker0 -o br-ec854a8fb357 -j DROP
-A DOCKER-ISOLATION -i br-ec854a8fb357 -o docker0 -j DROP
-A DOCKER-ISOLATION -j RETURN
$ ifconfig -a
(...)
eth0      Link encap:Ethernet  HWaddr 96:20:1b:82:ba:c4
          inet addr:185.205.210.197  Bcast:185.205.210.255  Mask:255.255.255.0
          inet6 addr: fe80::9420:1bff:fe82:bac4/64 Scope:Link
          inet6 addr: 2a07:5741:0:1160::1/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:98858771 errors:11229226 dropped:0 overruns:0 frame:11229226
          TX packets:1564862 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7330778108 (7.3 GB)  TX bytes:180059437 (180.0 MB)
(...)

I was expecting to see something along the lines of:

iptables -t mangle -A PREROUTING -p tcp -m conntrack --ctstate NEW -m tcpmss ! --mss 1420:65535 -j DROP

Possibly related: https://github.com/moby/moby/issues/26473 .
Looks like it may be inside the container.

Even if the issue is not due to esp/arduino's core, and for maximum compatibility,
1460 MSS can be reverted to default leaving 536 MSS as an option.
@igrr @devyte @earlephilhower what do you think ?

I haven't had any issues with MSS @536, so I'm inclined to suspect something in OP's particular setup as the culprit. Unlessnther thisnis becomes widely reported, I'd rather not up the default heap usage for everyone else just due to this case.

The default (small MSS) seems to work for a large majority, there is a simple change to allow it to work in this exceptional instance (menu option during build).

I see no need to change anything here, it's actually all working like a champ, no? No bug in sight, just some specific combination at a VPS hoster...

Hey everyone, I have news from the VPS hoster:

Dear Customer,

Hello,

Your conclusion is right. We have custom firewall which filter lower tcp mss flag than 500. If you check RFC 6691 by the calculations we had never expect lower value. Most of packets generated and filled with lower mtu is scanning and flooding scripts. We also check all other tcp header flags like seq/ack number, win flag, valid timestamp. This is related with the tcp protocol, doesn't matter what software you use for connections (curl, nc, wget, etc).
I can advise you to set bigger values on the packets that your OS is generating, otherwise you will have same problems with many other hosts which have similar filters.

What do you people think?

filter lower tcp mss flag than 500

MSS for memory lwip is 536.

Whatever else, it's not an issue with the core. Closing.

I'm experiencing nearly the same symptoms:

  • Connections to my server fail nearly every time.
  • I tried another server (httpbin.org) and it works fine
  • Other diverse clients (including dozens of ESP8266s running older firmware) connect to my server with no problem.
  • On the problematic connections, I can see a SYN and then numerous retries from the ESP鈫扴erver, but never anything back.
  • Occasionally one gets through and the connection works as expected. Maybe 1 attempt in 30.
  • I can see this traffic if I run tcpdump on my wifi router, and also if I run it on the destination server
  • The server is dockerized. I did not try running tcpdump from within the destination container; only the host.
  • The SYN showed a MSS=536 and a window size of 2144
  • I switched lwip from 2.0 low memory to 2.0 high bandwidth. This changed the MSS=1460 and window size of 5840. Everything else about the SYN packet seems identical. Symptoms persist.
  • I switched lwip to 1.4 high bandwidth. This makes every connection work perfectly. I cannot see any difference between this SYN and the previous one, so I'm at a loss for why this works.

I have to comment that when using two ESP8266-01s, one setup as the example HTTP server and another running a basic HTTPClient code, they only talk client to Server when using lwip 1.4 high bandwidth.
I have tried v2 higher bandwidth and that didn't work either.
These are two devices sitting next to each other on the bench, connecting to the same wifi service as STA.
The version of Arduino is 1.89.
The version of ESP8266 is 2.6.3
The version of xtensa-lx106-elf-gcc\2.5.0-4-b40a506
So it seems to go againstthe grain if it just being a straight MSS config - it changes with lwip too.

any one who got the solution?

My only suggestion is to use LWIP1.4 higher bandwidth version.
This only works on other LWIP settings when the server is making outbound connections. (e.g. maintaining an open MQTT connection ) and that is sporadic and seemingly linked to refreshing the server's client connection to an external service at the server.

This is a closed issue. Please open a new one.

@pimby it was suggested to test with the higher bandwidth lwIP variant. Did you try it ?

@skybadger I suggest you open a new issue with your data. This issue is closed and the problem was found and caused by a server not accepting low MSS. Your issue is different.

Just add this limitation to the ESP documentation. No need for a fix, but documentation is good.

Was this page helpful?
0 / 5 - 0 ratings