Ubuntu 16.04 LTS
Flask==1.0.2
Flask-SocketIO==3.0.1
I already applied Flask-socketio to our production environment. However, it happens sometimes (20% chance) that websocket fails connecting. From what I can see from web network:

The last "GET" pends forever then gives me a 400 or 502 or retry.
Here is my nginx recipe:
location /api/socket.io {
proxy_http_version 1.1;
proxy_buffering off;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_pass http://socketio_nodes/socket.io;
access_log /home/xkool/volume/log/access.log;
}
The weird part is that the last GET request always gets stuck.
I tried packet-capture and it shows that nginx didn't proxy the last request to the upstream when failing.
Question 1:
why does websocket takes 5 requests (prefix with EIO blablabla...) before connection?
Question 2:
Do you have any idea what to do next to dig more into the problem?
Can I see your socketio_nodes upstream configuration?
Sure. It is as easy as it shows on your documentation page.
upstream socketio_nodes {
ip_hash;
server 10.0.0.131:5010;
}
this is the enginio log if it may help
[2019-02-26 15:36:25,734][INFO] [thread:139716351444296 process:386]{socket.py:94} - 366b9fc32e2a42e595b703f4d02075bd: Sending packet OPEN data {'sid': '366b9fc32e2a42e595b703f4d02075bd', 'upgrades': ['websocket'], 'pingTimeout': 60000, 'pingInterval': 25000}
[2019-02-26 15:36:25,735][INFO] [thread:139716351444296 process:386]{socket.py:94} - 366b9fc32e2a42e595b703f4d02075bd: Sending packet MESSAGE data 0
[2019-02-26 15:36:25,765][INFO] [thread:139716351444296 process:386]{socket.py:52} - 366b9fc32e2a42e595b703f4d02075bd: Received packet MESSAGE data 0/pptx_generation,
[2019-02-26 15:36:25,766][INFO] [thread:139716351444296 process:386]{pptx_event.py:23} - session 366b9fc32e2a42e595b703f4d02075bd connected. ++++++++++++++++++++++++++++
[2019-02-26 15:36:25,766][INFO] [thread:139716351444296 process:386]{socket.py:94} - 366b9fc32e2a42e595b703f4d02075bd: Sending packet MESSAGE data 0/pptx_generation
[2019-02-26 15:36:25,798][INFO] [thread:139716351445576 process:386]{socket.py:104} - 366b9fc32e2a42e595b703f4d02075bd: Received request to upgrade to websocket
[2019-02-26 15:37:25,866][INFO] [thread:139716351444296 process:386]{pptx_event.py:32} - client 366b9fc32e2a42e595b703f4d02075bd disconnected. room disconnected 366b9fc32e2a42e595b703f4d02075bd. ---------------------------
[2019-02-26 15:37:25,867][INFO] [thread:139716351444296 process:386]{socket.py:75} - 366b9fc32e2a42e595b703f4d02075bd: Client is gone, closing socket
[2019-02-26 15:37:25,889][INFO] [thread:139716351445576 process:386]{socket.py:179} - 366b9fc32e2a42e595b703f4d02075bd: Failed websocket upgrade, expected UPGRADE packet, received None instead.
@chenyuan53618 it seems your client is not connecting through websocket explicitly, correct? That answers your first question. The default connection handshake involves a HTTP connection first, and then an upgrade to WebSocket once the HTTP connection is established. That is the few requests that you see at connection time.
The client is gone message indicates that the client is not sending the required ping packets at the correct interval, or else that the server is blocked or too busy to handle those ping packets.
@miguelgrinberg what do you mean by not connecting through websocket explicitly? Our frontend is using angular and the socket-io version is "socket.io-client": "2.1.1". Does this matter?
And yes I checked our network traffic with tcpdump and found out our nginx doesn't proxy the last request(the pending one) to the server in time. That's why our connection fails. What should I do?
As I said above, the default Socket.IO connection goes through two phases. First a HTTP connection is made, and then it is upgraded to WebSocket. Unless you are explicitly requesting that the connection is made directly via WebSocket, you will get this HTTP traffic every time there is a connection. See the transports option for the client if you want to request WebSocket only.
What do you mean by "nginx doesn't proxy the last request(the pending one) to the server in time"? In time for what?
Normally, Nginx would send the request to upstream nodes right after it receives from clients. However, in my case, Nginx kind of #hold# the request for 60 seconds.
What is in the uwsgi log? What does uwsgi configuration look like?
@maxim-s-barabash . When failing, this is exactly what my uwsgi log looks like.
[2019-03-02 16:57:22,130][INFO] [thread:139958441512264 process:527]{socket.py:94} - d5f74799dba140eab8087bde94949323: Sending packet OPEN data {'sid': 'd5f74799dba140eab8087bde94949323', 'upgrades': ['websocket'], 'pingTimeout': 60000, 'pingInterval': 25000}
[2019-03-02 16:57:22,131][INFO] [thread:139958441512264 process:527]{socket.py:94} - d5f74799dba140eab8087bde94949323: Sending packet MESSAGE data 0
[2019-03-02 16:57:22,164][INFO] [thread:139958441512264 process:527]{socket.py:52} - d5f74799dba140eab8087bde94949323: Received packet MESSAGE data 0/pptx_generation,
[2019-03-02 16:57:22,165][INFO] [thread:139958441512264 process:527]{pptx_event.py:23} - session d5f74799dba140eab8087bde94949323 connected. ++++++++++++++++++++++++++++
[2019-03-02 16:57:22,165][INFO] [thread:139958441512264 process:527]{socket.py:94} - d5f74799dba140eab8087bde94949323: Sending packet MESSAGE data 0/pptx_generation
{address space usage: 1833799680 bytes/1748MB} {rss usage: 196460544 bytes/187MB} [pid: 527|app: 0|req: 546/545] 10.0.0.131 () {50 vars in 1090 bytes} [Sat Mar 2 16:57:22 2019] POST /socket.io/?EIO=3&transport=polling&t=MazdqMC&sid=d5f74799dba140eab8087bde94949323 => generated 2 bytes in 1 msecs (HTTP/1.1 200) 3 headers in 136 bytes (3 switches on core 2996)
{address space usage: 1833799680 bytes/1748MB} {rss usage: 196460544 bytes/187MB} [pid: 527|app: 0|req: 547/546] 10.0.0.131 () {44 vars in 994 bytes} [Sat Mar 2 16:57:22 2019] GET /socket.io/?EIO=3&transport=polling&t=MazdqML&sid=d5f74799dba140eab8087bde94949323 => generated 22 bytes in 0 msecs (HTTP/1.1 200) 3 headers in 131 bytes (3 switches on core 2996)[2019-03-02 16:57:22,192][INFO] [thread:139958441509448 process:527]{socket.py:104} - d5f74799dba140eab8087bde94949323: Received request to upgrade to websocket
{address space usage: 1833799680 bytes/1748MB} {rss usage: 196460544 bytes/187MB} [pid: 527|app: 0|req: 549/547] 10.0.0.131 () {44 vars in 994 bytes} [Sat Mar 2 16:57:22 2019] GET /socket.io/?EIO=3&transport=polling&t=MazdqMf&sid=d5f74799dba140eab8087bde94949323 => generated 4 bytes in 57 msecs (HTTP/1.1 200) 3 headers in 131 bytes (3 switches on core 2996)[2019-03-02 16:58:22,257][INFO] [thread:139958441512264 process:527]{pptx_event.py:32} - client d5f74799dba140eab8087bde94949323 disconnected. room disconnected d5f74799dba140eab8087bde94949323. ---------------------------
[2019-03-02 16:58:22,257][INFO] [thread:139958441512264 process:527]{socket.py:75} - d5f74799dba140eab8087bde94949323: Client is gone, closing socket
{address space usage: 1833799680 bytes/1748MB} {rss usage: 196460544 bytes/187MB} [pid: 527|app: 0|req: 550/548] 10.0.0.131 () {44 vars in 994 bytes} [Sat Mar 2 16:57:22 2019] GET /socket.io/?EIO=3&transport=polling&t=MazdqNl&sid=d5f74799dba140eab8087bde94949323 => generated 11 bytes in 60002 msecs (HTTP/1.1 400) 3 headers in 126 bytes (3 switches on core 2996)
[2019-03-02 16:58:22,276][INFO] [thread:139958441509448 process:527]{socket.py:179} - d5f74799dba140eab8087bde94949323: Failed websocket upgrade, expected UPGRADE packet, received None instead.
{address space usage: 1833799680 bytes/1748MB} {rss usage: 196460544 bytes/187MB} [pid: 527|app: 0|req: 550/549] 10.0.0.131 () {50 vars in 1128 bytes} [Sat Mar 2 16:57:22 2019] GET /socket.io/?EIO=3&transport=websocket&sid=d5f74799dba140eab8087bde94949323 => generated 14 bytes in 60084 msecs (HTTP/1.1 101) 4 headers in 183 bytes (2 switches on core 2995)
Are you using gevent in combination with uWSGI?
@maxim-s-barabash yes
Nginx isn't the root cause in this case, I'm using Traefik instead and see exactly the same problem. I also use Gevent and uWSGI combination.
@chenyuan53618, did you try other frameworks like eventlet? I think I will give it a try...
I also see such an error from time to time (not sure if it's related to this problem, probably not):
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 766, in gevent._greenlet.Greenlet.run
File "/usr/local/lib/python3.7/site-packages/engineio/async_drivers/gevent_uwsgi.py", line 57, in select_greenlet_runner
select([fd], [], [])[0]
File "/usr/local/lib/python3.7/site-packages/gevent/select.py", line 145, in select
sel_results = _original_select(rlist, wlist, xlist, 0)
OSError: [Errno 9] Bad file descriptor
2019-03-10T18:14:34Z <Greenlet at 0x7f67d17d8e18: select_greenlet_runner(13, <gevent._event.Event object at 0x7f67b090e108>)> failed with OSError
I already asked. What does uwsgi configuration look like?
I run the uWSGI with such command without any configuration file:
PYTHONPATH=`pwd` uwsgi --http :51001 --gevent 1000 --http-websockets --master --wsgi-file my_app/api/websocket.py --callable app
It's the same as described in the documentation here.
@jpowie01
From what I see, the last request doesn't get proxied to the upstream (upstream sees no request from nginx). Why do you think it not related to Nginx?
@maxim-s-barabash
Mine is:
[uwsgi]
http = :5000
;http-socket = :5000
master = true
;workers = 10
;threads = 8
max-requests = 600
reload-on-rss = 800
mount = /api=./wsgi.py
callable = app
manage-script-name = true
gevent = 3000
http-websockets = true
http-timeout = 100
listen = 1024
worker-reload-mercy = 10
stats = :5001
stats-http = true
memory-report = true
chmod-socket = 666
buffer-size = 65536
logto = /volume/log/uwsgi.log
@jpowie01 did it work using eventlet?
the error occurred in my application many times too:
File "src/gevent/greenlet.py", line 716, in gevent._greenlet.Greenlet.run
File "/usr/local/lib/python2.7/site-packages/engineio/async_drivers/gevent_uwsgi.py", line 57, in select_greenlet_runner
select([fd], [], [])[0]
File "/usr/local/lib/python2.7/site-packages/gevent/select.py", line 145, in select
sel_results = _original_select(rlist, wlist, xlist, 0)
error: (9, 'Bad file descriptor')
2019-03-14T11:32:55Z <Greenlet "Greenlet-0" at 0x7f78def1c578: select_greenlet_runner(109, <gevent._event.Event object at 0x7f78de850ef0>)> failed with error
I can also reproduce this error socket.io + uWSGI (gevent) when client connects to a room.
1 out of 6 times this error gets triggered.
GET 101 127.0.0.1 1490kb 60138s /socket.io/?EIO=3&transport=websocket&sid=6b04ab210e184660844d6a4a5f66a019 Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0 -
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 766, in gevent._greenlet.Greenlet.run
File "/home/foo/.local/share/virtualenvs/bar-2O4ywwgW/lib/python3.7/site-packages/engineio/async_drivers/gevent_uwsgi.py", line 57, in select_greenlet_runner
select([fd], [], [])[0]
File "/home/foo/.local/share/virtualenvs/bar-2O4ywwgW/lib/python3.7/site-packages/gevent/select.py", line 145, in select
sel_results = _original_select(rlist, wlist, xlist, 0)
OSError: [Errno 9] Bad file descriptor
2019-03-20T08:37:01Z <Greenlet at 0x7f75d54b8378: select_greenlet_runner(11, <gevent._event.Event object at 0x7f75d5461948>)> failed with OSError
uWSGI
http = :8000
module = server:app
callable = wsgi_app
py-autoreload = 3
# web sockets
gevent = 1000
http-websockets = true
master = true
http-auto-chunked = True
http-keepalive = True
lazy-apps = True
buffer-size = 65535
stats-http = True
stats = :8081
vacuum = True
close-on-exec = True
max-requests = 3000
post-buffering = 8192
socket-timeout = 120
Deps
Flask==1.0.2
Flask-SocketIO==3.3.2
uWSGI==2.0.18
gevent==1.4.0
greenlet==0.4.15
anybody?
try this uwsgi.ini configuration:
[uwsgi]
socket = :10630
pp = /home/app/
module = app:app
pidfile = /tmp/uwsgi.10630.pid
stats = /tmp/uwsgi_stats.10630.socket
master = true
workers = 2;
enable-threads = true
reload-mercy = 16
vacuum = true
reload-on-rss = 200
procname-prefix-spaced = uwsgi_10630
gevent = 1000
lazy-apps = true
uid = app
gid = app
listen = 1000
my error disappeared after using this configuration.
@iaen7 your configuration is not right. First of all, you have two workers, but one worker is the only supported configuration at this time (unless there is a way to enable session affinity in uwsgi that I don't know about?).
@chenyuan53618 unfortunately this is a configuration that I don't really use myself so I have nothing more to offer besides what's in the documentation, which is the configuration that I tested at the time. This was a contribution that I accepted at some point. I just tested it again today and it appears to work, but as I said, I don't have any long running services running on this stack so there may be problems I'm not seeing.
Going back to the original problem, let me repeat one of my initial comments:
The client is gone message indicates that the client is not sending the required ping packets at the correct interval, or else that the server is blocked or too busy to handle those ping packets.
If you have anything that is blocking or CPU consuming in your application, then that could be your problem.
@miguelgrinberg Thanks for help! That is the reason my app support about 1000 ws clients at most. But my app also servers http requests, can two workers work?
@miguelgrinberg
My server is a local instance which only has a load I generate - minimal, so this argument is false in my case
the server is blocked or too busy to handle those ping packets.
For client I use official SocketIO library which is pretty battle tested and all I do is connecting to a room, therefore I would also question this argument
client is not sending the required ping packets at the correct interval
but I might be wrong as I'm not familiar with SocketIO implementation.
@iaen7 there is an option in uWSGI called attach-deamon this lets you attach additional process to uWSGI instance. If you have your socketIO endpoints in one file and your REST endpoints in another file you can run both servers at the same time.
pip install eventlet
# check the server is working
python path_to_socketIO_endponts.py
* Restarting with stat
* Debugger is active!
* Debugger PIN: 180-756-981
(6354) wsgi starting up on http://127.0.0.1:5000
uwsgi.ini
http = :8000
module = server:app # adjust to your setup
callable = wsgi_app
py-autoreload = 3
workers = 10
attach-daemon = python path_to_socketIO_endpoints.py
This will run REST server on :8000 and socketIO endpoints on :5000 (using eventlet). Every time you restart uWSGI both servers will restart. Now you can run as many workers as you need for REST endpoints.
@zzart Thanks for your help! Have you tried to use socket instead of http withouthttp-websockets in uwsgi.ini configuration? It works for me.
@iaen7 either option will be fine.
Most helpful comment
I can also reproduce this error socket.io + uWSGI (gevent) when client connects to a room.
1 out of 6 times this error gets triggered.
uWSGI
Deps