Client: Chrome 70.0.3538.110
Ubuntu 16.04 DigitalOcean
Python 3.6 / pip3
Gunicorn 19.9.0 (and tested with 19.7.1) with Flask backend
Command I'm using to run gunicorn: gunicorn -b 0.0.0.0:5000 --log-level debug --access-logfile - manage:app
(I tried multiple workers to no avail)
traceback.txt - this is after starting the server and making a request to https://SERVER_IP:5000
I have my backend running on port 5000 for my server, but right now it's only set up for handling HTTP requests. When an HTTPS request comes in on port 5000, for example, by accessing https://SERVER_IP:5000, gunicorn hangs and about 20 seconds later logs:
[DEBUG] Ignored premature client disconnection. No more data after: b'\x16\x03...
(see above traceback)
I understand why the "no more data after..." is happening (HTTPS traffic when expecting HTTP), but if I were to just run a python development server, it will throw the exception (code 400, message Bad request syntax ('\x16\x03...
) and immediately respond to other requests. With gunicorn, it stalls for about 20 seconds, blocking any other requests made to the backend before. Nothing else is logged in gunicorn until the request times out, at which point the debug log shows the Ignored premature client disconnection
error and then immediately responds to the other queued requests.
who is hanging the connection ? Looking at the code gunicorn is displaying the error and close the client socket before returning in accept loop:
Your client is probably still expecting some data to come before timeouting. Maybe you can setup its connection to return faster?
The odd part is that it seems like gunicorn is hanging rather than the actual Flask backend implementation. I think this because when I run a dev server, for example python3 manage.py runserver
, when the endpoint gets hit with an HTTPS request, flask errors out but then immediately responds to other requests. So there's no hanging that occurs. This is not the case with gunicorn. With what you're saying, I believe gunicorn is waiting for a response from flask but since it errors out, it the request times out鈥攈owever, I don't see why this functionality would differ from the python dev server. i.e. shouldn't gunicorn not hang?
Although it's for node, this might be relevant.
I can confirm that a single sync
worker gets blocked as described by the OP, by the https
request when expecting http
, but I find that additional workers are not blocked, so provided you have multiple workers, normal requests can be concurrently handled.
Still a problem though from a DOS perspective if nothing else.
Here's a frame when the sync
worker aborts:
File "/usr/local/bin/gunicorn", line 11, in <module>
sys.exit(run())
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 223, in run
super(Application, self).run()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 203, in run
self.manage_workers()
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
self.spawn_workers()
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
self.spawn_worker()
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.run()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 124, in run
self.run_for_one(timeout)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 68, in run_for_one
self.accept(listener)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 30, in accept
self.handle(listener, client, addr)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
req = six.next(parser)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 181, in __init__
super(Request, self).__init__(cfg, unreader)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
unused = self.parse(self.unreader)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 196, in parse
line, rbuf = self.read_line(unreader, buf, self.limit_request_line)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 248, in read_line
self.get_data(unreader, buf)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/message.py", line 184, in get_data
data = unreader.read()
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 38, in read
d = self.chunk()
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 65, in chunk
return self.sock.recv(self.mxchunk)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 195, in handle_abort
self.cfg.worker_abort(self)
File "config.py", line 7, in worker_abort
traceback.print_stack(frame)
I used _gunicorn_ latest running in Docker and wget
to send the unexpected https://
request.
wget
(or browser) sends an SSL ClientHello
, but the sync
worker seems to not be able to detect that bad message (it's not expecting SSL), and seems blocked on read per above. All you see after the hello is a bunch of TCP keep alives sent by Gunicorn, which are ACKed until eventually the worker is timed-out.
As an additional data-point - if you set a client timeout e.g. wget -v -T1 https://localhost:5000
, then of course the client gives-up waiting for the ServerHello
and closes the connection. In that case the server will log:
[2018-12-03 02:47:03 +0000] [13] [DEBUG] Ignored premature client disconnection. No more data after: b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03\xf00v\xd6\xe9\xf8\x0b\xdfyS\x9a\xbc:%\x0c\x87\x99p\xdat\xe6S\xa4 \\\x97\xd7\xc3\x12A\x12C \x9ae\x03\xa8\xe0B\xd0\xea\x16\x9e\xde\xcf\x9dFx\xe1\xaa"\xf3\xca\xef\xc4u\x81\xacy\xd6\x0b\xcb13\xf7\x00"\xba\xba\x13\x01\x13\x02\x13\x03\xc0+\xc0/\xc0,\xc00\xcc\xa9\xcc\xa8\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00\n\x01\x00\x01\x91**\x00\x00\xff\x01\x00\x01\x00\x00\x00\x00\x0e\x00\x0c\x00\x00\tlocalhost\x00\x17\x00\x00\x00#\x00\x00\x00\r\x00\x14\x00\x12\x04\x03\x08\x04\x04\x01\x05\x03\x08\x05\x05\x01\x08\x06\x06\x01\x02\x01\x00\x05\x00\x05\x01\x00\x00\x00\x00\x00\x12\x00\x00\x00\x10\x00\x0e\x00\x0c\x02h2\x08http/1.1uP\x00\x00\x00\x0b\x00\x02\x01\x00\x003\x00+\x00)ZZ\x00\x01\x00\x00\x1d\x00 J\xeb\x91\xa3\xf8\xa0z\xfb`\xff\xed_v\xf4\xf0\xc7\xb6\xe9\xb6\xd8\xc2\xf5\x11\xc1\xf2\xe7\xb8uC/\xc2R\x00-\x00\x02\x01\x01\x00+\x00\x0b\n\xca\xca\x03\x04\x03\x03\x03\x02\x03\x01\x00\n\x00\n\x00\x08ZZ\x00\x1d\x00\x17\x00\x18\x00\x1b\x00\x03\x02\x00\x02ZZ\x00\x01\x00\x00\x15\x00\xcb\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
... Noting that b'\x16\x03\x01\x02\x00\x01\x00\x01
is recognisable as the start of an SSL ClientHello
.
I haven't searched the above, but I suppose the unreader is looking for line-data and hasn't found any terminal yet, so is still trying to read more bytes into its buffer.
I imagine possible fixes for this are:
ClientHello
or other bad message detection. The above is clearly not text, but Gunicorn is yet to decide on that, and a timeout occurs first, in the meantime the worker is blocked.Makes sense. I wonder if this could be causing some other errors that have been reported with Gunicorn hanging.
We could investigate using a well-known HTTP parser again. @benoitc what happened to your branch that used http-parser?
I wasn't aware of this issue until now (in the context of https://github.com/benoitc/gunicorn/issues/1466 ). I second what @javabrett wrote above. Any new ideas how to handle that (except for spawning more workers which doesn't help in case of (D)DoS).
Most helpful comment
I can confirm that a single
sync
worker gets blocked as described by the OP, by thehttps
request when expectinghttp
, but I find that additional workers are not blocked, so provided you have multiple workers, normal requests can be concurrently handled.Still a problem though from a DOS perspective if nothing else.
Here's a frame when the
sync
worker aborts:I used _gunicorn_ latest running in Docker and
wget
to send the unexpectedhttps://
request.wget
(or browser) sends an SSLClientHello
, but thesync
worker seems to not be able to detect that bad message (it's not expecting SSL), and seems blocked on read per above. All you see after the hello is a bunch of TCP keep alives sent by Gunicorn, which are ACKed until eventually the worker is timed-out.As an additional data-point - if you set a client timeout e.g.
wget -v -T1 https://localhost:5000
, then of course the client gives-up waiting for theServerHello
and closes the connection. In that case the server will log:... Noting that
b'\x16\x03\x01\x02\x00\x01\x00\x01
is recognisable as the start of an SSLClientHello
.I haven't searched the above, but I suppose the unreader is looking for line-data and hasn't found any terminal yet, so is still trying to read more bytes into its buffer.
I imagine possible fixes for this are:
ClientHello
or other bad message detection. The above is clearly not text, but Gunicorn is yet to decide on that, and a timeout occurs first, in the meantime the worker is blocked.