Gunicorn: Gunicorn worker closes the connection instead of returning 500 error.

Created on 25 Apr 2017  路  15Comments  路  Source: benoitc/gunicorn

I ran into an issue where Gunicorn closes connection without returning http response code when my code encounters an error from requests library.

Running this app:

def app(environ, start_response):
    import requests
    requests.get('http://google.com', timeout=0.001)

I try curl localhost:8000 and get "curl: (52) Empty reply from server", instead of the expected "Internal Server Error" html.

Investigation - Bugs -

Most helpful comment

i had the same issue

INFO:sqlalchemy.engine.base.Engine:{'hubot_id_1': 249}
DEBUG:sqlalchemy.engine.base.Engine:Col ('dashboard_hubot_vitals_id', 'dashboard_hubot_vitals_vital_id', 'dashboard_hubot_vitals_hubot_id')
[2017-05-31 03:01:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:02:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:03:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:04:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:05:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:06:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:07:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:08:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:09:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:10:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:11:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:12:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:13:57 +0000] [1239] [DEBUG] Closing connection.
[2017-05-31 03:14:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:15:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:16:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:17:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:18:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:19:57 +0000] [1239] [DEBUG] Closing connection.
[2017-05-31 03:20:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:21:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:22:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:23:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:24:57 +0000] [1239] [DEBUG] Closing connection.
[2017-05-31 03:25:54 +0000] [1239] [DEBUG] POST /graphql
[2017-05-31 03:25:54 +0000] [1258] [DEBUG] POST /graphql

All 15 comments

i had the same issue

INFO:sqlalchemy.engine.base.Engine:{'hubot_id_1': 249}
DEBUG:sqlalchemy.engine.base.Engine:Col ('dashboard_hubot_vitals_id', 'dashboard_hubot_vitals_vital_id', 'dashboard_hubot_vitals_hubot_id')
[2017-05-31 03:01:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:02:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:03:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:04:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:05:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:06:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:07:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:08:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:09:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:10:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:11:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:12:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:13:57 +0000] [1239] [DEBUG] Closing connection.
[2017-05-31 03:14:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:15:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:16:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:17:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:18:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:19:57 +0000] [1239] [DEBUG] Closing connection.
[2017-05-31 03:20:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:21:57 +0000] [1220] [DEBUG] Closing connection.
[2017-05-31 03:22:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:23:57 +0000] [1258] [DEBUG] Closing connection.
[2017-05-31 03:24:57 +0000] [1239] [DEBUG] Closing connection.
[2017-05-31 03:25:54 +0000] [1239] [DEBUG] POST /graphql
[2017-05-31 03:25:54 +0000] [1258] [DEBUG] POST /graphql

The problem here is that Gunicorn sees the exception bubble out of your handler and thinks it's a socket error on the client side.

Gunicorn already has code in place to try to send an error message to a client, and to log a message when that fails. It also has code to ignore cases that seem like client disconnects. It also knows how to log the request URI that cause an exception. I think the following should be a safe change we can make that might fix this issue:

diff --git a/gunicorn/workers/async.py b/gunicorn/workers/async.py
index a3a0f91..ac4998b 100644
--- a/gunicorn/workers/async.py
+++ b/gunicorn/workers/async.py
@@ -75,7 +75,7 @@ class AsyncWorker(base.Worker):
                 self.handle_error(req, client, addr, e)
         except EnvironmentError as e:
             if e.errno not in (errno.EPIPE, errno.ECONNRESET):
-                self.log.exception("Socket error processing request.")
+                self.handle_error(req, client, addr, e)
             else:
                 if e.errno == errno.ECONNRESET:
                     self.log.debug("Ignoring connection reset")
diff --git a/gunicorn/workers/sync.py b/gunicorn/workers/sync.py
index 1d2ce2f..020bcf0 100644
--- a/gunicorn/workers/sync.py
+++ b/gunicorn/workers/sync.py
@@ -146,7 +146,7 @@ class SyncWorker(base.Worker):
                 self.handle_error(req, client, addr, e)
         except EnvironmentError as e:
             if e.errno not in (errno.EPIPE, errno.ECONNRESET):
-                self.log.exception("Socket error processing request.")
+                self.handle_error(req, client, addr, e)
             else:
                 if e.errno == errno.ECONNRESET:
                     self.log.debug("Ignoring connection reset")

@skonakov @bochuxt were either of you able to test this patch?

The bug here appears to be that Gunicorn is seeing an upstream disconnection and thinking it's a downstream (client) disconnect, and therefore not trying to send a response.

maybe there is a better way to distinguish our errors from the application error?. If we only catch our own isssues then any application errors could indeed trigger that handler. Thoughts?

Hey @tilgovi @benoitc I am experiencing somewhat similar issue.
[2017-12-15 19:18:27 +0000] [19] [DEBUG] Ignoring connection reset
[2017-12-15 19:18:27 +0000] [14] [DEBUG] Ignoring connection reset

As seen, gunicorn ignores the connection reset, but doesn't send the response. I receive the following in curl output: curl: (52) Empty reply from server for both the requests. Can you help me understand what is happening here? Thanks.

Requests should probably also handle its own exceptions. Is there any ticket for it?

On our side I wonder if we could be more granular in the way we are catching our own exceptions by wrapping all sockets and other usage related to our calls. The general idea is to distinct the app errors from ours.

Thoughts?

Any updates/workaround for that?

I didn't have any news from the Request project ;)

Anyway I will have a closer look when the merge of Python 3 changes will be done sometimes this month.

Actually I don't think it should be handled on Request project side.

Here a simple Gunicorn app

file g_app.py

import requests


def app1(environ, start_response):
    raise ValueError(1)


def app2(environ, start_response):
    url = 'http://google.com/test'
    r = requests.get(url)
    r.raise_for_status()


def app3(environ, start_response):
    with open("test") as f:
        f.read()
gunicorn g_app:app1 -> curl localhost:8000  -> 500 error as expected
gunicorn g_app:app2 -> curl localhost:8000  -> Empty response
gunicorn g_app:app3 -> curl localhost:8000  -> Empty response

Any exception based on EnvironmentError gives the same result.

well what I meant is tthat to handle it we need to know from where the error is coming.

The fact that requests don鈥檛 identify its own errors by just letting the socket exception from python oblige us to find a way to track what come from our sockets vs sockets used by the application and requests.

Next version will improve it.

I think this issue can be solved by my patch. Here's a new version that applies to both sync and async worker base classes:

--- a/gunicorn/workers/base_async.py
+++ b/gunicorn/workers/base_async.py
@@ -81,6 +81,7 @@ class AsyncWorker(base.Worker):
                     self.log.debug("Ignoring connection reset")
                 else:
                     self.log.debug("Ignoring EPIPE")
+            self.handle_error(req, client, addr, e)
         except Exception as e:
             self.handle_error(req, client, addr, e)
         finally:
diff --git a/gunicorn/workers/sync.py b/gunicorn/workers/sync.py
index 1d2ce2f..14df4da 100644
--- a/gunicorn/workers/sync.py
+++ b/gunicorn/workers/sync.py
@@ -152,6 +152,7 @@ class SyncWorker(base.Worker):
                     self.log.debug("Ignoring connection reset")
                 else:
                     self.log.debug("Ignoring EPIPE")
+            self.handle_error(req, client, addr, e)
         except Exception as e:
             self.handle_error(req, client, addr, e)
         finally:

It's not necessary to distinguish errors with the client socket from errors caused by the application code. Regardless of the cause, handle_error can make a best-effort attempt to return an error to the client. For a case like this, where the application raises an un-handled EnvironmentError but the client socket is still good, the result should be a 500 instead of an empty response. The patch above produces that result.

(edited my previous comment, because I had other changes to examples code in the diff)

I seem to be hitting this issue as well. I get it with StopIteration from next(). This is with Gunicorn 19.9.0 and gevent 1.4.0. Guessing this is because Gunicorn itself catches StopIteration.

empty = []
next(item for item in empty)
> $ curl http://localhost:4000/test                                                                                                                         
curl: (52) Empty reply from server

Same issue on our end.

We got some weird issue where our nginx returns hundreds of 502 errors for several seconds once in a while.

We eventually tracked it down to gunicorn workers closing the connection when some crashes were not caught in our code and propagated outside of our code.
That led nginx to believe our server was done (max_fail=1), and even to believe our whole infra was down temporarily when retrying the request on other webservers, leading to spikes of 502 when nginx thinks there is no upstream up and running.

We are using gunicorn 19.7.0 on production.
I also tried with the latest gunicorn version (19.9.0) in development, but the problem is the same.

Was this page helpful?
0 / 5 - 0 ratings