gunicorn 19.x causing problems with django-1.7.1 and gevent

Created on 4 Nov 2014  Â·  53Comments  Â·  Source: benoitc/gunicorn

Hello,

I'm seeing this error with gunicorn 19.x (tested with 19.0 and 19.1.1) using the gevent worker. I'm using django-1.7.1.

this is the exception:

Traceback:
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  87.                 response = middleware_method(request)
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py" in process_request
  34.         if user and hasattr(user, 'get_session_auth_hash'):
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/utils/functional.py" in inner
  224.             self._setup()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/utils/functional.py" in _setup
  357.         self._wrapped = self._setupfunc()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py" in <lambda>
  23.         request.user = SimpleLazyObject(lambda: get_user(request))
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py" in get_user
  11.         request._cached_user = auth.get_user(request)
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py" in get_user
  151.         user_id = request.session[SESSION_KEY]
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py" in __getitem__
  49.         return self._session[key]
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py" in _get_session
  175.                 self._session_cache = self.load()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/sessions/backends/db.py" in load
  21.                 expire_date__gt=timezone.now()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/manager.py" in manager_method
  92.                 return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/query.py" in get
  351.         num = len(clone)
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/query.py" in __len__
  122.         self._fetch_all()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/query.py" in _fetch_all
  966.             self._result_cache = list(self.iterator())
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/query.py" in iterator
  265.         for row in compiler.results_iter():
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py" in results_iter
  700.         for rows in self.execute_sql(MULTI):
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py" in execute_sql
  784.         cursor = self.connection.cursor()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/backends/__init__.py" in cursor
  163.         self.validate_thread_sharing()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/backends/__init__.py" in validate_thread_sharing
  515.                 % (self.alias, self._thread_ident, thread.get_ident()))

Exception Type: DatabaseError at /
Exception Value: DatabaseWrapper objects created in a thread can only be used in that same thread. The object with alias 'default' was created in thread id 140049505195808 and this is thread id 61130224.

Here is my gunicorn.conf.py:

##
# Gunicorn config at 
# Managed by Chef - Local Changes will be Nuked from Orbit (just to be sure)
##

# What ports/sockets to listen on, and what options for them.
bind = "127.0.0.1:9300"

# The maximum number of pending connections
backlog = 2048

# What the timeout for killing busy workers is, in seconds
timeout = 300

# How long to wait for requests on a Keep-Alive connection, in seconds
keepalive = 2

# The maxium number of requests a worker will process before restarting
max_requests = 1024

# Whether the app should be pre-loaded
preload_app = True

# How many worker processes
workers = 16

# Type of worker to use
worker_class = "gevent"

# The granularity of error log outputs.
loglevel = "error"

The error occurs every time, just accessing any URL of the django app (that depends on any model, of course) triggers this error. Any ideas?
Since changing to gunicorn-18.0 solves the problem, I'm assuming that gunicorn-19.x is doing something different.

If you need any additional information, let me know and I will post it here.

For now I'm using 18.0.

Thanks a lot.

( FeaturWorker FeaturCore ThirdpartGevent Deploy Investigation - Bugs -

Most helpful comment

@gukoff proposed fix didn't work for me, I still get the same error 😞

All 53 comments

Just a question first: does taking out the preload option help?
I think I remember another similar issue being reported.
If changing that option helps, then we must have changed something about how Django applications are loaded and it will be easier for us to track it down.

I _think_ we import the django wsgi application in the master process, which at least on django 1.7, initializes all of the django models on import.

Hello @tilgovi, thank you for your time. I cant confirm that using preload_app = False I'm able to use gunicorn-19.1.1 and django-1.7.1.

Do you have any idea what gunicorn cloud be doing differently about this in versions 19.x?

I have now a second problem. I use Raven[1] in my django app. And accorging to the Raven docs, when using a WSGU middleware[1] it is possible to use Raven as a WSGI object (wrapping the original django WSGI app), and when using gunicorn (as an external command, not the run_gunicorn django command) we must add a hook to gunicorn[2]. When this hook is enabled, gunicorn only boots if preload_app = True, but then gevent does not work. When I use preload_app = True with the hook enabled, gunicorn does not even boot, giving me and ImportError saying it cannot import my "DJANGO_SETTINGS_MODULE".

Raven is 4.2.3

For now I will get back to gunicorn 18.x, since it works with both configurations (gevent and raven as a WSGI middleware)
Should I open another Issue about this?

[1] http://raven.readthedocs.org/en/latest/config/django.html#wsgi-middleware
[2] http://raven.readthedocs.org/en/latest/config/django.html#gunicorn

@daltonmatos the way databases are setup in django 1.7.1 has probably changed are apparently not threadsafe. I am not sure what to do at that point.

One more thing, just found out today,
session handling in django 1.7 with gunicorn 19.3.0 expires within seconds.

Had to switch back to 18 to make it work.

FYI.

This seems to be the same issue as https://github.com/benoitc/gunicorn/issues/879, which was closed but not fixed. It makes gunicorn 19.x. unusable with Django. Perhaps only for the gevent worker though? I haven't tested for other worker types.

@tilgovi i wonder if it's not due to the change done in the environ. in 19.x we set wsgi.multithread to true where it was false in 18.x :

https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L99

This change may be handledd differently in django. And a greenlet is not really a thread. Thoughts? At least maybe we could try to revert it and check if it solves the issue. Do we have a test or a reproducible way for this issue?

Oh, I bet that is the issue. Good memory, @benoitc.

I still don't know why yet. As I said at the time to justify the change, I can't imagine that it would make anything break because it would mean that the framework would only take more precautions about sharing data. Evidently, I may be wrong.

I pushed the branch fix/927 to check if the fix above works. Please test!

I'm still able to reproduce this error using fix/927 with django 1.8.1 (and 1.7.1) and gevent 1.0.1

That's reassuring to me, even if we still don't know the cause.

So I tried using the example application and didn't reproduce the issue running the following command line:

gunicorn --env DJANGO_SETTINGS_MODULE=testing.settings -k gevent -w3 wsgi:application

Can anyone provides me a reproducible example?

I can still reproduce this issue invoking gunicorn like so:

    gunicorn --env DJANGO_SETTINGS_MODULE=settings -w 3 --max-requests=1000 -b 0.0.0.0:8000 -k gevent --config ogs/gunicorn.py  --pythonpath ogs wsgi:application

Is there any relevant details you need?

@brockhaywood should be project.settings, but yes i need to have a way to reproduce it, do you have any minimal code to share.

Looking at the trace above, it could also be the monkey patching. Maybe threads shouldn't be monkey patched. Can you try the following patch?

--- a/gunicorn/workers/ggevent.py
+++ b/gunicorn/workers/ggevent.py
@@ -60,9 +60,9 @@ class GeventWorker(AsyncWorker):

         # if the new version is used make sure to patch subprocess
         if gevent.version_info[0] == 0:
-            monkey.patch_all()
+            monkey.patch_all(thread=False)
         else:
-            monkey.patch_all(subprocess=True)
+            monkey.patch_all(subprocess=True, thread=False)

         # monkey patch sendfile to make it none blocking
         patch_sendfile()

Using the above patch with the following wsgi no longer produces the specified error:

import os

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings")

from gevent import monkey
monkey.patch_all(thread=False)

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

from django.core.cache.backends.memcached import BaseMemcachedCache
BaseMemcachedCache.close = lambda self, **kwargs: None

I'm now seeing a different and new error but have no idea if it's related:

ProgrammingError: close cannot be used while an asynchronous query is underway

This is occurring, 1in 10/15 requests is doing a count on a model. I suspect this is an error in my application which is now uncovered.

@brockhaywood cool! btw ahat's your version of gevent? For your latest error it dies not seem related to gunicorn indeed.

Yah, good news! On version 1.0.1 of gevent.

@tilgovi so I wonder if we shouldn't patch gunicorn removing the thread monkey patching. Maybe we should also set wsgi.multithread to False? Thoughts?

@benoitc I'm able to reproduce the new error that I referenced above with a very simple Django application that has just one view and does a simple model query and count if I run multiple concurrent requests.

Do you have any suggestions as to which component is the most likely culprit? Is this likely a gevent problem or psycogreen?

More likely psycogreen. Are you using such hook to setup it?

def def_post_fork(server, worker):
    from psycogreen.gevent import psyco_gevent
    psyco_gevent.make_psycopg_green()
    worker.log.info("Made Psycopg Green")

Oh, I'm actually using

def post_fork(server, worker):    
    from psycogreen.gevent import patch_psycopg
    patch_psycopg()

I'll try your suggestion.

Ok, I'll try opening a ticket with psycogreen

can you try to use the hook post_worker_init instead of post_fork?

Still occurs for me when patching psycogreen in post_worker_init.

I've submitted a ticket to psycogreen as well:
https://bitbucket.org/dvarrazzo/psycogreen/issue/2/databaseerror-used-with-asynchronous-query

@brockhaywood OK :/ Also found this link: https://bitbucket.org/dvarrazzo/psycogreen-hg/issue/1/databaseerror-execute-used-with . I am not sure if it could help you though.

I did take a look at that and it seems like the same problem but I didn't see a resolution, unfortunately.

@brockhaywood can you share any simple example of code that could help me to reproduce the issue? I will have a look on it tomorrow.

Here the simple project that I'm using to reproduce:
https://github.com/brockhaywood/gunicorn_gevent_issue

thanks!

On Sun, May 17, 2015 at 10:34 PM brockhaywood [email protected]
wrote:

Here the simple project that I'm using to reproduce:
https://github.com/brockhaywood/gunicorn_gevent_issue

—
Reply to this email directly or view it on GitHub
https://github.com/benoitc/gunicorn/issues/927#issuecomment-102852497.

@benoitc any thoughts on this?

@brockhaywood sorry i didn't have time to test it. Since I'm locked in a flight tomorrow, I will anyway :)

Thanks
On 26 May 2015 1:17 pm, "Benoit Chesneau" [email protected] wrote:

@brockhaywood https://github.com/brockhaywood sorry i didn't have time
to test it. Since I'm locked in a flight tomorrow, I will anyway :)

—
Reply to this email directly or view it on GitHub
https://github.com/benoitc/gunicorn/issues/927#issuecomment-105652947.

Was just experiencing the same DatabaseWrapper issue when I saw this thread.

I can confirm that removing the preload_app option from my gunicorn config file fixes the problem. I'm using:

  • Django==1.7.7
  • gevent==1.0.2
  • gunicorn==19.3.0
  • psycopg2==2.5.2
  • psycogreen==1.0
  • gunicorn wrapped by PgBouncer

(jumping in since we've been seeing the same issues as described in this thread)

It looks like these issues are being caused by the fix for issue #478 (don't monkey patch the master).

By not monkey patching the master, app preload fundamentally creates a mismatch if you're using gevent workers. Any app code which runs as part of the preload runs in an environment that is not monkey patched. Any app code which runs in the worker runs in a monkey patched environment.

For us, just restoring the setup method to the gevent worker class as it's defined in 18.x, so that the master is monkey patched and it is monkey patched prior to app preload, fixed all our issues. This includes the DatabaseWrapper bug, as well as another issue we were seeing where gevent workers would hang and timeout. No other changes were necessary (we do not have to disable monkey patching threads, for example).

I would still be very hesitant to monkey-patch the master process, but if you want to do it you may be able to use the server hooks instead of patching gunicorn itself. If you can do that, I would welcome any way we can document that somewhere that's better than here.

I think the short term plan for us is to create an (internal) gunicorn fork with this patch applied.

I'll investigate moving a monkey patch step into our wsgi code, but this still seems like something which should be done inside the gunicorn code (even as a configurable option). Patching in preloaded app code will affect the master one way or another, and the whole point would be to keep the monkey patched state of the workers in sync with the master. Applying the master monkey patch in app code means that app developers who depend on this behavior will be always playing catchup to whatever changes gunicorn makes to when/how it monkey patches the worker.

If it works to patch early, maybe that should be considered as a change.

I can't remember if that was ever the case and if that caused problems, but it's always tricky with monkey patching.

Our own production gunicorn application requires preloading and so also requires monkey-patching in the master (we use Pyramid's/paste's pserve script to launch the app and load gunicorn from an ini file; we have a wrapper around it to be sure that monkey-patching happens ASAP).

I'm a current gevent maintainer. I'm happy to look at any issues in gevent required to make this scenario work better. We're expecting to drop 1.1 soon-ish and it would be great if this was a supported option.

I tried applying the same monkey patch logic at the top of our wsgi implementation, and we started seeing the DatabaseWrapper issue. Not entirely sure why though - looking at the wsgi invocation in arbiter.py, that happens a few lines after the point where GeventWorker.setup will be called (as a side effect of the worker_class assignment. If I'm reading the code right.

I'll followup if I have a eureka or get lucky with regards to the app code, but so far for us, patching earlier in the gunicorn master code is the only fix that solves all our issues.

@jamadden thanks so much.

@jzupko it's usually important that the patching happens before any imports that might be affected. That's why I suggested to do it in a server hook, like by putting a post_fork function in a gunicorn.conf.py file. However, I just realized that the application preloading happens before any of the server hooks and it's not clear where one would do that.

@benoitc do we have any mechanism for worker-specific config?

I'm adding this to the release 20 milestone and suggesting that we consider monkey patching the master, maybe, if it can be done without re-introducing problems like #478.

@tilgovi you have the setup method of the worker that can be used for that I think.

Does anyone have a temporary fix for this until 20.0 is out? Or is the only solution to choose between disabling preload_app or downgrading to 18.0?

@fletom temporary fix for? Did you try the latest master? Did it works for you?

@benoitc I mean a fix for the "DatabaseWrapper objects" error.

Current gunicorn master doesn't work at all for me actually. On regular 19.6.0 I get this (normal):

[2017-01-05 15:46:32 -0500] [3222] [INFO] Starting gunicorn 19.6.0
[2017-01-05 15:46:32 -0500] [3222] [INFO] Listening at: http://127.0.0.1:8000 (3222)
[2017-01-05 15:46:32 -0500] [3222] [INFO] Using worker: gevent
[2017-01-05 15:46:32 -0500] [3226] [INFO] Booting worker with pid: 3226
[2017-01-05 15:46:32 -0500] [3227] [INFO] Booting worker with pid: 3227
[2017-01-05 15:46:32 -0500] [3228] [INFO] Booting worker with pid: 3228
[2017-01-05 15:46:32 -0500] [3229] [INFO] Booting worker with pid: 3229

And on latest master with the 100% exact same configuration I get this, and nothing listening on any port as far as I can tell:

[2017-01-05 15:47:19 -0500] [3308] [INFO] Starting gunicorn 19.6.0
[2017-01-05 15:47:19 -0500] [3308] [INFO] Listening at:  (3308)
[2017-01-05 15:47:19 -0500] [3308] [INFO] Using worker: gevent
[2017-01-05 15:47:19 -0500] [3312] [INFO] Booting worker with pid: 3312
[2017-01-05 15:47:19 -0500] [3313] [INFO] Booting worker with pid: 3313
[2017-01-05 15:47:19 -0500] [3314] [INFO] Booting worker with pid: 3314
[2017-01-05 15:47:19 -0500] [3315] [INFO] Booting worker with pid: 3315

Seems bugged?

@fletom how do you launch gunicorn?

@benoitc In this case, it's gunicorn -c gunicorn_conf.py <appname>.wsgi .

My gunicorn_conf.py is:

workers = 4

worker_class = 'gevent'

preload_app = False

I am stumbling upon this issue with gunicorn==18.0. In the gunicorn_conf.py I have worker_class = 'gevent', preload_app = True and custom pre_fork, on_starting and on_reload functions.

Putting these lines on top of the gunicorn_conf.py file seems to solve the issue for me:

import gevent.monkey
gevent.monkey.patch_thread()

Update:
I've had these lines in my gunicorn.conf. Getting rid of them solved the issue without monkey-patching.

import django
django.setup()

Because of them django initiated the connection before gunicorn's monkey-patching.

@gukoff proposed fix didn't work for me, I still get the same error 😞

I'm going to close this and open a discussion with the mailing list label on the R20 milestone for monkey-patching the arbiter in gevent. If you are having problems related to anything in this discussion and you think it might be a bug in Gunicorn, please open an issue. You can mention this one by number, or link to comments, if it helps add context, but at this point this is an old ticket with conversations in parallel about several versions of Gunicorn and different symptoms.

Was this page helpful?
0 / 5 - 0 ratings