Hi!
The documentation doesn't state clearly enough, how the async workers are different from the sync ones and what should be done by a programmer to make use of their difference.
_I assume, that asynchronous workers are spawned in separate processes based on the pre-fork model._
Say, we want to see a difference between a sync
and a gevent
worker classes in an example of a simple application. Here go four scenarios:
The application accepts a request, makes 10 external calls using the requests
library and returns an array of 10 responses.
_My assumption_: there is no difference, the class of the workers doesn't matter.
The application calls gevent.monkey.patch_all()
in the pre_fork()
function of the master process. Then the first scenario takes place: the app accepts a request, makes 10 external calls using the requests
library and returns an array of 10 responses.
_My assumption_: synchronous workers implicitly turn into gevent workers.
The same as the Scenarioâ„–2, but the monkey-patch is called in a worker.
_My assumptions_:
gevent.monkey.patch_all()
doesn't affect the way workers listen on the socket. Synchronous workers don't turn into gevent workers and don't accept new calls until the previous are handled.requests
occurs. The allowed count of concurrently handled calls is capped by worker_connections setting. That's the only difference.The application accepts a request, spawns 5 gevent jobs and joins them; their 5 responses will be the result. After that it spawns another 10 jobs, doesn't join them and returns immediately.
_My assumptions_:
gevent. joinall(...)
will be called and they might be scheduled to be executed.gevent. joinall(...)
, after having any of the jobs done, etc.).I feel, many of these assumptions must be wrong. Could you please correct me and expand on it in the documentation respectively?
This page describe the design an give some informations about the workers:
http://docs.gunicorn.org/en/stable/design.html
I will answer in a generic manner if it's OK for you. Hopefully it will give you enough hint to answer yourself to the scenarios above.
If you run gunicorn behind a proxy that buffer the connection the key point is not the number the number of connections that can accept gunicorn,but rather the number of slow connections (worker that do a huge task, connecting to an api, a database ...) or connections that will be used for a long time (longpolling, ..) . In such case an async worker is advised. When you return quite immediately, then a sync worker is enough and in most case when a database is locale or in the same network with a low latency it can also be used.
If your run gunicorn standalone. Then you will need a threaded or an async worker if you expect to have a lot of concurrent connections. Increasing the number of worker when using a sync worker is also enough sometimes when you don't expect a large number of connectios or can handle some latency in the responses.
I will also add that monley patching add some border effects to your application which cna be an issue or not. Using other async workers don't suffer such border effects. At least for the tornado and threaded workers.
@benoitc thanks for your answer!
I've already read the docs. Essentially, my point is that the docs are way to short. There are important implementation details, which aren't mentioned yet.
Firstly, it came as a surprise to me, that gevent-workers implicitly call gevent.monkey.patch_all()
. It is quite a rough strategy, unacceptable in many cases. There should be another type of gevent workers, which simply listen on a gevent socket and don't monkey-patch anything. And this behaviour isn't explicitly documented. It's also important to know, whether the main process gets monkey-patched as well as the worker processes.
Secondly, it's not very clear, how the max-requests
option works. Say, if given, does it use the graceful_timeout
option? If so, how does the graceful_timeout
option work? Does it make a worker stop accepting new requests, or it's up to a developer?
Thirdly, how exactly does gunicorn restart after the HUP
signal? The documentation states as follows:
HUP: Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers
So, in case I have a server with 30 workers, a long-running pre_fork
function(1 minute) and the graceful timeout of 20 seconds, what are the actions after the HUP
? I suppose, they are:
1) Reload the application and configuration in the master process;
2) run the pre_fork
function in the master process. Wait a minute for it to finish. Don't touch the workers;
3) fork 30 new workers. Let them work together with the older ones. In other words, for a short period of time consume double RAM and let 60 workers run on the same socket;
4) gracefully shutdown the older workers. Give them 20 seconds to handle the pending queries and terminate.
Am I right?
Fourthly, what happens if the master process is sent with two HUP
signals
at the same time? Are they put is some kind of signal queue and handled consecutively? What about other signals?
Fifthly, has the recommendation about 2*CORES + 1
workers something to do with asynchronous workers? I think, that the gevent workers are expected to utilise CPU to the limit and never wait in any IO-bound tasks, and ~CORES
workers are OK. Otherwise the load isn't high enough and the number of workers can be even lower.
And so on.
@gukoff Firstly, it came as a surprise to me, that gevent-workers implicitly call gevent.monkey.patch_all(). It is quite a rough strategy, unacceptable in many cases. There should be another type of gevent workers, which simply listen on a gevent socket and don't monkey-patch anything. And this behaviour isn't explicitly documented. It's also important to know, whether the main process gets monkey-patched as well as the worker processes.
Can you open a ticket about it?
to answer on your last questions: max_requests is used when you know yuou will have to recycle (kill the current worker) at some point. It's useful when you know that your worker is leaking some memory, or need to be reset at some point.
Hooks must be processed fast If not, you may either block the worker or the arbiter preventing any other scheduled actions.
The arbiter queue the signals so 2 HUP will be handled concurrently.
N2+1 is a generic rule to load balance the sockets between the workers between CPUs/cores especially useful for the sync worker.
Actually, I stumbled into this venue of questioning recently at work and was rather badly bitten by mixing gevent with gunicorn without considering it's Python too.
@benoitc my simple question is this, assuming I'm using SyncWorker as my worker, and somewhere in the code serving request I call monkey.patch_all
- how far up the component tree will this patch_all
go. Will it patch the SyncWorker for other requests too, effectively making it gevent worker?
@mdomans In what way were you bitten by mixing gevent and gunicorn?
(I'm curious because we've been using gunicorn+gevent successfully for 2 years now.)
I used gevent based grequests library which calls monkey.patch_all
. This in turn resulted in a lot of socket errors for other requests.
Important note: we use SyncWorkers and I needed gevent to be very precisely scoped only to one function. As it turns out, the patching somehow leaked out.
@RonRothman curious to talk about your architecture :)
closing the issue, superseded by #1746
Most helpful comment
@benoitc thanks for your answer!
I've already read the docs. Essentially, my point is that the docs are way to short. There are important implementation details, which aren't mentioned yet.
Firstly, it came as a surprise to me, that gevent-workers implicitly call
gevent.monkey.patch_all()
. It is quite a rough strategy, unacceptable in many cases. There should be another type of gevent workers, which simply listen on a gevent socket and don't monkey-patch anything. And this behaviour isn't explicitly documented. It's also important to know, whether the main process gets monkey-patched as well as the worker processes.Secondly, it's not very clear, how the
max-requests
option works. Say, if given, does it use thegraceful_timeout
option? If so, how does thegraceful_timeout
option work? Does it make a worker stop accepting new requests, or it's up to a developer?Thirdly, how exactly does gunicorn restart after the
HUP
signal? The documentation states as follows:So, in case I have a server with 30 workers, a long-running
pre_fork
function(1 minute) and the graceful timeout of 20 seconds, what are the actions after theHUP
? I suppose, they are:1) Reload the application and configuration in the master process;
2) run the
pre_fork
function in the master process. Wait a minute for it to finish. Don't touch the workers;3) fork 30 new workers. Let them work together with the older ones. In other words, for a short period of time consume double RAM and let 60 workers run on the same socket;
4) gracefully shutdown the older workers. Give them 20 seconds to handle the pending queries and terminate.
Am I right?
Fourthly, what happens if the master process is sent with two
HUP
signalsat the same time? Are they put is some kind of signal queue and handled consecutively? What about other signals?
Fifthly, has the recommendation about
2*CORES + 1
workers something to do with asynchronous workers? I think, that the gevent workers are expected to utilise CPU to the limit and never wait in any IO-bound tasks, and~CORES
workers are OK. Otherwise the load isn't high enough and the number of workers can be even lower.And so on.