Gunicorn: Gunicorn doesn't shutdown immediately on receiving SIGTERM

Created on 9 May 2016 · 16Comments · Source: benoitc/gunicorn

I am seeing a weird behavior where Gunicorn does not shutdown immediately when sent a SIGTERM. I see this with Python 3.5 on Ubuntu 16.04. The following environments (that I have access to) _do not_ exhibit this issue:

Python 2.7 on Ubuntu 16.04
Python 2.7 on Ubuntu 14.04
Python 3.4 on Ubuntu 14.04

Refer to https://gist.github.com/suriya/1d9a11ea9ad33a5b1df0b2ba7ed0cd0e on how to reproduce this issue.

After debugging a while, I found that the Gunicorn arbiter properly signals the child worker. The child workers handle the signal properly as well (on all versions). However, something prevents the loop in run_for_one function https://github.com/benoitc/gunicorn/blob/9d158bec94c91c62cd5cc28a27268a5b8804d795/gunicorn/workers/sync.py#L58 from exiting right away with Python 3.5.

Could Python 3.5's change in the behavior of select.select() https://docs.python.org/3/library/select.html#select.select explain this issue.

( FeaturWorker help wanted

Source

suriya

Most helpful comment

For those who arrive here and wonder in what version in gunicorn this was fixed, it's 19.5.0

confiq on 19 Jun 2018

👍3

All 16 comments

Do you have open connections when this happens? If so, that is expected. SIGTERM is a graceful shutdown and subject to the --graceful-timeout setting (default 30s).

At present, at least on the master branch, I observe a worker exiting within 1s of sending SIGTERM to the arbiter, tested on Python 3.5.

tilgovi on 9 May 2016

There is no open connection. As shown in the gist, All that the script does is start gunicorn and send a SIGTERM.

suriya on 10 May 2016

Happens to me with the master branch as well. This is with Python 3.5 on Ubuntu 16.04. Maybe, this happens because of Ubuntu.

$ python --version
Python 3.5.1+
$ gunicorn --version
gunicorn (version 19.5.0)
$ time ./run-and-stop-gunicorn.sh 
[2016-05-10 01:32:59 +0000] [8537] [INFO] Starting gunicorn 19.5.0
[2016-05-10 01:32:59 +0000] [8537] [INFO] Listening at: http://127.0.0.1:8000 (8537)
[2016-05-10 01:32:59 +0000] [8537] [INFO] Using worker: sync
[2016-05-10 01:32:59 +0000] [8541] [INFO] Booting worker with pid: 8541
[2016-05-10 01:33:00 +0000] [8537] [INFO] Handling signal: term
[2016-05-10 01:33:30 +0000] [8537] [INFO] Shutting down: Master

real    0m31.102s
user    0m0.320s
sys 0m0.012s

suriya on 10 May 2016

The same behavior happens with Python 3.5, Gunicorn (master, commit 9d158bec94c91c62cd5cc28a27268a5b8804d795), Centos 7.

$ time ./run-and-stop-gunicorn.sh 
CentOS Linux release 7.2.1511 (Core) 
Python 3.5.1
[2016-05-10 04:25:12 +0000] [1448] [INFO] Starting gunicorn 19.5.0
[2016-05-10 04:25:12 +0000] [1448] [INFO] Listening at: http://127.0.0.1:8000 (1448)
[2016-05-10 04:25:12 +0000] [1448] [INFO] Using worker: sync
[2016-05-10 04:25:12 +0000] [1452] [INFO] Booting worker with pid: 1452
[2016-05-10 04:25:13 +0000] [1448] [INFO] Handling signal: term
[2016-05-10 04:25:43 +0000] [1448] [INFO] Shutting down: Master

real    0m31.109s
user    0m0.231s
sys 0m0.053s

Python 3.4 does not have this issue.

$ time ./run-and-stop-gunicorn.sh 
CentOS Linux release 7.2.1511 (Core) 
Python 3.4.4
[2016-05-10 04:24:09 +0000] [1386] [INFO] Starting gunicorn 19.5.0
[2016-05-10 04:24:09 +0000] [1386] [INFO] Listening at: http://127.0.0.1:8000 (1386)
[2016-05-10 04:24:09 +0000] [1386] [INFO] Using worker: sync
[2016-05-10 04:24:09 +0000] [1390] [INFO] Booting worker with pid: 1390
[2016-05-10 04:24:09 +0000] [1386] [INFO] Handling signal: term
[2016-05-10 04:24:09 +0000] [1390] [INFO] Worker exiting (pid: 1390)
[2016-05-10 04:24:09 +0000] [1386] [INFO] Shutting down: Master

real    0m1.103s
user    0m0.240s
sys 0m0.058s

suriya on 10 May 2016

You're right. I do observe this on Python 3. Perhaps I made a mistake earlier when I was testing.

tilgovi on 10 May 2016

I also agree with the diagnosis.

tilgovi on 10 May 2016

I don't believe this affects any of the async workers or the threaded worker.

tilgovi on 10 May 2016

PEP 475 https://www.python.org/dev/peps/pep-0475/#backward-compatibility has some information how this should be addressed.

Applications relying on the fact that system calls are interrupted with InterruptedError will hang. The authors of this PEP don't think that such applications exist, since they would be exposed to other issues such as race conditions (there is an opportunity for deadlock if the signal comes before the system call). Besides, such code would be non-portable.

In any case, those applications must be fixed to handle signals differently, to have a reliable behaviour on all platforms and all Python versions. A possible strategy is to set up a signal handler raising a well-defined exception, or use a wakeup file descriptor.

For applications using event loops, signal.set_wakeup_fd() is the recommanded option to handle signals. Python's low-level signal handler will write signal numbers into the file descriptor and the event loop will be awaken to read them. The event loop can handle those signals without the restriction of signal handlers (for example, the loop can be woken up in any thread, not just the main thread).

suriya on 10 May 2016

Just tested it and was able to reproduce it. At least for me gunicorn exit is delayed. Removing the --timeout arguments makes the problem gone.

After debugging a little, it appears that after receiving the signal, the select is indeed blocking there:
https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/sync.py#L35

I am not sure yet how to solve the issue, I will reread the PEP 475 carefully this morning to see if it can give a solution, thanks for the link!

benoitc on 10 May 2016

@suriya @tilgovi please test #1257 and let me know :)

benoitc on 10 May 2016

The test script run-and-stop-gunicorn.sh behaves identically with Python 3.5 and Python 3.4. The pull request addresses the SIGTERM issue.

@benoitc Are there other parts of the gunicorn that break because of PEP 475?

suriya on 10 May 2016

@suriya not to my knowledge, I tested with other workers and it was fine.

benoitc on 10 May 2016

@suriya so the patch is fixing your issue right?

benoitc on 10 May 2016

@benoitc Yes, absolutely fixing the issue. Thank you.

I was just wondering if there would be other places that might need to be fixed since the effects of PEP 475 seem to be pervasive.

suriya on 10 May 2016

@suriya normally not, but will keep an eye on it :-)

benoitc on 10 May 2016

For those who arrive here and wonder in what version in gunicorn this was fixed, it's 19.5.0

confiq on 19 Jun 2018

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Memory increase after service run a long time

haolujun · 3Comments

Gunicorn conf file, environment variables syntax

leonardbinet · 4Comments

Default SSL ciphers setting disables the most secure ciphers

lordmauve · 3Comments

disable health check logging in access logging

thomasjungblut · 3Comments

How do I pass an arg to the application that gunicorn runs?

mw44118 · 4Comments