I am having issue very similar to #291. Application processes (gunicorn based) are managed by supervisor and after fresh deploy is done, supervisored processes are ordered to restart.
In most cases, this works as expected but for one specific application this always fails and prevents application to restart properly. The old gunicorn processes hang up blocking ports for new ones. After few minutes, they finally die but still, this is very inconvenient since it causes unavailability in the app for too long time.
Supervisor config for the app is following:
[program:iw2_admin]
command=/srv/fragaria/iw2/bin/gunicorn --name=gunicorn_iw2_admin --bind=10.0.0.50:13000 --workers=2 --max-requests=5000 --timeout=500 --user=www-data --group=www-data --worker-class sync --worker-connections 1000 iw2.wsgi:application
environment=DJANGO_SETTINGS_MODULE='iw2.admin.settings',LANG='cs_CZ.utf8',LC_ALL='cs_CZ.UTF-8',LC_LANG='cs_CZ.UTF-8'
redirect_stderr=True
stdout_logfile=/var/log/supervisor/iw2_admin.log
Gunicorn's log doesn't show anything interesting but this:
2013-04-26 11:35:16 [30352] [INFO] Starting gunicorn 0.15.0
2013-04-26 11:35:16 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:16 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:17 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:17 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:18 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:18 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:19 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:19 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:20 [30352] [INFO] Listening at: http://10.0.0.50:13000 (30352)
2013-04-26 11:35:20 [30352] [INFO] Using worker: sync
2013-04-26 11:35:20 [30355] [INFO] Booting worker with pid: 30355
2013-04-26 11:35:20 [30356] [INFO] Booting worker with pid: 30356
The ERRORs are repeated for quite a while as mentioned above...
Env:
Python 2.6.6
Debian squeeze
Gunicorn 0.15.0
I might try to fix it by using fresh version of gunicorn, do you that it might be the solution? Don't wanna risk upgrading if it won't help anyway.
Any progress on this?
Same issue. Bump!
@a2 what is your command line & version ?
@xaralis using the 0.17.x version is always better yes. This is actually the version supported. Anyway what if you add the setting stopsignal = QUIT
in your program section?
@benoitc
supervisor config:
[program:gunicorn]
command=/srv/example.com/www/start.sh
process_name=%(program_name)s
directory=/srv/example.com/www
user=web
autostart=true
autorestart=true
redirect_stderr=true
stopsignal=KILL
[program:watchmedo]
command=/usr/local/bin/watchmedo shell-command --patterns "*.py;*.txt;*.scss" --recursive --command='/usr/local/bin/supervisorctl restart gunicorn' /srv/example.com/www
process_name=%(program_name)s
directory=/srv/example.com/www
autostart=true
autorestart=true
redirect_stderr=true
start.sh:
#!/bin/bash
/usr/local/bin/compass compile --boring --trace
source venv/bin/activate
pip install -r requirements.txt
if [ -e "env.sh" ]
then
source env.sh
fi
gunicorn app:app -c gunicorn.conf.py
I've gathered that the problem is that some of the worker processes are still using port 7200 but aren't killed by supervisor when the process restarts? I really have no idea. I'm sort of a noob but I'm trying to learn quickly.
Thanks so much for your speedy response, Benoit.
@a2 cam you replace the line stopsignal=KILL
by stopsignal=QUIT
in your config and let me know about the results?
@benoitc If I change that line it makes no difference. If I touch
a file monitored by watchmedo
, then I get the same errors because the gunicorn
process was restarted. I have to stop supervisord
, pkill gunicorn
, and then start supervisord
to stop the errors.
It sounds like you have long lived connections and the high timeout combined with graceful restart is causes workers to exit slowly.
Try TERM or INT instead of QUIT.
@tilgovi Same result.
what do you mean by restarting supervisor? Sending an HUP? in that case I
remember there is a setting to send an hup signal on reload. Or maybe this
is just in gaffer.
Also if you can i would use dystemd that can pass a socket to gunicorn in
latest version.
On Saturday, May 11, 2013, Alexsander Akers wrote:
@tilgovi https://github.com/tilgovi Same result.
—
Reply to this email directly or view it on GitHubhttps://github.com/benoitc/gunicorn/issues/520#issuecomment-17752060
.
signals have been switched in 81241907ffcf94517ffa14b8427205906b61b540 . closing this issue, thanks for the feedback!
@a2 it's because your bash script (via bash shell process) gets supervised, not gunicorn. Try using exec gunicorn ...
in your script
I know this is an old thread, but for anyone else who has landed here and is using make + gunicorn + supervisor, the above comment is the solution. Supervisor needs the specific gunicorn command to be able to kill the process- providing a make command that runs gunicorn will not kill the process. Something to do with make having its own shell, maybe.
@fillest Thanks for the exec trick.
@tilgovi the long running connections were my issue after running svc -d
on my process.
@fillest is also correct that an exec
is required
I haven't looked at this in a while. If the documentation or examples for supervisord need any changes, please make a PR. Thanks!
using exec is the key
excellent suggestion!
@abhijeetsangwan if something needs to change in examples/supervisor.conf please make a pull request.
Most helpful comment
@a2 it's because your bash script (via bash shell process) gets supervised, not gunicorn. Try using
exec gunicorn ...
in your script