Supervisor: Running `supervisorctl restart <name>` causes xmlrpclib.Fault

Created on 9 Nov 2011  路  20Comments  路  Source: Supervisor/supervisor

Sometimes (for some reason not all the time), running sudo supervisorctl restart someservice returns:

<class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib/python2.6/xmlrpclib.py line: 838

In some odd cases this kills supervisord completely, printing "unix:///var/run/supervisor.sock no such file" whenever I run a command in supervisorctl manually. /etc/init.d/supervisor restart won't fix it, rather I need /etc/init.d/supervisor stop && /etc/init.d/supervisor start to get it back running.

I'm using supervisor 3.0a8 on Ubuntu 10.04.3 LTS (Kernel 2.6.32-21-server) 64bit.

Any idea how i can make the restart command not kill my supervisor and actually work?

supervisorctl

Most helpful comment

solution, kill process!

ps -aux|grep supervisor
kill -9 the_pid
supervisord
supervisorctl

All 20 comments

The SHUTDOWN_STATE fault will be returned to supervisorctl when any command is issued but supervisord is no longer accepting commands because it is in the process of either shutting itself down or restarting itself. It is normal to encounter this after the supervisorctl shutdown or supervisorctl reload commands are issued.

In some odd cases this kills supervisord completely, printing "unix:///var/run/supervisor.sock no such file" whenever
I run a command in supervisorctl manually

It is normal to see this in supervisorctl if you have configured it to communicate with supervisord via a domain socket and the file has gone away. This usually means that supervisord has exited.

Since you first see the SHUTDOWN_STATE fault and then supervisorctl can't connect at all, it sounds like supervisord was in the process of shutting itself down and then finally was shut down.

Any idea how i can make the restart command not kill my supervisor ... ?

I'm not aware of any code path where restarting a subprocess with supervisorctl restart <name> would trigger supervisord to start shutting itself down.

Could it be that the supervisorctl shutdown or supervisorctl reload commands were issued?

yes indeed, just before calling supervisorctl restart I issue supervisorctl reload. This is all part of our automated deployment. I've added a 3 second sleep between the two and that works most of the time, but not all the time.

Was there any development on this issue ? I currently have the same problem.

When I run the command manually it works every time. When it's my deployment script it faults every time.

Just in case anybody stumbles onto this - got the same error, but it was because the filesystem was mounted readonly after a fs error.

solution, kill process!

ps -aux|grep supervisor
kill -9 the_pid
supervisord
supervisorctl

Can we get a less-useless error message reported by supervisorctl?

Perhaps something like:

Supervisor is restarting, please wait a few minutes or find and kill the supervisor process manually.

solution, kill process!

ps -aux|grep supervisor
kill -9 the_pid

kill -9 will orphan any child processes running under supervisord and is not recommended. If you have issued the supervisorctl shutdown command but supervisord has not exited, it is almost always because it is waiting for its child processes to exit. The log will have messages during shutdown.

Could you put that explanation in the error log so it's clear why supervisorctl won't work as expected?

It's already there. If supervisord is blocking shutdown waiting for child processes to die, it prints "waiting for processname to die" in the log at regular intervals.

It's there, in the logs, but not in the console output from supervisorctl.

Here's what it looks like:

# supervisorctl status
<class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib/python2.6/xmlrpclib.py line: 838

I'm saying that it should print something informative instead of just a cryptic error code and code reference. You could check the log for more information, but the tool itself should say things that make sense to the user.

That sounds like a pretty reasonable addition to supervisorctl to me. I'll leave this issue open, feel free to open a pull request.

I meet this issue just now , it occurs atfer runing supervisorctl reload.And I have to kill supervisord to solve this problem

I meet this issue just now , it occurs atfer runing supervisorctl reload.
And I have to kill supervisord to solve this problem

Instead of killing supervisord, look at its log for information about why it is blocking shutdown and address that root problem.

If you see messages in the log like waiting for processname to die, you may need to change the signal sent to stop the process (stopsignal), or the amount of time supervisord waits before resorting to sending SIGKILL to the process (stopwaitsecs).

Example: if stopsignal=TERM but the process doesn't exit after it receives SIGTERM, and stopwaitsecs=90, then supervisord is going to block shutdown for a full 90 seconds while it waits for the process to exit.

kill supervisord
and restart works for me

Hi guys, I noticed a similar issue and filed it here: https://github.com/Supervisor/supervisor/issues/1041

I guess that issue was closed as a duplicate, so let me repeat what I think are the relevant parts here:

I'm pretty sure this is what is happening, though I don't follow enough of the XMLRPC code to verify it, can anyone else?

  1. Supervisord process shuts down
  2. Meanwhile, a new supervisord process is started, with a new XMLRPC server
  3. Shutdown signal from (1) is sent to the new XMLRPC server, shutting down the new supervisord process

This seems like a race condition that supervisor should guard against.

supervisorctl shutdown commands supervisord to shut down and it then returns immediately. It doesn't currently have a way to wait until supervisord has actually exited. This issue is still open because it probably should.

As described above, the SHUTDOWN_STATE error is returned if supervisord receives any more commands after it has already been commanded to shut down. It is refusing to process any more commands because it is in the middle of shutting down.

A workaround is to insert some time delay between when you run supervisorctl shutdown (or supervisorctl reload) before you try another command. It may take supervisord several seconds or even minutes to shut down, depending on the largest value of stopwaitsecs in the config file. You must wait longer than that to ensure it has fully shut down or reloaded.

FYI: I get this without a shutdown or reload :(

After exploring a bit, I can see that it only comes immediately after sending a HUP to one of the managed programs (gunicorn), very strange.

systemctl restart supervisord.service
supervisorctl status
Was this page helpful?
0 / 5 - 0 ratings

Related issues

mminer picture mminer  路  41Comments

lra picture lra  路  60Comments

flaugher picture flaugher  路  30Comments

jvanasco picture jvanasco  路  46Comments

ivan1986 picture ivan1986  路  49Comments