Supervisor: exit code allway zero

Created on 26 Jun 2011  Â·  33Comments  Â·  Source: Supervisor/supervisor

I want to use the exit code for nagios but it's allways 0.

ecample 1:
zope@plone:~> ./bin/supervisorctl status
http://127.0.0.1:9001 refused connection
zope@plone:~> echo $?
0

Could you change exit LSB conform ?
http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

example 2:
./bin/supervisorctl status
balancer RUNNING pid 23753, uptime 0:00:50
instance0 RUNNING pid 23751, uptime 0:00:50
instance1 EXITED Jun 26 05:40 PM
varnish RUNNING pid 23754, uptime 0:00:50
zeo RUNNING pid 23750, uptime 0:00:50
zope@plone:~> echo $?
0

supervisorctl

Most helpful comment

Is there any chance of backporting this to 3.x version? I'd be happy to help.

All 33 comments

See also issue #116.

FWIW supervisorctl pid used to exit with return code 2 when the daemon was not running in supervisor 3.0a8 (as packaged for Ubuntu 12.10), but this regressed back to return code 0 with supervisor 3.0b1 (latest on PyPI).

I looked into that one specifically. The pid command in supervisorctl never intentionally set an exit code in any version. The other commands do not set exit codes either. The only condition that causes exit code 2 is when an unhandled exception occurs inside supervisorctl.

A bug in the pid command caused an unhandled exception, and that's why you saw pid return exit code 2 when the daemon was not running. The bug in pid has since been fixed, and that's why it no longer behaves this way (instead it now behaves like the other commands).

I agree we should set meaningful exit codes for the commands in a future version.

Any word on accepting the pull request @GotenXiao submitted? Always exiting with a 0 means automating service bounces with tools like fabric will have no way to tell that the bounce attempt failed.

+1

I think you can work around this issue using a plugin. Something like this: https://gist.github.com/dnephin/f61646745bd004afba8b

I took a stab at this starting where the previous patch left off. I turned every error message into an exit. It's a bit crude, but thought I might share. I didn't attempt to rewrite the unittests because I figure someone might take a different approach anyways.

https://github.com/lukeweber/supervisor/commit/04d47d019198287af4c4f09bc201b52dd1a149cf

Why is this marked as "enhancement"? it's a bug. Supervisor should return an exit code different than 0 when the operation requested by user failed.

For example when the daemon is not running (so when the socket is not even there, if you are using a socket), supervisorctl still returns 0.

@lukeweber I am going to try your patch, thanks

+1 - this is a bug

This ticket is years old and concerns a serious shortcoming. Could we bump it's priority or provide some kind of sensible workaround?

@exi I have been using @lukeweber's patch, but obviously already switching to runit for the long term/production.

Labeling this as an enhancement makes absolutely no sense.

+1 - definitely a bug

+1

This issue makes us cry at least once a week. As supervisord is not returning anything but 0, ansible has to parse the "human" strings to check the state.

+1 — this is a serious issue that makes it very difficult to integrate supervisor with our deployment infrastructure

+1 - That's a serious bug. This shoudn't be seen as a low priority enhancement

It difficults the integration of supervisord with other automatization tools (rather than checking the return code, the only generic way to know if the command failed is to analyze the supervisorctl output to stdout....).

+1, this is a bug

+1 for getting this fixed. It's pretty unusable for production for me without the patch. I've been using my patched version for a year now in production. @mnaberez would you merge this if there were tests or is there some other holdup? At 4 years pending I'm a bit curious why it's still not done. I recall it looking like a bit of a rewrite would be in order to test exit codes adequately.

@mnaberez would you merge this if there were tests or is there some other holdup?

I'm not sure what you are referring to. This is an issue not a pull request. As for tests, this project doesn't typically merge patches without tests.

@lukeweber has a commit with a patch (lukeweber@04d47d0), but he has not filed a pull request I think.

I know of at least two PRs - mine (https://github.com/Supervisor/supervisor/pull/620) and one from @GotenXiao (https://github.com/Supervisor/supervisor/issues/292).

Mine needs tests which I have not gotten to yet. I have not looked at the changes from @GotenXiao or @lukeweber. I guess I'm curious which of the 3 is closest to what is desired and then I'll focus on the best one if I get some time.

At first glance, lukeweber@04d47d0 looks comprehensive and seems to have tests, so maybe it's a better starting point than my own PR.

@lukeweber: Do you want to submit a PR?

@mnaberez -

  1. Is supervisorctl.py code used for anything besides single commands or interactive mode(two states), or is it reused in libraries (third state - not interactive, raise exceptions and don't exit on errors)?

Implementation could be the following that I think would work for both:

  1. implement handle_error(message=None, fatal=False)
  2. Wrap any output that contains "Error" or "ERROR" in supervisorctl.py with a handle_error(message=...) method.
  3. Wrap any blank "raise" with handle_error(fatal=True)
  4. Add an option exit_on_error in main() in supervisorctl.py defined as !interactive
  5. Create duplicate tests for all tests that check for different Exceptions in test_supervisorctl.py to check for SystemExit when options.exit_on_error = True

Handle error example code:

    def handle_error(self, message=None, fatal=False):
        if message:
            self.ctl.output(message)

        if self.ctl.options.exit_on_error:
            raise SystemExit(2)
        elif fatal:
            raise

Would be cool if anyone wanted to test out the latest code. It's just such a chunk would feel better if a few people ran through some real world cases.

@lukeweber what about adding some script-based integration tests? I can see that a simple command like sh -c 'sleep 3 && true' and sh -c 'sleep 3 && false' could do the job

also very surprised that such a basic thing isn't working yet. Ping https://github.com/cockroachdb/cockroach-prod/issues/59

+1 for a severe bug of supervisorctl

from 2011 till now 2016 why hasn't it be fixed yet?

+1 in which version of supervisor will this be available?
pretty shocking this took 5 years to be fixed...

Also curious what version of supervisor this bug will be fixed in.

Thanks supervisor contributors for making such a useful tool!

This has been merged into the master branch which will be Supervisor 4.0.

Is there any chance of backporting this to 3.x version? I'd be happy to help.

I use supervisorctl status cron | grep RUNNING instead for a while on 3.x. It returns 1 when got Fatal.

Hi, the current docs say the following:

supervisorctl status all would return non-zero if any single process was not running

However this doesn't seem to be the case when I try it. Are the docs incorrect or is there something else going on. Is there any version of supervisor where supervisorctl returns non-zero exit codes?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gregpinero picture gregpinero  Â·  29Comments

flaugher picture flaugher  Â·  30Comments

lra picture lra  Â·  60Comments

mminer picture mminer  Â·  41Comments

mkotsalainen picture mkotsalainen  Â·  28Comments