I'm running 3.0b1 and noticed the following error. With the program:
[program:foo]
command=ls
autorestart=False
The process repeatedly restarts
2013-01-29 18:21:10,434 INFO RPC interface 'supervisor' initialized
2013-01-29 18:21:10,434 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2013-01-29 18:21:10,435 INFO supervisord started with pid 18680
2013-01-29 18:21:11,437 INFO spawned: 'foo' with pid 18941
2013-01-29 18:21:11,447 INFO exited: foo (exit status 0; not expected)
2013-01-29 18:21:12,449 INFO spawned: 'foo' with pid 18943
2013-01-29 18:21:12,461 INFO exited: foo (exit status 0; not expected)
2013-01-29 18:21:14,464 INFO spawned: 'foo' with pid 18966
2013-01-29 18:21:14,473 INFO exited: foo (exit status 0; not expected)
2013-01-29 18:21:17,476 INFO spawned: 'foo' with pid 18969
2013-01-29 18:21:17,484 INFO exited: foo (exit status 0; not expected)
2013-01-29 18:21:18,485 INFO gave up: foo entered FATAL state, too many start retries too quickly
The unexpected options seems to fail too:
[program:foo]
command=ls
autorestart=unexpected
exitcodes=0
2013-01-29 18:22:48,026 INFO RPC interface 'supervisor' initialized
2013-01-29 18:22:48,027 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2013-01-29 18:22:48,027 INFO supervisord started with pid 19955
2013-01-29 18:22:49,029 INFO spawned: 'foo' with pid 19960
2013-01-29 18:22:49,038 INFO exited: foo (exit status 0; not expected)
2013-01-29 18:22:50,041 INFO spawned: 'foo' with pid 19961
2013-01-29 18:22:50,048 INFO exited: foo (exit status 0; not expected)
2013-01-29 18:22:52,050 INFO spawned: 'foo' with pid 19962
2013-01-29 18:22:52,058 INFO exited: foo (exit status 0; not expected)
2013-01-29 18:22:55,063 INFO spawned: 'foo' with pid 19972
2013-01-29 18:22:55,072 INFO exited: foo (exit status 0; not expected)
2013-01-29 18:22:56,073 INFO gave up: foo entered FATAL state, too many start retries too quickly
I believe it's because of the startretries setting. After reaching X number of retries, the state will become FATAL.
http://supervisord.org/configuration.html#program-x-section-settings
Yes setting startretries to 0 will cause the process to never restart. This is the workaround I am using. But autorestart and exitcodes should work as one would expect otherwise what is the point of having them?
Same bug here, exitcodes is completely ignored (default to 0).
[program:myworker]
command=...
autorestart=true
...
supervisord.log: exited: myworker (exit status 0; not expected)
supervisord.log: gave up: myworker entered FATAL state, too many start retries too quickly
The exit status 0 shouldn't be not expected since it is by default (http://supervisord.org/configuration.html#program-x-section-settings).
Just curios, but does this happen with processes that daemon-ize and start in less than a second(the default value of startsecs=1)
@smschauhan It happens for me why my worker indeed starts and exits in less than 1s.
What the worker does is: start, wait for a task, process it, exit. If there are severals tasks queued, then the worker might restart 3 times, each time taking less than 1 second. So supervisor will mark it as "failed" and will not restart it again.
(my current workaround is to have wait(1) in my worker to wait 1 second, which is not really good)
+1
+1
Having the same issue:
[program:configure]
command=python /root/configure.py
autorestart=unexpected
priority=0
exitcodes=0
When I run it, I get this:
INFO exited: configure (exit status 0; not expected) and it attempts to restart the process.
So it seems that exitcodes are completely ignored
+1
Need to ensure startsecs = 0:
[program:foo]
command = ls
startsecs = 0
autorestart = false
http://supervisord.org/configuration.html
startsecs
The total number of seconds which the program needs to stay running after a startup to consider the start successful. If the program does not stay up for this many seconds after it has started, even if it exits with an “expected” exit code (see exitcodes), the startup will be considered a failure. Set to 0 to indicate that the program needn’t stay running for any particular amount of time.
I also got hit by this. The output is confusing! It should say that it exited too soon, not that 0 was unexpected!
I had to set startsecs=0 in docker to get anything to start successfully with supervisor. Otherwise it would seemingly retry 'very' quickly and endlessly even though the original ones were actually running. Running 3.0b2-1 on ubuntu. I can't find evidence trying to set autorestart=false did anything at all.
+1
+1
+1
+1 Same issue here in docker.
exit codes list is completely ignored while using autorestart=unexpected
+1
_EDIT: even though the exit code is 0, the program is daemonizing itself_
this is not how supervisor expects a program to behave, it should run in the foreground
see http://supervisord.org/subprocess.html#nondaemonizing-of-subprocesses
END EDIT
+1 on docker
below is an example for rsyslog
2015-06-30T09:45:28.622083575Z 2015-06-30 09:45:28,621 INFO exited: rsyslogd (exit status 0; not expected)
2015-06-30T09:45:29.623299820Z 2015-06-30 09:45:29,623 INFO gave up: rsyslogd entered FATAL state, too many start retries too quickly
with supervisor config:
[supervisord]
nodaemon=true
[program:rsyslogd]
command=/usr/sbin/rsyslogd
autostart=true
autorestart=false
startretries=0
and in the docker instance the running processes:
root@5d8962193676:/var/log# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 47296 12676 ? Ss+ 09:45 0:00 /usr/bin/python /usr/bin/super
root 5 0.0 0.0 20228 1988 ? Ss 09:45 0:00 /bin/bash
root 11 0.0 0.0 91164 4460 ? S 09:45 0:00 nginx: master process /usr/sbi
root 14 0.0 0.0 184924 1196 ? Ssl 09:45 0:00 /usr/sbin/rsyslogd
Clearly a bug. I've just experienced the same where
web_1 | 2015-07-22 23:25:39,985 INFO exited: tmpcreator (exit status 0; not expected)
despite 0 being the default exit code and works exactly the same if I exclusively set exitcodes=0.
My workaround was also setting startsecs=0.
Clearly a bug.
You're not giving the Supervisor developers much to work with here (you're definitively saying you have found a bug but have only provided one line of log output with little configuration details and no explanation about what happened or didn't happen).
web_1 | 2015-07-22 23:25:39,985 INFO exited: tmpcreator (exit status 0; not expected)
despite 0 being the default exit code and works exactly the same if I exclusively set exitcodes=0
I'll have to guess your issue is that you set exitcodes=0 but you were surprised by the log message saying the exit status was 0 and that the exit was not expected.
https://github.com/Supervisor/supervisor/blame/3.1.3/docs/configuration.rst#L664-L671
``startsecs``
The total number of seconds which the program needs to stay running
after a startup to consider the start successful. If the program
does not stay up for this many seconds after it has started, even if
it exits with an "expected" exit code (see ``exitcodes``), the
startup will be considered a failure. Set to ``0`` to indicate that
the program needn't stay running for any particular amount of time.
Note: _even if it exits with an "expected" exit code (see exitcodes), the startup will be considered a failure._ It sounds like your process exited with status 0 but didn't stay up for startsecs so it was considered a failure as described. The log probably also has the message "Exited too quickly (process log may have details)".
My workaround was also setting startsecs=0.
This is also suggested in the documentation quoted above.
I'm still getting this error with the newest supervisor update
My only workaround is to set startsecs=0 for my elasticsearch program
Is it correct/proper to set this value? Or would there be another solution?
My concern is because /var/log/supervisor/supervisor.log is being filled with exceessive elasticsearch spawn statements....
2017-01-24 15:02:05,780 INFO success: elastic_search_1 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2017-01-24 15:02:05,797 INFO exited: elastic_search_1 (exit status 0; expected)
2017-01-24 15:02:06,800 INFO spawned: 'elastic_search_1' with pid 10002
2017-01-24 15:02:06,805 INFO success: elastic_search_1 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2017-01-24 15:02:06,819 INFO stopped: elastic_search_1 (terminated by SIGQUIT)
[program:elastic_search_1]
command=/usr/sbin/service elasticsearch start
autostart=true
autorestart=true
stopsignal=QUIT
exitcodes=0
numprocs=1
stdout_logfile=/var/log/supervisor/%(program_name)s-stdout.log
stderr_logfile=/var/log/supervisor/%(program_name)s-stderr.log
startsecs=0
Would love more input on this. Thanks!
@rlam3 - startsecs = 0 is correct.
If it restarts repeatedly you might want to look at the command you're using and ensure it's running the process in the foreground.
@jdeathe should we not be using using service command to start elasticsearch? would you recommend some other way in starting up and have supervisor monitor elasticsearch?
would love to see a smarter/best practice way of monitoring elasticsearch. thanks!
really appreciate your help here.
Found a better way of running elastic search without startsecs=0 and won't be bloating up supervisor.log
[program:elasticsearch_node_1]
command=/usr/share/elasticsearch/bin/elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch
user=elasticsearch
autostart=true
autorestart=true
redirect_stderr=true
numprocs=1
stdout_logfile=/var/log/supervisor/%(program_name)s-stdout.log
stderr_logfile=/var/log/supervisor/%(program_name)s-stderr.log
+1
2018-12-23 16:20:41,100 CRIT Supervisor running as root (no user in config file)
2018-12-23 16:20:41,100 INFO Included extra file "/etc/supervisor/conf.d/tomcat.conf" during parsing
2018-12-23 16:20:41,105 INFO RPC interface 'supervisor' initialized
2018-12-23 16:20:41,105 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-12-23 16:20:41,105 INFO supervisord started with pid 20564
2018-12-23 16:20:42,108 INFO spawned: 'tomcat' with pid 20602
2018-12-23 16:20:42,145 INFO exited: tomcat (exit status 0; not expected)
2018-12-23 16:20:43,147 INFO spawned: 'tomcat' with pid 20679
2018-12-23 16:20:43,159 INFO exited: tomcat (exit status 0; not expected)
2018-12-23 16:20:45,162 INFO spawned: 'tomcat' with pid 20815
2018-12-23 16:20:45,192 INFO exited: tomcat (exit status 0; not expected)
2018-12-23 16:20:48,196 INFO spawned: 'tomcat' with pid 21005
2018-12-23 16:20:48,222 INFO exited: tomcat (exit status 0; not expected)
2018-12-23 16:20:48,223 INFO gave up: tomcat entered FATAL state, too many start retries too quickly
Most helpful comment
Need to ensure startsecs = 0:
http://supervisord.org/configuration.html