Supervisor: Upgrading Supervisor causes services to be stopped (or restarted)

Created on 22 May 2018 · 7Comments · Source: Supervisor/supervisor

I am running Supervisord on some Ubuntu 16.04 servers to run mainly Java processes. Some of these services have autostart=false in their supervisor/conf.d conf file to avoid starting them automatically on server restarts.

This morning Ubuntu's unattended-upgrade automatically upgraded Supervisor to 3.2.0-2ubuntu0.2 on some of the servers. After the upgrade the upgrade process restarted Supervisor. The restart also stopped all Supervisor services running on the servers, and those services that had autostart=false were not restarted. Even if autostart had been true and the services had been automatically restarted, the upgrade would have caused some downtime for them.

Should upgrading Supervisor cause the restart of all the services Supervisor is running? Is there a way to upgrade Supervisor without causing downtime for the services? Is there a way to configure a service to not start on server restart, but start when Supervisor is restarted because it has been upgraded?

The issue might not be with Supervisor, but with the Ubuntu packaging, but I wasn't sure so I posted here first. Apologies if this has been discussed before.

packaging question

Source

jkytomaki

👍5

Most helpful comment

Ha! You just helped me figure out why 50% of my infrastructure restarted today!

vvucetic on 22 May 2018

👍3

All 7 comments

Ha! You just helped me figure out why 50% of my infrastructure restarted today!

vvucetic on 22 May 2018

👍3

Mystery solved. We were ripping out our hairs out for the last 2 hours.

yegors on 22 May 2018

👍1

I also experienced interuptions due to this brazen unattended upgrade from Ubuntu. This is my resolution. Simply disable the unattended upgrades config file. It's read by a cronjob.

Ansible Role snippet
```- name: disable auto upgrades
copy:
src: /usr/share/unattended-upgrades/20auto-upgrades-disabled
dest: /etc/apt/apt.conf.d/20auto-upgrades
remote_src: True

cryptmin on 22 May 2018

I've faced this error too, but have some progress about this.

1. Workaround

You can disable unattended upgrade only for supervisor by editing /etc/apt/apt.conf.d/50unattended-upgrades like this:

// List of packages to not update (regexp are supported)
Unattended-Upgrade::Package-Blacklist {
    "supervisor";
};

However, this workaround is not recommend due to lack of some security improvement (although somewhat better than totally disabling unattended upgrade as @cryptmin suggested).

2. Root cause

As far as I can investigate, root causes are:

supervisor may take too long to die after receiving SIGTERM.
- It depends on the services you managed by supervisor.
unattended upgrade executes "post installation script" regardless of the state of existing state.
- Simply because package installation and service management is separate thing.
supervisor cannot "start" if it's already started (=running).

So my proposal to fix the problem is to modify "post installation script" so that it _restarts_ the supervisor service _if it's already running_.

I couldn't find the "post installation script" (must be named as supervisor.postinst?).
How can I fix this? Is there a maintainer of apt package?

Detail of investigation

In some cases, unattended upgrade upgrades supervisor successfully (of course with some downtime, but it's few seconds). See my /var/log/supervisor/supervisord.logs.

Successful case:

...
2018-05-22 06:34:09,914 WARN received SIGTERM indicating exit request
2018-05-22 06:34:09,922 INFO waiting for nginx to die
2018-05-22 06:34:09,923 INFO stopped: nginx (exit status 0)
2018-05-22 06:34:11,429 CRIT Supervisor running as root (no user in config file)
...

In this case, first unattended upgrade sent SIGTERM, and 1.5s later launched supervisor.

Failed case (different server):

...
2018-05-22 06:32:24,505 WARN received SIGTERM indicating exit request
2018-05-22 06:32:24,506 INFO waiting for cloudsql, airflow-worker to die
2018-05-22 06:32:27,551 INFO waiting for cloudsql, airflow-worker to die
2018-05-22 06:32:30,551 INFO waiting for cloudsql, airflow-worker to die
2018-05-22 06:32:34,393 INFO waiting for cloudsql, airflow-worker to die
2018-05-22 06:32:36,326 WARN killing 'airflow-worker' (6392) with SIGKILL
2018-05-22 06:32:36,332 INFO stopped: airflow-worker (terminated by SIGKILL)
2018-05-22 06:32:36,334 INFO stopped: cloudsql (terminated by SIGTERM)

(no log after this)

In this case, supervisor takes too long (more than 10s) to die.

In the meantime, unattended upgrade complained that failed to run "post installation" command (see /var/log/unattended-upgrades/unattended-upgrades-dpkg_2018-05-22_*.log).

(Reading database ... 105879 files and directories currently installed.)
Preparing to unpack .../supervisor_3.0b2-1ubuntu0.1_all.deb ...
Stopping supervisor: supervisord.
Unpacking supervisor (3.0b2-1ubuntu0.1) over (3.0b2-1) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up supervisor (3.0b2-1ubuntu0.1) ...
Starting supervisor: invoke-rc.d: initscript supervisor, action "start" failed.
dpkg: error processing package supervisor (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 supervisor
Error in function:

You can find "post installation" script in /var/lib/dpkg/info/supervisor.postinst, but the most core part is /etc/init.d/supervisor start.

Some maybe-related link

Installation halts if Supervisor is already running, Ubuntu 14.04 · Issue #3099 · certbot/certbot
- https://github.com/certbot/certbot/issues/3099
#877086 - restart in init script only stops supervisor - Debian Bug report logs
- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877086

tmshn on 23 May 2018

@tmshn Thanks for the workaround! Regarding the root cause, you write:

In some cases, unattended upgrade upgrades supervisor successfully (of course with some downtime, but it's few seconds)

Still, causing "few seconds" downtime for possibly production services because of updating Supervisor to me seems unexpected. Does updating systemd/Upstart/other similar systems cause them to restart all services they are running? Is it necessary?

jkytomaki on 23 May 2018

I don't know if it actually does or is necessary, but seems not so weird for service managers to respawn child processes on an upgrade.
IMHO, a single long-running process does not provide you a high availability; as a better-designed architecture, it's worth considering to have multiple nodes (which also makes deploy process clearer).

tmshn on 24 May 2018

This morning Ubuntu's unattended-upgrade automatically upgraded Supervisor to 3.2.0-2ubuntu0.2 on some of the servers. After the upgrade the upgrade process restarted Supervisor.

This project, Supervisor, only publishes Python packages to PyPI. Our packages do not include any operating system integration such as init scripts. The packages installed by distribution package managers like apt and rpm are created by other people that are not involved with the Supervisor project. Whatever mechanism causes supervisord to automatically receive the shutdown signal (or supervisorctl shutdown command) is part of the distribution package. Please report this issue to the authors of the package you are installing.

I don't know if it actually does or is necessary, but seems not so weird for service managers to respawn child processes on an upgrade.

It is not possible to upgrade Supervisor without stopping the subprocesses running under supervisord. Changing this would require major architectural changes to supervisord and is out of scope for this project.