Salt: Disabled (but running) systemd service is not restarted with mod_watch

Created on 27 Apr 2018 · 16Comments · Source: saltstack/salt

Description of Issue/Question

When a systemd service is created, started, but not enabled, salt does not restart the service on mod_watch:

----------
          ID: /etc/zabbix/zabbix_agentd.conf
    Function: file.managed
      Result: True
     Comment: File /etc/zabbix/zabbix_agentd.conf updated
     Started: 10:14:52.621831
    Duration: 4298.499 ms
     Changes:
              ----------
              diff:
                  ---
                  <snip>
----------
          ID: zabbix-agent-service
    Function: service.running
        Name: zabbix-agent
      Result: True
     Comment: Service zabbix-agent has been enabled, and is in the desired state
     Started: 10:14:56.922787
    Duration: 1427.353 ms
     Changes:
              ----------
              zabbix-agent:
                  True

Some packages (zabbix, especifically), don't enable their services when installed, so Salt only enables it, instead of enabling+restarting.

This issue is discussed more thoroughly at this PR:
https://github.com/saltstack-formulas/zabbix-formula/pull/77

Setup

Formula: https://github.com/edusperoni/zabbix-formula
states:

zabbix-agent.repo
zabbix-agent.conf

Steps to Reproduce Issue

On a machine that has never installed zabbix-agent, install it using the formula above.

On a machine that has zabbix-agent installed, run systemctl disable zabbix-agent && apt-get purge zabbix-agent -y and then apply the state.

Versions Report

Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 0.8.6
       cherrypy: 3.5.0
       dateutil: 2.2
      docker-py: Not Installed
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: 1.2.3
      pycparser: 2.10
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.10
         locale: UTF-8
        machine: x86_64
        release: 3.16.0-4-amd64
         system: Linux
        version: debian 8.10

Bug Confirmed severity-medium

Source

edusperoni

👍2

Most helpful comment

ping @bandikoot thanks for the example and sorry for the late reply. thanks to stalebot I saw this issue today.

I was finally able to replicate this and I'm just gonna paste my exact steps so i don't have an issue replicating this in the future:

Using this vagrant file:

Vagrant.configure("2") do |config|
  config.vm.box = "debian/buster64"
end

install salt
install unbound - apt-get install unbound
disable the service - systemctl disable unbound
run this state:

/etc/unbound/unbound.conf:
  file.managed:
    - user: unbound
    - contents: |
        include: "/etc/unbound/unbound.conf.d/*.conf"
        #test file changes

unbound.service:
  service.running:
    - enable: True
    - watch:
      - file: /etc/unbound/unbound.conf

salt-call --local state.sls test -ldebug

and now i see the issue. will label as a bug for us to get fixed thanks

Ch3LL on 8 Jan 2020

🎉2

All 16 comments

I'm having a hard time replicating this with a small use case as below:

root@localhost:~# cat /srv/salt/test.sls 
test:
  file.managed:
    - name: /tmp/test
    - watch_in:
      - service: apache2

apache2:
  service.running:
    - enable: True

When running the state it restarts the service:

----------
          ID: apache2
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 14:31:23.317472
    Duration: 2242.205 ms
     Changes:   
              ----------
              apache2:
                  True

Summary for local

I tried capturing what I found in the states you pointed out and applyling it to this service. Is the use case the same?

Ch3LL on 30 Apr 2018

Try running the following commands:

systemctl disable apache2
service apache2 start

I'm not near my computer at this moment, but it should do the trick.

The idea is that systemctl status should report the service as disabled AND running

edusperoni on 30 Apr 2018

I just tripled check that i ran those commands before running the state and I am still seeing the service restart. I even just changed it to zabbix-agent and cannot replicate.

Heres a detailed use case:

root@localhost:~# systemctl disable zabbix-agent
Synchronizing state for zabbix-agent.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d zabbix-agent defaults
insserv: warning: current start runlevel(s) (empty) of script `zabbix-agent' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `zabbix-agent' overrides LSB defaults (0 1 6).
Executing /usr/sbin/update-rc.d zabbix-agent disable
insserv: warning: current start runlevel(s) (empty) of script `zabbix-agent' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `zabbix-agent' overrides LSB defaults (0 1 6).
root@localhost:~# service zabbix-agent start
root@localhost:~# sudo salt-call --local state.sls test
[WARNING ] State for file: /tmp/test - Neither 'source' nor 'contents' nor 'contents_pillar' nor 'contents_grains' was defined, yet 'replace' was set to 'True'. As there is no source to replace the file with, 'replace' has been set to 'False' to avoid reading the file unnecessarily.
local:
----------
          ID: test
    Function: file.managed
        Name: /tmp/test
      Result: True
     Comment: Empty file
     Started: 14:01:50.789805
    Duration: 36.405 ms
     Changes:   
              ----------
              new:
                  file /tmp/test created
----------
          ID: zabbix-agent
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 14:01:51.760442
    Duration: 109.101 ms
     Changes:   
              ----------
              zabbix-agent:
                  True

Summary for local
------------
Succeeded: 2 (changed=2)
Failed:    0
------------
Total states run:     2
Total run time: 145.506 ms

i'm wondering if my state is just not setup in the same way as your use case

Ch3LL on 1 May 2018

Oh, I see now, you're using sysvinit, not systemd. Try it using either Ubuntu 16.04 or Debian 8 (or newer)

edusperoni on 1 May 2018

my understanding is debian ensures they synchronize the services through both sysvinit and systemd. I even confirmed on the minion that it was running the salt/modules/systemd.py file.

Ch3LL on 2 May 2018

also to note i tested on both debian 8 and ubuntu 16

Ch3LL on 2 May 2018

That's weird, it happened to both me and another commenter on the issue I linked, and we were able to replicate it by running:

systemctl disable zabbix-agent
apt-get purge zabbix-agent -y

and then applying the formula. (or applying the formula on a clean system)

Can you try with the formula I linked (my fork is on the correct commit, before the workaround)? Just use zabbix.agent.repo and zabbix.agent.conf and this pillar (just in case):

zabbix:
  lookup:
     version_repo: 3.4

Although it should work the same on debian 8, you could try it on debian 9, which is the exact version I'm using.

edusperoni on 2 May 2018

I can confirm, problem exist. I just checked one more time.
Disable and remove agent:
systemctl disable zabbix-agent.service; systemctl stop zabbix-agent.service; apt-get -y purge zabbix-agent

Install it again:
salt 'testvm*' state.sls zabbix.agent.repo,zabbix.agent.conf

  Name: deb http://repo.zabbix.com/zabbix/3.0/debian jessie main - Function: pkgrepo.managed - Result: Clean Started: - 22:00:40.098412 Duration: 30.905 ms
----------
          ID: zabbix-agent
    Function: pkg.installed
      Result: True
     Comment: The following packages were installed/updated: zabbix-agent
              The following packages were already installed: zabbix-sender
     Started: 22:00:40.129466
    Duration: 7967.602 ms
     Changes:
              ----------
              zabbix-agent:
                  ----------
                  new:
                      1:3.0.16-1+jessie
                  old:
  Name: zabbix - Function: group.present - Result: Clean Started: - 22:00:48.098539 Duration: 1.68 ms
  Name: zabbix - Function: user.present - Result: Clean Started: - 22:00:48.100595 Duration: 14.298 ms
  Name: /var/log/zabbix - Function: file.directory - Result: Clean Started: - 22:00:48.469282 Duration: 0.6 ms
  Name: /var/run/zabbix - Function: file.directory - Result: Clean Started: - 22:00:48.470001 Duration: 0.519 ms
----------
          ID: /etc/zabbix/zabbix_agentd.conf
    Function: file.managed
      Result: True
     Comment: File /etc/zabbix/zabbix_agentd.conf updated
     Started: 22:00:48.470621
    Duration: 99.676 ms
     Changes:
              ----------
              diff:
                  ---
                  +++
                  @@ -1,3 +1,4 @@
                  +# Managed by saltstack. Do not edit this file.
                  ... snip ...
                   # TLSPSKFile=
                  +
----------
          ID: zabbix-agent
    Function: service.running
      Result: True
     Comment: Service zabbix-agent has been enabled, and is in the desired state
     Started: 22:00:48.570671
    Duration: 173.452 ms
     Changes:
              ----------
              zabbix-agent:
                  True

zabbix-agent log file

 12498:20180502:220047.523 agent #2 started [listener #1]
 12501:20180502:220047.525 active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to [[127.0.0.1]:10051]: [111] Connection refused)

It's still using 127.0.0.1 as server, restart was not performed.

I will try to replicate this issue with simple test state later. May be there is some race condition.

hatifnatt on 2 May 2018

Something weird is going on with this state.
From what I observed, when the service is dead and needs to be started AND enabled it's only gets started. The next run enables the service.

zerthimon on 15 Jun 2018

are any of you seeing this with any other services or only zabbix? Also are you certain this was working on previous salt versions?

Ch3LL on 21 Jun 2018

I'm seeing this with docker.service, not sure if it's salt, docker or systemd problem :)

zerthimon on 25 Jun 2018

and do you know if it was working on previous versions?

Ch3LL on 28 Jun 2018

Hello, @Ch3LL!

I'd try to bring this up again. The issue is reproducable and can be simplified to the description below.

Situation 1

Assume, we have an .sls with:

config file's state:

/etc/unbound/unbound.conf:
  file.managed:
    - require:
      - pkg: unbound
    - user: unbound
    - contents: |
        include: "/etc/unbound/unbound.conf.d/*.conf"
        #test file changes

service state with a watch requisite:

unbound.service:
  service.running:
    - enable: True
    - require:
      - cmd: unbound-systemctl-daemon-reload
      - pkg: unbound
    - watch:
      - file: /etc/systemd/system/unbound.service
      - file: /etc/unbound/unbound.conf
      - file: /etc/unbound/unbound.conf.d

config file is going to have changes
systemd-service being currently disabled, but running

In this case on applying state we get:

----------
          ID: /etc/unbound/unbound.conf
    Function: file.managed
      Result: True
     Comment: File /etc/unbound/unbound.conf updated
     Started: 20:18:22.972941
    Duration: 14.977 ms
     Changes:   
              ----------
              diff:
                  --- 
                  +++ 
                  @@ -1 +1,2 @@
                   include: "/etc/unbound/unbound.conf.d/*.conf"
                  +#test file changes
...
----------
          ID: unbound.service
    Function: service.running
        Name: unbound
      Result: True
     Comment: Service unbound has been enabled, and is in the desired state
     Started: 20:18:23.218767
    Duration: 531.541 ms
     Changes:   
              ----------
              unbound:
                  True

--------------------------------

$ systemctl status unbound
● unbound.service - Unbound recursive Domain Name Server
   Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/unbound.service.d
           └─50-insserv.conf-$named.conf, 50-unbound-$named.conf
   Active: active (running) since Thu 2019-08-08 16:56:43 MSK; 3h 21min ago
 Main PID: 1133 (unbound)
   CGroup: /system.slice/unbound.service
           └─1133 /usr/sbin/unbound -c /etc/unbound/unbound.conf

and the service hasn't been restarted (take a look on _active since_) despite the watch for the config.

Situation 2

When the service is already enabled, but we still have changes in the watched config file, applying the state results in the service's restart correctly:

----------
          ID: /etc/unbound/unbound.conf
    Function: file.managed
      Result: True
     Comment: File /etc/unbound/unbound.conf updated
     Started: 20:29:21.331920
    Duration: 6.818 ms
     Changes:   
              ----------
              diff:
                  --- 
                  +++ 
                  @@ -1,2 +1 @@
                   include: "/etc/unbound/unbound.conf.d/*.conf"
                  -#test file changes
...
----------
          ID: unbound.service
    Function: service.running
        Name: unbound
      Result: True
     Comment: Service restarted
     Started: 20:29:21.638848
    Duration: 377.89 ms
     Changes:   
              ----------
              unbound:
                  True

--------------------------------

$ systemctl status unbound
● unbound.service - Unbound recursive Domain Name Server
   Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/unbound.service.d
           └─50-insserv.conf-$named.conf, 50-unbound-$named.conf
   Active: active (running) since Thu 2019-08-08 20:31:09 MSK; 37s ago
  Process: 13367 ExecStart=/usr/sbin/unbound -c /etc/unbound/unbound.conf (code=exited, status=0/SUCCESS)
  Process: 13360 ExecStartPre=/usr/sbin/unbound-checkconf (code=exited, status=0/SUCCESS)
  Process: 13356 ExecStartPre=/usr/sbin/unbound-anchor -a /var/lib/unbound/root.key -v (code=exited, status=0/SUCCESS)
 Main PID: 13369 (unbound)
    Tasks: 1
   Memory: 5.6M
      CPU: 36ms
   CGroup: /system.slice/unbound.service
           └─13369 /usr/sbin/unbound -c /etc/unbound/unbound.conf

I believe, service state should trigger systemd-service's enabling and restarting (or starting if it was inactive) in Situation 1. It also seems to me, that the issue is not somehow service-related. We're currently using 2018.3.2 version.

Could you take another look on it, please?

bandikoot on 8 Aug 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

stale[bot] on 8 Jan 2020