salt.states.service fails to recognize init.d/sysv services on systemd systems

Created on 10 Apr 2014 · 13Comments · Source: saltstack/salt

Problem/Example

This is a simple Salt state to enable and start jenkins service:

# jenkins.sls
activate_jenkins_service:
    service.running:
        - name: jenkins
        - enable: True

Official Jenkins installation on RedHat/CentOS/Fedora uses init.d/sysv scripts.

Manually enabling and starting through init.d/sysv scripts perfectly works even on systemd-based Fedora 20:

systemctl enable jenkins                                                                                                                                                                             
jenkins.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig jenkins on

systemctl start jenkins

On the other hand, Salt fails to execute the state:

salt-call -l all state.sls jenkins
...
          ID: activate_jenkins_service
    Function: service.running
        Name: jenkins
      Result: False
     Comment: The named service jenkins is not available
     Changes:   
...

Cause

The problem stems from the fact that Salt executes systemctl list-unit-files command which only lists systemd unit files excluding init.d/sysv scripts:

...
[INFO    ] Executing state service.running for jenkins
[INFO    ] Executing command 'systemctl --full list-unit-files | col -b' in directory '/root'
...

Because Salt doesn't see required jenkins service in the list of unit files, it doesn't pass next execution to systemctl for enabling/starting/... the service and does't let systemctl to tell "authoritatively" about actual existence of the service.

Proposal

This issue is very closely related to issue #8444 (as far as the proposed solution is concerned) and described in this comment.

Rather than executing any pre-validation logic (i.e. finding service name somewhere), Salt should rely on systemd (and its systemctl command) to determine whether states to enable/start/... the service failed or succeeded. In other words, Salt should execute systemctl with any arbitrary service name optimistically and report result of the execution instead of trying to predict its outcome.

Workaround

Again, see it in issue #8444.

Versions

Master and minion is the same host with Fedora 20 x86_64:

 salt --versions-report
           Salt: 2014.1.1
         Python: 2.7.5 (default, Feb 19 2014, 13:47:28)
         Jinja2: 2.7.1
       M2Crypto: 0.21.1
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.6.1
         PyYAML: 3.10
          PyZMQ: 13.0.2
            ZMQ: 3.2.4

Bug Execution Module P3 Platform severity-medium stale

Source

uvsmtid

Most helpful comment

Running, CentOS 7, Salt version 2015.8.8.2. Cassandra is affected by this as well. As a work around, running this kludge works:

cassandra_kludge:
  cmd.run:
    - name: systemctl enable cassandra
    - unless: systemctl -a | grep cassandra

cassandra_service:
  service.running:
    - name: cassandra
    - init_delay: 10
    - require:
        - cmd: cassandra_kludge

thequailman on 28 Apr 2016

👍3

All 13 comments

I agree with what you're saying here. Let's try and get this in.

cachedout on 10 Apr 2014

@mtorromeo says this should be fixed by #11921. @uvsmtid can you verify?

cachedout on 14 Apr 2014

@cachedout and @mtorromeo Thanks for updates!

I cherry-picked both 90bece1 and 9617d33 on top of 2014.1 (latest develop had some unrelated issues) in my virtualenv.

#8444 looks fixed

I used similar state mentioned there in its example

activate_vpn_service:
    service.running:
        - name: [email protected]
        - enable: True

Indeed, commit 9617d33 handles @ in systemd unit names to make it work.
And while it still uses systemctl --full list-units command (see problems for init.d/sysv service next), parameterized services were listed in all my tries.

#11900 (this issue) still have problems

See example of Jenkins service state in the beginning of this issue.
After variations with enable/disable start/stop I can conclude that it doesn't work in general case. And here is why...

The code after commit 90bece1 still uses command systemctl --full list-units which simply does not list init.d/sysv until they are started on the system (only when they are started: enable/disable won't affect anything).
For example, start jenkins service manually and try to list it:

sudo systemctl start jenkins
systemctl --full list-units | grep jenkins
jenkins.service
# OK

Then stop jenkins service manually and execute:

sudo systemctl stop jenkins
systemctl --full list-units | grep jenkins
# ERROR: no output captured by grep

Although it seems more like an issue with systemd (I have even updated it here), the fastest fix is still possible through salt only. The argument is that systemctl --full list-units is not required to manage service.

uvsmtid on 16 Apr 2014

@uvsmtid This is great feedback, thank you! I'll go ahead and close #8444 then and we'll keep working on this one.

cachedout on 16 Apr 2014

This is still broken in 2014.1.10 on Fedora-20. While there is a kludgy workaround, this really does need to be fixed. the workaround, for those in a CICD environment who need to clear out any blocks in their pipeline, is ugly but works (this example is for Centrify, which also uses sysv init-style files but is managable with systemd under FC20):

centrify-service:
  service.running:
    - name: centrifydc
    - enable: True
    - reload: True
    - watch:
      - file: /etc/centrifydc/centrifydc.conf
    - require:
      - pkg: centrify-packages
      - file: centrify-config
      - cmd: centrify-adjoin
{%- if salt['grains.get']('osfinger', 'undefined') == 'Fedora-20' %}
    - provider: service
{%- endif %}

smithjm on 20 Aug 2014

Hello,

A little update for a strange thing :

salt-call service.available registrator.service
[INFO    ] Executing command 'systemctl --all --full --no-legend --no-pager list-units | col -b' in directory '/root'
[INFO    ] Executing command 'systemctl --full --no-legend --no-pager list-unit-files | col -b' in directory '/root'
[INFO    ] Legacy init script: "README".
[INFO    ] Legacy init script: "functions".
[INFO    ] Legacy init script: "netconsole".
[INFO    ] Legacy init script: "network".
local:
    False

But :

salt-call service.available registrator
[INFO    ] Executing command 'systemctl --all --full --no-legend --no-pager list-units | col -b' in directory '/root'
[INFO    ] Executing command 'systemctl --full --no-legend --no-pager list-unit-files | col -b' in directory '/root'
[INFO    ] Legacy init script: "README".
[INFO    ] Legacy init script: "functions".
[INFO    ] Legacy init script: "netconsole".
[INFO    ] Legacy init script: "network".
local:
    True

Why don't support the ".service" ? On systemd both are working :/

(and it's make me a little headache to find this...)

LordFPL on 3 Apr 2015

This happens to me when using hadoop-formula's hadoop.hdfs state. It starts three different services. The first service started by the highstate during a fresh run is not found. The rest of the services are found and function as normal. A second highstate run proceeds normally. This possibly indicates that Salt is reloading systemd later in the process than needed.

State:

{% if hdfs.is_namenode or hdfs.is_datanode %}
hdfs-services:
  service.running:
    - enable: True
    - names:
{% if hdfs.is_namenode %}
      - hadoop-secondarynamenode
      - hadoop-namenode
{% endif %}
{% if hdfs.is_datanode %}
      - hadoop-datanode
{% endif %}

{% endif %}

I also extend hdfs-services with provider: debian_service. I've tried it with the default for Debian Jessie (provider: systemd) with same results.

/var/log/salt/minion:

[INFO    ] Executing command 'service hadoop-namenode status' in directory '/root'
[ERROR   ] Command 'service hadoop-namenode status' failed with return code: 3
[ERROR   ] output: * hadoop-namenode.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)
[INFO    ] Executing command 'service hadoop-namenode start' in directory '/root'
[ERROR   ] Command 'service hadoop-namenode start' failed with return code: 6
[ERROR   ] output: Failed to start hadoop-namenode.service: Unit hadoop-namenode.service failed to load: No such file or directory.

Versions report:

                  Salt: 2015.5.0
                Python: 2.7.9 (default, Mar  1 2015, 12:57:24)
                Jinja2: 2.7.3
              M2Crypto: 0.21.1
        msgpack-python: 0.4.2
          msgpack-pure: Not Installed
              pycrypto: 2.6.1
               libnacl: Not Installed
                PyYAML: 3.11
                 ioflo: Not Installed
                 PyZMQ: 14.4.0
                  RAET: Not Installed
                   ZMQ: 4.0.5
                  Mako: 1.0.0
 Debian source package: 2015.5.0+ds-1~bpo8+1

I would like to debug this further but haven't debugged Salt much since I switched from Salt SSH to Master/Minion setup. Suggestions?

blbradley on 30 Jun 2015

Running, CentOS 7, Salt version 2015.8.8.2. Cassandra is affected by this as well. As a work around, running this kludge works:

cassandra_kludge:
  cmd.run:
    - name: systemctl enable cassandra
    - unless: systemctl -a | grep cassandra

cassandra_service:
  service.running:
    - name: cassandra
    - init_delay: 10
    - require:
        - cmd: cassandra_kludge

thequailman on 28 Apr 2016

👍3

This even made me update the bug in systemd again.

My test still confirm that there is no known way by systemd to list disabled services based on init.d/sysv scripts. The best current solution would be enabling/starting/stopping/disabling service and checking error code returned by systemctl - it will fail if there is no such service, but it will succeed if there is one without need to know upfront about it.

uvsmtid on 28 Apr 2016

I have discovered somewhat similar problem on Debian Jessie, when I deploy new sysv script and try to use service.running state. I get:

2016-11-28 13:59:10,206 [salt.state       ][INFO    ][1092] Running state [pgbouncer-web-login] at time 13:59:10.205771
2016-11-28 13:59:10,207 [salt.state       ][INFO    ][1092] Executing state service.running for pgbouncer-web-login
2016-11-28 13:59:10,209 [salt.loaded.int.module.cmdmod][INFO    ][1092] Executing command ['systemctl', 'status', 'pgbouncer-web-login.service', '-n', '0'] in directory '/root'
2016-11-28 13:59:10,229 [salt.loaded.int.module.cmdmod][DEBUG   ][1092] output: * pgbouncer-web-login.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)
2016-11-28 13:59:10,230 [salt.state       ][ERROR   ][1092] The named service pgbouncer-web-login is not available

Whole idea is to create /etc/init.d/pgbouncer-web-login daemon which is modification (copy) of /etc/init.d/pgbouncer (pgbouncer does not yet support systemd), but with different ports, configs, etc. because of need to have multiple pgbouncer pools, but that's details.

I had no problem on Wheesy, but on Jessie with systemd it seems that I have to execute systemctl daemon-reload (using module.wait -> cmd.run) to make new init.d script "visible" and service.running to work.

But does that mean that service.running should always reload systemd configuration? Would it be.. "bad" in any case?

Talkless on 28 Nov 2016

Still see this same issue:
Salt Version:
Salt: 2016.3.5

Dependency Versions:
cffi: 0.8.6
cherrypy: Not Installed
dateutil: 1.5
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: 0.21.1
Mako: 0.8.1
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: 1.2.5
pycparser: 2.14
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.5 (default, Sep 15 2016, 22:37:39)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: centos 7.2.1511 Core
machine: x86_64
release: 4.4.52-2.el7.centos.x86_64
system: Linux
version: CentOS Linux 7.2.1511 Core

Using a very simple file.managed + service.running/enable

vxlan SysV service file:

/etc/init.d/vxlan:
file.managed:
- source: salt://services/vxlan/vxlan
- user: root
- group: root
- mode: 755
- require_in:
- service: vxlan

vxlan:
service.running:
- enable: True

If I chkconfig --add vxlan and then re-run these states, no problem. BTW, this appears to be a regression as I don't recall having this issue in 2016.3.4. I haven't tested 2016.11.3, which came out today, as we're not quite ready to move to that yet. Although, I'm inclined to just change this to a systemd service given I have full control over this one regardless of the bug in salt.

seanjnkns on 1 Mar 2017

I ran into the same issue with cassandra init service on centos7. @gtmanfred suggested using the provider option for service.running which fixed the issue for me.

https://docs.saltstack.com/en/latest/ref/states/providers.html

```
start cassandra:
service.running:
- name: cassandra
- provider: rh_service

devopsprosiva on 23 Jan 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.