salt-minion can't restart itself

Created on 25 Jun 2013  路  47Comments  路  Source: saltstack/salt

Hi,

When I use salt service module or a watch statement on minon configuration file to restart salt-minion service, it ends up running two instances which breaks the communication between master and minion. (I recognized that PID is different after restart command)

This is required in order to be able to do mass changes on minions like editing mine configuration.

Below you can see how it behaves;

[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root     21507     1  7 16:43 ?        00:00:00 /usr/bin/python /usr/bin/salt-minion -d

root@salt01:# salt mar-pre-ord-web-03 service.restart salt-minion

[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root     21583     1  1 16:44 ?        00:00:00 /usr/bin/python /usr/bin/salt-minion -d
root     21601     1 25 16:44 ?        00:00:00 /usr/bin/python /usr/bin/salt-minion -d

This was the sls formula, which also doesn't work.

salt-minion:
  pkg:
    - installed
    - name: salt-minion
  file:
    - managed
    - name: /etc/salt/minion
    - template: jinja
    - source: salt://common/files/minion.jinja
    - require:
      - pkg: salt-minion
  service:
    - running
    - enable: True
    - require:
      - pkg: salt-minion
    - watch:
      - file: salt-minion

test with salt-call

[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root     22690     1  2 16:59 ?        00:00:00 /usr/bin/python /usr/bin/salt-minion -d
root     22755  2229  0 17:00 pts/0    00:00:00 grep salt
[root@mar-pre-ord-web-03 marconi]# salt-call  service.restart salt-minion
[INFO    ] Configuration file path: /etc/salt/minion
[INFO    ] Executing command '/sbin/runlevel' in directory '/root'
[INFO    ] Executing command '/sbin/chkconfig --list' in directory '/root'
[INFO    ] Executing command '/sbin/service salt-minion restart' in directory '/root'

^C (hangs here)
Exiting gracefully on Ctrl-c
[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root     22788     1 19 17:00 ?        00:00:00 /usr/bin/python /usr/bin/salt-minion -d
root     22821  2229  0 17:00 pts/0    00:00:00 grep salt

If I restart salt-minion on the server it works fine.

[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root     22253     1 18 16:56 ?        00:00:00 /usr/bin/python /usr/bin/salt-minion -d

[root@mar-pre-ord-web-03 marconi]# service salt-minion restart
Stopping salt-minion daemon:                               [  OK  ]
Starting salt-minion daemon:                               [  OK  ]

[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root     22302     1 38 16:56 ?        00:00:00 /usr/bin/python /usr/bin/salt-minion -d
Bug Execution Module P1 Platform severity-medium

Most helpful comment

Take a look at something we just merged a few minutes ago: https://github.com/saltstack/salt/pull/32593

All 47 comments

Can you show me the results of state.highstate or state.sls with that sls formula? Your test with salt-call was just a service.restart call again.

We definitely want the salt-minion to be able to restart itself, so thanks for the report. What's the output of salt --versions-report on the master, and salt-call --versions-report on the minion?

[root@mar-pre-ord-web-03 tmp]# salt-call --versions-report
           Salt: 0.15.3
         Python: 2.6.6 (r266:84292, Feb 22 2013, 00:00:18)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 3.2.3
root@salt01:/srv/salt/marconi/base/pillar# salt --versions-report
           Salt: 0.15.3
         Python: 2.7.4 (default, Apr 19 2013, 18:28:01)
         Jinja2: 2.6
       M2Crypto: 0.21.1
 msgpack-python: 0.2.0
   msgpack-pure: Not Installed
       pycrypto: 2.6
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 2.2.0

Right off the bat, I would recommend you update your master's ZMQ to ZMQ 3+. We've had a _lot_ of problems come from ZMQ 2, and they tend to manifest in weird ways. In this case, it's unlikely that its a ZMQ issue, as you have updated ZMQ on the minion, but if you could upgrade and then test again it would make me feel better. =)

Sure, unfortunately same result;

root@salt01:/srv/salt# salt --versions-report
           Salt: 0.15.3
         Python: 2.7.4 (default, Apr 19 2013, 18:28:01)
         Jinja2: 2.6
       M2Crypto: 0.21.1
 msgpack-python: 0.2.0
   msgpack-pure: Not Installed
       pycrypto: 2.6
         PyYAML: 3.10
          PyZMQ: 13.0.0
            ZMQ: 3.2.2
[root@mar-pre-ord-web-03 tmp]# salt-call --versions-report
           Salt: 0.15.3
         Python: 2.6.6 (r266:84292, Feb 22 2013, 00:00:18)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 3.2.3

executing the SLS

root@salt01:~# salt mar-pre-ord-web-03 state.sls minion -v
Executing job with jid 20130626043517452244
-------------------------------------------

mar-pre-ord-web-03:
    Minion did not return
[root@mar-pre-ord-web-03 tmp]# ps -fe|grep salt
root     25855     1  0 04:35 ?        00:00:01 /usr/bin/python /usr/bin/salt-minion -d
root     25939     1  0 04:35 ?        00:00:00 /usr/bin/python /usr/bin/salt-minion -d

minion/init.sls

salt-minion:
  pkg:
    - installed
    - name: salt-minion
  file:
    - managed
    - name: /etc/salt/minion
    - template: jinja
    - source: salt://minion/files/minion.jinja
    - require:
      - pkg: salt-minion
  service:
    - running
    - enable: True
    - require:
      - pkg: salt-minion
    - watch:
      - file: salt-minion

Thanks. I will try to find time to test this, see if I can reproduce it.

since salt-minion is restarted, the master loses connection too, reporting no progress on other states applied later.

Logically, I don't know if you can ask the minion to make sure it is installed and running. What happens if you drop 'running' from the sls and try to restart?

Hey all,

I have run into the same issue. However, my situation may be slightly different. I have a formula that ends with installing a number of packages from a repo that was previously added by a state. The installation fails 100% of the time until the salt-minion service is restarted. Unfortunately, the minion must be restarted from the CLI.

Here's the break down:

OS: CentOS 6.4
Master:

[root@master salt]# salt --versions-report
           Salt: 0.16.4
         Python: 2.6.6 (r266:84292, Jul 10 2013, 22:48:45)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 13.1.0
            ZMQ: 3.2.2

Minion:

[root@minion-3 ~]# salt-call --versions-report
           Salt: 0.16.4
         Python: 2.6.6 (r266:84292, Jul 10 2013, 22:48:45)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 13.1.0
            ZMQ: 3.2.2

Restarting the minion:

[root@master salt]# salt 'minion-3' service.restart salt-minion

Salt processes before restart:

[root@minion-3 ~]# pgrep salt-minion
24686

Processes on minion while waiting for return on master:

[root@minion-3 ~]# pgrep salt-minion
25018
25035

Restarting minion:

[root@minion-3 ~]# service salt-minion restart
Stopping salt-minion daemon:                               [  OK  ]
Starting salt-minion daemon:                               [  OK  ]
[root@minion-3 ~]# 

And finally a return on the master:

minion-3:
    True
[root@master salt]# 

I upgraded Pyzmq based on the discussion here, but I'm still unable to restart the minion.

In my case restart after minion upgrade from 1.16.4-1 to 1.7.1-1 stops the minion process and no minion is running (CentOS 6.4).

Master:

           Salt: 0.17.1
         Python: 2.6.6 (r266:84292, Jul 10 2013, 22:48:45)
         Jinja2: unknown
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 3.2.3
$ salt 'test*' pkg.install salt-minion
test.vm.xyz.com:
    ----------
    salt:
        ----------
        new:
            0.17.1-1.el6
        old:
            0.16.4-1.el6
    salt-minion:
        ----------
        new:
            0.17.1-1.el6
        old:
            0.16.4-1.el6
    sshpass:
        ----------
        new:
            1.05-1.el6
        old:
$ salt 'test*' pkg.version salt-minion
test.vm.xyz.com:
    0.17.1-1.el6
$ salt 'test*' service.restart salt-minion
test.vm.xyz.com:
    True

I use this technique to restart salt-minion.
http://www.rackeroz.com/2013/10/how-to-restart-salt-minion.html

mickep76 came up with a genius way of installing a new salt-minion or salt-master and restarting salt from salt. This can also be used for when a new schedule is applied to the master. I think this should be in the documentation and/or FAQ.
https://github.com/saltstack/salt/issues/7997

I know this topic is from 2013 but the bug still doesn't seem to be fixed? Because my minions won't restart either when being told so by the master.

master

[root@master]# salt-call --versions-report
           Salt: 2015.2.0rc1
         Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
         Jinja2: unknown
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
        libnacl: Not Installed
         PyYAML: 3.10
          ioflo: Not Installed
          PyZMQ: 14.3.1
           RAET: Not Installed
            ZMQ: 3.2.4
           Mako: Not Installed

minion

[root@minion]# salt-call --versions-report
           Salt: 2014.1.10
         Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
         Jinja2: unknown
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 14.3.1
            ZMQ: 3.2.4

As you can see the masters and minions ZMQ are both 3+. I just get the message

    Minion did not return. [No response]

@I3olle What command did you run to restart the minion?

I ran

salt 'minion_id' cmd.run "service salt-minion restart"

Is there a better, by salt implemented, method?

You can use the service module.

Thank you @thedrow. I ran

root@master_id ~ $ salt 'minion_id' service.restart salt-minion
minion_id:
    Minion did not return. [No response]

It doesn't seem to change the outcome though

@I3olle using salt to restart a minion's salt-minion service is probably not returning because, well, it restarted the salt-minion, so it dropped the connections. I heard that this can make minions stop connecting to the master.

So better to do something like for i in $(sudo salt-key | grep -); do echo ${i};ssh ${i} "sudo service salt-minion restart"; done.

I'm hitting this "bug" too. So I have some thoughts on this.

What's is the current expected behavior?

What about adding a specific state/module to trigger a salt-minion restart after the entire job execution?
It would just output a "changes: salt-minion will be restarted after this job execution", continue the job execution, returns the job status to the master, and only then, the minion restart itself.

Or, salt-minion could work as sshd do: a new process is created to execute the current job, and it kills itself only at the end of the job. A salt-minion restart would not touch existing job processes and not break minion-master communication. I think it's more difficult to do, but more elegant.

@dr4Ke I found that doing like this:

salt '*' at.at 'now + 1 minute' 'service salt-minion restart' tag=salt-restart

is the most reliable way to restart minions. But only when you have (or want to have) atd installed everywhere.

I had success using a supervisor like systemd or supervisord. It starts salt up after an upgrade with the added benefit of restarting after an unexpected crash, if that were to occur.

Using atd or anacron or a sleep 300 is a good way to do it on Linux, and on Windows using a scheduled task works pretty well. I also like somebody's suggestion of using salt-call --local service.restart salt-minion as that uses salt but salt-call doesn't require the service or communication with the master.

This is a long living issue, but it's also some kind of well-known limitation of Salt.
We got it covered in documentation: https://docs.saltstack.com/en/latest/faq.html#what-is-the-best-way-to-restart-a-salt-daemon-using-salt
@ozgurakan @basepi I believe this issue could be closed.

The reason that we have never closed this is because it would be useful if we could harden the restart routines such that a minion could restart itself without at or similar.

@basepi I see that we have at least couple of duplicates for this issue: #18835 #7997.
Maybe it would be nice to keep just a single issue relevant for the topic?

@vutny #18835 has a different underlying issue. The restart is just a workaround. I commented on #7997 to see if anyone thinks it should remain open.

This should be be doable easily enough now with cmd.run_bg from #6691.

Tested this on CentOS 6 and worked fine:

salt myminion cmd.run_bg 'sleep 10; service salt-minion restart'

Take a look at something we just merged a few minutes ago: https://github.com/saltstack/salt/pull/32593

@cachedout - nice! Will that make it in time for Boron or is that going to have to wait for a point release?

That's in develop right now, so it's scheduled for Cabon. I'd be willing to consider a backport. Is it something important for you?

@cachedout - With the workaround via cmd_bg which is in Boron, we won't need a backport.

@sjmh Sounds good. Thanks.

@ozgurakan have you experimented with the minion config parameter master_tries? If you set it to -1 it should retry connecting infinitely rather than giving up on first attempt and becoming a dead minion (if I've understood the documentation properly).

Hi, i had been trying to restart a windows minion for ages, and finally worked out a way to get it to work everytime:

salt '*' cmd.run_bg 'Restart-Service salt-minion' shell=powershell

@Trouble123 but not work for linux,it return
Minion did not return. [Not connected]

There is a universal recipe for all platforms supported by Salt:

salt '*' cmd.run_bg 'salt-call --local service.restart salt-minion'

It should work on Windows too even without shell=powershell parameter, as I figured out from the docs.
Could somebody confirm?

If that does work, we should definitely wrap it up into its own exec module function.

@cachedout What about the changes in PR #32593? I think it's possible to specify salt-call --local service.restart salt-minion as the minion_restart_command command and make it default. But again, not completely sure if that will work on Windows.

At least, for now we can update the FAQ for 2016.3 with better recipe.

@vutny I can confirm that your solution works fine on Ubuntu 16.04 LTS and on Windows Server 2012 R2 with one small exception: On Windows, the complete path to salt-call has to be specified. This is strange because C:\salt is in the path grain, but maybe cmd.run_bg does not inherit this path.

Anyway, I think that the solution proposed by @vutny is the one which works best on most platforms and thus the FAQ should be updated accordingly.

I use the following logic for triggering a restart from an SLS file:

minion_config:
  # The logic for updating the minion's configuration goes here...

minion_restart:
  module.run:
    - name: cmd.run_bg
{% if grains['os'] == 'Windows' %}
    - cmd: 'C:\salt\salt-call.bat --local service.restart salt-minion'
{% else %}
    - cmd: 'salt-call --local service.restart salt-minion'
{% endif %}
    - onchanges:
      - file: minion_config

@smarsching Thanks for the feedback! This is valuable information.
Just to be more precise, could you please tell with which exact Salt version you have tested your SLS?

In the meanwhile I will try to come up with PR to update FAQ according to your comment.

@vutny Sure, I tested it with version 2016.11.3.

@smarsching Thanks, one more question: maybe do you use some sleep command before actually restarting a Minion? Because on Linux sometimes Salt performs too fast (:smile:) and restarts the service before the Minion is to be able to report back to the Master. That's why I put sleep 10; before doing salt-call .... I wonder how to put something like this to the Windows command.

@vutny I did not have to add a sleep command (it simply worked correctly without it), but if you want to do some thing similar to sleep 10; salt-call --local service.restart salt-minion on Linux, the closest thing on Windows is ping -c 10 127.0.0.1 >nul: & C:\salt\salt-call.bat --local service.restart salt-minion.

Ah, never mind, I figured out that we don't need any kind of "sleep" or "timeout". It appears that on some systems (I've tested on CentOS7) wrapping cmd.run_bg with module.run doesn't work in way we want.
Instead, I've used cmd.run state function with bg argument. That works like a charm.

All the tricks have been assembled in PR #39952. But it would be great to have a single function which just hide all the complexities between systems, SaltStack is good at this :smile:

I will try this later on:
https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.minion.html#salt.modules.minion.restart
It should support custom installations, virtualenvs and looks system-agnostic.

@smarsching If you could evaluate it on Win platform, that would be much appreciated.

@vutny Sure, if you provide an updated version of minion.restart, I can certainly give it a try on Windows.

I have tested Salt Minion "self-restarting" with 2016.11.8 and it works quite nice even without --local option:

$ sudo salt-call -l debug service.restart salt-minion
...
[DEBUG   ] LazyLoaded service.restart
[DEBUG   ] LazyLoaded cmd.run_all
[INFO    ] Executing command ['systemctl', 'status', 'salt-minion.service', '-n', '0'] in directory '/root'
[DEBUG   ] stdout: * salt-minion.service - The Salt Minion
   Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-01-23 09:08:19 UTC; 1 months 26 days ago
     Docs: man:salt-minion(1)
           file:///usr/share/doc/salt/html/contents.html
           https://docs.saltstack.com/en/latest/contents.html
 Main PID: 31 (salt-minion)
   CGroup: /docker/7661046ac38e9413cb86a6d0c1292c9f6e58e00bfc6c7bdaf9e5b084676547db/system.slice/salt-minion.service
           |-  31 /usr/bin/python /usr/bin/salt-minion
           |-1441 /usr/bin/python /usr/bin/salt-minion
           `-1543 /usr/bin/python /usr/bin/salt-minion
[INFO    ] Executing command ['systemctl', 'is-enabled', 'salt-minion.service'] in directory '/root'
[DEBUG   ] output: enabled
[DEBUG   ] Service 'salt-minion' is not masked
[INFO    ] Executing command ['systemd-run', '--scope', 'systemctl', 'restart', 'salt-minion.service'] in directory '/root'
[DEBUG   ] output: Running scope as unit run-7123.scope.
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'c7-minion', 'tcp://172.20.0.2:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'c7-minion', 'tcp://172.20.0.2:4506')
[DEBUG   ] LazyLoaded nested.output
local:
    True

This SLS always give success:

test:
  test.succeed_with_changes

salt-minion:
  cmd.run:
    - name: salt-call service.restart salt-minion
    - bg: True
    - onchanges:
      - test: test

The minion.restart function seems does slightly different job and does not applicable for system init service definitions which come from SaltStack's packages.

As well, salt-minion is able to restart the salt-master the same way!

With the most recent doc changes explaining the conversation above, I think this is now completely resolved. I'll close this. Thanks all!

Was this page helpful?
0 / 5 - 0 ratings