Hi,
When I use salt service module or a watch statement on minon configuration file to restart salt-minion service, it ends up running two instances which breaks the communication between master and minion. (I recognized that PID is different after restart command)
This is required in order to be able to do mass changes on minions like editing mine configuration.
Below you can see how it behaves;
[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root 21507 1 7 16:43 ? 00:00:00 /usr/bin/python /usr/bin/salt-minion -d
root@salt01:# salt mar-pre-ord-web-03 service.restart salt-minion
[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root 21583 1 1 16:44 ? 00:00:00 /usr/bin/python /usr/bin/salt-minion -d
root 21601 1 25 16:44 ? 00:00:00 /usr/bin/python /usr/bin/salt-minion -d
This was the sls formula, which also doesn't work.
salt-minion:
pkg:
- installed
- name: salt-minion
file:
- managed
- name: /etc/salt/minion
- template: jinja
- source: salt://common/files/minion.jinja
- require:
- pkg: salt-minion
service:
- running
- enable: True
- require:
- pkg: salt-minion
- watch:
- file: salt-minion
test with salt-call
[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root 22690 1 2 16:59 ? 00:00:00 /usr/bin/python /usr/bin/salt-minion -d
root 22755 2229 0 17:00 pts/0 00:00:00 grep salt
[root@mar-pre-ord-web-03 marconi]# salt-call service.restart salt-minion
[INFO ] Configuration file path: /etc/salt/minion
[INFO ] Executing command '/sbin/runlevel' in directory '/root'
[INFO ] Executing command '/sbin/chkconfig --list' in directory '/root'
[INFO ] Executing command '/sbin/service salt-minion restart' in directory '/root'
^C (hangs here)
Exiting gracefully on Ctrl-c
[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root 22788 1 19 17:00 ? 00:00:00 /usr/bin/python /usr/bin/salt-minion -d
root 22821 2229 0 17:00 pts/0 00:00:00 grep salt
If I restart salt-minion on the server it works fine.
[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root 22253 1 18 16:56 ? 00:00:00 /usr/bin/python /usr/bin/salt-minion -d
[root@mar-pre-ord-web-03 marconi]# service salt-minion restart
Stopping salt-minion daemon: [ OK ]
Starting salt-minion daemon: [ OK ]
[root@mar-pre-ord-web-03 marconi]# ps -fe|grep salt
root 22302 1 38 16:56 ? 00:00:00 /usr/bin/python /usr/bin/salt-minion -d
Can you show me the results of state.highstate
or state.sls
with that sls formula? Your test with salt-call
was just a service.restart
call again.
We definitely want the salt-minion to be able to restart itself, so thanks for the report. What's the output of salt --versions-report
on the master, and salt-call --versions-report
on the minion?
[root@mar-pre-ord-web-03 tmp]# salt-call --versions-report
Salt: 0.15.3
Python: 2.6.6 (r266:84292, Feb 22 2013, 00:00:18)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 2.2.0.1
ZMQ: 3.2.3
root@salt01:/srv/salt/marconi/base/pillar# salt --versions-report
Salt: 0.15.3
Python: 2.7.4 (default, Apr 19 2013, 18:28:01)
Jinja2: 2.6
M2Crypto: 0.21.1
msgpack-python: 0.2.0
msgpack-pure: Not Installed
pycrypto: 2.6
PyYAML: 3.10
PyZMQ: 2.2.0.1
ZMQ: 2.2.0
Right off the bat, I would recommend you update your master's ZMQ to ZMQ 3+. We've had a _lot_ of problems come from ZMQ 2, and they tend to manifest in weird ways. In this case, it's unlikely that its a ZMQ issue, as you have updated ZMQ on the minion, but if you could upgrade and then test again it would make me feel better. =)
Sure, unfortunately same result;
root@salt01:/srv/salt# salt --versions-report
Salt: 0.15.3
Python: 2.7.4 (default, Apr 19 2013, 18:28:01)
Jinja2: 2.6
M2Crypto: 0.21.1
msgpack-python: 0.2.0
msgpack-pure: Not Installed
pycrypto: 2.6
PyYAML: 3.10
PyZMQ: 13.0.0
ZMQ: 3.2.2
[root@mar-pre-ord-web-03 tmp]# salt-call --versions-report
Salt: 0.15.3
Python: 2.6.6 (r266:84292, Feb 22 2013, 00:00:18)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 2.2.0.1
ZMQ: 3.2.3
executing the SLS
root@salt01:~# salt mar-pre-ord-web-03 state.sls minion -v
Executing job with jid 20130626043517452244
-------------------------------------------
mar-pre-ord-web-03:
Minion did not return
[root@mar-pre-ord-web-03 tmp]# ps -fe|grep salt
root 25855 1 0 04:35 ? 00:00:01 /usr/bin/python /usr/bin/salt-minion -d
root 25939 1 0 04:35 ? 00:00:00 /usr/bin/python /usr/bin/salt-minion -d
minion/init.sls
salt-minion:
pkg:
- installed
- name: salt-minion
file:
- managed
- name: /etc/salt/minion
- template: jinja
- source: salt://minion/files/minion.jinja
- require:
- pkg: salt-minion
service:
- running
- enable: True
- require:
- pkg: salt-minion
- watch:
- file: salt-minion
Thanks. I will try to find time to test this, see if I can reproduce it.
since salt-minion is restarted, the master loses connection too, reporting no progress on other states applied later.
Logically, I don't know if you can ask the minion to make sure it is installed and running. What happens if you drop 'running' from the sls and try to restart?
Hey all,
I have run into the same issue. However, my situation may be slightly different. I have a formula that ends with installing a number of packages from a repo that was previously added by a state. The installation fails 100% of the time until the salt-minion service is restarted. Unfortunately, the minion must be restarted from the CLI.
Here's the break down:
OS: CentOS 6.4
Master:
[root@master salt]# salt --versions-report
Salt: 0.16.4
Python: 2.6.6 (r266:84292, Jul 10 2013, 22:48:45)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 13.1.0
ZMQ: 3.2.2
Minion:
[root@minion-3 ~]# salt-call --versions-report
Salt: 0.16.4
Python: 2.6.6 (r266:84292, Jul 10 2013, 22:48:45)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 13.1.0
ZMQ: 3.2.2
Restarting the minion:
[root@master salt]# salt 'minion-3' service.restart salt-minion
Salt processes before restart:
[root@minion-3 ~]# pgrep salt-minion
24686
Processes on minion while waiting for return on master:
[root@minion-3 ~]# pgrep salt-minion
25018
25035
Restarting minion:
[root@minion-3 ~]# service salt-minion restart
Stopping salt-minion daemon: [ OK ]
Starting salt-minion daemon: [ OK ]
[root@minion-3 ~]#
And finally a return on the master:
minion-3:
True
[root@master salt]#
I upgraded Pyzmq based on the discussion here, but I'm still unable to restart the minion.
In my case restart after minion upgrade from 1.16.4-1 to 1.7.1-1 stops the minion process and no minion is running (CentOS 6.4).
Master:
Salt: 0.17.1
Python: 2.6.6 (r266:84292, Jul 10 2013, 22:48:45)
Jinja2: unknown
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 2.2.0.1
ZMQ: 3.2.3
$ salt 'test*' pkg.install salt-minion
test.vm.xyz.com:
----------
salt:
----------
new:
0.17.1-1.el6
old:
0.16.4-1.el6
salt-minion:
----------
new:
0.17.1-1.el6
old:
0.16.4-1.el6
sshpass:
----------
new:
1.05-1.el6
old:
$ salt 'test*' pkg.version salt-minion
test.vm.xyz.com:
0.17.1-1.el6
$ salt 'test*' service.restart salt-minion
test.vm.xyz.com:
True
I use this technique to restart salt-minion.
http://www.rackeroz.com/2013/10/how-to-restart-salt-minion.html
mickep76 came up with a genius way of installing a new salt-minion or salt-master and restarting salt from salt. This can also be used for when a new schedule is applied to the master. I think this should be in the documentation and/or FAQ.
https://github.com/saltstack/salt/issues/7997
I know this topic is from 2013 but the bug still doesn't seem to be fixed? Because my minions won't restart either when being told so by the master.
master
[root@master]# salt-call --versions-report
Salt: 2015.2.0rc1
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: unknown
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
libnacl: Not Installed
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 14.3.1
RAET: Not Installed
ZMQ: 3.2.4
Mako: Not Installed
minion
[root@minion]# salt-call --versions-report
Salt: 2014.1.10
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: unknown
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 14.3.1
ZMQ: 3.2.4
As you can see the masters and minions ZMQ are both 3+. I just get the message
Minion did not return. [No response]
@I3olle What command did you run to restart the minion?
I ran
salt 'minion_id' cmd.run "service salt-minion restart"
Is there a better, by salt implemented, method?
You can use the service module.
Thank you @thedrow. I ran
root@master_id ~ $ salt 'minion_id' service.restart salt-minion
minion_id:
Minion did not return. [No response]
It doesn't seem to change the outcome though
@I3olle using salt to restart a minion's salt-minion service is probably not returning because, well, it restarted the salt-minion, so it dropped the connections. I heard that this can make minions stop connecting to the master.
So better to do something like for i in $(sudo salt-key | grep -); do echo ${i};ssh ${i} "sudo service salt-minion restart"; done
.
I'm hitting this "bug" too. So I have some thoughts on this.
What's is the current expected behavior?
What about adding a specific state/module to trigger a salt-minion restart after the entire job execution?
It would just output a "changes: salt-minion will be restarted after this job execution", continue the job execution, returns the job status to the master, and only then, the minion restart itself.
Or, salt-minion could work as sshd
do: a new process is created to execute the current job, and it kills itself only at the end of the job. A salt-minion restart would not touch existing job processes and not break minion-master communication. I think it's more difficult to do, but more elegant.
@dr4Ke I found that doing like this:
salt '*' at.at 'now + 1 minute' 'service salt-minion restart' tag=salt-restart
is the most reliable way to restart minions. But only when you have (or want to have) atd
installed everywhere.
I had success using a supervisor like systemd or supervisord. It starts salt up after an upgrade with the added benefit of restarting after an unexpected crash, if that were to occur.
Using atd or anacron or a sleep 300 is a good way to do it on Linux, and on Windows using a scheduled task works pretty well. I also like somebody's suggestion of using salt-call --local service.restart salt-minion as that uses salt but salt-call doesn't require the service or communication with the master.
This is a long living issue, but it's also some kind of well-known limitation of Salt.
We got it covered in documentation: https://docs.saltstack.com/en/latest/faq.html#what-is-the-best-way-to-restart-a-salt-daemon-using-salt
@ozgurakan @basepi I believe this issue could be closed.
The reason that we have never closed this is because it would be useful if we could harden the restart routines such that a minion could restart itself without at
or similar.
@basepi I see that we have at least couple of duplicates for this issue: #18835 #7997.
Maybe it would be nice to keep just a single issue relevant for the topic?
@vutny #18835 has a different underlying issue. The restart is just a workaround. I commented on #7997 to see if anyone thinks it should remain open.
This should be be doable easily enough now with cmd.run_bg from #6691.
Tested this on CentOS 6 and worked fine:
salt myminion cmd.run_bg 'sleep 10; service salt-minion restart'
Take a look at something we just merged a few minutes ago: https://github.com/saltstack/salt/pull/32593
@cachedout - nice! Will that make it in time for Boron or is that going to have to wait for a point release?
That's in develop right now, so it's scheduled for Cabon. I'd be willing to consider a backport. Is it something important for you?
@cachedout - With the workaround via cmd_bg which is in Boron, we won't need a backport.
@sjmh Sounds good. Thanks.
@ozgurakan have you experimented with the minion config parameter master_tries
? If you set it to -1
it should retry connecting infinitely rather than giving up on first attempt and becoming a dead minion (if I've understood the documentation properly).
Hi, i had been trying to restart a windows minion for ages, and finally worked out a way to get it to work everytime:
salt '*' cmd.run_bg 'Restart-Service salt-minion' shell=powershell
@Trouble123 but not work for linux,it return
Minion did not return. [Not connected]
There is a universal recipe for all platforms supported by Salt:
salt '*' cmd.run_bg 'salt-call --local service.restart salt-minion'
It should work on Windows too even without shell=powershell
parameter, as I figured out from the docs.
Could somebody confirm?
If that does work, we should definitely wrap it up into its own exec module function.
@cachedout What about the changes in PR #32593? I think it's possible to specify salt-call --local service.restart salt-minion
as the minion_restart_command
command and make it default. But again, not completely sure if that will work on Windows.
At least, for now we can update the FAQ for 2016.3
with better recipe.
@vutny I can confirm that your solution works fine on Ubuntu 16.04 LTS and on Windows Server 2012 R2 with one small exception: On Windows, the complete path to salt-call
has to be specified. This is strange because C:\salt
is in the path
grain, but maybe cmd.run_bg
does not inherit this path.
Anyway, I think that the solution proposed by @vutny is the one which works best on most platforms and thus the FAQ should be updated accordingly.
I use the following logic for triggering a restart from an SLS file:
minion_config:
# The logic for updating the minion's configuration goes here...
minion_restart:
module.run:
- name: cmd.run_bg
{% if grains['os'] == 'Windows' %}
- cmd: 'C:\salt\salt-call.bat --local service.restart salt-minion'
{% else %}
- cmd: 'salt-call --local service.restart salt-minion'
{% endif %}
- onchanges:
- file: minion_config
@smarsching Thanks for the feedback! This is valuable information.
Just to be more precise, could you please tell with which exact Salt version you have tested your SLS?
In the meanwhile I will try to come up with PR to update FAQ according to your comment.
@vutny Sure, I tested it with version 2016.11.3.
@smarsching Thanks, one more question: maybe do you use some sleep
command before actually restarting a Minion? Because on Linux sometimes Salt performs too fast (:smile:) and restarts the service before the Minion is to be able to report back to the Master. That's why I put sleep 10;
before doing salt-call ...
. I wonder how to put something like this to the Windows command.
@vutny I did not have to add a sleep command (it simply worked correctly without it), but if you want to do some thing similar to sleep 10; salt-call --local service.restart salt-minion
on Linux, the closest thing on Windows is ping -c 10 127.0.0.1 >nul: & C:\salt\salt-call.bat --local service.restart salt-minion
.
Ah, never mind, I figured out that we don't need any kind of "sleep" or "timeout". It appears that on some systems (I've tested on CentOS7) wrapping cmd.run_bg
with module.run
doesn't work in way we want.
Instead, I've used cmd.run
state function with bg
argument. That works like a charm.
All the tricks have been assembled in PR #39952. But it would be great to have a single function which just hide all the complexities between systems, SaltStack is good at this :smile:
I will try this later on:
https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.minion.html#salt.modules.minion.restart
It should support custom installations, virtualenvs and looks system-agnostic.
@smarsching If you could evaluate it on Win platform, that would be much appreciated.
@vutny Sure, if you provide an updated version of minion.restart
, I can certainly give it a try on Windows.
I have tested Salt Minion "self-restarting" with 2016.11.8
and it works quite nice even without --local
option:
$ sudo salt-call -l debug service.restart salt-minion
...
[DEBUG ] LazyLoaded service.restart
[DEBUG ] LazyLoaded cmd.run_all
[INFO ] Executing command ['systemctl', 'status', 'salt-minion.service', '-n', '0'] in directory '/root'
[DEBUG ] stdout: * salt-minion.service - The Salt Minion
Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2018-01-23 09:08:19 UTC; 1 months 26 days ago
Docs: man:salt-minion(1)
file:///usr/share/doc/salt/html/contents.html
https://docs.saltstack.com/en/latest/contents.html
Main PID: 31 (salt-minion)
CGroup: /docker/7661046ac38e9413cb86a6d0c1292c9f6e58e00bfc6c7bdaf9e5b084676547db/system.slice/salt-minion.service
|- 31 /usr/bin/python /usr/bin/salt-minion
|-1441 /usr/bin/python /usr/bin/salt-minion
`-1543 /usr/bin/python /usr/bin/salt-minion
[INFO ] Executing command ['systemctl', 'is-enabled', 'salt-minion.service'] in directory '/root'
[DEBUG ] output: enabled
[DEBUG ] Service 'salt-minion' is not masked
[INFO ] Executing command ['systemd-run', '--scope', 'systemctl', 'restart', 'salt-minion.service'] in directory '/root'
[DEBUG ] output: Running scope as unit run-7123.scope.
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'c7-minion', 'tcp://172.20.0.2:4506', 'aes')
[DEBUG ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'c7-minion', 'tcp://172.20.0.2:4506')
[DEBUG ] LazyLoaded nested.output
local:
True
This SLS always give success:
test:
test.succeed_with_changes
salt-minion:
cmd.run:
- name: salt-call service.restart salt-minion
- bg: True
- onchanges:
- test: test
The minion.restart
function seems does slightly different job and does not applicable for system init service definitions which come from SaltStack's packages.
As well, salt-minion
is able to restart the salt-master
the same way!
With the most recent doc changes explaining the conversation above, I think this is now completely resolved. I'll close this. Thanks all!
Most helpful comment
Take a look at something we just merged a few minutes ago: https://github.com/saltstack/salt/pull/32593