Salt: Proper way to upgrade salt-minions / salt-master packages without losing minion connectivity

Created on 21 Oct 2013 · 71Comments · Source: saltstack/salt

We ran through the right way of doing this in salt training with Seth but I think I'm still missing something. I'm not sure if this is a bug or if I've missed something. I tried to run through the upgrade the master first / use salt to upgrade the minion service steps to upgrade from v.17 to v.17.1 of salt and ended up with losing access to most of my minions.

Long story short, I need a reliable way of upgrading all of the salt-minions and salt-master packages without losing access to the minions. From what I can tell, every time I perform such an upgrade I lose access to some if not all of my minions and need to login to each host/VM and restart the salt-minion package. This is doable in test/dev where we have 30 nodes being managed but not when I move this infrastructure to prod where I have over 200 nodes to manage. I need the upgrade path not to break the remote execution framework established between minions and master.

So without further ado here's what I did:

Update the master.

[root@salt-master ~]# yum list updates
Loaded plugins: security 
epel                                                                                                | 3.0 kB     00:00
epel/primary_db                                                                                     | 6.2 MB     00:00
epel-testing                                                                                        | 2.9 kB     00:00
epel-testing/primary_db                                                                             | 2.2 MB     00:00
rhel-localrepo                                                                                      | 3.0 kB     00:00
rhel-localrepo/primary_db                                                                           |  26 MB     00:00
Updated Packages         
glibc.x86_64                                         2.12-1.107.el6_4.5                                      rhel-localrepo
glibc-common.x86_64                                  2.12-1.107.el6_4.5                                      rhel-localrepo
glibc-devel.x86_64                                   2.12-1.107.el6_4.5                                      rhel-localrepo
glibc-headers.x86_64                                 2.12-1.107.el6_4.5                                      rhel-localrepo
java-1.6.0-openjdk.x86_64                            1:1.6.0.0-1.65.1.11.13.el6_4                            rhel-localrepo
kernel.x86_64                                        2.6.32-358.23.2.el6                                     rhel-localrepo
kernel-firmware.noarch                               2.6.32-358.23.2.el6                                     rhel-localrepo
kernel-headers.x86_64                                2.6.32-358.23.2.el6                                     rhel-localrepo
libtar.x86_64                                        1.2.11-17.el6_4.1                                       rhel-localrepo
nscd.x86_64                                          2.12-1.107.el6_4.5                                      rhel-localrepo
perf.x86_64                                          2.6.32-358.23.2.el6                                     rhel-localrepo
salt.noarch                                          0.17.1-1.el6                                            epel-testing
salt-master.noarch                                   0.17.1-1.el6                                            epel-testing
salt-minion.noarch                                   0.17.1-1.el6                                            epel-testing
setup.noarch                                         2.8.14-20.el6_4.1                                       rhel-localrepo
tzdata.noarch                                        2013g-1.el6                                             rhel-localrepo
tzdata-java.noarch                                   2013g-1.el6                                             rhel-localrepo
You have new mail in /var/spool/mail/root
[root@salt-master ~]# yum update -y

I restart the master and minion on my master VM.

[root@salt-master ~]# service salt-master restart
Stopping salt-master daemon:                               [  OK  ]
Starting salt-master daemon:                               [  OK  ]
[root@salt-master ~]# service salt-minion restart
Stopping salt-minion daemon:                               [  OK  ]
Starting salt-minion daemon:                               [  OK  ]

Try to upgrade some of my test minion VMs.

[root@salt-master ~]# salt 'salt-minion*' pkg.upgrade
[root@salt-master ~]# salt 'salt-minion*' pkg.list_upgrades

[root@salt-master ~]# salt -v 'salt-minion*' test.ping
Executing job with jid 20131021102016190263
-------------------------------------------

salt-minion-00:
    Minion did not return
salt-minion-01:
    Minion did not return

I login to each minion VM and restart the salt-minion service.

[root@salt-minion-01 ~]# service salt-minion restart
Stopping salt-minion daemon:                               [FAILED]
Starting salt-minion daemon:                               [  OK  ]
[root@salt-minion-01 ~]# chkconfig --list | grep salt-minion
salt-minion     0:off   1:off   2:on    3:on    4:on    5:on    6:off

Now I can ping the VMs again.

[root@salt-master ~]# salt -v 'salt-minion*' test.ping
Executing job with jid 20131021102229314417
-------------------------------------------

salt-minion-01:
    True                 
salt-minion-00:
    True

Versions reports:

[root@salt-master ~]# salt --versions-report
           Salt: 0.17.1
         Python: 2.6.6 (r266:84292, May 27 2013, 05:35:12)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 3.2.4

[root@salt-minion-00 ~]# salt-call --versions-report
           Salt: 0.17.1
         Python: 2.6.6 (r266:84292, May 27 2013, 05:35:12)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 3.2.4

[root@salt-minion-01 ~]# salt-call --versions-report
           Salt: 0.17.1
         Python: 2.6.8 (unknown, Nov  7 2012, 14:47:45)
         Jinja2: unknown
       M2Crypto: 0.21.1
 msgpack-python: 0.1.12
   msgpack-pure: Not Installed
       pycrypto: 2.3
         PyYAML: 3.08
          PyZMQ: 2.1.9
            ZMQ: 2.2.0

You'll notice that the upgrade proceeded correctly. The packages were upgraded, but the salt-minion services were not restarted as a part of the upgrade process (for both minion VMs - one is RHEL5 and the other is RHEL6). Unfortunately, I didn't think to run the upgrade packages command in verbose mode at the time.

Do I need to find some external remote-execution method to restart all of the minions post-upgrade (mussh/omnitty, etc...)? This is probably not a bug but it's still very frustrating... I'm unlikely to upgrade again until I can figure out how to do this properly.

Documentation P1 help-wanted

Source

shantanub

Most helpful comment

Restarting the minion process from inside a state running against that minion process (e.g. with a service / watch type state rule) will always fail. It's hard to imagine getting it to work without either a moderate re-architecture of the salt workflow or some cumbersome custom code inside the state machine to handle this special case.

That said, the following has been allowing flawless upgrades for me since I started deploying minions around 0.16.1, up to 0.17.4 we're running now:

salt-minion-reload:
  cmd.wait:
    - name: echo service salt-minion restart | at now + 5 minutes
    - watch:
      - file: /etc/salt/minion
      - pkg: salt-minion-pkgs

Obviously, you can set the wait time to whatever you want -- just make sure it's long enough that the minion proc doesn't get whacked during the current run...

tkwilliams on 20 Dec 2013

👍3

All 71 comments

you may need my fix (already on git/develop) for #7987

kiorky on 21 Oct 2013

Was this always a problem or just something specific to the v.17 -> v.17.1 upgrade? I've personally never got it to work reliably across all minion upgrades with past version upgrades but I had assumed I was going about it the wrong way.

I just tried an alternative upgrade approach. Unfortunately, it didn't work either.

[root@salt-master ~]# salt 'test*' cmd.run "yum update -y"
[root@salt-master ~]# salt 'test*' service.restart salt-minion
[root@salt-master ~]# salt 'test*' -v test.ping
Executing job with jid 20131021110335741192
-------------------------------------------

test:
    Minion did not return

[root@test ~]# service salt-minion status
salt-minion is stopped
[root@test ~]# service salt-minion start
Starting salt-minion daemon:                               [  OK  ]

[root@salt-master ~]# salt 'test*' -v test.ping
Executing job with jid 20131021110634961121
-------------------------------------------

test:
    True

The package upgrade seems to stop the running service.

shantanub on 21 Oct 2013

This process has never been as consistent or stable as we would like it to be. But the first thing you need to do is make sure all of your minions have ZMQ 3.2 or higher. That minion that you listed above with ZMQ 2 is definitely going to cause problems with keeping the connection alive or reconnecting.

The rest of the process tends to depend on the init system in question and a lot of other factors. As soon as we get the general bug count under control, we want to dedicate some resources to solving this upgrade problem for good.

basepi on 22 Oct 2013

I don't have much of a choice here unless I deviate from the repos. ZMQ3 is only available in epel for RHEL6. I have the latest version of ZMQ offered for RHEL5 which happens to be the 2.2 version listed above.

Minion01 replicates our legacy RHEL5 nodes (which we have a lot of) and I'm using it as a control for testing issues.

Minion00 is running RHEL6 and mimics the configuration of our latest app/service deployments (and where we'd like to transition everything once I have enough cycles to complete migrations).

shantanub on 22 Oct 2013

I should also say, you're right. Most of my upgrade and losing minions when the salt-master reboots woes have been with RHEL5 and the legacy ZMQ. That said, this latest upgrade to v.17.1 hasn't worked cleanly for any client. Every single upgrade has stopped the salt-minion service and requires me to login and reboot the services.

Can I get around this by using salt-ssh to restart the minion services? I could use an example of how to use salt-ssh. I'm not entirely clear what user / keys / password it uses (or if this is all hidden under the covers with salt's key management system) or how to issue commands. The docs I've run across don't have too many examples of executing shell commands. I'm presuming using the -r flag (raw) works much like "ssh -t"?

shantanub on 22 Oct 2013

Hi,

I have a workaround for this:

For my Linux minion i set a crontab to restart minions every night.

{% if grains['kernel'] == 'Linux' %}
/etc/cron.daily/restart-minion.sh:
  file:                                     # state declaration
    - managed                               # function
    - mode: 755
    - source: salt://tools/restart-minion/restart-minion.sh   # function arg
{% endif %}

On Windows, same thing, i add a task to restart minions.

{% if grains['os'] == 'Windows' %}
addtask:
  cmd.run:
     - name: schtasks /f /create /tn "restart_salt" /ru System /tr "c:\salt\salt-call.exe service.restart salt-minion"  /sc daily
 /st 02:00

Regards

equinoxefr on 22 Oct 2013

That's actually quite clever and I'm going to steal that idea.

That said, it would not have helped with the v.17 to v.17.1 upgrade via epel-testing packages. From what I'm seeing, the package upgrade itself stops the running minion service (whether via yum or from within salt's framework -- which makes sense since salt calls yum's methods) which is very peculiar behavior (I don't recall seeing this happen before with any daemon installed from rpm packages). This seems to be a new upgrade artifact that I don't remember seeing before but I've only done 4-5 upgrades so far.

I'm going to run the upgrade in verbose mode and see if I can find any other artifacts but need to troubleshoot some 10GbE networking problems we're having first.

shantanub on 22 Oct 2013

Maybe I dismissed the idea too early... I could extend your example and poll to see if the service is running (every 5 minutes or something) and if it is not, start the salt-minion.

That seems a little overkill but would resolve this specific issue.

I should be able to use salt-ssh to login to all of minions and start the minion service manually however right?

shantanub on 22 Oct 2013

@shantanub, I look at the spec file. There is some code to stop salt-minion before upgrade and if it's an upgrade to restart the minion with service salt-minion condrestart

I don't know why this part of code doesn't work. Before 0.17 version, the minion wasn't stopped during upgrade.

equinoxefr on 22 Oct 2013

Strange. I wonder if we changed something between 0.17.0 and 0.17.1 that would have caused this change.

@equinoxefr I can't see any recent changes that changed whether the salt process was stopped or not. Looks like the stop of the salt-minion has been in there for awhile (at least for the 0.16 release, I didn't go earlier). Just wondering where you were looking to see that change.

@shantanub You've had successful upgrades from epel before, then? But not 0.17.0-0.17.1?

basepi on 22 Oct 2013

@basepi i didn't see any change in the code i have seen a change in the functioning of minion upgrade (on linux RPM).
Before 0.17.1, salt-minion process wasn't restarted after upgrade. If you upgraded from 0.16.3 to 0.17 and launched a test.version just after, salt-minion will say 0.16.3 not 0.17. After a manual restart it works and says 0.17...

Now with 0.17.1 salt-minion is stopped and not restarted. Perhaps the piece of code that use the state of RPM operation doesn't work.

0.16.0 -> 0.16.3 = Minion not restarted
0.16.3 -> 0.17 = Minion not restarted
0.17 -> 0.17.1 = Minion stopped

I did my tests on centos 6.4.

I don't know why but something has changed ;-)

equinoxefr on 22 Oct 2013

Hrm, well, it doesn't _appear_ to be in the spec file, so it must be somewhere else. Maybe the init script?

basepi on 22 Oct 2013

@basepi That's correct. I've always upgraded and used the epel/epel-testing rpms to install salt so far. This is the first time I've noticed the minion stopped after/during the upgrade (this is something that would have been obvious since I would have to login to every minion and restart the service). The upgrade itself seems to have executed fine in every other regard that I can tell (no errors, etc...).

Now, I have in the past lost all of the rhel5 minions when the salt-master service is restarted. I haven't had a problem with that in a few versions, but as I mentioned above, I very much would like to depart from rhel5 as soon as possible.

shantanub on 23 Oct 2013

Upon upgrading to v.17.2, it looks like the minion restarts as a part of the upgrade just fine.

Has this issue with the package upgrade been resolved?

As a fail-safe, I "start" my minions every 5 minutes via cron just in case they're down for some reason or another. I'll need to add the windows specific scheduler task as well.

I would still like a definitive guide for how-to-upgrade minions and the salt-master. We're moving salt to production once v.17 is available in epel (as opposed to epel-testing), and I'd very much like upgrades to go smoothly.

shantanub on 7 Dec 2013

In general, the upgrades themselves tend to go swimmingly. The problem is the restart after the upgrade. We have an open issue specifically for the restarting of the minion: #5721

The issue also varies in severity from system to system (specifically between different init systems). Making it so the salt minion can restart itself consistently is high on our priority list.

basepi on 9 Dec 2013

Are you sure the minion upgrade/restart problem isn't just a bug in the rpm post install script? That's where it should be restarted ... did you have a look at the source rpm?

ghost on 19 Dec 2013

That said, the following has been allowing flawless upgrades for me since I started deploying minions around 0.16.1, up to 0.17.4 we're running now:

salt-minion-reload:
  cmd.wait:
    - name: echo service salt-minion restart | at now + 5 minutes
    - watch:
      - file: /etc/salt/minion
      - pkg: salt-minion-pkgs

Obviously, you can set the wait time to whatever you want -- just make sure it's long enough that the minion proc doesn't get whacked during the current run...

tkwilliams on 20 Dec 2013

👍3

Using at is a great idea. And if you put order: last onto the state, then you can put it at + 1 minute or something without much fear of it cutting the process off.

basepi on 20 Dec 2013

Ah, 'order: last' is a damned fine idea. Ashamed I didn't think of it
myself...

On Fri, Dec 20, 2013 at 11:41 AM, Colton Myers [email protected]:

Using at is a great idea. And if you put order: last onto the state, then
you can put it at + 1 minute or something without much fear of it cutting
the process off.

—
Reply to this email directly or view it on GitHubhttps://github.com/saltstack/salt/issues/7997#issuecomment-31035886
.

tkwilliams on 20 Dec 2013

Well, I can't believe I've never thought of using at! Thanks for your awesome workaround (until we can get this working properly without it, of course)

basepi on 21 Dec 2013

@tkwilliams: Woah cool. That's really helpful. Now you mentioned you guys watch your minion file as well. Does that mean you specify the contents of the minion file for every host/vm? Are you just pulling the fqdn from the environment for the contents of that file or doing something else?

I'm having a lot of trouble with renaming minions (I have the master copy keys and change the contents of the minion file on the minion but a simple service restart of the salt-minion doesn't seem to be sufficient to get the master talking to the minion with the new minion hostname).

This is an artifact of our kickstart setup. All of our hosts start up with a name that looks like "preconfig-macaddr.domain.org" where macaddr is the macaddress of the primary kickstarted interface (aa-bb-cc-dd-ee-ff). We then set the hostname and role of hosts via script but this has been a little painful since salt doesn't seem to readily want to move to the new hostname. Rebooting the host/VM after the change seems to work but I'd prefer not to have to do that.

I'll experiment with this little wrinkle and see if it helps with renaming minions.

shantanub on 23 Dec 2013

The "at now" trick didn't work for renaming minions on rhel6 unfortunately. Something is caching the old hostname even though I've changed it just about everywhere I can imagine.

Oh well, back to restarting vms/hosts when I rename them.

shantanub on 31 Dec 2013

Are you also deleting the /etc/salt/minion_id file so that the minion isn't caching the old name?

basepi on 3 Jan 2014

Nope. I'm putting the new hostname in that file and it doesn't seem to be doing anything without a reboot.

shantanub on 4 Jan 2014

I've actually passed along to Seth the actual scripts I'm using to perform the name change. Feel free to see if I'm doing something silly. He thinks there may be a timing issue I've glossed over.

shantanub on 4 Jan 2014

You do need to restart the minion to change the minion ID. Don't know if "without a reboot" meant system reboot or minion restart.

basepi on 7 Jan 2014

@basepi: I mean a system reboot/restart of the minion host/vm is required. As I mentioned restarting the minion service post name change even with the 'at now +1 minutes', several different sleep lengths, etc... doesn't get the minion to show up on the master as up.

An interesting factoid is I can restart the minion reverting the keys on the master back to the original minion hostname and the minion shows up on the master just fine (this is without changing the contents of the minion_id file or the actual hostname of the minion which now point to the new hostname). So something is being cached somewhere and I'm not sure why/what. I don't run nscd if that matters.

shantanub on 7 Jan 2014

Just so we're all on the same page here's what I do: I have 2 hostname change scripts. One that resides on and is called from the master and one on each minion. The master calls the minion script as a part of its script via salt's remote execution framework:

rename-minion.sh script run on salt-master:

#!/bin/bash

DOMAIN={{ domainsuffix }} 

#!/bin/bash

die () {
    /bin/echo >&2 "$@"
    exit 1
}

[ "$#" -eq 2 ] || die "2 arguments required, $# provided"

/bin/echo $1
/bin/echo $2

orig_hostname="$1.${DOMAIN}"
new_hostname="$2.${DOMAIN}"

/bin/echo $orig_hostname
/bin/echo $new_hostname

path_to_keys="/etc/salt/pki/master/minions"

if [ -f "$path_to_keys/$orig_hostname" ]; then
    /bin/cp -a $path_to_keys/$orig_hostname $path_to_keys/$new_hostname

    # change name on minion
    /usr/bin/salt -v $orig_hostname cmd.run "/managed/scripts/set_hostname.sh $2 ${DOMAIN}"

    # /bin/sleep 10s
    /bin/rm -f $path_to_keys/$orig_hostname
fi

set_hostname.sh script called on minion:

#!/bin/bash

die () {
    /bin/echo >&2 "$@"
    exit 1
}

[ "$#" -eq 2 ] || die "2 arguments required, $# provided"

/bin/echo $1
/bin/echo $2

DOMAIN="$2"
hostname="$1.${DOMAIN}"

/bin/echo $hostname
/bin/cp -a /etc/sysconfig/network /etc/sysconfig/network.bak 
/usr/bin/chattr -i -V /etc/sysconfig/network 
/bin/sed "s/HOSTNAME=.*/HOSTNAME=${hostname}/" /etc/sysconfig/network.bak > /etc/sysconfig/network
/bin/hostname $hostname 
/usr/bin/chattr +i -V /etc/sysconfig/network 
/sbin/service salt-minion stop

/bin/echo $hostname > /etc/salt/minion_id

/sbin/reboot

# /bin/sleep 5
# /sbin/service salt-minion start

I've commented out the sleeps and the minion service start since they weren't doing anything post name change (in favor of a full host/vm restart of the minion but did experiment with a number of combinations of sleep times, calls to restart and stop/start of the minion service without avail.

shantanub on 7 Jan 2014

I _think_ I see what's going on here. I think what you need to do is delete the keys on the master _before_ the minion is restarted with the new ID.

Nevermind, though, that particular issue should be resolved with a minion restart, not a minion system restart. Still, would be something to try.

basepi on 8 Jan 2014

Umm.. how exactly do I target the minion if I delete its key before restarting it?

That nested remote-execution call to the minion will never execute.

Are you implying this can't be done without an out-of-band restart of the salt minion (via salt-ssh or some other method?)?

shantanub on 8 Jan 2014

Well, since a minion restart doesn't help (from what I understand you're having to issue a _system_ reboot to get the name change to work) then it's a moot point, and it wasn't the issue I was thinking it was.

basepi on 8 Jan 2014

Whoops, misused "moot", it's a pet peeve of mine. I meant that it's irrelevant.

basepi on 8 Jan 2014

What we need to figure out is why it's requiring a system reboot to get the new name to show up. Something odd is going on.

basepi on 8 Jan 2014

It's peculiar. I have it working - reboot isn't too big a requirement for VMs, but I'll be renaming about 200 nodes we're deploying hadoop on soon, and these things take forever to reboot with ram checks, etc...

It's a small gripe since the minion_id file isn't working the way I thought it would, but I'm also mucking with keys on the master which isn't what you guys had intended.

I probably should have started another thread. Sorry about the thread hijack. I was really hopeful the 'at now' trick would work here too.

shantanub on 9 Jan 2014

Ya, definitely a strange one. If you do end up with further problems, then let's do as you suggested and create a new thread.

basepi on 9 Jan 2014

Seth has resolved the issue btw. The stop in my minion script prior to updating the minion_id appears to have been the problem. I'm not sure why it matters but it is why communication to the minion via the new hostname didn't work. I haven't had a chance to create a new thread yet but since the issue is resolved, I'm considering being lazy and avoiding it? =)

shantanub on 12 Jan 2014

I could get on board with the "being lazy" plan. =)

basepi on 13 Jan 2014

Hi @shantanub, sorry I just noticed your question earlier in the thread about watching /etc/salt/minion. We in fact use a completely generic minion file (only non-stock setting is 'master') across all our nodes, but I do manage it via salt and set up a simple watch on it in case I ever wanted to push config changes that way :)

As an aside, did you ever think about NOT starting the salt minion until you've successfully renamed the host? That would bypass the whole situation, assuming your scripts are ordered such as to allow that sort of thing of course...

tkwilliams on 13 Jan 2014

@tkwilliams: I could but isn't changing the hostname using an out-of-band approach painful?

I'm spinning up 200 physical hosts and dozens of VMs at a time and they all end up with names derived from their kickstart interface mac address (I chose that over localhost.localdomain). I need some way to group hosts and rename them. I was planning on just assigning names with a simple for loop once I could rename 1 minion correctly which looks like its working now.

How do you guys name your hosts/vms before installing salt?

shantanub on 14 Jan 2014

Hi @shantanub. From reading back through this issue, it looks like @whiteinge may have solved this for you. Is that right? Should this issue remain open or should it be closed and marked as resolved? If you wouldn't mind letting us know if there's more work to do here, it would be much appreciated. Thanks!

cachedout on 4 Mar 2014

Yup @whiteinge fixed the renaming issue with the '@ +1 mins' trick.

I've performed several minion upgrades as well, and they've gone much better (i.e. I didn't lose all my minions when upgrading the master or all my minions when I upgraded salt-minion on each vm/host, etc...). I think you guys have fixed the issues in the rhel packages. My minions are now straddling .17.5 and 2014.1.0 (I have to still upgrade production).

shantanub on 7 Mar 2014

Similar to the at state that restarts linux minions, here is one for Windows minions:

schedule-start:
  cmd.run:
    - name: at (Get-Date).AddMinutes(1).ToString("HH:mm") cmd /c "net start salt-minion"
    - shell: powershell
  service.dead:
    - name: salt-minion
    - require:
        - cmd: schedule-start

csakoda on 3 Apr 2014

👍1

@csakoda Very handy! Thanks for leaving that note on this issue where it might help others. Much appreciated!

cachedout on 10 Apr 2014

For updates we use the following since using Salt to update itself will break.

First make sure not to update packages, just install them with current version. Like when building a master.

salt-master:
  pkg.installed:
    - version: {{ pillar.salt.cur_pkg_version }}

Pillar for salt.cur_pkg_version would look something like: {{ grains.saltversion }}-1.fc19

This would then launch after Salt highstate has finished, the advantage of using "nohup" and "sleep" instead of "at", is that it work's inside root jail that Cobbler uses during kickstart.

{%- if grains.saltversion < pillar.salt.version %}
salt-update:
  cmd:
    - run
{%- if 'salt.master' in pillar.roles %}
    - name: /usr/bin/nohup /usr/bin/bash -c 'set -x && sleep 30 && yum -y install salt-{{ pillar.salt.version }} && systemctl restart salt-master salt-minion' >>/var/log/salt/update 2>&1 &
{%- else %}
    - name: /usr/bin/nohup /usr/bin/bash -c 'set -x && sleep 30 && yum -y install salt-{{ pillar.salt.version }} && systemctl restart salt-minion' >>/var/log/salt/update 2>&1 &
{%- endif %}
    - order: last
{%- endif %}

mickep76 on 7 Jul 2014

I tried mickep76's way of restarting the salt-minion and I believe it is far superior then the method that is currently documented on the website. I believe this way should replace the current documentation. Can we re-open this or make a new issue? I'm new to Salt and Github.

hal58th on 12 Aug 2014

@hal58th Are you willing to try to generate a pull request even though you are new to GitHub? Generate it against the 2014.7 branch. If not, that's cool, we can take care of it.

cro on 12 Aug 2014

The example above is systemctl-specific, so we should probably create equivalents for the other service providers and post them as well.

basepi on 13 Aug 2014

@cro I would love to do this pull request if I had more experience. I'll try to look up some more tutorials on how to use github this weekend when I actually have time (we use git at work but it's a much simpler setup). Best to assume I won't do it at this current time.

hal58th on 13 Aug 2014

I'm re-labeling this as documentation and re-opening this issue so that documentation can be updated, as discussed above by @hal58th .

cachedout on 21 Aug 2014

The at command mentioned in this issue as a resolution for windows is deprecated. I'm running Windows Server 2012 and I'm trying to get a version similar to @equinoxefr solution using schtasks to work.

Here is my attempt:

schedule-start:
  cmd:
    - run
    - name: schtasks /create /sc once /tn restartsalt /tr "cmd /c net start salt-minion" /st (Get-Date).AddMinutes(1).ToString("HH:mm") /sd (Get-Date).ToString("MM/dd/yyyy")
    - shell: powershell
    - order: last
  service:
    - dead
    - name: salt-minion
    - require:
        - cmd: schedule-start

But using 2015.2.0rc2 this returns an error:

          ID: schedule-start
    Function: cmd.run
        Name: schtasks /create /sc once /tn restartsalt /tr "cmd /c net start salt-minion" /st (Get-Date).AddMinutes(1).ToString("HH:mm") /sd (Get-Date).ToString("MM/dd/yyyy")
      Result: False
     Comment: Command "schtasks /create /sc once /tn restartsalt /tr "cmd /c net start salt-minion" /st (Get-Date).AddMinutes(1).ToString("HH:mm") /sd (Get-Date).ToString("MM/dd/yyyy")" run
     Started: 09:34:36.998000
    Duration: 1535.0 ms
     Changes:
              ----------
              pid:
                  3856
              retcode:
                  1
              stderr:
                  ERROR: No mapping between account names and security IDs was done.
                  (40,4):UserId:
              stdout:

Apparently schtasks doesn't seem to be happy when executed from the Local System account.

Does anyone have a suggestion? Maybe @equinoxefr? Did your solution work on 2012 or are you running earlier versions?

Maybe my only solution is to supply the /ru and /rp passwords and specifcy a user. But I would like to avoid that since all machines use different credentials and the are not in a domain (cloud based machines).

EDIT:
Specifying /ru "SYSTEM" seems to have worked. But I still don't understand how this would work in an upgrade. The installer will still kills the current process right? And that puts jobs hanging! Wouldn't it be better to create two scheduled tasks? One that installs the minion in a minute, and another that restarts the service in two minutes? this way the exit from the salt command would be returned (creation of the tasks would return nicely) and no "dirty jobs" would be hanging.

andrejohansson on 21 Apr 2015

@andrejohansson Do any of the workarounds in this fellow's repo work on 2012? https://github.com/markuskramerIgitt/LearnSalt/blob/master/learn-run-as.sls

dragon788 on 26 May 2015

We ended up creating a batch file to uninstall and reinstall the salt minion, since simply upgrading in place had weird behaviors on 2014.7.x to newer 2014.7.x. We then schedule the batch file using the method above which with /ru "SYSTEM" works quite well. The one thing we added to the batch was backing up the minion.pem and minion.pub (could also add the minion [.conf]) so that when it talks back to the master it isn't colliding with its old key, it is reusing it. We also had to trigger a service start after we installed the new version otherwise we are unable to connect to it from the salt master.

dragon788 on 29 May 2015

@dragon788 yes, I ended up doing something similar, but I havent saved the key files yet. Smart!
What I've done is the following:

Made a salt state that does the following:
- Downloads the new minion and places in a temp folder
- Downloads a scheduled task with multiple actions (xml file) to the same temp folder
- Schedules the task (xml file) to run once on boot
- Disable the minion so it doesn't autostart
- Schedules a reboot
Once the computer reboots the scheduled tasks
- Uninstalls salt-minion
- Removes the c:\salt dir completely (we don't want leftover cache stuff or other things)
- Installs the new minion
- Waits 10 minutes
- Starts the minion

The reboot I've found necessary because even in the newest 2015.5.X releases sometimes nssm.exe won't be deleted by the uninstaller and remains active in c:\salt. This can prevent successful installs and startups later.

The wait 10 minutes I've found necessary because sometimes the installer won't start the minion after exit.

andrejohansson on 22 Jul 2015

under systemd systems, this upgrade of salt-minion via salt is really a pain in the neck.
On Debian jessie, not only the salt call never returns (which is not really a problem), but it le the salt-minion package half upgraded and the salt-minion stopped...
This is due to the fact the salt-minion.service file uses the default cgroup mode for the KillMode.

Setting this to KillMode=process helps there. I guess the salt-minion.service should be modified in this way. Meanwhile, I deploy the file /etc/systemd/system/salt-minion.service.d/killmode.conf with

[Service]
KillMode=process

It allows me to properly run

root@pw01:/srv/salt# salt 'pw02' service.restart salt-minion 
pw02:
    True

douardda on 1 Dec 2015

@douardda Thanks for the update. Looks like your pull request is merged, and I'm working on merging it forward today.

basepi on 1 Dec 2015

Having this issue too on SmartOS/Solaris.
My solution for now is also using at but I use this command:
salt-call --local service.restart salt-minion this works on both SmartOS and my ubuntu test vm

There should probably be a delay option for service.restart that uses at or the windows scheduler where it can to do the action after X seconds. Or better yet a way for the salt-minion to be survive a restart

sjorge on 26 Jan 2016

Hm, this issue is labeled "documentation", but the discussions doesn't really be about documentation. Is this properly labeled?

JensRantil on 5 Feb 2016

This is the relevant comment: https://github.com/saltstack/salt/issues/7997#issuecomment-51981491

basepi on 5 Feb 2016

Ideally there would be a 'reload' command instead that would properly reload everything for the salt-master/salt-minion without actually stopping and starting the process.

sjorge on 5 Feb 2016

@sjorge That would be nice but that's a pretty tall order. We'd have to refresh the opts dict everywhere and that's non-trivial.

cachedout on 5 Feb 2016

Can't the bigger brush be use? Closing the socks on the current process and forking a new copy, then existing. Since all listening sockets are close the new minion process should start fine while the old can still send a reply that all is OK to the master?

sjorge on 5 Feb 2016

The reload sounds really similar to how nginx handles config changes, it lets the sockets from the old config survive and spawns new ones with the new config until the old ones all expire.

One thing we've noticed is that upgrading a debian version forces the service restart due to how Debian derivatives handles services in general, ie "you requested this be installed so we are enabling/starting the service NOW!". We have worked around this when preseeding salt-minion on machines by using the rc-policy.d trick (I might have dyslexia'd the name) which basically prevents a service from starting during apt-get operations, though it shouldn't affect running services. This could then possibly be followed up by a salt-call --local service.restart as mentioned above to flip from old version (in memory) to new version (on disk).

This is completely untested, I'm just going through my watched issues and seeing if I've found any new creative ways to fix them.

dragon788 on 18 Feb 2016

I'm wondering if we should close this in favor of #5721?

basepi on 24 Feb 2016

I need to update out-of-date (0.17.5) Ubuntu Minions from an up-to-date (2016.3.2) salt-master; Minion-ID must remain the same, minion key must remain the same, minion cache should remain the same.
saltutil.update fails (see below)
What shall I do?

Install Esky on each Minion?
Uninstall each Minion and re-install it with pip?

Documentation at https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.saltutil.html#salt.modules.saltutil.update seems not updated since 2014.
Is what mickep76 commented on Jul 7, 2014 documented somewhere else?
update_url for UNIX is missing.
dragon788 commented on May 29, 2015 that an undocumented script is needed to retain the minion key.

Does saltutil.update work?
If yes, what are the requirements?

mkramer@mgmt-bn-051:~$ sudo salt --version
salt 2016.3.2 (Boron)

mkramer@mgmt-bn-051:~$ sudo salt PC* test.versions_report
PC-LIN-01:
               Salt: 0.17.5
             Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
             Jinja2: 2.7.2
           M2Crypto: 0.21.1
     msgpack-python: 0.3.0
       msgpack-pure: Not Installed
           pycrypto: 2.6.1
             PyYAML: 3.10
              PyZMQ: 14.0.1
                ZMQ: 4.0.4
PC-LIN-02:
               Salt: 0.17.5
             Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
             Jinja2: 2.7.2
           M2Crypto: 0.21.1
     msgpack-python: 0.3.0
       msgpack-pure: Not Installed
           pycrypto: 2.6.1
             PyYAML: 3.10
              PyZMQ: 14.0.1
                ZMQ: 4.0.4
mkramer@mgmt-bn-051:~$ sudo salt PC* saltutil.update
PC-LIN-01:
    Esky not available as import
PC-LIN-02:
    Esky not available as import

markuskramerIgitt on 17 Aug 2016

The relatively new minion config option master_tries should help out in this area - I've recently set all my minions to a value of -1 (unlimited retries) which seems to be really helping to keep the minions connected up to the master ... but I may be talking nonsense as I haven't had enough time to be definite about that.

oliver-dungey on 19 Aug 2016

I've solved this by having salt simply fork off an upgrade script that lets the minion return instantly and has the service restart in the background. The script is Ubuntu 14.04 specific but could easily be adapted.

minion-upgrade.sls

/tmp/salt-minion-upgrade-deb.sh:
  cmd.script:
    - source: salt://salt/upgrade-minion-deb.sh

minion-upgrade-deb.sh

#!/bin/bash

# This script forks off and runs in the background so salt can continue

{
    DEBIAN_FRONTEND=noninteractive apt-get install -y -o Dpkg::Options::=--force-confold salt-minion
    service salt-minion restart
} >>/var/log/salt/minion-upgrade.log 2>&1 &

disown

carsonoid on 24 Aug 2016

Just to complete the ideas:
Last week found which do a good way (with "at" or "nohup" way):
https://docs.saltstack.com/en/latest/faq.html#what-is-the-best-way-to-restart-a-salt-daemon-using-salt

Reiner030 on 24 Aug 2016

Speaking only about restarting Minion, I really like the solution from here: #5721
Being nice and tiny, it works with Salt 2016.3 on most Linux distros:

salt '*' cmd.run_bg 'sleep 10; service salt-minion restart'

As for Debian 8 and Ubuntu 16 with systemd on board, to prevent the salt-minion service to be restarted automatically after the upgrade, you need to mask it first.
So, the "upgrade" procedure is following:

salt -C 'G@init:systemd and G@os_family:Debian' service.mask salt-minion
salt -C 'G@init:systemd and G@os_family:Debian' pkg.install salt-minion refresh=True
salt -C 'G@init:systemd and G@os_family:Debian' service.unmask salt-minion

There is another solution for _Upstart_ and _SystemV_ init -- using of policy-rc.d method. You need to temporarily deny the runlevel operations:

salt -C '( G@init:upstart or G@init:sysvinit ) and G@os_family:Debian' file.manage_file \
/usr/sbin/policy-rc.d '' '{}' '' '{}' root root '755' base '' contents=''
salt -C '( G@init:upstart or G@init:sysvinit ) and G@os_family:Debian' file.append \
/usr/sbin/policy-rc.d '!#/bin/sh' 'exit 101'
salt -C '( G@init:upstart or G@init:sysvinit ) and G@os_family:Debian' pkg.install \
salt-minion refresh=True
salt -C '( G@init:upstart or G@init:sysvinit ) and G@os_family:Debian' file.remove \
/usr/sbin/policy-rc.d

I've found that it's the most reliable way to get Salt Minion upgraded properly.
Now you could safely restart Minions using nohup method from the FAQ.

Also I've discovered that restarting Minions just with:

salt '*' service.restart salt-minion

works like a charm with recent Salt version from 2015.8 and 2016.3 branches even after I did the upgrade. I believe this is because of systemd units were patched and Salt does the trick by forking itself to run restart command while keeping connection to a Master.

Need to do more testing, but I think using at or nohup is only required for scripting upgrade from very old versions in Salt States.