What is the proper way to re-auth minion to upgraded master (master which got its keys changed)?
I have limited access to minions, some of them are Windows minions
Regular salt master - minion setup
salt-key --gen-keys master/etc/salt/pki/masterIt doesn't matter if you previously deleted minion from master via salt-key -d
As long the minion keeps master's key somewhere in salt/pki/minion/minion_master.pub it seems that changing keys on master is unsupported. Am I right?
This is a problem for me because I have salt master deployed in container
Sometimes I want to upgrade salt by simply replacing the container
Is the only solution to manually remove minion_master.pub from minion?
Maybe there is some configuration option on minion to relax this requirement?
any version afair
I also have this question
I also have the question?
Is there any solution for it?
The "old" public master key on the minion prevents connection the (any) master with a "new" key.
This is exaclty what it must do, and there can be no way to relax that.
We must prepare the clients for a master key change before we change the master key.
The only way I know of is to switch to a new master:
1) on the minion, delete the old key of the old master,
2) on the minion, set new master and store the new public key
Is there realy a second master needed to renew the master key?
I gave it a thought: to handle master key change, the minion must know both (old and new) keys.
I think this is a feature request, there is no "proper way" (yet) I know of.
Because @kiemlicz said "Sometimes I want to upgrade", I understand this is a use case:
The minion should try the second master key, if the first does not work (any more)
I could also think of a second use case, where the master will change the key at a specific time
The minion, at a specific time, shall replace the master key with the the second.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Still a desired feature
Thank you for updating this issue. It is no longer marked as stale.
Also from our part.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
gladly reopen the issue
Thank you for updating this issue. It is no longer marked as stale.
The problem is spoiled me ,so I add following code to resolve the problem. @shangxdy @kiemlicz
os.remove(self.opts['pki_dir'] + "/minion_master.pub")
at location:
/usr/lib/python3.6/site-packages/salt/crypt.py
"The master key has changed, the salt master could have been subverted, verify salt master's public key"
I wish saltstack add an option to support auto delete the minion_master.pub file.
You remove the master key from minion, I know this will work
However I don't understand why this is designed in a way that doesn't allow to "change" (actually it's same master but only the keys have changed) the master
Maybe there are some security reasons for that, yet I would love this to be configurable
We have a similar method where we have an AWS autoscaling group of 1 master to create it if it dies (or we need a fresh version). We set the userdata build of the master to copy the master keys from /etc/salt/pki/master - the master.pem and master.pub into a secure S3 bucket if they don't exist there - and download them from there if they do (I have the bash if anyone wants it). We also have the master set to use the autosign_grains to auto-register minions if they match the special grains - so we don't have to accept the keys again.
However - the minions aren't attempting to reconnect at all unless they are restarted (which you can't do via salt at that point!). There seems to be no "poll" of the master happening in the /var/log/salt/minion log either - certainly no warnings of the master not being present (we thought a cron job to check for lack of salt connection and run a systemctl restart salt-minion ).
If I can just solve this one part then we'll have a self-healing salt infrastructure. Otherwise we need to either manually log onto every minion and restart or fully rebuild the platform which seems overkill for just the salt master being replaced.
We have a similar method where we have an AWS autoscaling group of 1 master to create it if it dies (or we need a fresh version). We set the userdata build of the master to copy the master keys from /etc/salt/pki/master - the master.pem and master.pub into a secure S3 bucket if they don't exist there - and download them from there if they do (I have the bash if anyone wants it). We also have the master set to use the autosign_grains to auto-register minions if they match the special grains - so we don't have to accept the keys again.
However - the minions aren't attempting to reconnect at all unless they are restarted (which you can't do via salt at that point!). There seems to be no "poll" of the master happening in the /var/log/salt/minion log either - certainly no warnings of the master not being present (we thought a cron job to check for lack of salt connection and run a systemctl restart salt-minion ).
If I can just solve this one part then we'll have a self-healing salt infrastructure. Otherwise we need to either manually log onto every minion and restart or fully rebuild the platform which seems overkill for just the salt master being replaced.
Ahh - just found this to investigate: https://github.com/saltstack/salt/issues/44038#issuecomment-342761567
Just want to clarify
We have a similar method where we have an AWS autoscaling group of 1 master to create it if it dies (or we need a fresh version). We set the userdata build of the master to copy the master keys from /etc/salt/pki/master - the master.pem and master.pub into a secure S3 bucket if they don't exist there - and download them from there if they do (I have the bash if anyone wants it). We also have the master set to use the autosign_grains to auto-register minions if they match the special grains - so we don't have to accept the keys again.
I'm not very familiar with AWS, does that mean that you Salt Master instances always have the same keypair? (if so then you should not have problem)
However - the minions aren't attempting to reconnect at all unless they are restarted (which you can't do via salt at that point!). There seems to be no "poll" of the master happening in the /var/log/salt/minion log either - certainly no warnings of the master not being present (we thought a cron job to check for lack of salt connection and run a systemctl restart salt-minion ).
By default Minions die if the connection to master(s) won't succeed.
You can change that (if the reason is key reject: https://docs.saltstack.com/en/latest/ref/configuration/minion.html#rejected-retry)
However I don't know if this is your case
Just want to clarify
We have a similar method where we have an AWS autoscaling group of 1 master to create it if it > > ...
I'm not very familiar with AWS, does that mean that you Salt Master instances always have the same keypair? (if so then you should not have problem)
Yes - we keep the same keypair - but the minions weren't reconnecting (also the master IP address is changing - but DNS would hopefully find it).
However - the minions aren't attempting to reconnect at all unless they are restarted (which
...By default Minions die if the connection to master(s) won't succeed.
You can change that (if the reason is key reject: https://docs.saltstack.com/en/latest/ref/configuration/minion.html#rejected-retry)
However I don't know if this is your case
Actually it looks like the salt-minion process (2019.2.0 code) is just sitting there doing nothing. Maybe this is changed in later versions (we got badly bitten by the 2019.2.1 performance bug so we're very cautious now).
Looks like this might work in the salt-minion conf file:
# If authentication fails due to SaltReqTimeoutError during a ping_interval,
# cause sub minion process to restart.
auth_safemode: False
# Ping Master to ensure connection is alive (minutes).
ping_interval: 2
# Number of consecutive SaltReqTimeoutError that are acceptable when trying to
# authenticate
auth_tries: 2
# The number of attempts to connect to a master before giving up.
# Set this to -1 for unlimited attempts. This allows for a master to have
# downtime and the minion to reconnect to it later when it comes back up.
# In 'failover' mode, it is the number of attempts for each set of masters.
# In this mode, it will cycle through the list of masters for each attempt.
#
# This is different than auth_tries because auth_tries attempts to
# retry auth attempts with a single master. auth_tries is under the
# assumption that you can connect to the master but not gain
# authorization from it. master_tries will still cycle through all
# the masters in a given try, so it is appropriate if you expect
# occasional downtime from the master(s).
master_tries: -1
And it doesn't work (at least on version 2019.2.0). The minion does detect that it's gone now:
2020-03-10 15:10:23,288 [tornado.application:611 ][ERROR ][2789] Exception in callback <functools.partial object at 0x7fdc204d2f18>
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/tornado/ioloop.py", line 591, in _run_callback
ret = callback()
File "/usr/lib64/python2.7/site-packages/tornado/stack_context.py", line 342, in wrapped
raise_exc_info(exc)
File "/usr/lib64/python2.7/site-packages/tornado/stack_context.py", line 313, in wrapped
ret = fn(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 212, in <lambda>
future, lambda future: callback(future.result()))
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/minion.py", line 1409, in _send_req_async
ret = yield channel.send(load, timeout=timeout)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 373, in send
ret = yield self._crypted_transfer(load, tries=tries, timeout=timeout, raw=raw)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 341, in _crypted_transfer
ret = yield _do_transfer()
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 325, in _do_transfer
tries=tries,
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
SaltReqTimeoutError: Message timed out
however it just posts that every 2 minutes and doesn't restart and pick up the new master.
To me you're having some other issue (possibly network related)
The problem is spoiled me ,so I add following code to resolve the problem. @shangxdy @kiemlicz
os.remove(self.opts['pki_dir'] + "/minion_master.pub")at location:
/usr/lib/python3.6/site-packages/salt/crypt.py
"The master key has changed, the salt master could have been subverted, verify salt master's public key"
on what line number or function did you put that on @rico256-cn ?
@Slamoth
grep "The master key has changed, the salt master could have been subverted" /usr/lib/python3.6/site-packages/salt/crypt.py
@Slamoth
grep "The master key has changed, the salt master could have been subverted" /usr/lib/python3.6/site-packages/salt/crypt.py
Thanks.
Most helpful comment
@Slamoth
grep "The master key has changed, the salt master could have been subverted" /usr/lib/python3.6/site-packages/salt/crypt.py