A method is needed to delete keys from old minions that haven't connected to the master in $x minutes/seconds.
This could be run automatically as part of /etc/salt/master configuration or as a flag in salt-key
We do have something like that -- salt-run manage.down removekeys=True
The difference is that this removes keys from any minions which are not currently connected.
The difficulty with removing keys for minions which have not connected to the master for a certain amount of time is the fact that we don't keep track of how long minions have been disconnected. So that would be the first step, adding some sort of functionality in that area.
It's exactly that time from last connection that I would want.
+1 - This would be useful in my environment using boot2docker/debian2docker without installing to disk. Every reboot gives a new key and the following message. Having a way to clean up old keys for minions who haven't connected in a while seems better than removing what's not currently connected.
[CRITICAL] The Salt Master has rejected this minion's public key!
To repair this issue, delete the public key for this minion on the Salt Master and restart this minion.
Or restart the Salt Master in open mode to clean out the keys. The Salt Minion will now exit.
Or, if there's another way to accomplish this, I'd like to know.
The difficulty with removing keys for minions which have not connected to the master for a certain amount of time is the fact that we don't keep track of how long minions have been disconnected. So that would be the first step, adding some sort of functionality in that area.
@basepi what would the preferred place be to store this? Do minion grains persist anywhere other than cache? Or is there any sort of persistent storage used by salt-master? I'd be happy to try to take a crack at this if someone can point me in the right direction.
My not-so-great workaround was to have a script do manage.up, or test.ping every once in a while and then touch the public key for the minion so that the mtime on the file changes. Then I can use find to find old keys and delete them.
That's a decent workaround, though it's obviously a bit hacky.
We could potentially use the new manage.present
runner on a schedule to keep this information. Maybe store timeout information in a msgpack dump in the master cache.
All - has any work related to managing "old; not seen keys" been done in 2015.5.x? This is a significant issue for us, as we have a salt-minion plumbed in to every single cloud OS image we deploy. We utilize a "cattle not pets" philosophy, and regularly recycle nodes/minions regularly. This means that dead keys build up quickly on the master - and we go through and purge them manually ... sadly, no "lifecycle" management is in the images when their "decommissioned" ... essentially since some one can just "nuke" an image in the cloud with a single command - we can't "catch" that it's been decommissioned.
I was thinking of doing something as simple as a 'salt --out txt * test.ping' once an hour ... and storing the resulting returned nodes in a simple DB with a "last seen" timestamp. Then harvesting out old keys if it hasn't been seen in "X" period of time.
+1
+1
+1
+1
+1
Meanwhile you can use the following cron to clean old keys:
salt --out txt '*' test.ping | grep "Not connected" | cut -d ":" -f 1 | xargs -I dead_minion salt-key -y -d dead_minion
+1
omolto - thx for providing that snippet - it's great if you just want to nuke keys for "what's not connected at this very second" .... but if you have a (proper) key management process in place - that means ... any minion "not connected this very second" that aren't reachable, down, or available - for whatever reason (like the various 100 or so VMs I use that I shut down when I don't need them - but want Salt on them when they're running ... for instance) ... will have their keys removed. Then you have to go through and authorize the keys.
If you're running in a terrifyingly insecure mode where you accept any salt minion key that connects - then sure ... go for it.
Again - thank you for the snippet - it certainly is useful in some situations.
+1
+1
my salt_clean_script.sh
#! /bin/bash
echo "clean the not return salt "
salt '*' test.ping > all_hosts
cat all_hosts | grep return -B 1 | grep -v 'return' | cut -d':' -f1 > salt_error_hosts
echo "start clean the error hosts"
for i in $( cat salt_error_hosts);
do
# for delete the authed key
salt-key -d $i -y;
# for delete the unauthed key , sometime it is a must
salt-key -d $i -y;
# delete the pem file
if [ -f /etc/salt/pki/master/minions/%i ]
then
rm -f /etc/salt/pki/master/minions/$i
fi
done
echo "done"
As @basepi said and as alternative to the command I posted, we actually use the following cron to remove old keys
# SALT_CRON_IDENTIFIER:Clean old minion keys
*/5 * * * * salt-run manage.down removekeys=True
I think that using the new Thorium system we should be able to write a system in that makes this very clean and event driven. For reference I am posting the design I have in my head here so I don't forget it when I or someone else can come back to it.
+1
+99999
+1 this would be really awesome to have
+1
CAN HAZ!
I can verify that this works well, although I hope more people can also verify it:
https://docs.saltstack.com/en/develop/topics/thorium/index.html
I'll give this a whirl and report back. Should work fine with 2016.3.1?
Thanks for getting this in!
this code is only in the develop branch, so you need a master and minion on develop. It is a feature addition
Most helpful comment
We do have something like that --
salt-run manage.down removekeys=True
The difference is that this removes keys from any minions which are not currently connected.
The difficulty with removing keys for minions which have not connected to the master for a certain amount of time is the fact that we don't keep track of how long minions have been disconnected. So that would be the first step, adding some sort of functionality in that area.