If the salt-master moves to a different machine (to say, remove cruft, the machine died, etc.) _the minions will not reconnect._
Currently, the only way to reconnect minions appears to be to SSH into the machines and run service salt-minion stop && service salt-minion start
.
tcpdump
shows that the minions continue to try connecting to the IP address of the old master (they're sending SYN
packets to that IP) long past the TTL. They do not appear to requery DNS at any point, other than startup, for the IP address, and thus can't ever reconnect.
This is true. Currently, the minions do not attempt to re-resolve DNS at any point or initiate a new connection. As announced at SaltConf, we're currently working on rewriting and modularizing the transport layer of Salt, and in the process we should be able to fix problems like this.
+1 on this. We're trying to use put our salt service into Consul and have it flip the DNS to an alternate master if the first fails, but running into the same issue that the minion never attempts to re-resolve the master IP.
It looks like there are some items in the config that should be causing the minion to restart itself upon a number of ping failures to the master ( and thus, would re-resolve the master DNS name ), but that doesn't appear to be working.
Looks like you can workaround this issue by setting master_type to failover and making your master list a single element list.
master:
- salt.service.consul
master_type: failover
master_alive_interval: 30
@sjmh Your workaround does not work from me unfortunately.
Any updates on that?
Is this still planned?
This still appears to be happening in 2016.11.3
2 years and counting guys...
This is truly surprising. Respecting the DNS TTL is pretty basic stuff and ends up being a logistical nightmare for IT providers managing multiple locations. I'm using a CNAME entry and now that we migrated our master we have almost 1000 minions that are not connected. Now we have to find a way to restart the service or reboot the machines ;(.
@Xaelias @pstatho I ran into this issue as well today when running a salt-master and salt-minion inside Kubernetes where there is no way to assign a static IP to the salt master easily. Every time the salt-master pod was recreated the IP was changed and the salt-minions within the cluster were unable to be contacted by the salt-master.
I fixed this after finding the following notes within the salt-minion configuration file :
# To auto recover minions if master changes IP address (DDNS)
# auth_tries: 10
# auth_safemode: False
# ping_interval: 90
#
# Minions won't know master is missing until a ping fails. After the ping fail,
# the minion will attempt authentication and likely fails out and cause a restart.
# When the minion restarts it will resolve the masters IP and attempt to reconnect.
This caused my salt-minions to fail their authentication attempts and succesfully reconnect after a ping interval. Note that ping_interval is in minutes and not in seconds.
Hope this helps!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Just ran into this issue with ~20 minions behind NAT.
I ran into the issue as well. Having master with a dynamic IP.
6 years after the bug was reported :-(
Not respecting the TTL is really a bug and should be fixed.
Also, I wonder if the suggested workaround with faking a multi-master list works.
Reason is that I see the word 'ping' in the workaround and I am wondering if previous master IP is still answering standard PING packets.
My minion is erroring with timeout and not 'master unreachable'.
@234d - ack !
I also ended using https://github.com/saltstack/salt/issues/10032#issuecomment-119296425
and... sadly, I just found out that that workaround does NOT work.
I see in the logs that at least for several days, minion crashes with
salt.exceptions.SaltReqTimeoutError: Message timed out
and is continuously using the wrong IP for the master to reconnect :-(
We are running the Master in Kubernetes behind a LB Service with a static IP.
Unfortunately, we had to completely recreate the Kubernetes Cluster today which meant to reclaim a new IP address.
Now, our ~180 heterogenic Minions can't connect to the Master. Honestly, I have better things to do to connect to all Servers and restart the Salt-Minion.
Is there an ETA when Salt will respect the DNS TTL?
ZD-6017
I know there are some reports here that the workaround reported back in https://github.com/saltstack/salt/issues/10032#issuecomment-119296425 does not work, but it did appear to work for me with 3002.2, so it's worth trying if you need this before it is supported more officially. To refresh, it's something like:
master:
- my.master.dns
master_type: failover
master_alive_interval: 30
retry_dns: 0
Most helpful comment
@Xaelias @pstatho I ran into this issue as well today when running a salt-master and salt-minion inside Kubernetes where there is no way to assign a static IP to the salt master easily. Every time the salt-master pod was recreated the IP was changed and the salt-minions within the cluster were unable to be contacted by the salt-master.
I fixed this after finding the following notes within the salt-minion configuration file :
This caused my salt-minions to fail their authentication attempts and succesfully reconnect after a ping interval. Note that ping_interval is in minutes and not in seconds.
Hope this helps!