salt-minion uses/caches master's IP address beyond its TTL

Created on 29 Jan 2014 · 19Comments · Source: saltstack/salt

If the salt-master moves to a different machine (to say, remove cruft, the machine died, etc.) _the minions will not reconnect._

Currently, the only way to reconnect minions appears to be to SSH into the machines and run service salt-minion stop && service salt-minion start.

tcpdump shows that the minions continue to try connecting to the IP address of the old master (they're sending SYN packets to that IP) long past the TTL. They do not appear to requery DNS at any point, other than startup, for the IP address, and thus can't ever reconnect.

CS-R1 CS-S3 Core Feature P1 ZD phase-plan status-in-prog

Source

thanatos

Most helpful comment

@Xaelias @pstatho I ran into this issue as well today when running a salt-master and salt-minion inside Kubernetes where there is no way to assign a static IP to the salt master easily. Every time the salt-master pod was recreated the IP was changed and the salt-minions within the cluster were unable to be contacted by the salt-master.

I fixed this after finding the following notes within the salt-minion configuration file :

# To auto recover minions if master changes IP address (DDNS)
#    auth_tries: 10
#    auth_safemode: False
#    ping_interval: 90
#
# Minions won't know master is missing until a ping fails. After the ping fail,
# the minion will attempt authentication and likely fails out and cause a restart.
# When the minion restarts it will resolve the masters IP and attempt to reconnect.

This caused my salt-minions to fail their authentication attempts and succesfully reconnect after a ping interval. Note that ping_interval is in minutes and not in seconds.

Hope this helps!

dcovello on 7 Mar 2018

👍4

All 19 comments

This is true. Currently, the minions do not attempt to re-resolve DNS at any point or initiate a new connection. As announced at SaltConf, we're currently working on rewriting and modularizing the transport layer of Salt, and in the process we should be able to fix problems like this.

basepi on 5 Feb 2014

+1 on this. We're trying to use put our salt service into Consul and have it flip the DNS to an alternate master if the first fails, but running into the same issue that the minion never attempts to re-resolve the master IP.

It looks like there are some items in the config that should be causing the minion to restart itself upon a number of ping failures to the master ( and thus, would re-resolve the master DNS name ), but that doesn't appear to be working.

sjmh on 7 Jul 2015

Looks like you can workaround this issue by setting master_type to failover and making your master list a single element list.

master:
  - salt.service.consul

master_type: failover
master_alive_interval: 30

sjmh on 7 Jul 2015

👍2

@sjmh Your workaround does not work from me unfortunately.

dverbeek84 on 27 Feb 2016

Any updates on that?

jlory on 29 Nov 2016

Is this still planned?

marccardinal on 21 Mar 2017

This still appears to be happening in 2016.11.3

syphernl on 18 Apr 2017

2 years and counting guys...

Xaelias on 8 Dec 2017

This is truly surprising. Respecting the DNS TTL is pretty basic stuff and ends up being a logistical nightmare for IT providers managing multiple locations. I'm using a CNAME entry and now that we migrated our master we have almost 1000 minions that are not connected. Now we have to find a way to restart the service or reboot the machines ;(.

pstatho on 27 Jan 2018

👍1

I fixed this after finding the following notes within the salt-minion configuration file :

# To auto recover minions if master changes IP address (DDNS)
#    auth_tries: 10
#    auth_safemode: False
#    ping_interval: 90
#
# Minions won't know master is missing until a ping fails. After the ping fail,
# the minion will attempt authentication and likely fails out and cause a restart.
# When the minion restarts it will resolve the masters IP and attempt to reconnect.

This caused my salt-minions to fail their authentication attempts and succesfully reconnect after a ping interval. Note that ping_interval is in minutes and not in seconds.

Hope this helps!

dcovello on 7 Mar 2018

👍4

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

stale[bot] on 20 Jun 2019

Just ran into this issue with ~20 minions behind NAT.

0x647262 on 11 Nov 2019

I ran into the issue as well. Having master with a dynamic IP.
6 years after the bug was reported :-(

Not respecting the TTL is really a bug and should be fixed.

Also, I wonder if the suggested workaround with faking a multi-master list works.
Reason is that I see the word 'ping' in the workaround and I am wondering if previous master IP is still answering standard PING packets.
My minion is erroring with timeout and not 'master unreachable'.

cr1st1p on 6 Apr 2020

Also came across this issue. In my opinion salt-minion should handle this automatically. The proposed solution in the minion configuration didn't work for me. Had to use the single item master from comment.

@cr1st1p It's not an ICMP ping.

234d on 24 Apr 2020

@234d - ack !
I also ended using https://github.com/saltstack/salt/issues/10032#issuecomment-119296425

cr1st1p on 24 Apr 2020

and... sadly, I just found out that that workaround does NOT work.
I see in the logs that at least for several days, minion crashes with

salt.exceptions.SaltReqTimeoutError: Message timed out

and is continuously using the wrong IP for the master to reconnect :-(

cr1st1p on 1 May 2020

We are running the Master in Kubernetes behind a LB Service with a static IP.

Unfortunately, we had to completely recreate the Kubernetes Cluster today which meant to reclaim a new IP address.
Now, our ~180 heterogenic Minions can't connect to the Master. Honestly, I have better things to do to connect to all Servers and restart the Salt-Minion.

Is there an ETA when Salt will respect the DNS TTL?

soner89 on 12 Aug 2020

ZD-6017

oeuftete on 19 Nov 2020

I know there are some reports here that the workaround reported back in https://github.com/saltstack/salt/issues/10032#issuecomment-119296425 does not work, but it did appear to work for me with 3002.2, so it's worth trying if you need this before it is supported more officially. To refresh, it's something like:

master:
  - my.master.dns

master_type: failover
master_alive_interval: 30
retry_dns: 0

oeuftete on 19 Nov 2020

❤1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[BUG] While trying to install salt using pip, package "distro" is a requirement

allyunion · 3Comments

[Feature Request] Use modules in requisites

Oloremo · 3Comments

top.sls contains invalid yaml syntax, salt '*' state.highstate show "No Top file or external nodes data matches found"

commutecat · 3Comments

event-stream from salt api /events does not stop at session timeout

erwindon · 3Comments

Password not working

qiushics · 3Comments