Salt: The minion loses connectivity to the master after changing IP address

Created on 5 Jun 2017  路  12Comments  路  Source: saltstack/salt

Description of Issue/Question

Setup

The Salt master is at 10.17.15.194
One of the Salt minion is at 10.17.15.127

Steps to Reproduce Issue

sh-4.3# salt "*" test.ping
[DEBUG   ] Configuration file path: /etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Including configuration from '/etc/salt/master.d/api.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/api.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/debug.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/debug.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/master.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/master.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/peer.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/peer.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/reactor.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/reactor.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/returner.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/returner.conf
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'dashboard.k8s.local_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/86/a48fad8b8da521c6db8a12184ce28e039ca4fd3ea7f3afb4bfbfb67a6bcc65/.minions.p
[DEBUG   ] get_iter_returns for jid 20170605101957391229 sent to set(['04efe4d7f00940a799492844ed29ab57', '94b96abeaa41445c8688c454ceaac726', 'ca', '1740fea923c842508012be4aea15d857']) will timeout at 10:20:02.412050
[DEBUG   ] jid 20170605101957391229 return from ca
[DEBUG   ] LazyLoaded nested.output
ca:
    True
[DEBUG   ] jid 20170605101957391229 return from 04efe4d7f00940a799492844ed29ab57
[DEBUG   ] LazyLoaded nested.output
04efe4d7f00940a799492844ed29ab57:
    True
[DEBUG   ] jid 20170605101957391229 return from 1740fea923c842508012be4aea15d857
[DEBUG   ] LazyLoaded nested.output
1740fea923c842508012be4aea15d857:
    True
[DEBUG   ] jid 20170605101957391229 return from 94b96abeaa41445c8688c454ceaac726
[DEBUG   ] LazyLoaded nested.output
94b96abeaa41445c8688c454ceaac726:
    True
[DEBUG   ] jid 20170605101957391229 found all minions set(['ca', '94b96abeaa41445c8688c454ceaac726', '04efe4d7f00940a799492844ed29ab57', '1740fea923c842508012be4aea15d857'])

Then if I do

ip addr del 10.17.15.127 dev eth0
ip addr add 10.17.15.128 dev eth0

Then running a test.ping results in a Minion did not return. [No response] for the minion that changed IP address.

sh-4.3# salt "*" test.ping
[DEBUG   ] Configuration file path: /etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Including configuration from '/etc/salt/master.d/api.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/api.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/debug.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/debug.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/master.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/master.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/peer.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/peer.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/reactor.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/reactor.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/returner.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/returner.conf
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'dashboard.k8s.local_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/63/5811ecfd3f2050ff4d461c7d8b230080d061f5834c343305b2eb098dabf489/.minions.p
[DEBUG   ] get_iter_returns for jid 20170605102344190820 sent to set(['04efe4d7f00940a799492844ed29ab57', '94b96abeaa41445c8688c454ceaac726', 'ca', '1740fea923c842508012be4aea15d857']) will timeout at 10:23:49.238166
[DEBUG   ] jid 20170605102344190820 return from 94b96abeaa41445c8688c454ceaac726
[DEBUG   ] LazyLoaded nested.output
94b96abeaa41445c8688c454ceaac726:
    True
[DEBUG   ] jid 20170605102344190820 return from 1740fea923c842508012be4aea15d857
[DEBUG   ] LazyLoaded nested.output
1740fea923c842508012be4aea15d857:
    True
[DEBUG   ] jid 20170605102344190820 return from ca
[DEBUG   ] LazyLoaded nested.output
ca:
    True
[DEBUG   ] Checking whether jid 20170605102344190820 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'dashboard.k8s.local_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] Passing on saltutil error. This may be an error in saltclient. 'retcode'
[DEBUG   ] LazyLoaded localfs.init_kwargs
[DEBUG   ] LazyLoaded localfs.init_kwargs
[DEBUG   ] LazyLoaded no_return.output
04efe4d7f00940a799492844ed29ab57:
    Minion did not return. [No response]

I can see the master is still trying to contact the minion:

...
12:35:15.225523 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
12:35:16.225528 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
12:35:17.225462 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
12:35:18.225459 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
12:35:19.225418 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
12:35:20.225482 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
12:35:21.225519 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
12:35:22.225477 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
12:35:23.225477 ARP, Request who-has 10.17.15.127 tell 10.17.15.194, length 28
...

Versions Report

sh-4.3# salt --versions-report
Salt Version:
           Salt: 2016.11.4

Dependency Versions:
           cffi: 1.5.2
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: 1.2.5
      pycparser: 2.10
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.13 (default, Jan 11 2017, 10:56:06) [GCC]
   python-gnupg: Not Installed
         PyYAML: 3.10
          PyZMQ: 14.0.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.4

System Versions:
           dist: SuSE 12 x86_64
        machine: x86_64
        release: 4.4.59-92.17-default
         system: Linux
        version: SUSE Linux Enterprise Server  12 x86_64
Aluminium Bug Community phase-plan severity-medium status-in-prog

Most helpful comment

@sagetherage I am very interested in a discussion of this issue. I am located in the EU, but I am somewhat flexible regarding the the time of the discussion (as long as it isn鈥檛 before 8am or past 10 pm local time), so I am sure we can figure something out, regardless of where your engineers are located.

Today, when we had the problem again, I made an interesting observation. The minion does not seem to lose the connection to the master completely. Executing modules that communicate with the master (like cp.get_file_str or saltutil.refresh_pillar) still works, when executing them locally with salt-call.

But triggering the execution from the master (even something as simple as test.ping) fails. So I guess it is the registration with one of the event buses, which doesn鈥檛 reconnect correctly. In case it helps, this is the associated stack trace in the minion log (both master and minion are on version 3001):

salt.exceptions.SaltReqTimeoutError: Message timed out
2020-08-12 18:41:57,043 [tornado.application:640 ][ERROR   ][13860] Exception in callback functools.partial(<function wrap.<locals>.wrapped at 0x7f6672a57ea0>, <salt.ext.tornado.concurrent.Future object at 0x7f6672a845c0>)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
    ret = callback()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 346, in wrapped
    raise_exc_info(exc)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 317, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 291, in <lambda>
    future, lambda future: callback(future.result()))
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python3/dist-packages/salt/minion.py", line 1564, in _send_req_async
    ret = yield channel.send(load, timeout=timeout)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 442, in send
    load, tries=tries, timeout=timeout, raw=raw
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 411, in _crypted_transfer
    ret = yield _do_transfer()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 394, in _do_transfer
    tries=tries,
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
salt.exceptions.SaltReqTimeoutError: Message timed out

The logfile of the master doesn鈥檛 contain any associated messages. As expected, restarting the master doesn鈥檛 fix the problem, but restarting the minon does.

As a workaround, I thought about having a cronjob on the minion that periodically checks whether the minion is still connected and restarts it if not, but as all modules seem to execute fine, I am not sure yet how I could detect such a situation on the minon side (without parsing the minion log, which seems too error prone).

All 12 comments

What do the logs show on the minion side? Also do you still see this behavior after restarting the minion service?

The does not show anything in the logs in the minion: I guess it doesn't realize anything has changed. Connectivity is restored after running systemctl restart salt-minion in the minion, and I can run fine salt "*" test.ping in the master again.

+1

2017-10-01 21:59:25,498 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to B.B.B.B 2017-10-02 00:49:27,683 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to B.B.B.B 2017-10-02 00:49:27,707 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to B.B.B.B 2017-10-02 08:22:15,266 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to A.A.A.A 2017-10-02 09:20:17,538 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to A.A.A.A 2017-10-02 11:40:20,125 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to A.A.A.A 2017-10-02 12:44:22,485 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to A.A.A.A 2017-10-02 12:50:45,471 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster1' failed. 2017-10-02 12:50:45,490 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to 127.0.0.1 2017-10-02 12:50:50,500 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster2' failed. 2017-10-02 12:50:50,500 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to 127.0.0.1 2017-10-02 12:55:11,299 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster2' failed. 2017-10-02 12:55:11,299 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to 127.0.0.1 2017-10-02 12:55:11,304 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to A.A.A.A 2017-10-02 18:13:15,018 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to B.B.B.B 2017-10-02 20:57:42,191 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster1' failed. 2017-10-02 20:57:50,375 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to 127.0.0.1 2017-10-02 20:58:00,385 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster2' failed. 2017-10-02 20:58:00,386 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to 127.0.0.1 2017-10-02 20:58:01,389 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to B.B.B.B 2017-10-03 00:15:31,060 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster2' failed. 2017-10-03 00:15:31,245 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to 127.0.0.1 2017-10-03 00:15:34,255 [salt.minion ][WARNING ][22935] Master ip address changed from B.B.B.B to A.A.A.A 2017-10-03 00:19:51,984 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster1' failed. 2017-10-03 00:19:51,985 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to 127.0.0.1 2017-10-03 00:20:01,995 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster2' failed. 2017-10-03 00:20:01,996 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to 127.0.0.1 2017-10-03 00:20:12,005 [salt.utils ][ERROR ][22935] DNS lookup or connection check of 'saltmaster1' failed. 2017-10-03 00:20:12,006 [salt.minion ][WARNING ][22935] Master ip address changed from A.A.A.A to 127.0.0.1 2017-10-03 00:20:21,397 [salt.crypt ][WARNING ][22935] SaltReqTimeoutError: Message timed out 2017-10-03 00:20:36,403 [salt.crypt ][WARNING ][22935] SaltReqTimeoutError: Message timed out 2017-10-03 00:20:51,408 [salt.crypt ][WARNING ][22935] SaltReqTimeoutError: Message timed out 2017-10-03 00:21:06,414 [salt.crypt ][WARNING ][22935] SaltReqTimeoutError: Message timed out 2017-10-03 00:21:21,419 [salt.crypt ][WARNING ][22935] SaltReqTimeoutError: Message timed out 2017-10-03 00:21:36,424 [salt.crypt ][WARNING ][22935] SaltReqTimeoutError: Message timed out 2017-10-03 00:21:51,430 [salt.crypt ][WARNING ][22935] SaltReqTimeoutError: Message timed out 2017-10-03 00:22:06,436 [salt.crypt ][WARNING ][22935] SaltReqTimeoutError: Message timed out

There is also no any connection to saltmaster:4505 found by netstat.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

This is apparently a regression on #2358.

I know this would only be a workaround, but do any of the parameters that start here

https://github.com/saltstack/salt/blob/master/conf/minion#L244

in the example minion config file help?

I tried these, but it made no difference.

auth_tries: 10
auth_safemode: False
ping_interval: 2

With or without those settings, the minions are reconnected after ~15 minutes. No restart required.
I guess that counts as working as designed?

versions report

Salt Version:
           Salt: 3000.3

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.6.1
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.10
        libgit2: 0.26.0
       M2Crypto: 0.32.0
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: 0.26.2
         Python: 3.6.9 (default, Apr 18 2020, 01:56:04)
   python-gnupg: 0.4.1
         PyYAML: 3.12
          PyZMQ: 16.0.2
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.2.5

System Versions:
           dist: Ubuntu 18.04 bionic
         locale: ISO-8859-1
        machine: x86_64
        release: 4.15.0-106-generic
         system: Linux
        version: Ubuntu 18.04 bionic

putting this in Magnesium release to at least have a discussion and assigning Dwoz to help facilitate if this is working as designed or merely "failling back" to the TCP keepalive default and isn't reliable.

@OrangeDog You write that after 15 minutes, your minions do reconnect.

We have a similar problem as the one that you are reporting, but in our case, the minions do not reconnect, not even after hours, so I would like to understand what might be different about our setup compared to yours.

In our case, the minions connect to the master, not the other way around and the master occassionally changes its IP address, but the DNS name is updated to represent the new IP address, so I am under the impression that the minions still try to connect to the old IP address instead of resolving the DNS name again and then using the new IP address, but I haven鈥檛 found any setting in the Salt minion that could be used to tell it resolve names again.

@OrangeDog @smarsching if it would help, myself and/or @cassandrafaris can set up a time to discuss in a group? We have engineers across the Pacific in EU timezones where we could set up a meeting then or US timezones, whatever works for all involved. If that helps to discuss what maybe happening here and thus lead to a possible solution seems like a win-win, no?

Moving this to non-committed work in Magnesium -- but it is still on the table to discuss in a small group setting if that helps. @DmitryKuzmenko assigning in case a discussion is to occur in your timezone and not US plus your skills are welcomed here: let's see if this is working as designed or merely "failling back" to the TCP keepalive default and isn't reliable, please.

@sagetherage I am very interested in a discussion of this issue. I am located in the EU, but I am somewhat flexible regarding the the time of the discussion (as long as it isn鈥檛 before 8am or past 10 pm local time), so I am sure we can figure something out, regardless of where your engineers are located.

Today, when we had the problem again, I made an interesting observation. The minion does not seem to lose the connection to the master completely. Executing modules that communicate with the master (like cp.get_file_str or saltutil.refresh_pillar) still works, when executing them locally with salt-call.

But triggering the execution from the master (even something as simple as test.ping) fails. So I guess it is the registration with one of the event buses, which doesn鈥檛 reconnect correctly. In case it helps, this is the associated stack trace in the minion log (both master and minion are on version 3001):

salt.exceptions.SaltReqTimeoutError: Message timed out
2020-08-12 18:41:57,043 [tornado.application:640 ][ERROR   ][13860] Exception in callback functools.partial(<function wrap.<locals>.wrapped at 0x7f6672a57ea0>, <salt.ext.tornado.concurrent.Future object at 0x7f6672a845c0>)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
    ret = callback()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 346, in wrapped
    raise_exc_info(exc)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 317, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 291, in <lambda>
    future, lambda future: callback(future.result()))
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python3/dist-packages/salt/minion.py", line 1564, in _send_req_async
    ret = yield channel.send(load, timeout=timeout)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 442, in send
    load, tries=tries, timeout=timeout, raw=raw
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 411, in _crypted_transfer
    ret = yield _do_transfer()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 394, in _do_transfer
    tries=tries,
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
salt.exceptions.SaltReqTimeoutError: Message timed out

The logfile of the master doesn鈥檛 contain any associated messages. As expected, restarting the master doesn鈥檛 fix the problem, but restarting the minon does.

As a workaround, I thought about having a cronjob on the minion that periodically checks whether the minion is still connected and restarts it if not, but as all modules seem to execute fine, I am not sure yet how I could detect such a situation on the minon side (without parsing the minion log, which seems too error prone).

Was this page helpful?
0 / 5 - 0 ratings