Salt: Minion restarting and erroring when cannot reach the master

Created on 12 Apr 2016  路  6Comments  路  Source: saltstack/salt

Description of Issue/Question

Similar to #32517 but in this instance it is only pointing to one master and its a different error. At the head of 2015.8 When a minion cannot reach a master that is down there is an error in the logs and the minions shows that it is restarting.

Setup

/etc/salt/minion:

master: ipaddress

Steps to Reproduce Issue

  1. Make sure the minion has been accepted on the master and you can communicate
  2. Stop Both the Master and Minion
  3. Start minion and watch it attempt to authenticate against the master. When it cannot authenticate with the masters it then restarts the salt-minion and I get an error:
[root@ch3ll-cent7 ~]# salt-minion -ldebug
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: ch3ll-cent7.c7.saltstack.net
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[INFO    ] Setting up the Salt Minion "ch3ll-cent7.c7.saltstack.net"
[DEBUG   ] Created pidfile: /var/run/salt-minion.pid
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[WARNING ] IMPORTANT: Do not use md5 hashing algorithm! Please set "hash_type" to sha256 in Salt Minion config!
[INFO    ] The Salt Minion is starting up
[INFO    ] Minion is starting as user 'root'
[DEBUG   ] AsyncEventPublisher PUB socket URI: ipc:///var/run/salt/minion/minion_event_9126913ec3_pub.ipc
[DEBUG   ] AsyncEventPublisher PULL socket URI: ipc:///var/run/salt/minion/minion_event_9126913ec3_pull.ipc
[INFO    ] Starting pub socket on ipc:///var/run/salt/minion/minion_event_9126913ec3_pub.ipc
[INFO    ] Starting pull socket on ipc:///var/run/salt/minion/minion_event_9126913ec3_pull.ipc
[DEBUG   ] Minion 'ch3ll-cent7.c7.saltstack.net' trying to tune in
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'ch3ll-cent7.c7.saltstack.net', 'tcp://54.159.114.180:4506')
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (2882)
[DEBUG   ] Setting zmq_reconnect_ivl to '2882ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'ch3ll-cent7.c7.saltstack.net', 'tcp://54.159.114.180:4506', 'clear')
[INFO    ] The Salt Minion is shut down
[WARNING ] /usr/lib/python2.7/site-packages/salt/scripts.py:83: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  log.error('Minion failed to start: {0}'.format(exc.message), exc_info=True)

[ERROR   ] Minion failed to start: Attempt to authenticate with the salt master failed
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 81, in minion_process
    minion.start()
  File "/usr/lib/python2.7/site-packages/salt/cli/daemons.py", line 320, in start
    self.minion.tune_in()
  File "/usr/lib/python2.7/site-packages/salt/minion.py", line 1699, in tune_in
    self.sync_connect_master()
  File "/usr/lib/python2.7/site-packages/salt/minion.py", line 760, in sync_connect_master
    raise six.reraise(*future_exception)
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python2.7/site-packages/salt/minion.py", line 767, in connect_master
    master, self.pub_channel = yield self.eval_master(self.opts, self.timeout, self.safe)
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
    value = future.result()
  File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
    raise_exc_info(self._exc_info)
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python2.7/site-packages/salt/minion.py", line 503, in eval_master
    yield pub_channel.connect()
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
    value = future.result()
  File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
    raise_exc_info(self._exc_info)
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 876, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 338, in connect
    yield self.auth.authenticate()
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 870, in run
    value = future.result()
  File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 214, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
SaltClientError: Attempt to authenticate with the salt master failed
[WARNING ] ** Restarting minion **
[INFO    ] Sleeping random_reauth_delay of 6 seconds
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'

The minion does restart and when the master comes back online the minion is able to connect to the master just fine, the issue is the error and the minion restarting.

Also to note I did edit scripts.py to have exc_info=True instead of false so I could get the stack trace error:

log.error('Minion failed to start: {0}'.format(exc.message), exc_info=True)

Versions Report

[root@ch3ll-cent7 ~]# salt --versions-report
Salt Version:
           Salt: 2015.8.8-324-g492ebfc

Dependency Versions:
         Jinja2: 2.7.2
       M2Crypto: 0.21.1
           Mako: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.7.0
         Python: 2.7.5 (default, Nov 20 2015, 02:00:19)
           RAET: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
        libgit2: Not Installed
        libnacl: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.7
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
   python-gnupg: Not Installed
          smmap: Not Installed
        timelib: Not Installed

System Versions:
           dist: centos 7.2.1511 Core
        machine: x86_64
        release: 3.10.0-327.4.5.el7.x86_64
         system: CentOS Linux 7.2.1511 Core
Bug Core P2 fixed-pending-your-verification severity-medium

All 6 comments

Committed for 2015.8.9

@Ch3LL there are two points:

  1. Deprecation warting
[WARNING ] /usr/lib/python2.7/site-packages/salt/scripts.py:83: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  log.error('Minion failed to start: {0}'.format(exc.message), exc_info=True)

I've fixed it in #32556

  1. Error message:
[ERROR   ] Minion failed to start: Attempt to authenticate with the salt master failed

This message appears because master is down. Minion tries to authenticate some times with timeout and reports this. After that minion restarts itself.
I see no problem here. Am I wrong?
Also there is a config option auth_tries. User can set it to some huge value to avoid minion restarts in single master mode. I'm not sure do we have to set it to some huge value automatically for single master mode because it could break some use cases: for example config re-read on restart.

I've updated the code. Now if single master is unreachable minion will report

Attempt to authenticate with the salt master failed with timeout error

There was actual logic mistake where we were hiding the original exception reason replacing it with a constant string.
@Ch3LL thank you for catch!

@DmitryKuzmenko oh that error makes a lot more sense and is not as confusing. Thanks! I tested this and i'm now seeing that new timeout error. I'm just waiting for #32556 to be merged and then I will test and close this issue.

@Ch3LL I will try to get that fix in this morning. You should be able to re-test and close this by the end of the day. Thanks.

@cachedout thanks I went ahead and tested on the head of 2015.8 and i'm not seeing the error anymore. Once again thanks @DmitryKuzmenko !!

Was this page helpful?
0 / 5 - 0 ratings