Salt: 2019.2.1: Breaks Minion-Master Communication

Created on 26 Sep 2019  路  10Comments  路  Source: saltstack/salt

Description of Issue

After our minions were upgraded from 2019.2.0 to 2019.2.1, the communication (IPv6 only) with the master (either 2019.2.0 or 2019.2.1) is broken.
Minion is stuck at:
SaltReqTimeoutError, retrying. (2/7)

Via tcpdump communication - but with length "0" - is observable.

Master says:
Sep 26 11:59:36 salt salt-master[9266]: [WARNING ] /usr/lib/python3/dist-packages/zmq/eventloop/ioloop.py:211: RuntimeWarning: IOLoop.current expected instance of <class 'zmq.eventloop.ioloop.ZMQIOLoop'>, got <tornado.platform.epoll.EPollIOLoop object at 0x7fd9e798feb8> Sep 26 11:59:36 salt salt-master[9266]: ioloop.IOLoop.instance() is IOLoop.instance(), "tornado IOLoop already initialized"

Setup

IPv6-only communication, no special configuration.

Steps to Reproduce Issue

After minion is started:
Sep 26 12:33:52 salt-minion[15491]: [DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'minion', 'tcp://<master-ip>', 'clear') Sep 26 12:33:52 salt-minion[15491]: [DEBUG ] Connecting the Minion to the Master URI (for the return server): tcp://<master-ip> Sep 26 12:33:52 salt-minion[15491]: [DEBUG ] Trying to connect to: tcp://<master-ip> Sep 26 12:33:52 salt-minion[15491]: [DEBUG ] salt.crypt.get_rsa_pub_key: Loading public key Sep 26 12:33:57 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (1/7) Sep 26 12:34:02 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (2/7) Sep 26 12:34:07 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (3/7) Sep 26 12:34:12 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (4/7) Sep 26 12:34:17 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (5/7) Sep 26 12:34:22 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (6/7) Sep 26 12:34:27 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (7/7) Sep 26 12:34:32 salt-minion[15491]: [DEBUG ] Re-init ZMQ socket: Message timed out Sep 26 12:34:32 salt-minion[15491]: [DEBUG ] Trying to connect to: tcp://<master-ip> Sep 26 12:34:32 salt-minion[15491]: [DEBUG ] Closing AsyncZeroMQReqChannel instance Sep 26 12:34:32 salt-minion[15491]: [ERROR ] Error while bringing up minion for multi-master. Is master at <master-hostname> responding?

Master:
Sep 26 11:59:36 salt salt-master[9266]: [WARNING ] /usr/lib/python3/dist-packages/zmq/eventloop/ioloop.py:211: RuntimeWarning: IOLoop.current expected instance of <class 'zmq.eventloop.ioloop.ZMQIOLoop'>, got <tornado.platform.epoll.EPollIOLoop object at 0x7fd9e798feb8> Sep 26 11:59:36 salt salt-master[9266]: ioloop.IOLoop.instance() is IOLoop.instance(), "tornado IOLoop already initialized"

With downgrade to 2019.2.0 of the minion everything works fine.

Versions Report

Master:
```Salt Version:
Salt: 2019.2.1

Dependency Versions:
cffi: 1.12.3
cherrypy: Not Installed
dateutil: 2.5.3
docker-py: Not Installed
gitdb: 2.0.3
gitpython: 2.1.7
ioflo: Not Installed
Jinja2: 2.9.4
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: 2.19
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
python-gnupg: Not Installed
PyYAML: 3.12
PyZMQ: 16.0.2
RAET: Not Installed
smmap: 2.0.3
timelib: Not Installed
Tornado: 4.4.3
ZMQ: 4.2.1

System Versions:
dist: debian 9.11
locale: UTF-8
machine: x86_64
release: 4.19.0-0.bpo.5-amd64
system: Linux
version: debian 9.11


Minion:

Salt Version:
Salt: 2019.2.1

Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.5.3
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.9.4
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
python-gnupg: Not Installed
PyYAML: 3.12
PyZMQ: 16.0.2
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.4.3
ZMQ: 4.2.1

System Versions:
dist: debian 9.11
locale: UTF-8
machine: x86_64
release: 4.19.0-0.bpo.5-amd64
system: Linux
version: debian 9.11
```

Bug P2 severity-medium

Most helpful comment

All 10 comments

This is affecting our systems as well. We have a hybrid ipv4 and ipv6 setup, which no longer works. I haven't been able to link it to any of the items posted in the changelog yet though.

Only our ipv4 minions are connected. Thankfully, we don't heavily depend on ipv6 yet, so removing the ipv6 record to the master resolves it for now.

All our minions broke because of this issue ... switching to v4 is not an option here as not all hosts even have a v4 address.

I suspect an issue in file:
https://github.com/saltstack/salt/blob/develop/salt/utils/zeromq.py#L81
line 81

Or the method is not called at all ...

Since the connect string on old minions looks like this:
And on new minions like this:
[DEBUG ] Trying to connect to: tcp://2001:608:a01:ffff::28:4506

So the brackets are missing which leads to a none connectable address.

IPV6 is broken in 2019.2.1

Thanks everyone for the clear reports - check out #54784 and let me know if you see any issues :)

Went to ansible for a quick fix. Inventory came from salt-key. For us we enabled IPv6 in /etc/salt/minion.d/ipv6.conf, so killing that file and restarting restored connection.

Adapt as needed.
YMMV.

- name: salt_disable_ipv6
  hosts: salt
  become: yes
  become_method: sudo
  handlers:
    - name: restart salt-minion
      service:
        name: salt-minion
        state: restarted
  tasks:
    - name: "Remove /etc/salt/minion.d/ipv6.conf"
      file:
        path: /etc/salt/minion.d/ipv6.conf
        state: absent
      notify: restart salt-minion

With #54784 applied, either over the release version of 2019.2.1, or just using the head of branch 2019.2.1, errors still triggered as below.

2019-09-29 12:59:40,157 [salt.minion      :1049][CRITICAL][2819594] Unexpected error while connecting to HOSTNAME.DOMAIN
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/salt/minion.py", line 1026, in _connect_minion
    yield minion.connect_master(failed=failed)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 261, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/local/lib/python2.7/dist-packages/salt/minion.py", line 1224, in connect_master
    master, self.pub_channel = yield self.eval_master(self.opts, self.timeout, self.safe, failed)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 261, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/local/lib/python2.7/dist-packages/salt/minion.py", line 749, in eval_master
    yield pub_channel.connect()
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 261, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 509, in connect
    log.debug('Connecting the Minion to the Master publish port, using the URI: %s', self.master_pub)
  File "/usr/local/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 520, in master_pub
    source_port=self.opts.get('source_publish_port'))
  File "/usr/local/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 84, in _get_master_uri
    master_ip=ip_bracket(master_ip), master_port=master_port)
  File "/usr/local/lib/python2.7/dist-packages/salt/utils/zeromq.py", line 86, in ip_bracket
    addr = ipaddress.ip_address(addr)
  File "/usr/local/lib/python2.7/dist-packages/salt/ext/ipaddress.py", line 120, in ip_address
    address)
ValueError: u'[xxxx:xxxx::xxxx:xxxx]' does not appear to be an IPv4 or IPv6 address

Thanks for that stack trace!

modules/status.py:ping_master
minion.py:eval_master

The above functions have a side effect of updating opts['master_ip'] to the resolved ip address.
opts['master_ip'] is later passed to _get_master_uri, which assumes a bare ip.

connect_master and reconnect are two of several functions that call eval_master.

We'll need to do make sure we're consistent in doing the conversion, and then we can assume we're dealing with URI-formatted ips everywhere.

Shouldn't this be closed now that https://github.com/saltstack/salt/pull/54823 is merged?

Was this page helpful?
0 / 5 - 0 ratings