Salt: 2017.7.4 to 2018.3.0 upgrade issue: Salt request timed out. The master is not responding

Created on 4 Apr 2018  ยท  98Comments  ยท  Source: saltstack/salt

Description of Issue/Question

I have a standby saltmaster on saltstack 2017.7.4 and I am trying to upgrade it to 8.3.0 salt just released.
After 8.3.0 upgrade, I am getting following issue.

[root@salt01 ~]# sudo salt minion01 test.version
Salt request timed out. The master is not responding. 
You may need to run your command with `--async` in order to bypass the congested event bus. 
With `--async`, the CLI tool will print the job id (jid) and exit immediately without
 listening for responses.  You can then use `salt-run jobs.lookup_jid` to look up 
the results of the job in the job cache later.
[root@salt01 ~]#

Setup

(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)

Steps to Reproduce Issue

  • snapshot salt01 VM (VMware image) for backup.
  • disable /etc/yum.repos.d/salt-latest.repo
  • yum update -y && reboot
  • enable /etc/yum.repos.d/salt-latest.repo
  • yum update -y salt && reboot
  • run test.ping on all minions or just one minion
[root@salt01 ~]# sudo salt minion01 test.version
Salt request timed out. The master is not responding. 
You may need to run your command with `--async` in order to bypass 
the congested event bus. With `--async`, the CLI tool will print the job id (jid) and 
exit immediately without listening for responses. You can then 
use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.
[root@salt01 ~]#
  • If I revert the salt01's VM image back to saltstack 7.4 version. this problem disappear.

Versions Report

  • errors in systemctl status -l salt-master.
[root@salt01 ~]# systemctl status -l salt-master
? salt-master.service - The Salt Master Server
   Loaded: loaded (/usr/lib/systemd/system/salt-master.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-04-04 09:08:33 CDT; 19min ago
     Docs: man:salt-master(1)
           file:///usr/share/doc/salt/html/contents.html
           https://docs.saltstack.com/en/latest/contents.html
 Main PID: 1090 (salt-master)
   CGroup: /system.slice/salt-master.service
           +-1090 /usr/bin/python /usr/bin/salt-master
           +-1480 /usr/bin/python /usr/bin/salt-master
           +-1707 /usr/bin/python /usr/bin/salt-master
           +-1708 /usr/bin/python /usr/bin/salt-master
           +-1715 /usr/bin/python /usr/bin/salt-master
           +-1716 /usr/bin/python /usr/bin/salt-master
           +-1717 /usr/bin/python /usr/bin/salt-master
           +-1718 /usr/bin/python /usr/bin/salt-master
           +-1724 /usr/bin/python /usr/bin/salt-master
           +-1726 /usr/bin/python /usr/bin/salt-master
           +-1727 /usr/bin/python /usr/bin/salt-master
           +-1728 /usr/bin/python /usr/bin/salt-master
           +-1729 /usr/bin/python /usr/bin/salt-master
           +-1730 /usr/bin/python /usr/bin/salt-master

Apr 04 09:12:15 salt01 salt-master[1090]: pub = salt.crypt.get_rsa_pub_key(pubfn)
Apr 04 09:12:15 salt01 salt-master[1090]: File "/usr/lib/python2.7/site-packages/salt/crypt.py", line 210, in get_rsa_pub_key
Apr 04 09:12:15 salt01 salt-master[1090]: key = RSA.load_pub_key(path)
Apr 04 09:12:15 salt01 salt-master[1090]: File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 406, in load_pub_key
Apr 04 09:12:15 salt01 salt-master[1090]: return load_pub_key_bio(bio)
Apr 04 09:12:15 salt01 salt-master[1090]: File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 422, in load_pub_key_bio
Apr 04 09:12:15 salt01 salt-master[1090]: rsa_error()
Apr 04 09:12:15 salt01 salt-master[1090]: File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 302, in rsa_error
Apr 04 09:12:15 salt01 salt-master[1090]: raise RSAError, m2.err_reason_error_string(m2.err_get_error())
Apr 04 09:12:15 salt01 salt-master[1090]: RSAError: no start line
[root@salt01 ~]#

  • version report
[root@salt01 ~]# salt --versions-report
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: unknown
       dateutil: 1.5
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: 1.3.8
         Jinja2: 2.7.2
        libgit2: 0.24.6
        libnacl: 1.4.3
       M2Crypto: 0.21.1
           Mako: 0.8.1
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: 1.2.5
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: 0.24.2
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: 0.2.4
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.21.1.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core

[root@salt01 ~]#

Bug P1 ZD ZRELEASED - 2018.3.1 fixed-pending-your-verification severity-medium

Most helpful comment

I'm having this same issue with 2018.3.0. My salt-minion is also on 2018.3.0.

This is what I'm getting in /var/log/salt/master:

2018-04-04 15:40:43,095 [tornado.application:123 ][ERROR   ][2361] Future exception was never retrieved: Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 230, in wrapper
    yielded = next(result)
  File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 676, in handle_message
    stream.send(self.serial.dumps(self._auth(payload['load'])))
  File "/usr/lib/python2.7/site-packages/salt/transport/mixins/auth.py", line 436, in _auth
    pub = salt.crypt.get_rsa_pub_key(pubfn)
  File "/usr/lib/python2.7/site-packages/salt/crypt.py", line 210, in get_rsa_pub_key
    key = RSA.load_pub_key(path)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 426, in load_pub_key
    return load_pub_key_bio(bio)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 441, in load_pub_key_bio
    rsa_error()
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 330, in rsa_error
    raise RSAError(Err.get_error_message())
RSAError: no start line
# salt --versions-report
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.28.2
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: ANSI_X3.4-1968
        machine: x86_64
        release: 4.13.13-6-pve
         system: Linux
        version: CentOS Linux 7.4.1708 Core

All 98 comments

@tjyang So we have all the information and can accurately attempt to replicate the situation, was the minion upgraded to 2018.3.0 as well or is it still running 2017.7.4?

Hi @garethgreenaway , Thanks for the quick reply.

Yes. all minions are still using 2017.7.4(some are even older).

I'm having this same issue with 2018.3.0. My salt-minion is also on 2018.3.0.

This is what I'm getting in /var/log/salt/master:

2018-04-04 15:40:43,095 [tornado.application:123 ][ERROR   ][2361] Future exception was never retrieved: Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 230, in wrapper
    yielded = next(result)
  File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 676, in handle_message
    stream.send(self.serial.dumps(self._auth(payload['load'])))
  File "/usr/lib/python2.7/site-packages/salt/transport/mixins/auth.py", line 436, in _auth
    pub = salt.crypt.get_rsa_pub_key(pubfn)
  File "/usr/lib/python2.7/site-packages/salt/crypt.py", line 210, in get_rsa_pub_key
    key = RSA.load_pub_key(path)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 426, in load_pub_key
    return load_pub_key_bio(bio)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 441, in load_pub_key_bio
    rsa_error()
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 330, in rsa_error
    raise RSAError(Err.get_error_message())
RSAError: no start line
# salt --versions-report
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.28.2
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: ANSI_X3.4-1968
        machine: x86_64
        release: 4.13.13-6-pve
         system: Linux
        version: CentOS Linux 7.4.1708 Core

Hitting me as well, with the same message @DeviantEng reported in the log.

> salt --versions-report
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: unknown
       dateutil: 1.5
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.28.2
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: Not Installed
         Python: 2.7.5 (default, May  3 2017, 07:55:04)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: redhat 7.4 Maipo
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.21.1.el7.x86_64
         system: Linux
        version: Red Hat Enterprise Linux Server 7.4 Maipo

edit: Minions are anywhere from 2016.11.3 (Carbon) to 2018.3.0 (Oxygen).

We came in this morning to this problem. Our salt master updated and about half our minions have been updated from 2017.7.2 to 2018.3.0 via automatic update. Most of the 2018.3.0 minions appear to be off the air from the master's point of view. And we are seeing lots of errors in the logs:

2018-04-04 15:58:12,052 [salt.payload     :154 ][CRITICAL][6782] Could not deserialize msgpack message. This often happens when trying to read a file not in binary mode. To see message payload, enable debug logging and retry. Exception:
 unpackb() got an unexpected keyword argument 'ext_hook'
2018-04-04 15:58:12,095 [tornado.general  :410 ][ERROR   ][6782] Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 407, in _run_callback
    callback(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 1045, in mark_future
    data = self.serial.loads(msg[0])
  File "/usr/lib/python2.7/dist-packages/salt/payload.py", line 146, in loads
    ret = msgpack.loads(msg, use_list=True, ext_hook=ext_type_decoder)
  File "_unpacker.pyx", line 80, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:82)
TypeError: unpackb() got an unexpected keyword argument 'ext_hook'
2018-04-04 15:58:12,096 [tornado.general  :445 ][ERROR   ][6782] Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 433, in _handle_events
    self._handle_recv()
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 465, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 407, in _run_callback
    callback(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 1045, in mark_future
    data = self.serial.loads(msg[0])
  File "/usr/lib/python2.7/dist-packages/salt/payload.py", line 146, in loads
    ret = msgpack.loads(msg, use_list=True, ext_hook=ext_type_decoder)
  File "_unpacker.pyx", line 80, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:82)
TypeError: unpackb() got an unexpected keyword argument 'ext_hook'
2018-04-04 15:58:12,097 [tornado.application:612 ][ERROR   ][6782] Exception in callback None
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/tornado/ioloop.py", line 866, in start
    handler_func(fd_obj, events)
  File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 433, in _handle_events
    self._handle_recv()
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 465, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 407, in _run_callback
    callback(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 1045, in mark_future
    data = self.serial.loads(msg[0])
  File "/usr/lib/python2.7/dist-packages/salt/payload.py", line 146, in loads
    ret = msgpack.loads(msg, use_list=True, ext_hook=ext_type_decoder)
  File "_unpacker.pyx", line 80, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:82)
TypeError: unpackb() got an unexpected keyword argument 'ext_hook'
2018-04-04 15:58:51,928 [salt.minion      :998 ][ERROR   ][6782] Error while bringing up minion for multi-master. Is master at saltmaster.ourdomain.com responding?

For the time being, I manually downgraded our salt-master back to 2017.7.5 (Nitrogen) and we're back in business.

I experienced the same issue after automatically updating to 2018.3.0

I was able to get everything connecting again by making sure python-msgpack was installed on both the master and minions.

Thanks everyone. Going to see if I can duplicate the issue and dive into what the root cause is.

Hey guys,

We are also experiencing the issue after the master was upgraded to 2018.3.0.

Regards,

Hanoz

Unfortunately I haven't been able to replicate the issue. If someone could provide a VM snapshot or container of an affected system that would be great. Thanks!

Had the same issue and downgraded the master to 2017.7.5

@garethgreenaway , I can provide a VMware snaphost.
Since this is a production system, please advise me how can I take out my sensitive info.

@tjyang That would be greatly appreciated. Please remove any personal or company proprietary information. Also send the snapshot info. along to [email protected], then it's not publicly available in the issue. Thanks!

@garethgreenaway , Let me see if I can reproduce this issue in my homelab which only have a few minions connected. Send in production VM will be last resort. for homelab, I can even open up an ssh access to you if I can reproduce the issue, will keep you posted.

@tjyang Sounds good!

So I was able to fix the issue by upgrading the m2crypto and the zeromq libraries, and it is working afterwards. I don't know if that helps.

@Auha Every bit of information helps! Do you know what versions of both you had installed before the upgrade?

  • 2018.3.0 with issue, before m2crypto|zeromq' upgrade
[root@salt01 ~]# rpm -qa |egrep 'm2crypto|zeromq'
zeromq-4.1.4-5.el7.x86_64
m2crypto-0.21.1-17.el7.x86_64
[root@salt01 ~]#
  • upgrade per @Auha's pointer
[root@va32lsalt01 ~]# yum update -y
Loaded plugins: changelog, fastestmirror, langpacks, show-leaves
Loading mirror speeds from cached hostfile
 * base: mirrors.advancedhosters.com
 * elrepo: repos.ord.lax-noc.com
 * epel: mirror.cogentco.com
 * extras: mirror.clarkson.edu
 * updates: mirror.cogentco.com
Resolving Dependencies
--> Running transaction check
---> Package m2crypto.x86_64 0:0.21.1-17.el7 will be updated
---> Package m2crypto.x86_64 0:0.28.2-3.el7 will be an update
--> Processing Dependency: python2-typing for package: m2crypto-0.28.2-3.el7.x86_64
---> Package zeromq.x86_64 0:4.1.4-5.el7 will be updated
---> Package zeromq.x86_64 0:4.1.4-6.el7 will be an update
--> Running transaction check
---> Package python2-typing.noarch 0:3.5.2.2-3.el7 will be installed
--> Finished Dependency Resolution

<snipped>
  • salt-call --versions-report
    Salt Version:
[root@salt01 ~]# salt-call --versions-report
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: 5.6.0
       dateutil: 1.5
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: 1.3.8
         Jinja2: 2.7.2
        libgit2: 0.24.6
        libnacl: 1.4.3
       M2Crypto: 0.28.2
           Mako: 0.8.1
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: 1.2.5
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: 0.24.2
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: 0.2.4
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.21.1.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core

[root@salt01 ~]#

  • salt-master start up ok.
[root@salt01 ~]# systemctl status -l  salt-master
โ— salt-master.service - The Salt Master Server
   Loaded: loaded (/usr/lib/systemd/system/salt-master.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-04-04 18:23:51 CDT; 2min 46s ago
     Docs: man:salt-master(1)
           file:///usr/share/doc/salt/html/contents.html
           https://docs.saltstack.com/en/latest/contents.html
 Main PID: 10232 (salt-master)
   CGroup: /system.slice/salt-master.service
           โ”œโ”€10232 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10246 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10249 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10252 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10253 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10254 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10255 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10256 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10257 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10264 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10265 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10266 /usr/bin/python /usr/bin/salt-master
           โ”œโ”€10267 /usr/bin/python /usr/bin/salt-master
           โ””โ”€10268 /usr/bin/python /usr/bin/salt-master

Apr 04 18:23:49 va32lsalt01 systemd[1]: Starting The Salt Master Server...
Apr 04 18:23:51 va32lsalt01 systemd[1]: Started The Salt Master Server.
[root@salt01 ~]#

I don't know if this helps, but I ran just a normal yum update when I previously just upgraded only the salt-master and its dependencies.

--> Running transaction check
---> Package libsodium.x86_64 0:1.0.5-1.el7 will be updated
---> Package libsodium.x86_64 0:1.0.16-1.el7 will be an update
---> Package m2crypto.x86_64 0:0.21.1-17.el7 will be updated
---> Package m2crypto.x86_64 0:0.28.2-3.el7 will be an update
--> Processing Dependency: python2-typing for package: m2crypto-0.28.2-3.el7.x86_64
---> Package python-simplejson.x86_64 0:3.2.0-1.el7 will be obsoleted
---> Package python2-msgpack.x86_64 0:0.4.8-1.el7 will be updated
---> Package python2-msgpack.x86_64 0:0.5.1-1.el7 will be an update
---> Package python2-simplejson.x86_64 0:3.10.0-1.el7 will be obsoleting
---> Package zeromq.x86_64 0:4.1.4-5.el7 will be updated
---> Package zeromq.x86_64 0:4.1.4-6.el7 will be an update
--> Running transaction check
---> Package python2-typing.noarch 0:3.5.2.2-3.el7 will be installed
--> Finished Dependency Resolution

Definitely helps. Looks like we might be able to narrow this down to m2crypto version 0.21.1-17.el7.

  • 1st time run test.ping on all nodes passed with most minions [no reponse]
  • following is the output of 2nd try.
[root@salt01 ~]# salt -t 20 \* test.ping
Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.
[root@salt01 ~]# tail /var/log/salt/master
    pub = salt.crypt.get_rsa_pub_key(pubfn)
  File "/usr/lib/python2.7/site-packages/salt/crypt.py", line 210, in get_rsa_pub_key
    key = RSA.load_pub_key(path)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 426, in load_pub_key
    return load_pub_key_bio(bio)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 441, in load_pub_key_bio
    rsa_error()
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 330, in rsa_error
    raise RSAError(Err.get_error_message())
RSAError: no start line
[root@salt01 ~]#

[root@salt01 ~]# systemctl status -l  salt-master
<snipped>
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: pub = salt.crypt.get_rsa_pub_key(pubfn)
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: File "/usr/lib/python2.7/site-packages/salt/crypt.py", line 210, in get_rsa_pub_key
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: key = RSA.load_pub_key(path)
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 426, in load_pub_key
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: return load_pub_key_bio(bio)
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 441, in load_pub_key_bio
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: rsa_error()
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 330, in rsa_error
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: raise RSAError(Err.get_error_message())
Apr 04 18:27:42 va32lsalt01 salt-master[10232]: RSAError: no start line
[root@va32lsalt01 ~]#

@tjyang both of these are after upgrading m2crypto?

@garethgreenaway
Yes.

[root@salt01 ~]# rpm -qa |egrep 'm2crypto|zeromq'
zeromq-4.1.4-6.el7.x86_64
m2crypto-0.28.2-3.el7.x86_64
[root@salt01 ~]#
  • PS: it will take a while for me to download my prod VM to my home lab.

Sounds good. I have a theory I'm going to explore in the meantime.

@garethgreenaway
Not so good news to report here. I can't reproduce test.ping all nodes failed issue.

  • download salt01 vmware vm into my homelab (192.168.1.0)
  • change IP of salt01 to use 192.168.1.9(salt01.test.lan)
  • yum update -y
  • existing salt-master info
[root@salt01 ~]# salt-call --versions-report ;date
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: 5.6.0
       dateutil: 1.5
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: 1.3.8
         Jinja2: 2.7.2
        libgit2: 0.24.6
        libnacl: 1.4.3
       M2Crypto: 0.28.2
           Mako: 0.8.1
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: 1.2.5
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: 0.24.2
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: 0.2.4
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.21.1.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core

Thu Apr  5 02:40:05 CDT 2018
[root@salt01 ~]#
  • remove all existing salt-minions(close to 1,000)
  • login to 3 minions to remove old master's public key and restart minion on all following 3 nodes.
  • run test.version on all connected minions(only 3 nodes).
[root@salt01 ~]# salt -t 20 \* test.version
nagios01.test.lan:
    2017.7.4
keycloak01.test.lan:
    2017.7.2
ipa1.test.lan:
    2017.7.1
[root@salt01 ~]#

  • Looks like it is about number of connected nodes. I only have a few test nodes in 192.168.1.0 subnet to test with.
  • Let me know if you still interested to do ssh access to salt01@homelab.

@garethgreenaway
From my above test case in homelab(192.168.1.0), looks like "too many minions will fail the 2018.3.0 with timeout".

I went back to my prod one( 2018.3.0) and do followings.

  • delete all existing minions
  • accept only a few minions in by hostname pattern(23 of nodes).
  • run test.ping or test.version on all connected 23 nodes.
[root@salt01 ~]# sudo salt \* test.version |wc;date
     46      46     574
Thu Apr  5 03:00:48 CDT 2018
[root@salt01 ~]#

  • I will add back hosts by small batchs one at a time to see when test.version failed.
  • may be it is not about the amount of nodes connected, it is about older version of minions.

For us, what appears to be happening is that things are fine until some older clients connect, we get a few of those RSA errors, and things slowly degrade until we can no longer contact the salt master. Restarting the master fixes it for a few minutes, but it doesn't last long.

Same error as everyone else on Arch Linux which is still officially at 2017.7.4 but I manually updated to 2018.3 along with m2crypto upgraded to 0.29.0.

I have ~120 minions which are a mixture of version 2018.3 and 2017.x and if I do a salt * test.ping ~20 of them respond (again mixed 2018 and 2017) and then I get the "RSAError: no start line" on the master.

I discovered that if I completely remove python2-m2crypto and install python2-pycryptodomex (which is used as a fallback alternative by salt) everything works fine.

@mtorromeo

Thanks for the tip.
For my centos 7.4 on 2018.3.0 with about 1,000 connected minions,
I did " rpm -e m2crypto-0.28.2-3.el7.x86_64" to resolve the RSA/timeout errors.

@mtorromeo

Unfortunately, after a few runs of applying simple actions on all 1,000 minions.
The RSA/timeout errors message come back again. so removing m2crypto is not the workaround solution.
But restarting salt-master seems to have "salt -t 20 * test.ping" working again.

@tjyang are you sure the error is the same? If there is still /usr/lib64/python2.7/site-packages/M2Crypto/RSA.py in the stack trace of the error you are still using it. Otherwise the error would at least be a little different than before.

My salt-master is still running fine.

Here's my versions report. Notice how M2Crypto is reported as Not Installed:

```
Salt Version:
Salt: 2018.3.0

Dependency Versions:
cffi: 1.11.5
cherrypy: unknown
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.10
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: 1.2.5
pycparser: 2.18
pycrypto: 2.6.1
pycryptodome: 3.5.1
pygit2: Not Installed
Python: 2.7.14 (default, Jan 5 2018, 10:41:29)
python-gnupg: Not Installed
PyYAML: 3.12
PyZMQ: 16.0.3
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.2.2

System Versions:
dist:
locale: UTF-8
machine: x86_64
release: 4.15.8-1-ARCH
system: Linux
version: Not Installed
```

@mtorromeo

You are correct, the error message on my production master should be different since it is now "M2Crypto: Not Installed".
So far " salt-run -t 60 manage.versions" works 3 more time now every time, after I restarted salt-master. But the first time I ran it was causing issue. I didn't catch the error log.

@garethgreenaway , hope you have enough information to debug the root cause.

We're chasing a theory on this one, if you're still seeing this under 2018.3.0 can you include the first line (the header) of your Salt master's public key? should be found in /etc/salt/pki/master/master.pub. Also the last modified date, ls -l /etc/salt/pki/master/master.pub.
Thanks!

@garethgreenaway

So far my production 2018.3.0 works fine.
Following is the info you requested.
Hope it is helpful for your debugging.

[root@salt01 ~]# head -2 /etc/salt/pki/master/master.pub
-----BEGIN PUBLIC KEY-----
MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAtjsvEo7PHlQoj7wHBgYA
[root@salt01 ~]# ls -l /etc/salt/pki/master/master.pub
-rw-rw-r-- 1 saltadm saltadm 800 Jan 22  2015 /etc/salt/pki/master/master.pub
[root@salt01 ~]#

@tjyang So things appear to be working without m2crypto installed?

@garethgreenaway
Yes to me without m2crypto installed. But I only did some basic checks like followings.

salt -t 20 \* test.ping 
salt -t 20 \* test.version
salt -t 20 \* cmd.run 'uname -a'

@tjyang thanks. would you mind emailing me a copy of that master public key? gareth at saltstack dot com.

@garethgreenaway Sent.

@tjyang That was definitely helpful. I believe we've got this one figured out. Two last things to check. Can you check the first couple lines of the minion public key for one of the minions that was having problems prior to m2crypto being removed? And see if python2-pycryptodomex is installed and if so, what version? Thanks!

  • Can you check the first couple lines of the minion public key for one of the minions that was having problems prior to m2crypto being removed?
A.

[root@miq01t ~]# head -3 /etc/salt/pki/minion/minion.pub
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAhd/LQys5XBpJPIXzWyQl
GZwa6+/9avNIPO4MEsVhufuCoYBv0XQp0Q64LXGluGSrORhrZfqt1cce+zYCjTay
[root@miq01t ~]#


  • And see if python2-pycryptodomex is installed and if so, what version?
A.
no  python2-pycryptodomex is not  installed on miq01t
[root@miq01t ~]# rpm -qa |grep python2-py
python2-pyasn1-modules-0.1.9-7.el7.noarch
python2-pyasn1-0.1.9-7.el7.noarch
[root@miq01t ~]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
[root@miq01t ~]#

Same problem here

But we can't uninstall m2crypto because our server is Katello/Saltstack, some pulp packages are needing m2crypto and it seems that even the Git Fileserver Backend is not working well without m2crypto

Just want to confirm with everyone who is seeing this issue, that whatever components are being upgraded to 2018.3 either minion or master are being restarted following the upgrade?

For me, procedure for upgrading salt-master is always 1. stop salt-master 2. doing upgrade and 3. start salt-master. No restart of remote salt-minions.

I install salt from the salt repo (EL7), and updated via yum -y update and normally have no issues. First thing I did after discovering this issue was rebooting the system, but still had the same issue.

As a work around for now, I have removed m2crypto from my salt-master and I am working again. I haven't seen any more errors related to this, but I only have ~40 minion so not a super busy environment.

[root@salt ~]# salt-master --versions-report
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: ANSI_X3.4-1968
        machine: x86_64
        release: 4.13.13-6-pve
         system: Linux
        version: CentOS Linux 7.4.1708 Core

ZD-2406

was able to recreate this issue by installing pycryptodome on a pre 2018.3 minion and forcing a rekey for that minion.

My test case was a 2017.7.1 minion to a 2018.3.0 master. the master did have m2crypto installed. when I uninstalled it. and restarted the master the issue went away.

@whytewolf We were able to duplicate this with pycryptodome installed and the tell tale sign of the problem can be spotted in the header of minion public key. It will contain RSA and it should not. This is a known bug in earlier version of pycryptodome. The PR submitted by @dwoz (https://github.com/saltstack/salt/pull/46930) should resolve that. There seems to be some issues that are not related to having pycryptodome installed though that we're still unable to duplicate.

I was seeing the same behavior as @cheribral. I had m2crypto (0.28.2-3.el7) installed and had restarted all of my processes. The master was responsive for a single test.ping, and then nothing. I removed m2crypto and everything seems to be fine now. I verified I don't have pycryptodome on my production master.

There is a potential fix (https://github.com/saltstack/salt/pull/46930) if someone is able to re-install M2Crypto and try it out, that would be greatly appreciated.

@garethgreenaway

If i've had correctly understood, so this is not the fix, because i've checked avery minion key on our system, and no one of these had the Rsa header, but we have the problem anyway.

@Cobra1978 Thanks. This was something we were looking to confirm. If possible would you be able to send me the private & public keys for the master and those for a minion which are affected by this issue when running 2018.3? Email would work best, gareth at saltstack dot com

@brianeclow can you paste your --versions-report output? in particular msgpack-python I'm interested in. thanks

@Cobra1978 Thanks for the keys. When you experience the issue with 2018.3 is there any sort of traceback or exceptions that you could provide? And if you could provide the output from:
salt --versions-report.

@garethgreenaway

Trace:

2018-04-11 08:57:26,171 [tornado.application:123 ][ERROR   ][3738] Future exception was never retrieved: Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 230, in wrapper
    yielded = next(result)
  File "/usr/lib/python2.7/site-packages/salt/transport/zeromq.py", line 676, in handle_message
    stream.send(self.serial.dumps(self._auth(payload['load'])))
  File "/usr/lib/python2.7/site-packages/salt/transport/mixins/auth.py", line 436, in _auth
    pub = salt.crypt.get_rsa_pub_key(pubfn)
  File "/usr/lib/python2.7/site-packages/salt/crypt.py", line 210, in get_rsa_pub_key
    key = RSA.load_pub_key(path)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 426, in load_pub_key
    return load_pub_key_bio(bio)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 441, in load_pub_key_bio
    rsa_error()
  File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 330, in rsa_error
    raise RSAError(Err.get_error_message())
RSAError: no start line

version report on the server:

 salt --versions-report
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: unknown
       dateutil: 1.5
      docker-py: 1.10.6
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: 0.24.6
        libnacl: Not Installed
       M2Crypto: 0.28.2
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: 0.24.2
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.17.1.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core

I checked all the minion public keys with

salt-run survey.hash * cmd.run 'head -1 /etc/salt/pki/minion/minion.pub'

and found that 10 minions had the wrong -----BEGIN RSA PUBLIC KEY----- line while all the others were fine.

I fixed all the keys and restarted the minions.
I also had to re-accept the keys that salt-master now denied.
Then I re-installed m2crypto and restarted the master and it seems that the problem is gone.

Ummmh, but i've encointered the problem even with a minion that don't had -----BEGIN RSA PUBLIC KEY-----, but -----BEGIN PUBLIC KEY----- in the header

@Cobra1978 you might have a slightly different issue but I think that fixing -----BEGIN RSA PUBLIC KEY----- lines or applying the suggested patch can resolve the problem for most of the users here.

@mtorromeo
Wait, for what i've seen having some keys with wrong header may cause the issue for all minions, so by now i try to fix all the keys in my environment, and see if this fix or not.

Anyone know how wrong " -----BEGIN RSA PUBLIC KEY-----" got generated ?
I have 8 of these in my minions.

There is a known bug is some versions of pycrytdomex that would result in that invalid header being in place, versions 3.4.3 and under I believe. It is fixed in later versions.

For posterity the issue was fix in pycryptodome version 3.4.7:

http://pycryptodome.readthedocs.io/en/latest/src/changelog.html#id16

@mattp- I just got back from a few days off, but before I left, I was pinned our version from the latest apt repo to a known working specific repo, in our case 2017.7.5. Here is the current on salt-master:

$ sudo /usr/bin/salt --versions-report
Salt Version:
           Salt: 2017.7.5

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.6.1
      docker-py: Not Installed
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.10
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 0.9.1
   msgpack-pure: Not Installed
 msgpack-python: 0.4.7
   mysql-python: 1.2.3
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.6 (default, Nov 23 2017, 15:49:48)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 14.0.1
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.4

System Versions:
           dist: Ubuntu 14.04 trusty
         locale: UTF-8
        machine: x86_64
        release: 3.13.0-74-generic
         system: Linux
        version: Ubuntu 14.04 trusty

Just hit the aforementioned unpackb errors during our upgrade on some Ubuntu 14.04 minions, seems manually upgrading python-msgpack did the trick.

@b1naryth1ef Do you happen to know the version of python-msgpack that was installed before you upgraded the package?

@dwoz based on the apt logs I have most of the servers that broke with this update where on 0.3.0-1ubuntu3, upgrading put them on 0.4.6-1build1

I can confirm that this issue also exists on Raspbian Jessie, after the upgrade, the salt-master has an exception being thrown, which I can view when I run "systemctl status salt-master.service", however the master itself appears to be communicating with the minion machines which are still on the previous version. However, the salt-minion I upgraded on Raspbian Jessie is now broken, it doesn't respond to the master, but shows as "active" in systemd. The logs are even more weirder...

[ERROR ] Got a bad pillar from master, type str, expecting dict:
[ERROR ] Error while bringing up minion for multi-master. Is master at localhost responding?

Firstly, this is weird because the pillar for the master was working perfectly before the minion upgrade, and secondly it's weird because I am absolutely not running in multi-master mode! So why does the minion think I am in multi-master mode? Luckily I still have the 2017.7.4 DEB packages in my /var/apt/archives, so I am going to downgrade both my salt-master and salt-minion until this issue can be resolved.

I realize that there has been some talk behind python-msgpack being a possible problem, I can safely say that during my apt-get upgrade, msgpack wasn't updated to a newer version. When I check my currently installed version of python-msgpack, which worked on 2017.7.4+ds-1, the version of python-msgpack I currently have installed is: 0.4.2-1_armhf

Okay, so my salt-master exception is much different than I see from other users.

Apr 13 14:55:18 pi salt-master[10403]: ret = getattr(self, func)(load)
Apr 13 14:55:18 pi salt-master[10403]: File "/usr/lib/python2.7/dist-packages/salt/master.py", line 1487, in _pillar
Apr 13 14:55:18 pi salt-master[10403]: 'pillar': data})
Apr 13 14:55:18 pi salt-master[10403]: File "/usr/lib/python2.7/dist-packages/salt/cache/__init__.py", line 156, in store
Apr 13 14:55:18 pi salt-master[10403]: return self.modules[fun](bank, key, data, **self._kwargs)
Apr 13 14:55:18 pi salt-master[10403]: File "/usr/lib/python2.7/dist-packages/salt/cache/redis_cache.py", line 346, in store
Apr 13 14:55:18 pi salt-master[10403]: key=key, bank=bank, redis_key=redis_key
Apr 13 14:55:18 pi salt-master[10403]: File "/usr/lib/python2.7/logging/__init__.py", line 1148, in debug
Apr 13 14:55:18 pi salt-master[10403]: self._log(DEBUG, msg, args, **kwargs)
Apr 13 14:55:18 pi salt-master[10403]: TypeError: _log() got an unexpected keyword argument 'redis_key'

So this may explain why my master minion has a pillar error, but this worked before the update...

So I removed the line in my master config which says to use "redis" as my cache, as everything works again. So, I guess there is a bug in the redis cache module, should I open a separate issue for this?

@kveroneau Yes, Please open a separate issue and thanks for reporting the problem.

@kveroneau that issue is already fixed in the 2018.3 repo, and will be in the 2018.3.1 branch

Same on Redhat RHEL7......

I am also unable to yum downgrade to 2017.7.4-1.el7

reading through the case,

[root@vuwunicorhsat01 ~]# rpm -qa |egrep 'm2crypto|zeromq'
m2crypto-0.21.1-17.el7.x86_64
zeromq-4.1.4-6.el7.x86_64
[root@vuwunicorhsat01 ~]#

[root@vuwunicorhsat01 ~]# salt --versions-report
Salt Version:
Salt: 2018.3.0

Dependency Versions:
cffi: 1.6.0
cherrypy: Not Installed
dateutil: 1.5
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: 0.21.1
Mako: 0.8.1
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: 2.14
pycrypto: 2.6.1
pycryptodome: 3.4.3
pygit2: Not Installed
Python: 2.7.5 (default, May 3 2017, 07:55:04)
python-gnupg: 0.3.7
PyYAML: 3.11
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: redhat 7.4 Maipo
locale: UTF-8
machine: x86_64
release: 3.10.0-693.17.1.el7.x86_64
system: Linux
version: Red Hat Enterprise Linux Server 7.4 Maipo

[root@vuwunicorhsat01 ~]#

Are there any old rpms for earlier salt-master somewhere? we are crippled without salt and are in the middle of a major upgrade that has just come to a creaming halt.

They are all on the repo website, just pin to the old major release.

On Mon, Apr 16, 2018, 5:33 PM gzcwnk notifications@github.com wrote:

Are there any old rpms for earlier salt-master somewhere? we are crippled
without salt and are in the middle of a major upgrade that has just come to
a creaming halt.

โ€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/saltstack/salt/issues/46868#issuecomment-381771263,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAssoeHNbVEOMPHw5M1yEBP2eMeu2rAvks5tpRxTgaJpZM4TG6uQ
.

Installed latest both master and minion in one server:

Salt Version:
Salt: 2018.3.0

Dependency Versions:
cffi: 1.11.5
cherrypy: 3.2.2
dateutil: 1.5
docker-py: Not Installed
gitdb: 0.5.4
gitpython: 0.3.2 RC1
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 0.9.1
msgpack-pure: Not Installed
msgpack-python: 0.3.0
mysql-python: 1.2.3
pycparser: 2.18
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.6 (default, Nov 23 2017, 15:49:48)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 14.0.1
RAET: Not Installed
smmap: 0.8.2
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.4

System Versions:
dist: Ubuntu 14.04 trusty
locale: UTF-8
machine: x86_64
release: 4.4.0-93-generic
system: Linux
version: Ubuntu 14.04 trusty

Master:
2018-04-18 14:17:29,326 [salt.payload :154 ][CRITICAL][23097] Could not deserialize msgpack message. This often happens when trying to read a file not in binary mode. To see message payload, enable debug logging and retry. Exception: unpackb() got an unexpected keyword argument 'ext_hook'
2018-04-18 14:17:29,346 [salt.transport.zeromq:652 ][ERROR ][23097] Bad load from minion: TypeError: unpackb() got an unexpected keyword argument 'ext_hook

Minion:
2018-04-18 14:16:48,195 [salt.minion :998 ][ERROR ][24310] Error while bringing up minion for multi-master. Is master at 172.16.140.3 responding?
2018-04-18 14:16:58,220 [salt.payload :154 ][CRITICAL][24310] Could not deserialize msgpack message. This often happens when trying to read a file not in binary mode. To see message payload, enable debug logging and retry. Exception: unpackb() got an unexpected keyword argument 'ext_hook'
2018-04-18 14:16:58,237 [tornado.general :410 ][ERROR ][24310] Uncaught exception, closing connection.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 407, in _run_callback
callback(args, *kwargs)
File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(args, *kwargs)
File "/usr/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 1045, in mark_future
data = self.serial.loads(msg[0])
File "/usr/lib/python2.7/dist-packages/salt/payload.py", line 146, in loads
ret = msgpack.loads(msg, use_list=True, ext_hook=ext_type_decoder)
File "_unpacker.pyx", line 80, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:82)
TypeError: unpackb() got an unexpected keyword argument 'ext_hook'
2018-04-18 14:16:58,237 [tornado.general :445 ][ERROR ][24310] Uncaught exception, closing connection.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 433, in _handle_events
self._handle_recv()
File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 465, in _handle_recv
self._run_callback(callback, msg)
File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 407, in _run_callback
callback(args, *kwargs)
File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(args, *kwargs)
File "/usr/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 1045, in mark_future
data = self.serial.loads(msg[0])
File "/usr/lib/python2.7/dist-packages/salt/payload.py", line 146, in loads
ret = msgpack.loads(msg, use_list=True, ext_hook=ext_type_decoder)
File "_unpacker.pyx", line 80, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:82)
TypeError: unpackb() got an unexpected keyword argument 'ext_hook'
2018-04-18 14:16:58,238 [tornado.application:612 ][ERROR ][24310] Exception in callback None
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/tornado/ioloop.py", line 866, in start
handler_func(fd_obj, events)
File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(args, *kwargs)
File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 433, in _handle_events
self._handle_recv()
File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 465, in _handle_recv
self._run_callback(callback, msg)
File "/usr/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 407, in _run_callback
callback(args, *kwargs)
File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(args, *kwargs)
File "/usr/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 1045, in mark_future
data = self.serial.loads(msg[0])
File "/usr/lib/python2.7/dist-packages/salt/payload.py", line 146, in loads
ret = msgpack.loads(msg, use_list=True, ext_hook=ext_type_decoder)
File "_unpacker.pyx", line 80, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:82)
TypeError: unpackb() got an unexpected keyword argument 'ext_hook'

@wongchao in your case that is definitely because your old version of msgpack-python (0.3.0). can you upgrade?
@brianeclow can you check if msgpack-python on your minion is also 0.4.7?
@garethgreenaway ext_hook doesn't exist in msgpack-python listed in requirements.txt. that is probably causing atleast that class of error people are seeing

@mattp- randomly selected salt minion versions-report shows the msgpack-python version as

 msgpack-python: 0.4.6

Did a salt command to pull that from all the salt versions-reports, and it looks like a handful of the minions have msgpack-python: 0.3.0 instead.

@brianeclow Those minions reporting msgpack-python: 0.3.0, what versions of OS platform and salt-minion are they, and is it possible to retrieve a few versions report from them ?

All minions are running 2017.7.5, and they are ubuntu 14.04. They are some of the oldest launched instances in the infrastructure. The configuration management, like most of them, if famous for not updating anything that it is not explicitly told to update. In other words, if the package doesn't require a dependent package to have a version bump, then the CM will not implicitly bump the package even if there is one available.

This is sounding like this needs to be added to the list of "latest" and/or the built packages need to have their dependencies updated to require a minimum version.

quite a few issues flying around this thread;
fwiw it looks like msgpack-python ext_hook is introduced in 0.3.1, so I'm willing to wager the guess that those ext_hook stacktraces are strictly from minions running 0.3. those who have seen it would be nice to confirm if that assumption is correct?

@brianeclow Thanks for the information. Yes, just because the SaltStack repo provides 0.4.x, doesn't mean that it gets used. I shall update the dependencies for all platforms to require a minimum >= 0.4 which should help address this problem.

@mattp-
Thanks for ur suggestion, it works for me after msgpack-python upgraded.
Now msgpack-python: 0.5.6

Just to confirm the provided workaround, I found this problem in Ubuntu 14.04 when upgrading from 2017.7.4 to 2018.3.0, and solved it upgrading the package python-msgpack from 0.3.0-1ubuntu3 to 0.4.6-1build1.

Having the same problem on CentOS.

salt --versions-report
Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: Not Installed
       cherrypy: unknown
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.28.2
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.21.1.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core

We are having this issue running 2018.3.0 on the masters and python27-msgpack-0.4.6-2 on masters and minions. Are you sure that msgpack is the sole cause of the problem?

@mwtzzz @ianare are you specifically referring to the ext_hook msgpack issue? Or the RSAerror 'no start line'?

for those experiencing no start line, could you please paste the contents of the private key for the minion being loaded that is specifically causing the 'no start line' errors? (scrub the key-data itself of course or change the key after). I'd like to see if there was a change in RSA header style/format being generated or accepted.

Sorry, yes the problem is with the start line:

pub = salt.crypt.get_rsa_pub_key(pubfn)
File "/usr/lib/python2.7/site-packages/salt/crypt.py", line 210, in get_rsa_pub_key
key = RSA.load_pub_key(path)
File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 426, in load_pub_key
return load_pub_key_bio(bio)
File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 441, in load_pub_key_bio
rsa_error()
File "/usr/lib64/python2.7/site-packages/M2Crypto/RSA.py", line 330, in rsa_error
raise RSAError(Err.get_error_message())
RSAError: no start line

Here are the keys:

-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAz3R1LdzUpF7axUtVYUwpfYIeXJzQrChvSmpfFXxhPOZOGGbs
Ak1E/lyaYDg3g5gO83GyRIYjGm4iwx0UTOvdmjjAxe5FIjPaMc50N8nhf1Lylt9g
Kt+qRtnXYXVzvDvVRPepUh06ygRMwzD/foMup6GiHIKN4nfAgsex/SBX6B9+MWPk
9Xu1xFp5Nr58s2UqbnxlThBnyZ4sIsfLV0qGHkD+JsHiLmFVqCzrmhB0n4qca9n5
kOULER60y/RzciqBzwdaZRNATQvAwC128LJ5nsrlACpjIarBA7TDA6c9lZH+qjEV
pO7ID7GrFylK6o2vuksdaKwQgUX+4/sY7Rvc2QIDAQABAoIBAAZgkXoWkeRYJKBA
QbiHGXbp3TJSYUFLkGlAlhnwXAJd34x1PH/RHYP2t4mp7L3zOpFt/zGHUJSvSksG
bbF0Jjinkjh7a71ii+xwSOcOoguSkpHD/d4viDnbmDj/He7NRY/0T4yOSLh9U5lD
/DSoxnJBgBzhgAhUT1MN3bk26XDhmck0o4VB1GBL1o1FzKWiMCjdpXkdBPy0qlBj
j6hcnIf8Bx/7BhBLnssrkWGzMZn1aYPOadrn9Mz+1M6kPoKsFG542dXcFVmSz2Yf
HW8ADygcsV8c+PW3Nlky3fh0XP7f94BjHf8NW1HsqK/0fiCooeAP/Gj8m7gpfx7A
Y+8jWnECgYEA4qNE1dOO6O7A2rgl1AbHkUqt1vB6Gvdo8PvwlXzHpJfxegglws13
TxRj+VZlpCNZUzx9kNuiu7jaJ/9TJebUpomu/o0ybN0QFEqc7majisEktl1sXK9B
SJK6JbhkG/fDxLFFtgT/nHmT4saKG5IyGKSkZbW+QVCx+6o2PjxzNnECgYEA6lT3
BNpfFVgTFhoJ57Gr97SYz0wp2Sh4OgRfaJCJqTxpVE7gvDT40XDPd9A+4IpLsC4N
PUHCf4LBlLqUWqi5k/pCEQDqoaM0PVQgl8Xs6/QET9uL/2cAW+Cpphm1uUbk2EoS
IQln40uHSOFVI8tS5buHLP0evLbZQxhI01D8UOkCgYEAjmkjju1CBrJH8uf1tTaR
XoW7mK1BPQKsKYlWUsi2toFXlmPaPn9XchSRAZM9iRdBQOFo0YaByk3Mn4d72vwi
MYegs4lL30ZFnUqJM/DA4SaeiA1KUy846X5xg9RK65wTotv1vvuDFvL6b2xsSgji
e8OxHcxHAcnQHSCGtzLK/rECgYAMVBwK7VJDVTbr2sLapcZeaeoAPgmVm0N4CfLb
FX+on70K6nt4a3EL09uWeiZUufHxevrHVkZdXwvAP3C+tFHPL0N6kYnGp0eAUoTE
d21XnKb9bJeZAFpMey1xx7h/3vL7TipWAqCsnLSpaPGJyJC79XnPLCIJ27TiisyK
EaC0wQKBgQDeOK2+Ukd3L7XVQMJhfph/ETGuJtfAq6gsKl9ekttGUXqy2zDEiebG
umtMfxe9yH3cSifa0nSKxfhkmFhMPlx5hwwyANdgqH7Q/zlR1RCx8lCpVCzZcMzJ
z4rn0nCqiuLTjw6rTs9moSStO7AOoIBdTklm+CB5InS7W8bnRhrxfw==
-----END RSA PRIVATE KEY-----

-----BEGIN RSA PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAz3R1LdzUpF7axUtVYUwp
fYIeXJzQrChvSmpfFXxhPOZOGGbsAk1E/lyaYDg3g5gO83GyRIYjGm4iwx0UTOvd
mjjAxe5FIjPaMc50N8nhf1Lylt9gKt+qRtnXYXVzvDvVRPepUh06ygRMwzD/foMu
p6GiHIKN4nfAgsex/SBX6B9+MWPk9Xu1xFp5Nr58s2UqbnxlThBnyZ4sIsfLV0qG
HkD+JsHiLmFVqCzrmhB0n4qca9n5kOULER60y/RzciqBzwdaZRNATQvAwC128LJ5
nsrlACpjIarBA7TDA6c9lZH+qjEVpO7ID7GrFylK6o2vuksdaKwQgUX+4/sY7Rvc
2QIDAQAB
-----END RSA PUBLIC KEY-----

I've found a "workaround" which is not really a workaround with many many machines:

  • delete all keys on the master: salt-key -D
  • on each minion, delete the minion key:
    systemctl stop salt-minion rm -f /etc/salt/pki/minion/minion.* systemctl start salt-minion
  • on the master, accept all the minion keys: salt-key -A

@mattp- We ended up reverting back to 2017.7.5-1 on our salt masters in order to make it work.

fyi for those still seeing issues I believe the current fix in develop for this is https://github.com/saltstack/salt/pull/46930 , if you are looking to patch rather than downgrade.

@mattp- thanks for that matt, we will test that out

As @mattp- mentioned above, there are several issues going on in this thread. The key issue is fixed at the head of the 2018.3 branch in #46930 and will available in the 2018.3.1 release. The msgpack version will be fixed in the packages as @dmurphy18 mentioned above also for the next release.

If you are seeing issues that are not related to either of these two problems and find yourself here, please file a new issue with all of the relevant information so we can track that problem separately.

Thank you everyone for your help in tracking down these bugs and for your help with reproducing and testing. It is very much appreciated.

When will 2018.3.1 be released? I applied the latest release (2018.3) for minions and master and now I cannot run even 'cmd.run uptime' or 'grains.get saltversion'. I have tried restarting the master. Thanks!

# salt b* grains.get saltversion
Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.

David

@ddellingeroxy We're in the process of finalizing the 2018.3.1 release so it should be pretty close now.

Fantastic! Thanks!

I still see this issue with Salt 2018.3.2 (with master and minion using the same version)

Can you open a new issuenwith steps to replicate?

Also make sure you have completely restarted the master process after
upgrading to 2018.3.2

On Sun, Oct 14, 2018, 8:26 AM Massimiliano Adamo notifications@github.com
wrote:

I still see this issue with Salt 2018.3.2 (with master and minion using
the same version)

โ€”
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/saltstack/salt/issues/46868#issuecomment-429626325,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAssoQDMYB6PDmwt5FI1ojjiVOgMFebcks5ukztvgaJpZM4TG6uQ
.

I was hit by the same problem with the next configuration:

```Salt Version:
Salt: 3000.3

Dependency Versions:
cffi: 1.12.2
cherrypy: unknown
dateutil: 2.7.3
docker-py: 3.4.1
gitdb: 2.0.5
gitpython: 2.1.11
Jinja2: 2.10
libgit2: 0.27.7
M2Crypto: 0.31.0
Mako: 1.0.7
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: 1.3.10
pycparser: 2.19
pycrypto: 2.6.1
pycryptodome: 3.6.1
pygit2: 0.27.4
Python: 3.7.3 (default, Dec 20 2019, 18:57:59)
python-gnupg: 0.4.4
PyYAML: 3.13
PyZMQ: 17.1.2
smmap: 2.0.5
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.1

System Versions:
dist: debian 10.2
locale: UTF-8
machine: x86_64
release: 4.15.0-99-generic
system: Linux
version: debian 10.2


UPD: I have investigated the issue and it looks like that it is somehow connected with the fact that one of minions sent the key of null size to the master. I've got in the logs the next error:

2020-07-09 09:53:03,784 [tornado.application:345 ][ERROR ][35] Future exception was never retrieved: Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 309, in wrapper
yielded = next(result)
File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 766, in handle_message
stream.send(self.serial.dumps(self._auth(payload['load'])))
File "/usr/lib/python3/dist-packages/salt/transport/mixins/auth.py", line 436, in _auth
pub = salt.crypt.get_rsa_pub_key(pubfn)
File "/usr/lib/python3/dist-packages/salt/crypt.py", line 213, in get_rsa_pub_key
key = RSA.load_pub_key_bio(bio)
File "/usr/lib/python3/dist-packages/M2Crypto/RSA.py", line 441, in load_pub_key_bio
rsa_error()
File "/usr/lib/python3/dist-packages/M2Crypto/RSA.py", line 330, in rsa_error
raise RSAError(Err.get_error_message())
M2Crypto.RSA.RSAError: no start line
```

I believe that there must be double-check in salt minion, salt-key and salt master for the correctness of key. So the described situation will be impossible

Was this page helpful?
0 / 5 - 0 ratings