Salt: [BUG] Salt Minion won't reconnect, 25+ GB minion log file

Created on 15 Apr 2020  路  5Comments  路  Source: saltstack/salt

Description
Salt minions deployed across Windows environment, many Windows 7 systems. All have been fine for months except this one. Won't reconnect to salt master after key was accepted, "Minion did not return".

Minion log file has since filled up and is nearly 30GB of logs.

Setup

Minion config:

master: $master
id: $hostname
master_port: $port
publish_port: $port
hash_type: sha256

Steps to Reproduce the behavior

  • Deploy Salt minion
  • Accept key
  • Minion did not return, can't connect, errors accumulate in log

Expected behavior

  • Minion should remain connected

Screenshots
Screenshot 2020-04-15 09 32 58

Versions Report
Master:

Salt Version:
           Salt: 2019.2.0

Dependency Versions:
           cffi: 1.12.3
       cherrypy: 3.2.3
       dateutil: 2.4.2
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 1.0.3
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: 1.2.5
      pycparser: 2.19
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.12 (default, Oct  8 2019, 14:14:10)
   python-gnupg: 0.3.8
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: Ubuntu 16.04 xenial
         locale: UTF-8
        machine: x86_64
        release: 4.4.0-1099-aws
         system: Linux
        version: Ubuntu 16.04 xenial

Minion:

    Salt Version:
               Salt: 2019.2.0

    Dependency Versions:
               cffi: 1.11.5
           cherrypy: 17.4.1
           dateutil: 2.7.5
          docker-py: Not Installed
              gitdb: 2.0.5
          gitpython: Not Installed
              ioflo: Not Installed
             Jinja2: 2.10
            libgit2: Not Installed
            libnacl: 1.6.1
           M2Crypto: Not Installed
               Mako: 1.0.7
       msgpack-pure: Not Installed
     msgpack-python: 0.5.6
       mysql-python: Not Installed
          pycparser: 2.19
           pycrypto: 2.6.1
       pycryptodome: Not Installed
             pygit2: Not Installed
             Python: 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]
       python-gnupg: 0.4.3
             PyYAML: 3.13
              PyZMQ: 17.1.2
               RAET: Not Installed
              smmap: 2.0.5
            timelib: 0.2.4
            Tornado: 4.5.3
                ZMQ: 4.2.5

    System Versions:
               dist:
             locale: cp1252
            machine: AMD64
            release: 7
             system: Windows
            version: 7 6.1.7601 SP1 Multiprocessor Free

Additional context

Sample logs from minion:

File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 215, in _get_singleton
_get_singleton().reinit()
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 159, in __init__
_singleton = _LockingUserFriendlyRNG()
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 224, in reinit
_UserFriendlyRNG.__init__(self)
self.auth = salt.crypt.AsyncAuth(self.opts, io_loop=self.io_loop)
  salt.utils.crypt.reinit_crypto()
File "c:\salt\bin\lib\site-packages\salt\crypt.py", line 468, in __new__
auth.__singleton_init__(opts, io_loop=io_loop)
File "c:\salt\bin\lib\site-packages\salt\crypt.py", line 511, in __singleton_init__
File "c:\salt\bin\lib\site-packages\salt\utils\crypt.py", line 114, in reinit_crypto
Crypto.Random.atfork()
File "c:\salt\bin\lib\site-packages\Crypto\Random\__init__.py", line 37, in atfork
_UserFriendlyRNG.reinit()
yielded = next(result)
pub_channel = salt.transport.client.AsyncPubChannel.factory(self.opts, **factory_kwargs)
File "c:\salt\bin\lib\site-packages\salt\transport\client.py", line 162, in factory
return salt.transport.zeromq.AsyncZeroMQPubChannel(opts, **kwargs)
File "c:\salt\bin\lib\site-packages\salt\transport\zeromq.py", line 394, in __init__
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 307, in wrapper
File "c:\salt\bin\lib\site-packages\salt\minion.py", line 709, in eval_master
_UserFriendlyRNG.reinit()
File "c:\salt\bin\lib\site-packages\Crypto\Random\__init__.py", line 37, in atfork
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 224, in reinit
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 86, in __init__
self._osrng = OSRNG.new()
self._ec = _EntropyCollector(self._fa)
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 53, in __init__
AttributeError: 'module' object has no attribute 'new'
File "c:\salt\bin\lib\site-packages\salt\minion.py", line 1018, in _connect_minion
Traceback (most recent call last):
yield minion.connect_master(failed=failed)
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 1055, in run
File "c:\salt\bin\lib\site-packages\tornado\concurrent.py", line 238, in result
value = future.result()
master, self.pub_channel = yield self.eval_master(self.opts, self.timeout, self.safe, failed)
File "c:\salt\bin\lib\site-packages\salt\minion.py", line 1215, in connect_master
yielded = self.gen.throw(*exc_info)
value = future.result()
File "c:\salt\bin\lib\site-packages\tornado\concurrent.py", line 238, in result
_get_singleton().reinit()
_singleton = _LockingUserFriendlyRNG()
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 159, in __init__
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 215, in _get_singleton
_UserFriendlyRNG.__init__(self)
2020-04-09 10:28:21,816 [salt.minion      :1041][CRITICAL][3312] Unexpected error while connecting to REDACTED
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 1063, in run
raise_exc_info(self._exc_info)
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 1055, in run
raise_exc_info(self._exc_info)
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 86, in __init__
self._osrng = OSRNG.new()
AttributeError: 'module' object has no attribute 'new'
yield minion.connect_master(failed=failed)
File "c:\salt\bin\lib\site-packages\salt\minion.py", line 1018, in _connect_minion
value = future.result()
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 1063, in run
File "c:\salt\bin\lib\site-packages\tornado\concurrent.py", line 238, in result
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 1055, in run
raise_exc_info(self._exc_info)
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 307, in wrapper
yielded = next(result)
File "c:\salt\bin\lib\site-packages\salt\transport\client.py", line 162, in factory
File "c:\salt\bin\lib\site-packages\salt\transport\zeromq.py", line 394, in __init__
self.auth = salt.crypt.AsyncAuth(self.opts, io_loop=self.io_loop)
File "c:\salt\bin\lib\site-packages\salt\crypt.py", line 468, in __new__
salt.utils.crypt.reinit_crypto()
auth.__singleton_init__(opts, io_loop=io_loop)
File "c:\salt\bin\lib\site-packages\salt\crypt.py", line 511, in __singleton_init__
Crypto.Random.atfork()
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 53, in __init__
self._ec = _EntropyCollector(self._fa)
2020-04-09 10:28:21,780 [salt.minion      :1041][CRITICAL][3312] Unexpected error while connecting to REDACTED
Traceback (most recent call last):
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 1055, in run
File "c:\salt\bin\lib\site-packages\tornado\concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
yielded = self.gen.throw(*exc_info)
File "c:\salt\bin\lib\site-packages\salt\minion.py", line 1215, in connect_master
value = future.result()
master, self.pub_channel = yield self.eval_master(self.opts, self.timeout, self.safe, failed)
File "c:\salt\bin\lib\site-packages\salt\minion.py", line 709, in eval_master
pub_channel = salt.transport.client.AsyncPubChannel.factory(self.opts, **factory_kwargs)
return salt.transport.zeromq.AsyncZeroMQPubChannel(opts, **kwargs)
File "c:\salt\bin\lib\site-packages\salt\utils\crypt.py", line 114, in reinit_crypto
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 1063, in run
master, self.pub_channel = yield self.eval_master(self.opts, self.timeout, self.safe, failed)
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 1055, in run
value = future.result()
File "c:\salt\bin\lib\site-packages\tornado\concurrent.py", line 238, in result
yielded = next(result)
return salt.transport.zeromq.AsyncZeroMQPubChannel(opts, **kwargs)
auth.__singleton_init__(opts, io_loop=io_loop)
self.auth = salt.crypt.AsyncAuth(self.opts, io_loop=self.io_loop)
File "c:\salt\bin\lib\site-packages\salt\crypt.py", line 511, in __singleton_init__
salt.utils.crypt.reinit_crypto()
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 224, in reinit
_get_singleton().reinit()
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 215, in _get_singleton
_UserFriendlyRNG.__init__(self)
File "c:\salt\bin\lib\site-packages\tornado\concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
yielded = self.gen.throw(*exc_info)
File "c:\salt\bin\lib\site-packages\salt\minion.py", line 1215, in connect_master
raise_exc_info(self._exc_info)
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 307, in wrapper
File "c:\salt\bin\lib\site-packages\salt\minion.py", line 709, in eval_master
pub_channel = salt.transport.client.AsyncPubChannel.factory(self.opts, **factory_kwargs)
File "c:\salt\bin\lib\site-packages\salt\transport\client.py", line 162, in factory
File "c:\salt\bin\lib\site-packages\salt\transport\zeromq.py", line 394, in __init__
File "c:\salt\bin\lib\site-packages\salt\crypt.py", line 468, in __new__
File "c:\salt\bin\lib\site-packages\salt\utils\crypt.py", line 114, in reinit_crypto
Crypto.Random.atfork()
_UserFriendlyRNG.reinit()
File "c:\salt\bin\lib\site-packages\Crypto\Random\__init__.py", line 37, in atfork
_singleton = _LockingUserFriendlyRNG()
File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 159, in __init__
yielded = next(result)
File "c:\salt\bin\lib\site-packages\tornado\gen.py", line 307, in wrapper
pub_channel = salt.transport.client.AsyncPubChannel.factory(self.opts, **factory_kwargs)
File "c:\salt\bin\lib\site-packages\salt\transport\client.py", line 162, in factory
self.auth = salt.crypt.AsyncAuth(self.opts, io_loop=self.io_loop)
File "c:\salt\bin\lib\site-packages\salt\utils\crypt.py", line 114, in reinit_crypto
Crypto.Random.atfork()
_singleton = _LockingUserFriendlyRNG()
self._ec = _EntropyCollector(self._fa)
self._osrng = OSRNG.new()
2020-04-09 10:28:21,744 [salt.minion      :1041][CRITICAL][3312] Unexpected error while connecting to REDACTED
Pending Discussion

All 5 comments

@shortstack Thanks for submitting this issue! I'm sure you've tried it, what happened when you purged the minion log? Was it able to connect temporarily?

Also, can you give me any details about why the problem minion is different from the minions that are connecting properly?

i am currently waiting on someone on the other end to do that

Also, can you give me any details about why the problem minion is different from the minions that are connecting properly?

in this case, it's the same Windows build as the others in the environment. digging in to find any glaring differences but so far not finding anything.

is there some mechanism in place that should keep the log from getting larger over time or force it to roll over if this does happen again?

re deployed and it is working, but it was working for months before it became disconnected and started filling up the logs with stack traces also.

One thought: File "c:\salt\bin\lib\site-packages\salt\utils\crypt.py", line 114, in reinit_crypto Crypto.Random.atfork() looks suspect, in particular this line: File "c:\salt\bin\lib\site-packages\Crypto\Random\_UserFriendlyRNG.py", line 53, in __init__ AttributeError: 'module' object has no attribute 'new'.

It's possible that somehow your crypto library got borked. I don't know what/how, but that would definitely cause the minion to stop responding. There's no information in the docs about rolling over the logs, probably because most people just use rsyslog which manages that kind of thing for you.

Was this page helpful?
0 / 5 - 0 ratings