salt-minion.pid boot issue leads to high CPU load (Windows)

Created on 28 Mar 2018  路  7Comments  路  Source: saltstack/salt

Description of Issue/Question.

After several weeks of successful usage, salt-minion service started to crash on booting up:

2018-03-28 13:17:15,009 [salt.log.setup   :1133][ERROR   ][20736] An un-handled exception was caught by salt's global exception handler:
TypeError: unorderable types: NoneType() < int()
Traceback (most recent call last):
  File "c:\salt\bin\Scripts\salt-minion", line 26, in <module>
    salt_minion()
  File "c:\salt\bin\lib\site-packages\salt\scripts.py", line 168, in salt_minion
    minion.start()
  File "c:\salt\bin\lib\site-packages\salt\cli\daemons.py", line 342, in start
    super(Minion, self).start()
  File "c:\salt\bin\lib\site-packages\salt\utils\parsers.py", line 1039, in start
    self.prepare()
  File "c:\salt\bin\lib\site-packages\salt\cli\daemons.py", line 299, in prepare
    if self.check_running():
  File "c:\salt\bin\lib\site-packages\salt\utils\parsers.py", line 1022, in check_running
    if self.check_pidfile() and self.is_daemonized(pid):
  File "c:\salt\bin\lib\site-packages\salt\utils\parsers.py", line 1028, in is_daemonized
    return os_is_running(pid)
  File "c:\salt\bin\lib\site-packages\salt\utils\process.py", line 187, in os_is_running
    return psutil.pid_exists(pid)
  File "c:\salt\bin\lib\site-packages\psutil\__init__.py", line 1438, in pid_exists
    if pid < 0:
TypeError: unorderable types: NoneType() < int()

For some reason, C:\salt\var\run\salt-minion.pid contains exactly 4 null bytes:

8912741

I suppose a couple of cold system reboots could lead to such file content, but as a result salt-minion can't start and crashing here.

But the issue itself is not about starting salt-minion.
I've detected this situation only after some time, and all that time salt-minion windows service attempted to start minion endlessly, which leads to continuous high CPU usage:

468346854

On a long uptime this was leading to 100% cpu and further overall system perf degradation.

Setup

Windows 10 Pro x64

Steps to Reproduce Issue

  • Stop salt-minion windows service
  • Add empty file (or fill it with null-bytes) on C:\salt\var\run\salt-minion.pid
  • Start salt-minion windows service
  • salt-minion fails to start and consuming CPU

Versions Report

Salt Version:
           Salt: 2017.7.4

Dependency Versions:
           cffi: 1.10.0
       cherrypy: unknown
       dateutil: 2.6.0
      docker-py: Not Installed
          gitdb: 2.0.3
      gitpython: 2.1.3
          ioflo: Not Installed
         Jinja2: 2.9.6
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 1.0.6
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: 2.17
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)]
   python-gnupg: 0.4.0
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: 2.0.3
        timelib: 0.2.4
        Tornado: 4.5.1
            ZMQ: 4.1.6

System Versions:
           dist:
         locale: cp1251
        machine: AMD64
        release: 10
         system: Windows
        version: 10 10.0.16299 SP0 Multiprocessor Free
Pending Discussion fixed-pending-your-verification

Most helpful comment

@rallytime Absolutely. Many thanks =)

No more CPU peaks and *.pid file is being filled with proper new PID even if previously there was invalid data. Things going smooth.

All 7 comments

@dwoz or @twangboy can one of yall take a look at this?

Thanks,
Daniel

I am able to reproduce this when. The key is salt being run as a service.

It's not a permissions issue. SYSTEM has full control of all directories in the tree, including salt-minion.pid.

@landergate The above PR will fix the issue. We'll see what the reviewers say to see if this is the best fix or not.

@landergate Does #46786 fix this issue for you?

@rallytime Absolutely. Many thanks =)

No more CPU peaks and *.pid file is being filled with proper new PID even if previously there was invalid data. Things going smooth.

@landergate That's wonderful news! :D

Was this page helpful?
0 / 5 - 0 ratings