Salt: using clean=True parameter in file.recurse causes python process to spin out of control

Created on 5 Oct 2016  路  17Comments  路  Source: saltstack/salt

Description of Issue/Question

As the title says, I was managing the salt-minion config files with file.recurse. When I added the clean=True flag, the python process would go crazy and eat up all the cpu.

Starting the salt-minion-debug service, I would see messages like this repeated over and over

[DEBUG   ] In saltenv 'base', ** considering ** path
'c:\salt\var\cache\salt\minion\files\base\salt-minion-config\minion.d\win.conf' to resolve
'salt://salt-minion-config/minion.d/win.conf'
[DEBUG   ] Fetching file from saltenv 'base', ** attempting ** 'salt://salt-minion-config/minion.d/win.conf'
[DEBUG   ] No dest file found
[INFO    ] Fetching file from saltenv 'base', ** done ** 'salt-minion-config/minion.d/win.conf'
[DEBUG   ] Jinja search path: '['c:\\salt\\var\\cache\\salt\\minion\\files\\base']'
[DEBUG   ] In saltenv 'base', looking at rel_path 'salt-minion-config/minion.d/win.conf' to resolve
'salt://salt-minion-config/minion.d/win.conf'

The files in question were present on the minion, in the cached files and in the target directory, and on the master. Without the clean=True flag the python process behaves normally, and there are no errors like above.

Setup

#!pyobjects

File.recurse(
  'salt-minion-config',
  name     = 'C:/salt/conf/minion.d',
  source   = 'salt://salt-minion-config/minion.d',
  defaults = pillar('minion:defaults'),
  template = 'jinja',
  clean      = True
)

Steps to Reproduce Issue

Add/Remove "clean=True" argument

Versions Report

Salt Version:
           Salt: 2016.3.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: 5.3.0
       dateutil: 2.5.3
          gitdb: 0.6.4
      gitpython: 2.0.2
          ioflo: 1.5.3
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 1.0.4
   msgpack-pure: Not Installed
 msgpack-python: 0.4.7
   mysql-python: 0.7.2.None
      pycparser: 2.14
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 20:40:30) [MSC v.1500 64 bit (AMD64)]
   python-gnupg: 0.3.8
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: 0.2.4
        Tornado: 4.3
            ZMQ: 4.1.2

System Versions:
           dist:
        machine: AMD64
        release: 2012ServerR2
         system: Windows
        version: 2012ServerR2 6.3.9600  Multiprocessor Free
Bug P3 Platform State Module TEAM Platform fixed-pending-your-verification severity-high

Most helpful comment

+1

All 17 comments

Thanks for reporting this.

I have confirmed this problem on 2016.3.3 and 2015.8.12

This is also a problem in 2016.11.1

Salt Version:
Salt: 2016.11.1

Dependency Versions:
cffi: 1.7.0
cherrypy: 7.1.0
dateutil: 2.5.3
gitdb: 0.6.4
gitpython: 2.0.8
ioflo: 1.5.5
Jinja2: 2.8
libgit2: Not Installed
libnacl: 1.4.5
M2Crypto: Not Installed
Mako: 1.0.4
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: 2.14
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.150
0 64 bit (AMD64)]
python-gnupg: 0.3.8
PyYAML: 3.11
PyZMQ: 16.0.1
RAET: Not Installed
smmap: 0.9.0
timelib: 0.2.4
Tornado: 4.4.1
ZMQ: 4.1.6

System Versions:
dist:
machine: AMD64
release: 2008ServerR2
system: Windows
version: 2008ServerR2 6.1.7601 SP1 Multiprocessor Free

Any movement here? 5 months "critical" - we (unfortunately) have a primarily Windows environment and have had to freeze a bunch of stuff manually to keep our SaltStack even usable.

I'm hitting this also. Thankfully I don't _need_ to use clean: True in my state; it would just be a nice-to-have.

Hitting this problem as well!

+1

+1
Tried to use this today and ran into this problem.

Melt too.

Hit his problem today and it breaks the idempotency of one of my states 馃槩

@twangboy can you take a look at this?

Thanks,
Daniel

@gtmanfred I can only replicate this when I remove inherited permissions from the target directory. And I get the following on 2016.11:

[INFO    ] User root Executing command saltutil.find_job with jid 20171221161036392913
[DEBUG   ] Command details {'tgt_type': 'list', 'jid': '20171221161036392913', 'tgt': ['gringo'], 'ret': '', 'user': 'root', 'arg': ['20171221160930356221'], 'fun': 'saltutil.find_job'}
[DEBUG   ] Multiprocessing queue logging configured for the process running under PID: 728
[DEBUG   ] Reading configuration from c:\salt\conf\minion
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\_schedule.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\_schedule.conf
[DEBUG   ] Traceback (most recent call last):
  File "c:\salt-dev\salt\salt\states\file.py", line 1977, in managed
    **kwargs)
  File "c:\salt-dev\salt\salt\modules\file.py", line 4961, in manage_file
    __opts__['cachedir'])
  File "c:\salt-dev\salt\salt\utils\files.py", line 87, in copyfile
    tgt = salt.utils.mkstemp(prefix=bname, dir=dname)
  File "c:\salt-dev\salt\salt\utils\__init__.py", line 1623, in mkstemp
    fd_, fpath = tempfile.mkstemp(*args, **kwargs)
  File "c:\python27\lib\tempfile.py", line 318, in mkstemp
    return _mkstemp_inner(dir, prefix, suffix, flags)
  File "c:\python27\lib\tempfile.py", line 257, in _mkstemp_inner
    raise IOError, (_errno.EEXIST, "No usable temporary file name found")
IOError: [Errno 17] No usable temporary file name found

[DEBUG   ] In saltenv 'base', looking at rel_path 'test/chocoinstallun.sls' to resolve 'salt://test/chocoinstallun.sls'
[DEBUG   ] In saltenv 'base', ** considering ** path 'c:\salt\var\cache\salt\minion\files\base\test\chocoinstallun.sls' to resolve 'salt://test/chocoinstallun.sls'

The temp file name is in the directory where it's trying to drop the files... in my case C:\Temp\Test

I've been playing with this, and it doesn't seem to break when using \\ in Windows my paths instead of /.

Previously, when using C:/opt/sensu/plugins as the destination, I just got the following looping infinitely (And I mean that, because I state running over a week when testing and had to force kill it when I came back).

[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171227064115677618 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171227064115677618 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171227064115677618 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171227064115677618 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return

Now using C:\\\\opt\\\\sensu\\\\plugin in my map.jinja (Which is really ugly), it seems to be okay.

@twangboy Based on our conversation today, I think the root of this issue is a call to minionfs's file_list method which run's on the master and returns unix style paths. Windows minions call while generating keep. This happens in the recurse method on 2016.11 and in the _gen_recurse_managed_files method in more recent versions. It looks like the keep list can get populated with a mix of windows style and unix style paths based on what I'm seeing.

With all this I think I am fairly close to being able to pin this down with a regression test.

Upon further investigation, it looks like this was fixed in 2016.11.8.

@absolutejam Are you able to provide me the version salt you saw this in?

Using the setup @morganwillcock reported on #39021. I was able to reproduce the behavior described in this ticket on versions before 2016.11.8. Prior to the patch in 2016.11.8 the _clean_dir method is checking:

'c:\' in ['/' or 'C:\\'] (after escaping)

After the patch in 2016.11.8 it's checking for:

'c:\' in ['\', 'c:\', 'c:\\']

However, this still seems a bit buggy. For instance, if someone happens to put C:\\\\\\test\\\\\\ in their state file it would still cause an infinite loop. We need to make sure the loop always exits. So, now I am back to calling this an active bug. Here is the patch I have in mind to fully resolve this:

diff --git a/salt/states/file.py b/salt/states/file.py
index 128fc7165a..f42fa510d5 100644
--- a/salt/states/file.py
+++ b/salt/states/file.py
@@ -446,11 +446,8 @@ def _clean_dir(root, keep, exclude_pat):
             while True:
                 fn_ = os.path.dirname(fn_)
                 real_keep.add(fn_)
-                if fn_ in [
-                           os.sep,
-                           ''.join([os.path.splitdrive(fn_)[0], os.sep]),
-                           ''.join([os.path.splitdrive(fn_)[0], os.sep, os.sep])
-                          ]:
+                drive, path = os.path.splitdrive(fn_)
+                if not path.lstrip(os.sep):
                     break

     def _delete_not_kept(nfn):

Pretty sure all this needs now is a regression test and this patch.

This should be fixed with #46632.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

saurabhnemade picture saurabhnemade  路  3Comments

zieba88 picture zieba88  路  3Comments

Arguros picture Arguros  路  3Comments

golmaal picture golmaal  路  3Comments

udf2457 picture udf2457  路  3Comments