As the title says, I was managing the salt-minion config files with file.recurse. When I added the clean=True flag, the python process would go crazy and eat up all the cpu.
Starting the salt-minion-debug service, I would see messages like this repeated over and over
[DEBUG ] In saltenv 'base', ** considering ** path
'c:\salt\var\cache\salt\minion\files\base\salt-minion-config\minion.d\win.conf' to resolve
'salt://salt-minion-config/minion.d/win.conf'
[DEBUG ] Fetching file from saltenv 'base', ** attempting ** 'salt://salt-minion-config/minion.d/win.conf'
[DEBUG ] No dest file found
[INFO ] Fetching file from saltenv 'base', ** done ** 'salt-minion-config/minion.d/win.conf'
[DEBUG ] Jinja search path: '['c:\\salt\\var\\cache\\salt\\minion\\files\\base']'
[DEBUG ] In saltenv 'base', looking at rel_path 'salt-minion-config/minion.d/win.conf' to resolve
'salt://salt-minion-config/minion.d/win.conf'
The files in question were present on the minion, in the cached files and in the target directory, and on the master. Without the clean=True flag the python process behaves normally, and there are no errors like above.
#!pyobjects
File.recurse(
'salt-minion-config',
name = 'C:/salt/conf/minion.d',
source = 'salt://salt-minion-config/minion.d',
defaults = pillar('minion:defaults'),
template = 'jinja',
clean = True
)
Add/Remove "clean=True" argument
Salt Version:
Salt: 2016.3.0
Dependency Versions:
cffi: 1.6.0
cherrypy: 5.3.0
dateutil: 2.5.3
gitdb: 0.6.4
gitpython: 2.0.2
ioflo: 1.5.3
Jinja2: 2.8
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 1.0.4
msgpack-pure: Not Installed
msgpack-python: 0.4.7
mysql-python: 0.7.2.None
pycparser: 2.14
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:40:30) [MSC v.1500 64 bit (AMD64)]
python-gnupg: 0.3.8
PyYAML: 3.11
PyZMQ: 15.2.0
RAET: Not Installed
smmap: 0.9.0
timelib: 0.2.4
Tornado: 4.3
ZMQ: 4.1.2
System Versions:
dist:
machine: AMD64
release: 2012ServerR2
system: Windows
version: 2012ServerR2 6.3.9600 Multiprocessor Free
Thanks for reporting this.
I have confirmed this problem on 2016.3.3 and 2015.8.12
This is also a problem in 2016.11.1
Salt Version:
Salt: 2016.11.1
Dependency Versions:
cffi: 1.7.0
cherrypy: 7.1.0
dateutil: 2.5.3
gitdb: 0.6.4
gitpython: 2.0.8
ioflo: 1.5.5
Jinja2: 2.8
libgit2: Not Installed
libnacl: 1.4.5
M2Crypto: Not Installed
Mako: 1.0.4
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: 2.14
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.150
0 64 bit (AMD64)]
python-gnupg: 0.3.8
PyYAML: 3.11
PyZMQ: 16.0.1
RAET: Not Installed
smmap: 0.9.0
timelib: 0.2.4
Tornado: 4.4.1
ZMQ: 4.1.6
System Versions:
dist:
machine: AMD64
release: 2008ServerR2
system: Windows
version: 2008ServerR2 6.1.7601 SP1 Multiprocessor Free
Any movement here? 5 months "critical" - we (unfortunately) have a primarily Windows environment and have had to freeze a bunch of stuff manually to keep our SaltStack even usable.
I'm hitting this also. Thankfully I don't _need_ to use clean: True in my state; it would just be a nice-to-have.
Hitting this problem as well!
+1
+1
Tried to use this today and ran into this problem.
Melt too.
Hit his problem today and it breaks the idempotency of one of my states 馃槩
@twangboy can you take a look at this?
Thanks,
Daniel
@gtmanfred I can only replicate this when I remove inherited permissions from the target directory. And I get the following on 2016.11:
[INFO ] User root Executing command saltutil.find_job with jid 20171221161036392913
[DEBUG ] Command details {'tgt_type': 'list', 'jid': '20171221161036392913', 'tgt': ['gringo'], 'ret': '', 'user': 'root', 'arg': ['20171221160930356221'], 'fun': 'saltutil.find_job'}
[DEBUG ] Multiprocessing queue logging configured for the process running under PID: 728
[DEBUG ] Reading configuration from c:\salt\conf\minion
[DEBUG ] Including configuration from 'c:\salt\conf\minion.d\_schedule.conf'
[DEBUG ] Reading configuration from c:\salt\conf\minion.d\_schedule.conf
[DEBUG ] Traceback (most recent call last):
File "c:\salt-dev\salt\salt\states\file.py", line 1977, in managed
**kwargs)
File "c:\salt-dev\salt\salt\modules\file.py", line 4961, in manage_file
__opts__['cachedir'])
File "c:\salt-dev\salt\salt\utils\files.py", line 87, in copyfile
tgt = salt.utils.mkstemp(prefix=bname, dir=dname)
File "c:\salt-dev\salt\salt\utils\__init__.py", line 1623, in mkstemp
fd_, fpath = tempfile.mkstemp(*args, **kwargs)
File "c:\python27\lib\tempfile.py", line 318, in mkstemp
return _mkstemp_inner(dir, prefix, suffix, flags)
File "c:\python27\lib\tempfile.py", line 257, in _mkstemp_inner
raise IOError, (_errno.EEXIST, "No usable temporary file name found")
IOError: [Errno 17] No usable temporary file name found
[DEBUG ] In saltenv 'base', looking at rel_path 'test/chocoinstallun.sls' to resolve 'salt://test/chocoinstallun.sls'
[DEBUG ] In saltenv 'base', ** considering ** path 'c:\salt\var\cache\salt\minion\files\base\test\chocoinstallun.sls' to resolve 'salt://test/chocoinstallun.sls'
The temp file name is in the directory where it's trying to drop the files... in my case C:\Temp\Test
I've been playing with this, and it doesn't seem to break when using \\ in Windows my paths instead of /.
Previously, when using C:/opt/sensu/plugins as the destination, I just got the following looping infinitely (And I mean that, because I state running over a week when testing and had to force kill it when I came back).
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG ] retcode missing from client return
[DEBUG ] Checking whether jid 20171227064115677618 is still running
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG ] retcode missing from client return
[DEBUG ] Checking whether jid 20171227064115677618 is still running
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG ] retcode missing from client return
[DEBUG ] Checking whether jid 20171227064115677618 is still running
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG ] retcode missing from client return
[DEBUG ] Checking whether jid 20171227064115677618 is still running
[DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'ce8ab3d03566_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG ] retcode missing from client return
Now using C:\\\\opt\\\\sensu\\\\plugin in my map.jinja (Which is really ugly), it seems to be okay.
@twangboy Based on our conversation today, I think the root of this issue is a call to minionfs's file_list method which run's on the master and returns unix style paths. Windows minions call while generating keep. This happens in the recurse method on 2016.11 and in the _gen_recurse_managed_files method in more recent versions. It looks like the keep list can get populated with a mix of windows style and unix style paths based on what I'm seeing.
With all this I think I am fairly close to being able to pin this down with a regression test.
Upon further investigation, it looks like this was fixed in 2016.11.8.
@absolutejam Are you able to provide me the version salt you saw this in?
Using the setup @morganwillcock reported on #39021. I was able to reproduce the behavior described in this ticket on versions before 2016.11.8. Prior to the patch in 2016.11.8 the _clean_dir method is checking:
'c:\' in ['/' or 'C:\\'] (after escaping)
After the patch in 2016.11.8 it's checking for:
'c:\' in ['\', 'c:\', 'c:\\']
However, this still seems a bit buggy. For instance, if someone happens to put C:\\\\\\test\\\\\\ in their state file it would still cause an infinite loop. We need to make sure the loop always exits. So, now I am back to calling this an active bug. Here is the patch I have in mind to fully resolve this:
diff --git a/salt/states/file.py b/salt/states/file.py
index 128fc7165a..f42fa510d5 100644
--- a/salt/states/file.py
+++ b/salt/states/file.py
@@ -446,11 +446,8 @@ def _clean_dir(root, keep, exclude_pat):
while True:
fn_ = os.path.dirname(fn_)
real_keep.add(fn_)
- if fn_ in [
- os.sep,
- ''.join([os.path.splitdrive(fn_)[0], os.sep]),
- ''.join([os.path.splitdrive(fn_)[0], os.sep, os.sep])
- ]:
+ drive, path = os.path.splitdrive(fn_)
+ if not path.lstrip(os.sep):
break
def _delete_not_kept(nfn):
Pretty sure all this needs now is a regression test and this patch.
Could be related to https://github.com/saltstack/salt/issues/32128
This should be fixed with #46632.
Most helpful comment
+1