I recently upgraded master and minion from 2014.1.0 to 2014.1.10. And now I'm hitting
2014-08-22 11:11:25,697 [salt.minion ][WARNING ] The minion function caused an exception
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/salt/minion.py", line 797, in _thread_return
return_data = func(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/salt/modules/state.py", line 394, in sls
ret = st_.state.call_high(high_)
File "/usr/lib/pymodules/python2.7/salt/state.py", line 1715, in call_high
ret = self.call_chunks(chunks)
File "/usr/lib/pymodules/python2.7/salt/state.py", line 1437, in call_chunks
running = self.call_chunk(low, running, chunks)
File "/usr/lib/pymodules/python2.7/salt/state.py", line 1652, in call_chunk
running = self.call_chunk(low, running, chunks)
File "/usr/lib/pymodules/python2.7/salt/state.py", line 1652, in call_chunk
...
File "/usr/lib/pymodules/python2.7/salt/state.py", line 1652, in call_chunk
running = self.call_chunk(low, running, chunks)
File "/usr/lib/pymodules/python2.7/salt/state.py", line 1562, in call_chunk
status = self.check_requisite(low, running, chunks, True)
File "/usr/lib/pymodules/python2.7/salt/state.py", line 1482, in check_requisite
req = trim_req(req)
File "/usr/lib/pymodules/python2.7/salt/state.py", line 94, in trim_req
reqfirst = next(iter(req))
RuntimeError: maximum recursion depth exceeded while calling a Python object
I've reduced this down to:
my-application:
file.touch:
- name: /tmp/file-c
graceful-shutdown:
file.touch:
- name: /tmp/file-a
- prereq:
- test: configuration-file
configuration-file:
test.touch:
- name: /tmp/file-b
- require:
- test: my-application
Exeucting it using state.sls module triggers the exception.
I know there have been other instances of prereq issues (#8210, #8785), but I don't think neither of them covers this case which I believe is a bug with priority than those (since they define an explicit recursion, and I don't).
In production I seem to be hitting this when a state fails. I have one that sometimes intermittently fails. When it succeeds, I am not getting this stacktrace. I can't really explain why it only triggers on failure, but writing here maybe more for my own sake, don't know.
Master version:
$ salt --versions-report
Salt: 2014.1.10
Python: 2.7.3 (default, Feb 27 2014, 19:58:35)
Jinja2: 2.6
M2Crypto: 0.21.1
msgpack-python: 0.1.10
msgpack-pure: Not Installed
pycrypto: 2.4.1
PyYAML: 3.10
PyZMQ: 13.0.0
ZMQ: 3.2.2
Minion version:
salt-minion --versions-report
Salt: 2014.1.10
Python: 2.7.3 (default, Feb 27 2014, 19:58:35)
Jinja2: 2.6
M2Crypto: 0.21.1
msgpack-python: 0.1.10
msgpack-pure: Not Installed
pycrypto: 2.4.1
PyYAML: 3.10
PyZMQ: 13.0.0
ZMQ: 3.2.2
Very interesting. I thought we had stamped out most of those but this looks like it might be another one. Thanks for the very clean test case. That should help make it much easier to find and fix this one!
Verified this problem persists in 2014.7.
@cachedout We recently upgraded to 2015.5.0 and am still seeing these recursions happen. It's very annoying and doesn't spread a culture of "things working" within our company. :-) I tested my minimal test case in the original issue description and can recreate the recursion limit every time using state.sls test=True.
I just hit this (or something like #20143) again and spent three days of my life trying various constellations of state configurations to avoid the infinite recursion. In my case I had quite a few newly added states so it was really hard for me to drill down what triggered this.
Out of every bug I've hit (and I've hit a few) in Salt this is definitely the most frustrating as it's really really hard to debug. Running the salt minion in non-daemon mode with DEBUG log level enabled did not help at all. I think this pull request really could be split into two different steps:
# salt-minion --versions-report
Salt Version:
Salt: 2015.8.8.2
Dependency Versions:
Jinja2: 2.6
M2Crypto: 0.21.1
Mako: Not Installed
PyYAML: 3.10
PyZMQ: 14.0.1
Python: 2.7.3 (default, Jun 22 2015, 19:33:41)
RAET: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.4
cffi: Not Installed
cherrypy: Not Installed
dateutil: 1.5
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
libgit2: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.3.0
mysql-python: 1.2.3
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
python-gnupg: Not Installed
smmap: Not Installed
timelib: Not Installed
System Versions:
dist: Ubuntu 12.04 precise
machine: x86_64
release: 3.2.0-99-virtual
system: Ubuntu 12.04 precise
Some general remarks regarding my issue:
state.highstate a second time makes my execution pass.call_chunk isn't done with identical parameters.Let me know if you have any input on how I can debug this further?
Let me know if you have any input on how I can debug this further?
Can I make logging even more verbose than DEBUG?
Workaround: Allright, I figured out what went wrong on my end. Here's what I did: I patched the line above the recursive call to /usr/lib/pymodules/python2.7/salt/state.py:call_chunk to do log the arguments to the call_chunk function it's just about to call:
log.trace("low=%s running=%s chunks=%s", low, running, chunks);
I then changed my minion configuration (both log_level and log_level_filename) to trace. I copied the output for low (I think!), started ipython and did a=<pasted copied text>;import pprint;pprint.pprint(a);. In the output I could see that I had a state that had a requisite that didn't exist. I corrected the requisite and voil谩 things are working.
I hope I'll never have to do this again!
I've hit this on 2016.11.3 with a very simple state. This is nginx/init.sls:
nginx_install:
pkg.installed:
- name: nginx
nginx_service:
service.running:
- name: nginx
- prereq:
- nginx_install
#nginx_config:
# file.managed:
# - name: /etc/nginx/nginx.conf
# - mode:
This is my topfile:
base:
davinci-staging:
- nginx
Traceback resulting from salt davinci-staging state.apply test=True, salt davinci-staging state.apply nginx test=True, and salt davinci-staging state.apply nginx.
Setting test=True does not matter in my case, the tracebacks are identical. Also, running the highstate several times did not make a difference.
davinci-staging:
The minion function caused an exception: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1420, in _thread_return
return_data = executor.execute()
File "/usr/lib/python2.7/dist-packages/salt/executors/direct_call.py", line 28, in execute
return self.func(*self.args, **self.kwargs)
File "/usr/lib/python2.7/dist-packages/salt/modules/state.py", line 543, in apply_
return sls(mods, **kwargs)
File "/usr/lib/python2.7/dist-packages/salt/modules/state.py", line 1039, in sls
ret = st_.state.call_high(high_, orchestration_jid)
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2299, in call_high
ret = dict(list(disabled.items()) + list(self.call_chunks(chunks).items()))
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1815, in call_chunks
running = self.call_chunk(low, running, chunks)
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2100, in call_chunk
running = self.call_chunk(low, running, chunks)
<SNIP>
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2100, in call_chunk
running = self.call_chunk(low, running, chunks)
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2100, in call_chunk
running = self.call_chunk(low, running, chunks)
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1997, in call_chunk
low = self._mod_aggregate(low, running, chunks)
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 762, in _mod_aggregate
agg_opt = self.functions['config.option']('state_aggregate')
File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1086, in __getitem__
func = super(LazyLoader, self).__getitem__(item)
File "/usr/lib/python2.7/dist-packages/salt/utils/lazy.py", line 91, in __getitem__
if self._missing(key):
RuntimeError: maximum recursion depth exceeded
Versions:
root@davinci-staging:~# salt-minion --versions-report
Salt Version:
Salt: 2016.11.3
Dependency Versions:
cffi: 0.8.6
cherrypy: Not Installed
dateutil: 2.2
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.9.4
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.2
mysql-python: 1.2.3
pycparser: 2.10
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 14.4.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
System Versions:
dist: debian 8.7
machine: x86_64
release: 3.16.0-4-amd64
system: Linux
version: debian 8.7
I was able to work around my issue by using prereq_in. This state works fine:
nginx_install:
pkg.installed:
- name: nginx
- prereq_in:
- nginx_service
nginx_service:
service.running:
- name: nginx
#nginx_config:
# file.managed:
# - name: /etc/nginx/nginx.conf
# - mode:
@cdunklau The problem with your sls is here:
- prereq:
- nginx_install
AFAIK that is invalid. It should be written as:
- prereq:
- pkg: nginx_install
Do that, and it works for me.
I've spent a couple of days pulling my hair out here over this bug. Following @JensRantil's lead, I dumped the low, running and chunks variables. In my case, I couldn't see any broken references - but there was so much data. Just one line from the log was 67kb, so I was unable to manually verify everything. Eventually I wrote a quick hacky script to see where the problem might be.
#!/usr/bin/env python
import sys
from collections import OrderedDict
import yaml
# add chunks line here
chunks={...}
def check_exists(state, name):
for block in chunks:
if block['state'] == state and name in (block['__id__'], block['name']):
return True
return False
def print_yaml(chunks):
print(yaml.dump(chunks, default_flow_style=False, explicit_start=True))
def main(chunks):
try:
if sys.argv[1] == 'yaml':
print_yaml(chunks)
sys.exit(0)
except IndexError:
pass
for block in chunks:
for key, value in block.items():
if 'require' in block:
for requisite_type in ('require', 'watch', 'prereq', 'use', 'onchanges', 'onfail'):
for requisite in (requisite_type, ''.join([requisite_type, '_in'])):
for req in block.get(requisite, []):
state, name = req.keys()[0], req.values()[0]
if not check_exists(state, name):
print("MISSING! '{0}: {1}'".format(state, name))
#else:
# print("FOUND! '{0}: {1}'".format(state, name))
if __name__ == "__main__":
main(chunks)
Initially it pointed out a large number of sls: somefile references which I didn't handle in the script. I got rid of those just to be sure they weren't somehow related to the recursion. There were a few other minor things it picked up on too, but fixing them all proved to be in vain. Nothing seemed to help me avoid the infinite recursion of doom.
At this point, I decided to start deleting everything bit by bit in our staging environment until I got something working. This proved difficult as there were various things generated automatically via pillar, lots of things including lots of other things that were required to get things working, etc. and each test would take some time to run. However after a lot of effort, I got everything reduced down to the minimum.
Here is my minimum valid example (to the best of my knowledge anyway) which triggers the infinite recursion exception:
Install blog:
pkg.installed:
- name: blog
Catch-all for blog configuration settings:
test.succeed_without_changes:
- require:
- pkg: Install blog
{% if True %}
Some specific blog configuration setting:
test.succeed_without_changes:
- require_in:
- test: Catch-all for blog configuration settings
- require:
- pkg: Install blog
{% endif %}
Do this first:
test.succeed_without_changes:
- prereq:
- pkg: Install blog
- test: Catch-all for blog configuration settings
Do this last:
test.succeed_without_changes:
- onchanges:
- test: Do this first
- require:
- pkg: Install blog
- test: Catch-all for blog configuration settings
This somewhat describes the core of one of our environments. We have a package or some other deployment mechanism (the package blog in this example) which is installed or upgraded.
We then have a bunch of configuration blocks in our sls file to verify permissions and ownership, update configuration files included with the blog package with the correct settings, etc. In this example, I have included one such block called Some specific blog configuration setting. Some of these settings will only render given specific conditions, hence the {% if Test %} Jinja2 representing the possibility that that block may not exist.
When a block somewhere needs to depend on the blog package being completely installed, we depend on the Catch-all for blog configuration settings block (as opposed to the entire sls: state file, since there could be other things in there for other packages or whatever). All tests are set to require_in the Catch-all - we can't have Catch-all depend on tests since those tests may not render (and we certainly don't want to duplicate all of our other Jinja2 tests just for this block). We have to use require_in instead.
So just these three blocks are working all well and good, but then we decide that we need to put our application behind a load balancer. We don't want to be serving traffic from the blog application on the host while it is in the middle of an upgrade, so we have a custom execution module and a module.run call to publish.publish somewhere in a prereq block to remove the instance from the load balancer during the upgrade, and finally a similar block to add it back, using onchanges against the aforementioned block. The Do this first and Do this last blocks represent these steps.
So to clarify:
Do this first has a prereq on the installation of blog, as well as any configuration changes (although the later won't work because we're just using test.succeed_without_changes... but it's just an example).Do this last has an onchanges against Do this first, so if the host was removed from the load balancer prior to an upgrade or configuration change, this ensures it is re-added. This block requires Catch-all for blog configuration settings to have completed first, since we don't want it running prior. I also depend on blog just to be sure (in case it was ever removed from the Catch-all block), but be hit an infinite recursion here regardless.So what is the work-around? There isn't one that I can see, so there is only one bad option, and one ugly option.
Remove the test: Catch-all for blog configuration settings requisite from the Do this first block. I guess this will cause any configuration changes to be made without triggering the Do this first/Do this last blocks. Not ideal, but better than nothing...
Remove the requisite pkg: Install blog from the Some specific blog configuration setting block - but this is very dangerous! In my testing, this will cause the block to always trigger before the blog package is installed. Even adding - order: 1 to the Install blog block and - order: last to the Some specific blog configuration setting block doesn't seem to actually change the order.
I'm seeing this in 2016.11.4.
Closed via #42985
@cachedout Is https://github.com/saltstack/salt/issues/15171#issuecomment-323872701 also supposed to close this issue? It's still open...
Eh? This is #15171. Which comment are you linking to?
@cachedout The one that states "Closed via #42985".
FYI, I just open sourced a small utility that might help people in finding circular dependencies between states: https://github.com/JensRantil/salt-circular Might help someone else.
Most helpful comment
FYI, I just open sourced a small utility that might help people in finding circular dependencies between states: https://github.com/JensRantil/salt-circular Might help someone else.