Description
Running salt-call slsutil.renderer is causing a freeze after the output.
Setup
install salt and render a file with saltutil.renderer salt-call slsutil.renderer watch it stick and have to hit ctrl+c to get a prompt back.
Steps to Reproduce the behavior
salt-call slsutil.renderer salt://test/init.sls file doesn't need to exist.
Expected behavior
salt-call exists after modules return.
Screenshots
If applicable, add screenshots to help explain your problem.
Versions Report
salt-call test.versions
local:
Salt Version:
Salt: 3001rc1
Dependency Versions:
cffi: Not Installed
cherrypy: 8.9.1
dateutil: 2.7.3
docker-py: Not Installed
gitdb: 2.0.6
gitpython: 3.0.7
Jinja2: 2.10.1
libgit2: 0.28.3
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: 3.6.1
pygit2: 1.0.3
Python: 3.8.2 (default, Apr 27 2020, 15:53:34)
python-gnupg: 0.4.5
PyYAML: 5.3.1
PyZMQ: 18.1.1
smmap: 2.0.5
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: ubuntu 20.04 focal
locale: utf-8
machine: x86_64
release: 5.4.0-29-generic
system: Linux
version: Ubuntu 20.04 focal
As we are heavily using slsutil.renderer for syntax validation in the Git hooks for sls that looks pretty bad to me.
@whytewolf Thanks for reporting this issue.
Sounds like the same thing as #57456
changing the severity to high - subjective definitions but my gut tells me this is high vs medium
I guess this bug was silently fixed by other changes.
I did tried to reproduce using 2020-06-09 git with an non-existent file:
root@marvin:~# salt-call slsutil.renderer salt://test/init.sls
[ERROR ] Unable to fetch file salt://test/init.sls from saltenv base.
[ERROR ] Template was specified incorrectly: False
local:
----------
And with an existent one:
root@marvin:~# salt-call slsutil.renderer salt://lvmtst/init.sls
local:
----------
lv_opt:
----------
lvm.lv_present:
|_
----------
name:
lala
|_
----------
vgname:
marvinvg01
|_
----------
size:
512
salt --versions-report
Salt Version:
Salt: 3001
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.11.2
libgit2: Not Installed
M2Crypto: 0.35.2
Mako: 1.1.3
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: 3.9.7
pygit2: Not Installed
Python: 3.8.3 (default, May 15 2020, 05:51:00)
python-gnupg: Not Installed
PyYAML: 3.13
PyZMQ: 18.1.1
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: slackware 14.2 current
locale: utf-8
machine: i686
release: 5.4.45
system: Linux
version: Slackware 14.2 current
@piterpunk or it's not reproducible on Slackware. Do you see the problem using the RC instead?
@piterpunk or it's not reproducible on Slackware. Do you see the problem using the RC instead?
The reproduction issue doesn't seem to be Slackware related. Just created a CentOS 8 machine at Linode and I have the same results, with no hangs, with the code from git:
# salt-call slsutil.renderer salt://test/non-existent-file.sls
[ERROR ] Unable to fetch file salt://test/non-existent-file.sls from saltenv base.
[ERROR ] Template was specified incorrectly: False
local:
----------
# salt-call slsutil.renderer salt://othertst/init.sls
local:
----------
sshd:
service.running
salt --versions-report
Salt Version:
Salt: 3001rc1-70-gb95213ec90
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.11.2
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: 3.9.7
pygit2: Not Installed
Python: 3.8.0 (default, May 7 2020, 02:49:39)
python-gnupg: Not Installed
PyYAML: 5.3.1
PyZMQ: 19.0.1
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: centos 8 Core
locale: utf-8
machine: x86_64
release: 4.18.0-147.8.1.el8_1.x86_64
system: Linux
version: CentOS Linux 8 Core
@piterpunk is that the RC release or a later version?
The people who have reported it have all been using Ubuntu 20.04.
@piterpunk is that the RC release or a later version?
The people who have reported it have all been using Ubuntu 20.04.
It's a later version. I usually check the bug against the last version to see if it's already fixed and not start to write code for nothing.
I see now that I have to understand better the development dynamics here.
Should I try the current code on Ubuntu 20.04 to see if it this issue is gone there too and, if solved, bisect to find the commit which solved the problem?
@piterpunk or it's not reproducible on Slackware. Do you see the problem using the RC instead?
You should check this. First try the RC code and confirm the issue is there. Then if the current code on the same system doesn't have the issue then it is fixed.
Tested 3001rc1 on Slackware machine and the issue was present there:
[TRACE ] data = {'local': OrderedDict([('lv_opt', OrderedDict([('lvm.lv_present', [OrderedDict([('name', 'lala')]), OrderedDict([('vgname', 'marvinvg01')]), OrderedDict([('size', 512)])])])), ('pv_fail', OrderedDict([('lvm.pv_present', [OrderedDict([('name', '/dev/vdc')])])]))])}
local:
----------
lv_opt:
----------
lvm.lv_present:
|_
----------
name:
lala
|_
----------
vgname:
marvinvg01
|_
----------
size:
512
pv_fail:
----------
lvm.pv_present:
|_
----------
name:
/dev/vdc
[DEBUG ] Closing AsyncZeroMQReqChannel instance
The execution waits forever in this last line, as described by OP.
salt --versions-report
Salt Version:
Salt: 3001
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.11.2
libgit2: Not Installed
M2Crypto: 0.35.2
Mako: 1.1.3
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: 3.9.7
pygit2: Not Installed
Python: 3.8.3 (default, May 15 2020, 02:05:39)
python-gnupg: Not Installed
PyYAML: 3.13
PyZMQ: 19.0.1
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: slackware 14.2 current
locale: utf-8
machine: x86_64
release: 5.4.43
system: Linux
version: Slackware 14.2 current
trying something on this issue with the title, versions bug is reported in the brackets. We may all hate it so only trying it on this one issue ATM this and the referenced issue point to more issues and we have the Open Core Team looking at what we can fix in the point release 3001.1 and maybe there is more to fix Magnesium - likely.
@sagetherage usually that's what the labels are for, and you use the milestone to show the planned fix release
yes, we abuse label, though
and right now I can't give community members the ability to apply labels, working on it.
Also tested it on Ubuntu 20.04:
Salt Version:
Salt: 3001
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.7.3
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.10.1
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: Not Installed
pycryptodome: 3.6.1
pygit2: Not Installed
Python: 3.8.2 (default, Apr 27 2020, 15:53:34)
python-gnupg: 0.4.5
PyYAML: 5.3.1
PyZMQ: 18.1.1
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: ubuntu 20.04 focal
locale: utf-8
machine: x86_64
release: 5.4.0-1015-aws
system: Linux
version: Ubuntu 20.04 focal
In our case seems to be failing when we have more then file in definitions. For instance a simple sls with a single file:
local:
----------
ID: vim_rc_file
Function: file.managed
Name: /root/.vimrc
Result: True
Comment: File /root/.vimrc is in the correct state
Started: 13:17:07.210639
Duration: 21.916 ms
Changes:
Summary for local
------------
Succeeded: 1
Failed: 0
------------
Total states run: 1
Total run time: 21.916 ms
was ok. However:
local:
----------
ID: syslog
Function: pkg.latest
Name: rsyslog
Result: True
Comment: Package rsyslog is already up-to-date
Started: 13:17:42.593908
Duration: 1531.023 ms
Changes:
----------
ID: authlogs_to_logserver_config
Function: file.managed
Name: /etc/rsyslog.d/50-authlog.conf
Result: True
Comment: File /etc/rsyslog.d/50-authlog.conf is in the correct state
Started: 13:17:44.129623
Duration: 17.261 ms
Changes:
----------
ID: syslogs_to_logserver_config
Function: file.managed
Name: /etc/rsyslog.d/50-syslogs.conf
Result: True
Comment: File /etc/rsyslog.d/50-syslogs.conf is in the correct state
Started: 13:17:44.147013
Duration: 8.082 ms
Changes:
----------
ID: syslog
Function: service.running
Name: rsyslog
Result: True
Comment: The service rsyslog is already running
Started: 13:17:44.155312
Duration: 31.784 ms
Changes:
Summary for local
------------
Succeeded: 4
Failed: 0
------------
Total states run: 4
Total run time: 1.588 s
^C
did not exit (thus the Ctrl+C).
In our tests it fails to exit for states having service.running, cmd.wait, archive.extracted, so maybe it has something to do with the cleanup.
Yeah, it seems every salt-call invocation on FreeBSD 13 and Salt 3001 has this problem
@CostelLupoaie pointed a bug applying states with more than one file.
@krionbsd all salt-call invocations on FreeBSD 13 and Salt 3001
The guess is that they are all related to the original slsutil.renderer bug?
When I was working on #57669, I had the same issue of the neverending "salt-call". There it was related to modules/disks.py and the optional loading.
It was solved with commit 8018b2a maybe it's something similar happening here.
Can confirm this happens in some of my states in 3001 on Ubuntu 20.04. The same states running on a Ubuntu 16.04 system don't hang (at least, not as far as I've seen). This is in my bento test environment so it's pretty vanilla. I can provide more details if you think it would be helpful.
@xcorvis the more of an MCVE, the better. I haven't been able to repro personally - either on 20.04, or FreeBSD 12.1. So I'm sure that there is some essential difference between my setup and everyone else seeing this problem.
@waynew Sure. This state proved reproducible:
nginx:
pkg.installed
On the master, (salt '*' state.sls teststate) it executed normally. On minion-xenial (salt-call state.sls teststate) this worked fine, no issues. On minion-focal (same command) this executed but hung on the "Closing AsyncZeroMQReqChannel instance" line. It also hung with test=true, and whether or not nginx was already installed.
The only odd settings on my master server might be top_file_merging_strategy: same and default_top: base, otherwise it's a pretty vanilla setup. Minions had no special config. These were fresh VMs made from the most recent bento virtualbox images. I used vagrant with salt-boostrap and installed ifupdown and virtualbox guest additions.
master salt -V:
Salt Version:
Salt: 3001
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.7.3
docker-py: Not Installed
gitdb: 2.0.6
gitpython: 3.0.7
Jinja2: 2.10.1
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: Not Installed
pycryptodome: 3.6.1
pygit2: Not Installed
Python: 3.8.2 (default, Apr 27 2020, 15:53:34)
python-gnupg: 0.4.5
PyYAML: 5.3.1
PyZMQ: 18.1.1
smmap: 2.0.5
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: ubuntu 20.04 focal
locale: utf-8
machine: x86_64
release: 5.4.0-31-generic
system: Linux
version: Ubuntu 20.04 focal
minion-xenial salt-call -V:
Salt Version:
Salt: 3001
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.4.2
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.8
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: Not Installed
pycryptodome: 3.4.7
pygit2: Not Installed
Python: 3.5.2 (default, Apr 16 2020, 17:47:17)
python-gnupg: 0.3.8
PyYAML: 3.11
PyZMQ: 17.1.2
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.1.4
System Versions:
dist: ubuntu 16.04 Xenial Xerus
locale: UTF-8
machine: x86_64
release: 4.4.0-179-generic
system: Linux
version: Ubuntu 16.04 Xenial Xerus
minion-focal salt-call -V:
Salt Version:
Salt: 3001
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.7.3
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.10.1
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: Not Installed
pycryptodome: 3.6.1
pygit2: Not Installed
Python: 3.8.2 (default, Apr 27 2020, 15:53:34)
python-gnupg: 0.4.5
PyYAML: 5.3.1
PyZMQ: 18.1.1
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: ubuntu 20.04 focal
locale: utf-8
machine: x86_64
release: 5.4.0-31-generic
system: Linux
version: Ubuntu 20.04 focal
I'm experiencing the same issue, with a pretty similar setup, master is Ubuntu 18.04, minion is 20.04.
Definitely something screwy going on here.
Thanks to @xcorvis I was finally able to repro this:
Running salt-call with strace -y, we get this:
poll([{fd=10<anon_inode:[eventfd]>, events=POLLIN}], 1, -1
That's where it hangs indefinitely. From the poll manpage we can see:
poll(struct pollfd fds[], nfds_t nfds, int timeout);
...
If the value of
timeout is -1, the poll blocks indefinitely.
So whatever we're polling for here, we're doing it for-ev-errrr. I'll post updates as I find them.
@whytewolf we didn't get this into the point release as we don't have the fix yet, so moving to Magnesium.
Does transport: tcp help? The issue seems to be related to ZMQ: https://github.com/saltstack/salt/issues/57456#issuecomment-663153437
@whytewolf I believe I have the fix for your hang. It works for my environment. I would appreciate it if you tested it in your environment to make sure it works for you so you don't need to wait for another release.
Most helpful comment
Definitely something screwy going on here.
Thanks to @xcorvis I was finally able to repro this:
Running salt-call with
strace -y, we get this:That's where it hangs indefinitely. From the
pollmanpage we can see:So whatever we're polling for here, we're doing it for-ev-errrr. I'll post updates as I find them.