The salt minion frequently hangs when restarting some services. Some more than others.
One that I can regularly reproduce from the minion (but not when I run from the command line) is icinga2:
[INFO ] Loading fresh modules for state activity
[INFO ] Completed state [icinga2] at time 20:03:14.168931 (duration_in_ms=27004.632)
[INFO ] Running state [icinga2] at time 20:03:14.172583
[INFO ] Executing state service.running for [icinga2]
[INFO ] Executing command '/usr/sbin/service -l' in directory '/root'
[INFO ] Executing command '/usr/sbin/service icinga2 onestatus' in directory '/root'
[INFO ] Executing command '/usr/sbin/service -l' in directory '/root'
[INFO ] Executing command '/usr/sbin/service icinga2 rcvar' in directory '/root'
[INFO ] Executing command '/usr/sbin/service icinga2 onestart' in directory '/root'
<hangs forever>
load: 0.19 cmd: icinga2 70106 [sbwait] 511.95r 0.04u 0.04s 0% 19504k
Like I mentioned, I can run service icinga2 onestart
, onerestart, onestop, and other options from the command line until I'm blue in the face. They never hang. It's only when salt runs them on my behalf.
# icinga2.sls
icinga2:
pkg.installed: []
service.running:
- enable: true
salt-call -l info state.sls icinga2
```
Salt Version:
Salt: 2019.2.2
Dependency Versions:
cffi: 1.13.2
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.10.1
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: 2.19
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 3.7.6 (default, Jan 30 2020, 01:18:54)
python-gnupg: Not Installed
PyYAML: 5.2
PyZMQ: 18.1.1
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.1
System Versions:
dist:
locale: UTF-8
machine: amd64
release: 12.1-RELEASE-p2
system: FreeBSD
version: Not Installed
```
This also affected FreeBSD 11.
It looks like icinga2
isn't detaching from the terminal, but the rc.d script shows it being started with the -d
flag.
Also, hitting CTRL+C
to get salt-minion
to exit doesn't kill icinga2 as if it were actually not daemonized:
icinga 33375 0.0 0.5 50672 20220 - Ss 20:18 0:00.03 /usr/local/lib/icinga2/sbin/icinga2 daemon -d -e /var/log/icinga2/error.log -c /usr/local/etc/icinga2/icinga2.conf
Same issue with PostgreSQL
confirmed with freebsd 12 and salt 3000
those labels are when the fix will be released, not where they were found -- updated, can you give me a severity and priority please? I still need those until we update our process in an open SEP @cmcmarrow
Severity: apocalyptic
Priority: 11
;)
But seriously--thanks for looking into this.
The following command also hangs:
salt-call -l debug --local cmd.retcode '/usr/sbin/service icinga2 onerestart'
which is the command being executed by the freebsdservice.restart
function.
Looks like it hangs at the proc.run
call in the cmdmod._run
function... which is a TimedProc.run
call.
So, I tracked it down to the python call... It's hanging at self.process.communicate
in the following function:
salt/utils/timed_subprocess.py
TimedProc.run.recieve()
def receive():
if self.with_communicate:
self.stdout, self.stderr = self.process.communicate(input=self.stdin) <======
elif self.wait:
self.process.wait()
That's where it's hanging... probably something in the setup of that class.
Interestingly, if I run the same command with the cron
service, everything functions normally. Possibly a problem with how the icinga2
service is configured?
The following works:
>>> import salt.utils.timed_subprocess
>>> proc = salt.utils.timed_subprocess.TimedProc('service cron onerestart', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
>>> proc.run()
0
>>> proc.stdout
b'Stopping cron.\nStarting cron.\n'
The following hangs:
>>> import salt.utils.timed_subprocess
>>> proc = salt.utils.timed_subprocess.TimedProc('service icinga2 onerestart', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
>>> proc.run()
The following not using salt utils works:
>>>import subprocess
>>> cmd = subprocess.Popen('service cron onerestart', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT).communicate()[0]
>>> cmd
b'Stopping cron.\nStarting cron.\n'
The following not using salt utils hangs:
>>> import subprocess
>>> cmd = subprocess.Popen('service icinga2 onerestart', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT).communicate()[0]
So probably an issue with either python subprocess library or icinga2
I don't know enough about FreeBSD to troubleshoot this further. @dwoz @dmurphy18 ?
I've had issues in the past with communicate - I believe it's usually when they're not dealing with stdout/stdin in the subprocess.
Interestingly I could do a subprocess.call
and it worked fine, but Popen
with communicate
would fail.
Just for reference -> https://github.com/saltstack/salt/issues/44848
I can't reproduce this failure on salt3000.3 which I'm going to port to FreeBSD in the next days. 2019.2.5 is still affected and has this issue though, stay tuned we're working on it.
@darkpixel as we upgraded sysutils/py-salt
to 3000.3 yesterday, it would be awesome if you could test this version and this issue on it. I assume official FreeBSD 3000.3 package will be available on 12-STABLE and on HEAD tomorrow, I did not commit it into quarter builds so it won't be available on security branches.
Building on my salt master now... :)
Salt master upgraded without issues. Just upgraded 21 FreeBSD minions, no issues. Preparing to upgrade ~65 other Linux and BSD machines.
Problems solved
Most helpful comment
Severity: apocalyptic
Priority: 11
;)
But seriously--thanks for looking into this.