Salt: Salt 2016.3.0 (Boron) clean_old_jobs fails

Created on 26 May 2016  路  8Comments  路  Source: saltstack/salt

Description of Issue/Question

Master log fills with these errors:

016-05-26 12:41:44,647 [salt.utils.process][ERROR   ][2706] An un-handled exception from the multiprocessing process 'Maintenance-11' was caught:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/utils/process.py", line 613, in _run
    return self._original_run()
  File "/usr/lib/python2.7/dist-packages/salt/master.py", line 236, in run
    salt.daemons.masterapi.clean_old_jobs(self.opts)
  File "/usr/lib/python2.7/dist-packages/salt/daemons/masterapi.py", line 187, in clean_old_jobs
    mminion.returners[fstr]()
  File "/usr/lib/python2.7/dist-packages/salt/returners/local_cache.py", line 413, in clean_old_jobs
    shutil.rmtree(t_path)
  File "/usr/lib/python2.7/shutil.py", line 239, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/usr/lib/python2.7/shutil.py", line 237, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: '/var/cache/salt/master/jobs/a9'
2016-05-26 12:41:44,652 [salt.utils.process][INFO    ][31591] Process <class 'salt.master.Maintenance'> (2706) died with exit status None, restarting...

Setup

master config:

  68 # Set the number of hours to keep old job information in the job cache:
  69   keep_jobs: 6

Steps to Reproduce Issue

update to 2016.3.0 Since then salt.master.Maintenance tries to cleanup files/dirs that are not in /var/cache/salt/master/jobs/

Versions Report

root@salt:~# salt --versions-report
Salt Version:
Salt: 2016.3.0

Dependency Versions:
cffi: Not Installed
cherrypy: 3.2.2
dateutil: 1.5
gitdb: 0.5.4
gitpython: 0.3.2 RC1
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 0.9.1
msgpack-pure: Not Installed
msgpack-python: 0.3.0
mysql-python: 1.2.3
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 14.4.0
RAET: Not Installed
smmap: 0.8.2
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.4

System Versions:
dist: Ubuntu 14.04 trusty
machine: x86_64
release: 3.13.0-83-generic
system: Linux
version: Ubuntu 14.04 trusty

Bug Core P1 ZRELEASED - Boron fixed-pending-your-verification severity-critical severity-high

Most helpful comment

@cachedout It seems this patch is not in the latest deb release, because I'm still having this issues. When will it be released?

All 8 comments

I'm seeing the same thing.

That's REALLY odd. You don't somehow have more than one Maintenance process running, do you?

Good guess, i stopped the salt-master service and checked that no other salt-master process was lingering.
Then i did an ls -l of /var/cache/salt/master/jobs and I see

<snip>
drwxr-xr-x 10 root root 4096 May 26 17:37 33
drwxr-xr-x  3 root root 4096 May 26 15:37 34
drwxr-xr-x 15 root root 4096 May 26 17:27 35
drwxr-xr-x  6 root root 4096 May 26 15:37 36

After that in the log the first OSError

2016-05-26 18:28:28,862 [salt.utils.process][ERROR   ][24373] An un-handled exception from the multiprocessing process 'Maintenance-4' was caught:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/utils/process.py", line 613, in _run
    return self._original_run()
  File "/usr/lib/python2.7/dist-packages/salt/master.py", line 236, in run
    salt.daemons.masterapi.clean_old_jobs(self.opts)
  File "/usr/lib/python2.7/dist-packages/salt/daemons/masterapi.py", line 187, in clean_old_jobs
    mminion.returners[fstr]()
  File "/usr/lib/python2.7/dist-packages/salt/returners/local_cache.py", line 413, in clean_old_jobs
    shutil.rmtree(t_path)
  File "/usr/lib/python2.7/shutil.py", line 239, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/usr/lib/python2.7/shutil.py", line 237, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: '/var/cache/salt/master/jobs/35'
2016-05-26 18:28:28,867 [salt.utils.process][INFO    ][24360] Process <class 'salt.master.Maintenance'> (24373) died with exit status None, restarting...

Looks like 2 processes trying to do the same cleanup?

Please see #33555 for a patch that resolves this issue.

After applying the patch and restarting the salt-master the error did not show up anymore.
Thanks for your quick fix!

@tjuup You're welcome. Apologies for the breakage. :]

Thanks that patch seems to have fixed this issue.

@cachedout It seems this patch is not in the latest deb release, because I'm still having this issues. When will it be released?

Was this page helpful?
0 / 5 - 0 ratings