After upgrading salt-master and salt-minion to 2017.7.0 file.managed runs approx 4 times slower than with the old 2016.11.0 version.
state duration before upgrade:
Name: /etc/vim/vimrc - Function: file.managed - Result: Clean Started: - 09:30:49.446082 Duration: 44.77 ms
state duration after upgrade:
Name: /etc/vim/vimrc - Function: file.managed - Result: Clean Started: - 14:16:04.989995 Duration: 164.882 ms
managing hundreds of servers with each of them having multiple file.managed states just got worse. the current performance in a huge environment is not acceptable.
Master:
salt-master 2017.7.0 (Nitrogen)
Minion:
Salt Version:
Salt: 2017.7.0
Dependency Versions:
cffi: 0.8.6
cherrypy: Not Installed
dateutil: 2.2
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.9.4
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.2
mysql-python: 1.2.3
pycparser: 2.10
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 14.4.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
System Versions:
dist: debian 8.8
locale: UTF-8
machine: x86_64
release: 3.16.0-4-amd64
system: Linux
version: debian 8.8
Looks like i'm able to replicate this:
[root@8fc82f7a57a0 /]# salt --version; time salt '*' state.apply test
salt 2016.11.6 (Carbon)
95e0100f10a0:
Name: /tmp/test - Function: file.managed - Result: Clean Started: - 13:52:34.278113 Duration: 18.636 ms
Summary for 95e0100f10a0
------------
Succeeded: 1
Failed: 0
------------
Total states run: 1
Total run time: 18.636 ms
real 0m1.000s
user 0m0.517s
sys 0m0.327s
[root@8fc82f7a57a0 /]# salt --version; time salt '*' state.apply test
salt 2017.7.0 (Nitrogen)
95e0100f10a0:
Name: /tmp/test - Function: file.managed - Result: Clean Started: - 13:53:59.575765 Duration: 53.349 ms
Summary for 95e0100f10a0
------------
Succeeded: 1
Failed: 0
------------
Total states run: 1
Total run time: 53.349 ms
real 0m1.048s
user 0m0.563s
sys 0m0.300s
Here is my state:
test:
file.managed:
- name: /tmp/test
- source: salt://test.txt
And my output is terse:
[root@8fc82f7a57a0 /]# grep -E -v '^$|#' /etc/salt/master
state_output: terse
Thanks we will look into this.
I can't reproduce this. I need to know more about how you came to the determination that this is a Salt bug. How did you install Salt? Did you clear the minion file cache as well as the master fileserver cache before testing each release?
The only way that I am able to reproduce this is when I install a version _from source_, and then _only the first time_. I can stop the master/minion, clear the caches, start the minion again, and it runs just as fast as it normally does. This slower first run can be reproduced on _both_ the 2016.11 and 2017.7 releases, and is likely explained by Python byte-compiling modules the first time they are imported.
i have installed salt on the master node that is running ubuntu 14 LTS via apt through the official saltstack repo from https://repo.saltstack.com. on all linux clients i have it is basically installed via this repository and apt. some of those hosts running debian others ubuntu in all kind of supported versions. as i have seen that there is a 2017.7.1 version in the meanwhile i was wondering if you did use that or if you used 2017.7.0?
i did not explicitly clear the minion file cache or the master fileserver cache. are there any simple commands to do this?
in my tests basically i did run the state multiple times (not just once) and the timings were almost the same (4x slower).
As this was a total show stopper for me i actually downgraded to 2016.11.x branch in the meanwhile.
@do3meli I used 2017.7.0, and the head of the 2017.7 branch, and could not reproduce in either case.
I think I'm getting the same issue with 2017.7.0. I don't have precise timing since I updated a production stack, but now highstate is very slow and I have a lot of file.managed.
I cleared master/minion cache, without any changes. It's like the local cache is not used (first run is as slow as next runs).
File are coming from a gitfs from the master.
I watched the minion cache file building up and it was also very slow (1 file cached / seconds).
Using salt from official repositories, on ubuntu.
OK, until someone provides a use case I can reproduce, there's not much I can do. I'm not saying there's no problem, but if I can't reproduce, I can't identify what needs to be changed.
Yes, I understand it's problematic to fix ;)
Is there some debugging/logging can I do on my stack to try to help ? I cannot downgrade it but I can do them on current version.
@the-glu If you set log_level_logfile: trace on the master, you will get a bunch of logging saying things like Master function call X took Y seconds. Ideally, it'd be great to get comparisons of this between 2016.11 and 2017.7 so we can see if there is any noticeable difference on the master side.
@terminalmage In fact I'm running into #42989 , installing pygit2 fixed the issue on my side (probably, because I don't have a comparison with 2016.11 and pygit2 ;))
Since @terminalmage is unable to reproduce this bug, I am removing the Blocker label from this issue. Once we have a better understanding of the issue and can reproduce it better, @terminalmage can take another look and come back to this.
i am closing this as i was able to upgrade to 2017.7.x in the meanwhile without any slow down issues.