Salt: file.managed slow down in 2017.7.0

Created on 19 Jul 2017  路  11Comments  路  Source: saltstack/salt

Description of Issue/Question

After upgrading salt-master and salt-minion to 2017.7.0 file.managed runs approx 4 times slower than with the old 2016.11.0 version.

state duration before upgrade:

Name: /etc/vim/vimrc - Function: file.managed - Result: Clean Started: - 09:30:49.446082 Duration: 44.77 ms

state duration after upgrade:

Name: /etc/vim/vimrc - Function: file.managed - Result: Clean Started: - 14:16:04.989995 Duration: 164.882 ms

managing hundreds of servers with each of them having multiple file.managed states just got worse. the current performance in a huge environment is not acceptable.

Steps to Reproduce Issue

  • run file.managed state before and after salt-minion and salt-master upgrade and compare timings

Versions Report

Master:

salt-master 2017.7.0 (Nitrogen)

Minion:

Salt Version:
           Salt: 2017.7.0

Dependency Versions:
           cffi: 0.8.6
       cherrypy: Not Installed
       dateutil: 2.2
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: 1.2.3
      pycparser: 2.10
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.8 
         locale: UTF-8
        machine: x86_64
        release: 3.16.0-4-amd64
         system: Linux
        version: debian 8.8 
Bug Core P1 cannot-reproduce severity-high

All 11 comments

Looks like i'm able to replicate this:

[root@8fc82f7a57a0 /]# salt --version; time salt '*' state.apply test                                                                        
salt 2016.11.6 (Carbon)            
95e0100f10a0:                      
  Name: /tmp/test - Function: file.managed - Result: Clean Started: - 13:52:34.278113 Duration: 18.636 ms                                    

Summary for 95e0100f10a0           
------------                       
Succeeded: 1                       
Failed:    0                       
------------                       
Total states run:     1            
Total run time:  18.636 ms         

real    0m1.000s                   
user    0m0.517s                   
sys     0m0.327s             
[root@8fc82f7a57a0 /]# salt --version; time salt '*' state.apply test 
salt 2017.7.0 (Nitrogen)
95e0100f10a0:
  Name: /tmp/test - Function: file.managed - Result: Clean Started: - 13:53:59.575765 Duration: 53.349 ms

Summary for 95e0100f10a0
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:  53.349 ms

real    0m1.048s
user    0m0.563s
sys     0m0.300s

Here is my state:

test:
  file.managed:
    - name: /tmp/test
    - source: salt://test.txt

And my output is terse:

[root@8fc82f7a57a0 /]# grep -E -v '^$|#' /etc/salt/master
state_output: terse

Thanks we will look into this.

I can't reproduce this. I need to know more about how you came to the determination that this is a Salt bug. How did you install Salt? Did you clear the minion file cache as well as the master fileserver cache before testing each release?

The only way that I am able to reproduce this is when I install a version _from source_, and then _only the first time_. I can stop the master/minion, clear the caches, start the minion again, and it runs just as fast as it normally does. This slower first run can be reproduced on _both_ the 2016.11 and 2017.7 releases, and is likely explained by Python byte-compiling modules the first time they are imported.

i have installed salt on the master node that is running ubuntu 14 LTS via apt through the official saltstack repo from https://repo.saltstack.com. on all linux clients i have it is basically installed via this repository and apt. some of those hosts running debian others ubuntu in all kind of supported versions. as i have seen that there is a 2017.7.1 version in the meanwhile i was wondering if you did use that or if you used 2017.7.0?

i did not explicitly clear the minion file cache or the master fileserver cache. are there any simple commands to do this?

in my tests basically i did run the state multiple times (not just once) and the timings were almost the same (4x slower).

As this was a total show stopper for me i actually downgraded to 2016.11.x branch in the meanwhile.

@do3meli I used 2017.7.0, and the head of the 2017.7 branch, and could not reproduce in either case.

I think I'm getting the same issue with 2017.7.0. I don't have precise timing since I updated a production stack, but now highstate is very slow and I have a lot of file.managed.

I cleared master/minion cache, without any changes. It's like the local cache is not used (first run is as slow as next runs).

File are coming from a gitfs from the master.

I watched the minion cache file building up and it was also very slow (1 file cached / seconds).

Using salt from official repositories, on ubuntu.

OK, until someone provides a use case I can reproduce, there's not much I can do. I'm not saying there's no problem, but if I can't reproduce, I can't identify what needs to be changed.

Yes, I understand it's problematic to fix ;)

Is there some debugging/logging can I do on my stack to try to help ? I cannot downgrade it but I can do them on current version.

@the-glu If you set log_level_logfile: trace on the master, you will get a bunch of logging saying things like Master function call X took Y seconds. Ideally, it'd be great to get comparisons of this between 2016.11 and 2017.7 so we can see if there is any noticeable difference on the master side.

@terminalmage In fact I'm running into #42989 , installing pygit2 fixed the issue on my side (probably, because I don't have a comparison with 2016.11 and pygit2 ;))

Since @terminalmage is unable to reproduce this bug, I am removing the Blocker label from this issue. Once we have a better understanding of the issue and can reproduce it better, @terminalmage can take another look and come back to this.

i am closing this as i was able to upgrade to 2017.7.x in the meanwhile without any slow down issues.

Was this page helpful?
0 / 5 - 0 ratings