Salt: Salt parallelization is sequential if the parallel state object has any requisites

Created on 23 Aug 2018 · 19Comments · Source: saltstack/salt

Description of Issue/Question

Whatever backing scheduler is used for the parallelization feature seems to only work if the state has no requisites. If it does- for example, a dependency on a pkg.install- parallelization doesn't occur, instead it's ran sequentially.

Unfortunately, this kind of strongly limits the parallelization usefulness; whether it be formatting disks or dumping files into a directory, each of those will have a requisite that must be satisfied first- which in turn results in the parallel section being serialized.

Setup + reproduction

test-case.sls; this should parallelize to ~10s total clock, but doesn't.

barrier:
  test.nop 

{%- for x in [1,2,3] %}
blah-{{x}}:
  cmd.run:
    - name: sleep 10
    - require:
      - barrier
    - parallel: true
    - require_in:
      - barrier2
{% endfor %}

barrier2:
  test.nop

The following is the render results. Note the start timestamps, and total runtime. warnings truncated from output:

local:
----------
          ID: barrier
    Function: test.nop
      Result: True
     Comment: Success!
     Started: 06:53:52.316003
    Duration: 0.311 ms
     Changes:   
----------
          ID: blah-1
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 06:53:52.317096
    Duration: 10008.755 ms
     Changes:   
              ----------
              pid:
                  13508
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: blah-2
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 06:54:02.333901
    Duration: 10012.71 ms
     Changes:   
              ----------
              pid:
                  13510
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: blah-3
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 06:54:12.353356
    Duration: 10011.777 ms
     Changes:   
              ----------
              pid:
                  13512
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: barrier2
    Function: test.nop
      Result: True
     Comment: Success!
     Started: 06:54:22.371214
    Duration: 0.502 ms
     Changes:   

Summary for local
------------
Succeeded: 5 (changed=3)
Failed:    0
------------
Total states run:     5
Total run time:  30.034 s

real    0m44.792s
user    0m3.196s
sys 0m0.657s

Contrast that with this test case- note that there is no barrier1, IE the parallelized content doesn't have any requisites of their own.

{%- for x in [1,2,3] %}
blah-{{x}}:
  cmd.run:
    - name: sleep 10
    - require_in:
      - barrier2
    - parallel: true
{% endfor %}

barrier2:
  test.nop

Output is thus; it's 20s faster since the 3 sleeps are ran in parallel now.

local:
----------
          ID: blah-1
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 06:56:31.062373
    Duration: 10009.84 ms
     Changes:   
              ----------
              pid:
                  13680
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: blah-2
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 06:56:31.066379
    Duration: 10009.478 ms
     Changes:   
              ----------
              pid:
                  13681
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: blah-3
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 06:56:31.068708
    Duration: 10009.141 ms
     Changes:   
              ----------
              pid:
                  13682
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: barrier2
    Function: test.nop
      Result: True
     Comment: Success!
     Started: 06:56:41.084674
    Duration: 0.537 ms
     Changes:   

Summary for local
------------
Succeeded: 4 (changed=3)
Failed:    0
------------
Total states run:     4
Total run time:  30.029 s

real    0m24.015s
user    0m1.909s
sys 0m0.612s

Versions Report

I've verified this behaviour for 2017.7.2 through 2018.3.2 for the minion, and 2017.7.2 for the master. I've validated this behaviour both via master side salt call's, and via salt-call from the minion itself.

Aluminium Bug severity-medium

Source

ferringb

👍1

Most helpful comment

Err... this isn't expected behaviour, and your proposed ordering fix missed my point- you removed the barrier1 which needs to be executed prior to the blah-* steps; in the example I gave, barrier1 must run before the blah-*. Your proposed order fix literally drops the requisite- what if barrier1 actually wasn't a nop, and had failed?

What I posted there was an isolated testcase for repro; what follows is a real world snippet, showing why your suggested fix isn't really viable.

cryptsetup:
  pkg.installed

{{ luks.keyfile }}:
  file.managed:
     - contents: some-secret

{% for dev in salt.pillar.get('disks:luks:devices', []) %}
{% set cryptodev = 'luks-{}'.format(salt.file.basename(dev)) %}
format-luks-{{ dev }}:
  cmd.run:
    - name: cryptsetup -q luksFormat {{ dev }} {{ luks.keyfile }}
    - unless:
      - cryptsetup isLuks {{ dev }} || findmnt {{ dev }}
    - parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
    - require:
      - sls: disks.preformat
      - pkg: cryptsetup
      - {{ luks.keyfile }}

  {%- set discard = '--allow-discards' if salt.pillar.get('disks:luks:discard') else '' %}
open-luks-{{ dev }}:
  cmd.run:
    - name: cryptsetup open --type luks --key-file {{ luks.keyfile }} {{ discard }} {{ dev }} {{ cryptodev }}
    - onlyif:
      - cryptsetup isLuks {{ dev }}
    - unless:
      - test -b /dev/mapper/{{ cryptodev }}
    - parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
    - require:
      - pkg: cryptsetup
      - cmd: format-luks-{{ dev }}
      - {{ luks.keyfile }}
{% endfor %}

mdadm:
  pkg.installed

{% for name, options in salt.pillar.get('disks:raid', {}).iteritems() if options %}
  {% set devices = options['devices'] %}
{{ name }}-assemble:
  raid.present:
    - name: {{ name }}
    - parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
    - level: {{ options['level'] }}
    - devices: {{ devices }}
    - run: true
    - failhard: true
    - require:
      - sls: disks.preformat
      - pkg: mdadm
      {%- for device in devices %}
        {%- if device.startswith('/dev/mapper/luks-sd') %}
      - cmd: open-luks-/dev/{{ device['/dev/mapper/luks-'|length:] }}
        {%- endif %}
      {%- endfor %}
{%- endfor %}

fspackages:
  pkg.installed

{% for mountpoint, options in salt.pillar.get('disks:format', {}).iteritems() %}
format-{{ options['device'] }}:
  blockdev.formatted:
    - name: {{ options['device'] }}
    - fs_type: {{ options['fstype'] }}
    - parallel: true
    - unless: findmnt {{ mountpoint }}
    - require:
      - pkg: fspackages
      {%- if salt.pillar.get('disks:raid:{}'.format(options['device'])) %}
      - raid: {{ options['device'] }}
      {%- endif %}
      {%- if options['device'].startswith('/dev/mapper/luks-sd') %}
      - cmd: open-luks-/dev/{{ options['device']['/dev/mapper/luks-'|length:] }}
      {%- endif %}
{%- endfor %}

Now, for the example above I've stripped out code- so there may be a couple of screwups in it, but the basic thrust is there. Parallelize luks formatting (since it takes a while), parallelize luks opening (since it takes a while), and parallelize mkfs (because we can).

Without getting into specifics for the disk count, if parallelization properly worked here we would be reclaiming multiple minutes of execution for our setup.

It's not viable for us to just drop requirements- each step requires the previous.

Coming from puppet, this sort of parallel execution is something they natively have via continually walking their internal state graph, picking off things that can be executed (while constraining the parallelism to whatever you limit it to). In tracing salt code, I grok that the parallel feature is new- and not fully leveraged since it's more of an opportunistic thing in call_chunk rather than a refactoring that has the sort of degraph walking I'm discussing.

For our needs at least, it's not viable to start layering explicit ordering through our salt code- the snippet I posted above is just the steps to get a disk mount, there is a ton of varied code that exists above; more specifically, I don't think it's viable to expect folks to manage 'order' like this across potentially large repos.

Pardon the long message; just trying to convey the details of why I think you're missing what I'm after, and why what you're suggesting isn't the direction I'm hoping for.

If I missed something, pardon, and feel free to correct me (despite my strong tone above).

ferringb on 24 Aug 2018

👍5

All 19 comments

@thatch45 can you comment on this?

Thanks,
Daniel

gtmanfred on 23 Aug 2018

At first I looked at this and said "yes, that sounds right" but then I tested it and it does not behave the way I think it should. So the require_ins should make barrier2 wait for all of the parallels to run, but it should not block until it gets evaluated. So I thought that it would fix it if you added an order: last to the barier2 state, but that does not work, what works is getting rid of the shared require of barrier.

So in a nutshell, this does look odd, the barrier requisite should be resolved first and then the 3 sleeps should run at the same time.

So yes, this is a bug, I need to figure out what is causing this

thatch45 on 23 Aug 2018

Ok, I took a look here and can see what is going on, the joys of the state system. The issue you have is ordering, you have set this up so that it will fully resolve barrier before it evaluates the blah={1,2,3}. If you change the evaluation order then it runs in 10 seconds and resolves all requisites properly:

barrier:
  test.nop:
    - order: -1

{%- for x in [1,2,3] %}
blah-{{x}}:
  cmd.run:
    - name: sleep 10
#    - require:
#      - barrier
    - parallel: true
    - require_in:
      - barrier2
{% endfor %}

barrier2:
  test.nop

Salt will read in the definition as it is presented so that it runs the same way every time, so all you need to do is make sure that the requisite is evaluated as a result of the requisite call rather than the natural progression.

thatch45 on 23 Aug 2018

What I posted there was an isolated testcase for repro; what follows is a real world snippet, showing why your suggested fix isn't really viable.

cryptsetup:
  pkg.installed

{{ luks.keyfile }}:
  file.managed:
     - contents: some-secret

{% for dev in salt.pillar.get('disks:luks:devices', []) %}
{% set cryptodev = 'luks-{}'.format(salt.file.basename(dev)) %}
format-luks-{{ dev }}:
  cmd.run:
    - name: cryptsetup -q luksFormat {{ dev }} {{ luks.keyfile }}
    - unless:
      - cryptsetup isLuks {{ dev }} || findmnt {{ dev }}
    - parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
    - require:
      - sls: disks.preformat
      - pkg: cryptsetup
      - {{ luks.keyfile }}

  {%- set discard = '--allow-discards' if salt.pillar.get('disks:luks:discard') else '' %}
open-luks-{{ dev }}:
  cmd.run:
    - name: cryptsetup open --type luks --key-file {{ luks.keyfile }} {{ discard }} {{ dev }} {{ cryptodev }}
    - onlyif:
      - cryptsetup isLuks {{ dev }}
    - unless:
      - test -b /dev/mapper/{{ cryptodev }}
    - parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
    - require:
      - pkg: cryptsetup
      - cmd: format-luks-{{ dev }}
      - {{ luks.keyfile }}
{% endfor %}

mdadm:
  pkg.installed

{% for name, options in salt.pillar.get('disks:raid', {}).iteritems() if options %}
  {% set devices = options['devices'] %}
{{ name }}-assemble:
  raid.present:
    - name: {{ name }}
    - parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
    - level: {{ options['level'] }}
    - devices: {{ devices }}
    - run: true
    - failhard: true
    - require:
      - sls: disks.preformat
      - pkg: mdadm
      {%- for device in devices %}
        {%- if device.startswith('/dev/mapper/luks-sd') %}
      - cmd: open-luks-/dev/{{ device['/dev/mapper/luks-'|length:] }}
        {%- endif %}
      {%- endfor %}
{%- endfor %}

fspackages:
  pkg.installed

{% for mountpoint, options in salt.pillar.get('disks:format', {}).iteritems() %}
format-{{ options['device'] }}:
  blockdev.formatted:
    - name: {{ options['device'] }}
    - fs_type: {{ options['fstype'] }}
    - parallel: true
    - unless: findmnt {{ mountpoint }}
    - require:
      - pkg: fspackages
      {%- if salt.pillar.get('disks:raid:{}'.format(options['device'])) %}
      - raid: {{ options['device'] }}
      {%- endif %}
      {%- if options['device'].startswith('/dev/mapper/luks-sd') %}
      - cmd: open-luks-/dev/{{ options['device']['/dev/mapper/luks-'|length:] }}
      {%- endif %}
{%- endfor %}

Without getting into specifics for the disk count, if parallelization properly worked here we would be reclaiming multiple minutes of execution for our setup.

It's not viable for us to just drop requirements- each step requires the previous.

Pardon the long message; just trying to convey the details of why I think you're missing what I'm after, and why what you're suggesting isn't the direction I'm hoping for.

If I missed something, pardon, and feel free to correct me (despite my strong tone above).

ferringb on 24 Aug 2018

👍5

I think I see where you are going here, and yes I see that the ordering is not enough. Let me spend some time looking here

thatch45 on 6 Sep 2018

@thatch45 Did you get anywhere with this? The original ticket #39010 appears to suggest a solution where it waits for the require's, then can execute the things in parallel. I have 20 simples state that requires a package to be installed, and they are all installing one after the other (they all take 90 seconds per cmd.run due to external systems), instead of all running at the same time.

timwsuqld on 26 Jun 2019

Any updates?..

Oloremo on 3 Sep 2019

Having looked through the history of this issue, it seems to me that @gtmanfred "shelved" this issue somewhat prematurely based on @thatch45 's initial assessment. After a clarifying https://github.com/saltstack/salt/issues/49273#issuecomment-415600787 by @ferringb even @thatch45 then concluded in https://github.com/saltstack/salt/issues/49273#issuecomment-419213994 that his initial assessment did not address the issue properly. Unfortunately, @thatch45 never came back and the issue remains in "Blocked" state, hindering any progress.

I strongly urge the @SaltStack team to reconsider the status of this issue.

It likely manifests itself also in #55121 and appears to be a long-standing problem which severely detracts from the usefulness of Salt for orchestration.

smartass101 on 28 Oct 2019

Any update on this issue? Even I am facing the same issue. Using requisites for one of the state that supposed to run in parallel but runs sequentially. After removing requisites, it runs parallely.

vnarapuram on 30 Oct 2019

I think I tracked it down to this line https://github.com/saltstack/salt/blob/c157cb752a6843e58826588110bcd3c67ef8bc86/salt/state.py#L1927
because that is the only place where https://github.com/saltstack/salt/blob/c157cb752a6843e58826588110bcd3c67ef8bc86/salt/state.py#L1784
is actually called.

To me it seems that something sets low['__prereq__] = True somewhere when requisites are defined, because I haven't found a place which would set low['parallel'] = False anywhere.

Unfortunately, I'm not entirely certain what __prereq__ actually is, it seems to me it is some internal flag, perhaps not just a direct translation of the prereq requisite.

Some deeper knowledge of how this flag is used in the requisite system is probably necessary, but I haven't found it described in the docs anywhere. Getting and understanding from the source code is likely possible, but will take quite some time, the requisite system is quite complex. Perhaps some core salt team member could help?

smartass101 on 31 Oct 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

stale[bot] on 7 Jan 2020

not stale

Oloremo on 7 Jan 2020

Thank you for updating this issue. It is no longer marked as stale.

stale[bot] on 7 Jan 2020

@Oloremo I have also added the Pending Discussion label and will get it on Tom's agenda as soon as I can.

sagetherage on 7 Jan 2020

👍1

Yes, this needs to be fixed, if I can't get to it @garethgreenaway or @dwoz should be able to resolve it.

thatch45 on 9 Jan 2020

🎉3

@garethgreenaway Any updates on this?

Oloremo on 27 Apr 2020

👍1

+1. Fixing this would allow for massive speed increases in many situations. Might I also suggest adding a tunable that allows for some kind of max concurrency limit? I have a routine that goes through and calls the keystone-ng module and creates projects for users and then assigns them roles. The project must be created prior to the role being assigned for each user, but there is no reason that those two serial steps can't be executed in parallel for each of the 1000 users that we have (so, 1000 separate threads that create a project and assign a role, in that order). That said, running 1000 threads simultaneously that make REST API calls is probably a poor decision, so the ability to 'cap' the number of parallel states that are able to be run at any one time (global minion config option?) would be very useful.

bitskri3g on 16 Oct 2020

👍2

Apologies for the delay on this one, I had a chance to look into this. I believe I've traced it down to the fact that if there is a requisite.....we call call_chunk from within call_chunk on the req, here: https://github.com/saltstack/salt/blob/master/salt/state.py#L2954 and then call it again on the original chunk once the requisite has been met, here: https://github.com/saltstack/salt/blob/master/salt/state.py#L2965, that second call is the one that ends up doing the call to call_parallel and I think that might be the issue. Still digging in to see what the fix would be.

garethgreenaway on 18 Nov 2020

Had a realization this morning when looking at this again with fresh eyes. The issue turned out to be that when we run check_requisite it ends up running reconcile_procs on all the running states. So using the example states above the barrier would run first, then the first state in the loop would run and run check_requisite to check the barrier requisite. Because both barrier and the first sleep state are now in the list of running states the while loop for reconcile_procs would continue until the first sleep state is finished. We would then see similar behavior for the second and third sleep states.

The PR filters out any states that aren't in the requisites and only runs reconcile_procs on those states.

garethgreenaway on 18 Nov 2020

🚀1

Was this page helpful?

0 / 5 - 0 ratings