Whatever backing scheduler is used for the parallelization feature seems to only work if the state has no requisites. If it does- for example, a dependency on a pkg.install- parallelization doesn't occur, instead it's ran sequentially.
Unfortunately, this kind of strongly limits the parallelization usefulness; whether it be formatting disks or dumping files into a directory, each of those will have a requisite that must be satisfied first- which in turn results in the parallel section being serialized.
test-case.sls; this should parallelize to ~10s total clock, but doesn't.
barrier:
test.nop
{%- for x in [1,2,3] %}
blah-{{x}}:
cmd.run:
- name: sleep 10
- require:
- barrier
- parallel: true
- require_in:
- barrier2
{% endfor %}
barrier2:
test.nop
The following is the render results. Note the start timestamps, and total runtime. warnings truncated from output:
local:
----------
ID: barrier
Function: test.nop
Result: True
Comment: Success!
Started: 06:53:52.316003
Duration: 0.311 ms
Changes:
----------
ID: blah-1
Function: cmd.run
Name: sleep 10
Result: True
Comment: Command "sleep 10" run
Started: 06:53:52.317096
Duration: 10008.755 ms
Changes:
----------
pid:
13508
retcode:
0
stderr:
stdout:
----------
ID: blah-2
Function: cmd.run
Name: sleep 10
Result: True
Comment: Command "sleep 10" run
Started: 06:54:02.333901
Duration: 10012.71 ms
Changes:
----------
pid:
13510
retcode:
0
stderr:
stdout:
----------
ID: blah-3
Function: cmd.run
Name: sleep 10
Result: True
Comment: Command "sleep 10" run
Started: 06:54:12.353356
Duration: 10011.777 ms
Changes:
----------
pid:
13512
retcode:
0
stderr:
stdout:
----------
ID: barrier2
Function: test.nop
Result: True
Comment: Success!
Started: 06:54:22.371214
Duration: 0.502 ms
Changes:
Summary for local
------------
Succeeded: 5 (changed=3)
Failed: 0
------------
Total states run: 5
Total run time: 30.034 s
real 0m44.792s
user 0m3.196s
sys 0m0.657s
Contrast that with this test case- note that there is no barrier1, IE the parallelized content doesn't have any requisites of their own.
{%- for x in [1,2,3] %}
blah-{{x}}:
cmd.run:
- name: sleep 10
- require_in:
- barrier2
- parallel: true
{% endfor %}
barrier2:
test.nop
Output is thus; it's 20s faster since the 3 sleeps are ran in parallel now.
local:
----------
ID: blah-1
Function: cmd.run
Name: sleep 10
Result: True
Comment: Command "sleep 10" run
Started: 06:56:31.062373
Duration: 10009.84 ms
Changes:
----------
pid:
13680
retcode:
0
stderr:
stdout:
----------
ID: blah-2
Function: cmd.run
Name: sleep 10
Result: True
Comment: Command "sleep 10" run
Started: 06:56:31.066379
Duration: 10009.478 ms
Changes:
----------
pid:
13681
retcode:
0
stderr:
stdout:
----------
ID: blah-3
Function: cmd.run
Name: sleep 10
Result: True
Comment: Command "sleep 10" run
Started: 06:56:31.068708
Duration: 10009.141 ms
Changes:
----------
pid:
13682
retcode:
0
stderr:
stdout:
----------
ID: barrier2
Function: test.nop
Result: True
Comment: Success!
Started: 06:56:41.084674
Duration: 0.537 ms
Changes:
Summary for local
------------
Succeeded: 4 (changed=3)
Failed: 0
------------
Total states run: 4
Total run time: 30.029 s
real 0m24.015s
user 0m1.909s
sys 0m0.612s
I've verified this behaviour for 2017.7.2 through 2018.3.2 for the minion, and 2017.7.2 for the master. I've validated this behaviour both via master side salt call's, and via salt-call from the minion itself.
@thatch45 can you comment on this?
Thanks,
Daniel
At first I looked at this and said "yes, that sounds right" but then I tested it and it does not behave the way I think it should. So the require_ins should make barrier2 wait for all of the parallels to run, but it should not block until it gets evaluated. So I thought that it would fix it if you added an order: last
to the barier2 state, but that does not work, what works is getting rid of the shared require of barrier
.
So in a nutshell, this does look odd, the barrier requisite should be resolved first and then the 3 sleeps should run at the same time.
So yes, this is a bug, I need to figure out what is causing this
Ok, I took a look here and can see what is going on, the joys of the state system. The issue you have is ordering, you have set this up so that it will fully resolve barrier
before it evaluates the blah={1,2,3}
. If you change the evaluation order then it runs in 10 seconds and resolves all requisites properly:
barrier:
test.nop:
- order: -1
{%- for x in [1,2,3] %}
blah-{{x}}:
cmd.run:
- name: sleep 10
# - require:
# - barrier
- parallel: true
- require_in:
- barrier2
{% endfor %}
barrier2:
test.nop
Salt will read in the definition as it is presented so that it runs the same way every time, so all you need to do is make sure that the requisite is evaluated as a result of the requisite call rather than the natural progression.
Err... this isn't expected behaviour, and your proposed ordering fix missed my point- you removed the barrier1 which needs to be executed prior to the blah-* steps; in the example I gave, barrier1 must run before the blah-*. Your proposed order fix literally drops the requisite- what if barrier1 actually wasn't a nop, and had failed?
What I posted there was an isolated testcase for repro; what follows is a real world snippet, showing why your suggested fix isn't really viable.
cryptsetup:
pkg.installed
{{ luks.keyfile }}:
file.managed:
- contents: some-secret
{% for dev in salt.pillar.get('disks:luks:devices', []) %}
{% set cryptodev = 'luks-{}'.format(salt.file.basename(dev)) %}
format-luks-{{ dev }}:
cmd.run:
- name: cryptsetup -q luksFormat {{ dev }} {{ luks.keyfile }}
- unless:
- cryptsetup isLuks {{ dev }} || findmnt {{ dev }}
- parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
- require:
- sls: disks.preformat
- pkg: cryptsetup
- {{ luks.keyfile }}
{%- set discard = '--allow-discards' if salt.pillar.get('disks:luks:discard') else '' %}
open-luks-{{ dev }}:
cmd.run:
- name: cryptsetup open --type luks --key-file {{ luks.keyfile }} {{ discard }} {{ dev }} {{ cryptodev }}
- onlyif:
- cryptsetup isLuks {{ dev }}
- unless:
- test -b /dev/mapper/{{ cryptodev }}
- parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
- require:
- pkg: cryptsetup
- cmd: format-luks-{{ dev }}
- {{ luks.keyfile }}
{% endfor %}
mdadm:
pkg.installed
{% for name, options in salt.pillar.get('disks:raid', {}).iteritems() if options %}
{% set devices = options['devices'] %}
{{ name }}-assemble:
raid.present:
- name: {{ name }}
- parallel: {{ salt.pkg.version_cmp(grains['saltversion'], '2017.7.3') >= 0 }}
- level: {{ options['level'] }}
- devices: {{ devices }}
- run: true
- failhard: true
- require:
- sls: disks.preformat
- pkg: mdadm
{%- for device in devices %}
{%- if device.startswith('/dev/mapper/luks-sd') %}
- cmd: open-luks-/dev/{{ device['/dev/mapper/luks-'|length:] }}
{%- endif %}
{%- endfor %}
{%- endfor %}
fspackages:
pkg.installed
{% for mountpoint, options in salt.pillar.get('disks:format', {}).iteritems() %}
format-{{ options['device'] }}:
blockdev.formatted:
- name: {{ options['device'] }}
- fs_type: {{ options['fstype'] }}
- parallel: true
- unless: findmnt {{ mountpoint }}
- require:
- pkg: fspackages
{%- if salt.pillar.get('disks:raid:{}'.format(options['device'])) %}
- raid: {{ options['device'] }}
{%- endif %}
{%- if options['device'].startswith('/dev/mapper/luks-sd') %}
- cmd: open-luks-/dev/{{ options['device']['/dev/mapper/luks-'|length:] }}
{%- endif %}
{%- endfor %}
Now, for the example above I've stripped out code- so there may be a couple of screwups in it, but the basic thrust is there. Parallelize luks formatting (since it takes a while), parallelize luks opening (since it takes a while), and parallelize mkfs (because we can).
Without getting into specifics for the disk count, if parallelization properly worked here we would be reclaiming multiple minutes of execution for our setup.
It's not viable for us to just drop requirements- each step requires the previous.
Coming from puppet, this sort of parallel execution is something they natively have via continually walking their internal state graph, picking off things that can be executed (while constraining the parallelism to whatever you limit it to). In tracing salt code, I grok that the parallel feature is new- and not fully leveraged since it's more of an opportunistic thing in call_chunk rather than a refactoring that has the sort of degraph walking I'm discussing.
For our needs at least, it's not viable to start layering explicit ordering through our salt code- the snippet I posted above is just the steps to get a disk mount, there is a ton of varied code that exists above; more specifically, I don't think it's viable to expect folks to manage 'order' like this across potentially large repos.
Pardon the long message; just trying to convey the details of why I think you're missing what I'm after, and why what you're suggesting isn't the direction I'm hoping for.
If I missed something, pardon, and feel free to correct me (despite my strong tone above).
I think I see where you are going here, and yes I see that the ordering is not enough. Let me spend some time looking here
@thatch45 Did you get anywhere with this? The original ticket #39010 appears to suggest a solution where it waits for the require's, then can execute the things in parallel. I have 20 simples state that requires a package to be installed, and they are all installing one after the other (they all take 90 seconds per cmd.run due to external systems), instead of all running at the same time.
Any updates?..
Having looked through the history of this issue, it seems to me that @gtmanfred "shelved" this issue somewhat prematurely based on @thatch45 's initial assessment. After a clarifying https://github.com/saltstack/salt/issues/49273#issuecomment-415600787 by @ferringb even @thatch45 then concluded in https://github.com/saltstack/salt/issues/49273#issuecomment-419213994 that his initial assessment did not address the issue properly. Unfortunately, @thatch45 never came back and the issue remains in "Blocked" state, hindering any progress.
I strongly urge the @SaltStack team to reconsider the status of this issue.
It likely manifests itself also in #55121 and appears to be a long-standing problem which severely detracts from the usefulness of Salt for orchestration.
Any update on this issue? Even I am facing the same issue. Using requisites for one of the state that supposed to run in parallel but runs sequentially. After removing requisites, it runs parallely.
I think I tracked it down to this line https://github.com/saltstack/salt/blob/c157cb752a6843e58826588110bcd3c67ef8bc86/salt/state.py#L1927
because that is the only place where https://github.com/saltstack/salt/blob/c157cb752a6843e58826588110bcd3c67ef8bc86/salt/state.py#L1784
is actually called.
To me it seems that something sets low['__prereq__] = True
somewhere when requisites are defined, because I haven't found a place which would set low['parallel'] = False
anywhere.
Unfortunately, I'm not entirely certain what __prereq__
actually is, it seems to me it is some internal flag, perhaps not just a direct translation of the prereq
requisite.
Some deeper knowledge of how this flag is used in the requisite system is probably necessary, but I haven't found it described in the docs anywhere. Getting and understanding from the source code is likely possible, but will take quite some time, the requisite system is quite complex. Perhaps some core salt team member could help?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
not stale
Thank you for updating this issue. It is no longer marked as stale.
@Oloremo I have also added the Pending Discussion
label and will get it on Tom's agenda as soon as I can.
Yes, this needs to be fixed, if I can't get to it @garethgreenaway or @dwoz should be able to resolve it.
@garethgreenaway Any updates on this?
+1. Fixing this would allow for massive speed increases in many situations. Might I also suggest adding a tunable that allows for some kind of max concurrency limit? I have a routine that goes through and calls the keystone-ng module and creates projects for users and then assigns them roles. The project must be created prior to the role being assigned for each user, but there is no reason that those two serial steps can't be executed in parallel for each of the 1000 users that we have (so, 1000 separate threads that create a project and assign a role, in that order). That said, running 1000 threads simultaneously that make REST API calls is probably a poor decision, so the ability to 'cap' the number of parallel states that are able to be run at any one time (global minion config option?) would be very useful.
Apologies for the delay on this one, I had a chance to look into this. I believe I've traced it down to the fact that if there is a requisite.....we call call_chunk from within call_chunk on the req, here: https://github.com/saltstack/salt/blob/master/salt/state.py#L2954 and then call it again on the original chunk once the requisite has been met, here: https://github.com/saltstack/salt/blob/master/salt/state.py#L2965, that second call is the one that ends up doing the call to call_parallel and I think that might be the issue. Still digging in to see what the fix would be.
Had a realization this morning when looking at this again with fresh eyes. The issue turned out to be that when we run check_requisite
it ends up running reconcile_procs
on all the running states. So using the example states above the barrier
would run first, then the first state in the loop would run and run check_requisite
to check the barrier
requisite. Because both barrier
and the first sleep state are now in the list of running states the while loop for reconcile_procs
would continue until the first sleep state is finished. We would then see similar behavior for the second and third sleep states.
The PR filters out any states that aren't in the requisites and only runs reconcile_procs
on those states.
Most helpful comment
Err... this isn't expected behaviour, and your proposed ordering fix missed my point- you removed the barrier1 which needs to be executed prior to the blah-* steps; in the example I gave, barrier1 must run before the blah-*. Your proposed order fix literally drops the requisite- what if barrier1 actually wasn't a nop, and had failed?
What I posted there was an isolated testcase for repro; what follows is a real world snippet, showing why your suggested fix isn't really viable.
Now, for the example above I've stripped out code- so there may be a couple of screwups in it, but the basic thrust is there. Parallelize luks formatting (since it takes a while), parallelize luks opening (since it takes a while), and parallelize mkfs (because we can).
Without getting into specifics for the disk count, if parallelization properly worked here we would be reclaiming multiple minutes of execution for our setup.
It's not viable for us to just drop requirements- each step requires the previous.
Coming from puppet, this sort of parallel execution is something they natively have via continually walking their internal state graph, picking off things that can be executed (while constraining the parallelism to whatever you limit it to). In tracing salt code, I grok that the parallel feature is new- and not fully leveraged since it's more of an opportunistic thing in call_chunk rather than a refactoring that has the sort of degraph walking I'm discussing.
For our needs at least, it's not viable to start layering explicit ordering through our salt code- the snippet I posted above is just the steps to get a disk mount, there is a ton of varied code that exists above; more specifically, I don't think it's viable to expect folks to manage 'order' like this across potentially large repos.
Pardon the long message; just trying to convey the details of why I think you're missing what I'm after, and why what you're suggesting isn't the direction I'm hoping for.
If I missed something, pardon, and feel free to correct me (despite my strong tone above).