Looks like we have some kind of race condition in the reactor system when passing pillar data to orchestration.
The setup I used to test this was based on salt-api webhooks. to send data from the webhook into the reactor and carry that data into orchestration that holds saltify cloud.profile.
The reactors all rendered 100% correctly. But the orchestrations were all over the place. Some repeating pillar information and in one case no pillar information was rendered.
The orchestration that rendered without any pillar data came from an attempt to mitigate issue using sleep of 1 second in-between sending the webhooks
config files
/etc/salt/cloud.providers.d/saltify.conf
salty:
driver: saltify
/etc/salt/cloud.profiles.d/saltify_host.conf
saltify_host:
provider: salty
force_minion_config: true
/etc/salt/master.d/*
rest_cherrypy:
port: 8000
webhook_disable_auth: True
disable_ssl: True
log_level_logfile: all
reactor:
- 'salt/netapi/hook/deploy_minion':
- /srv/salt/reactor/deploy_minion.sls
/srv/salt/reactor/deploy_minion.sls
# {{data}}
{% set post_data = data.get('post', {}) %}
#{{post_data}}
{% set host = post_data.get('host', '') %}
#{{host}}
{#
Initiate the Orchestration State
#}
deploy_minion_runner:
runner.state.orch:
- mods: orch.minion.deploy
- pillar:
params:
host: {{ host }}
/srv/salt/orch/minion/deploy.sls
{% set errors = [] %}
{% set params = salt['pillar.get']('params', {}) %}
{% if 'host' not in params %}
{% do errors.append('Please provide a host') %}
{% set host = '' %}
{% else %}
{% set host = params.get('host') %}
{% endif %}
{% if errors|length > 0 %}
preflight_check:
test.configurable_test_state:
- name: Additional information required
- changes: False
- result: False
- comment: '{{ errors|join(", ") }}'
{% else %}
preflight_check:
test.succeed_without_changes:
- name: 'All parameters accepted'
{% endif %}
provision_minion:
salt.runner:
- name: cloud.profile
- prof: saltify_host
- instances:
- {{ host }}
- require:
- preflight_check
- vm_overrides:
ssh_host: {{ host }}
ssh_username: centos
ssh_keyfile: /etc/salt/id_rsa
minion:
master:
- 172.16.22.60
- 172.16.22.115
sync_all:
salt.function:
- tgt: {{ host }}
- name: saltutil.sync_all
#execute_highstate:
# salt.state:
# - tgt: {{ host }}
# - highstate: True
Command run
for i in {1..8}; do curl localhost:8000/hook/deploy_minion -d host=test-${i}; done
As can be seen from the log.
deploy_minion.sls rendered in order
test-1
test-2
test-3
test-4
test-5
test-6
test-7
test-8
But deploy.sls
rendered in order
test-8
test-2
test-3
test-1
test-7
test-6
test-6
test-5
I have not found the cause of the issue yet.
Salt Version:
Salt: 2019.2.0
Dependency Versions:
cffi: Not Installed
cherrypy: unknown
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.5 (default, Jun 20 2019, 20:27:34)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4
System Versions:
dist: centos 7.6.1810 Core
locale: UTF-8
machine: x86_64
release: 3.10.0-862.14.4.el7.x86_64
system: Linux
version: CentOS Linux 7.6.1810 Core
This seems to happen more if the orchestrations that the reactor runs are long-running. Moving to another setup that uses test.arg or some other orchestration that is over quickly doesn't trigger the issue.
zd-3944
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
this is not stale
Thank you for updating this issue. It is no longer marked as stale.
@whytewolf this won't get marked stale any further, we updated the config and this shouldn't be a problem any longer, let me know if you see it on other tickets, too, you shouldn't
@sagetherage Yep, I ended up getting a flood of these when the timer was adjusted. but it all looks good now.
@whytewolf Yes, we didn't communicate that very well (at all) and should have, and although we are on a better path, we still want to iterate on the config or use GH Actions. LMK if you have other ideas, love to hear it!
ZD-4993
I wanted to add that we also see this inconsistent reactor to orchestration behavior. This occurs where the reactor fails to pass data or will even run a server ID twice and skip others. To duplicate this add the states below and accept 15+ minions at once.
Details of flow
1) Reactor salt/minion/*/start calls salt://reactor_state.sls
2) state reactor_state.sls uses runner.state.orchestrate to call orchestrate_state.sls
3) reactor_state.sls fails to pass event_data to orchestrate_state.sls
salt -V
Salt Version:
Salt: 2019.2.0
Dependency Versions:
cffi: 1.6.0
cherrypy: Not Installed
dateutil: 1.5
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: 0.21.1
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: 2.14
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.5 (default, Sep 26 2019, 13:23:47)
python-gnupg: 2.3.1
PyYAML: 3.11
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4
System Versions:
dist: redhat 7.8 Maipo
locale: UTF-8
machine: x86_64
release: 3.10.0-1127.el7.x86_64
system: Linux
version: Red Hat Enterprise Linux Server 7.8 Maipo
reactor.conf
reactor:
reactor_state.sls
reactor_state_call:
runner.state.orchestrate:
- timeout: 2000
- args:
- mods: orchestrate_state
- pillar:
event_tag: {{ tag }}
event_data: {{ data|json }}
orchestrate_state.sls
{% set tag = salt.pillar.get('event_tag',{}) %}
{% set data = salt.pillar.get('event_data',{}) %}
orchestrate_state_data:
test.configurable_test_state:
- name: Data and tag pillar
- changes: False
- result: True
- comment: |
event_data: {{data}}
tag: {{tag}}
{% if 'id' not in data %}
orchestrate_state-notify-on-Data-failure:
test.configurable_test_state:
- name: Data or pillar Failure on Orchestration Call
- changes: False
- result: False
- comment: |
Failed to get proper pillar values for event_data in orchstandardconf orchestration
event_data: {{data}}
{%else%}
orchestrate_state-success:
test.configurable_test_state:
- name: Data success on Orchestration Call
- changes: False
- result: True
- comment: |
Finished ok
event_data: {{data}}
{%endif%}
Here is the output of a failed job and a successful job
failed job "20200428210139495968"
from the event bus
salt/run/20200428210508971833/ret {
"_stamp": "2020-04-28T21:05:09.741246",
"fun": "runner.jobs.list_job",
"fun_args": [
"20200428210139495968"
],
"jid": "20200428210508971833",
"return": {
"EndTime": "2020, Apr 28 21:01:50.904429",
"Error": "Cannot contact returner or no job with this jid",
"Result": {
"gcp6387testsse01.earaa6387.gcp.slb.com_master": {
"return": {
"_stamp": "2020-04-28T21:01:39.509888",
"fun": "runner.state.orchestrate",
"fun_args": [
{
"mods": "orchestrate_state",
"pillar": {
"event_data": {
"_stamp": "2020-04-28T21:01:39.212370",
"cmd": "_minion_event",
"data": "Minion gcp6387temp36 started at Tue Apr 28 21:01:39 2020",
"id": "gcp6387temp36",
"pretag": null,
"tag": "salt/minion/gcp6387temp36/start"
},
"event_tag": "salt/minion/gcp6387temp36/start"
}
}
],
"jid": "20200428210139495968",
"return": {
"data": {
"gcp6387testsse01.earaa6387.gcp.slb.com_master": {
"test_|-orchestrate_state-notify-on-Data-failure_|-Data or pillar Failure on Orchestration Call_|-configurable_test_state": {
"__id__": "orchestrate_state-notify-on-Data-failure",
"__run_num__": 1,
"__sls__": "orchestrate_state",
"changes": {},
"comment": "Failed to get proper pillar values for event_data in orchstandardconf orchestration \nevent_data: {} \n",
"duration": 2.84,
"name": "Data or pillar Failure on Orchestration Call",
"result": false,
"start_time": "21:01:50.889726"
},
"test_|-orchestrate_state_data_|-Data and tag pillar_|-configurable_test_state": {
"__id__": "orchestrate_state_data",
"__run_num__": 0,
"__sls__": "orchestrate_state",
"changes": {},
"comment": "event_data: {} \ntag: {}\n",
"duration": 3.148,
"name": "Data and tag pillar",
"result": true,
"start_time": "21:01:50.885958"
}
}
},
"outputter": "highstate",
"retcode": 1
},
"success": false,
"user": "Reactor"
}
}
},
"StartTime": "2020, Apr 28 21:01:39.495968",
"jid": "20200428210139495968"
},
"success": true,
"user": "root"
}
[root@gcp6387testsse01 ~]# salt-run jobs.list_job 20200428210139495968
EndTime:
2020, Apr 28 21:01:50.904429
Error:
Cannot contact returner or no job with this jid
Result:
----------
gcp6387testsse01.earaa6387.gcp.slb.com_master:
----------
return:
----------
_stamp:
2020-04-28T21:01:39.509888
fun:
runner.state.orchestrate
fun_args:
|_
----------
mods:
orchestrate_state
pillar:
----------
event_data:
----------
_stamp:
2020-04-28T21:01:39.212370
cmd:
_minion_event
data:
Minion gcp6387temp36 started at Tue Apr 28 21:01:39 2020
id:
gcp6387temp36
pretag:
None
tag:
salt/minion/gcp6387temp36/start
event_tag:
salt/minion/gcp6387temp36/start
jid:
20200428210139495968
return:
----------
data:
----------
gcp6387testsse01.earaa6387.gcp.slb.com_master:
----------
test_|-orchestrate_state-notify-on-Data-failure_|-Data or pillar Failure on Orchestration Call_|-configurable_test_state:
----------
__id__:
orchestrate_state-notify-on-Data-failure
__run_num__:
1
__sls__:
orchestrate_state
changes:
----------
comment:
Failed to get proper pillar values for event_data in orchstandardconf orchestration
event_data: {}
duration:
2.84
name:
Data or pillar Failure on Orchestration Call
result:
False
start_time:
21:01:50.889726
test_|-orchestrate_state_data_|-Data and tag pillar_|-configurable_test_state:
----------
__id__:
orchestrate_state_data
__run_num__:
0
__sls__:
orchestrate_state
changes:
----------
comment:
event_data: {}
tag: {}
duration:
3.148
name:
Data and tag pillar
result:
True
start_time:
21:01:50.885958
outputter:
highstate
retcode:
1
success:
False
user:
Reactor
StartTime:
2020, Apr 28 21:01:39.495968
jid:
20200428210139495968
ok job from event bus
salt/run/20200428210624555837/ret {
"_stamp": "2020-04-28T21:06:25.339035",
"fun": "runner.jobs.list_job",
"fun_args": [
"20200428210149620344"
],
"jid": "20200428210624555837",
"return": {
"EndTime": "2020, Apr 28 21:01:54.637924",
"Error": "Cannot contact returner or no job with this jid",
"Result": {
"gcp6387testsse01.earaa6387.gcp.slb.com_master": {
"return": {
"_stamp": "2020-04-28T21:01:49.636384",
"fun": "runner.state.orchestrate",
"fun_args": [
{
"mods": "orchestrate_state",
"pillar": {
"event_data": {
"_stamp": "2020-04-28T21:01:45.194576",
"cmd": "_minion_event",
"data": "Minion gcp6387temp38 started at Tue Apr 28 21:01:45 2020",
"id": "gcp6387temp38",
"pretag": null,
"tag": "salt/minion/gcp6387temp38/start"
},
"event_tag": "salt/minion/gcp6387temp38/start"
}
}
],
"jid": "20200428210149620344",
"return": {
"data": {
"gcp6387testsse01.earaa6387.gcp.slb.com_master": {
"test_|-orchestrate_state-success_|-Data success on Orchestration Call_|-configurable_test_state": {
"__id__": "orchestrate_state-success",
"__run_num__": 1,
"__sls__": "orchestrate_state",
"changes": {},
"comment": "Finished ok \nevent_data: {u'_stamp': u'2020-04-28T21:01:45.194576', u'cmd': u'_minion_event', u'data': u'Minion gcp6387temp38 started at Tue Apr 28 21:01:45 2020', u'id': u'gcp6387temp38', u'pretag': None, u'tag': u'salt/minion/gcp6387temp38/start'} \n",
"duration": 0.31,
"name": "Data success on Orchestration Call",
"result": true,
"start_time": "21:01:54.636565"
},
"test_|-orchestrate_state_data_|-Data and tag pillar_|-configurable_test_state": {
"__id__": "orchestrate_state_data",
"__run_num__": 0,
"__sls__": "orchestrate_state",
"changes": {},
"comment": "event_data: {u'_stamp': u'2020-04-28T21:01:45.194576', u'cmd': u'_minion_event', u'data': u'Minion gcp6387temp38 started at Tue Apr 28 21:01:45 2020', u'id': u'gcp6387temp38', u'pretag': None, u'tag': u'salt/minion/gcp6387temp38/start'} \ntag: salt/minion/gcp6387temp38/start\n",
"duration": 0.417,
"name": "Data and tag pillar",
"result": true,
"start_time": "21:01:54.636010"
}
}
},
"outputter": "highstate",
"retcode": 0
},
"success": true,
"user": "Reactor"
}
}
},
"StartTime": "2020, Apr 28 21:01:49.620344",
"jid": "20200428210149620344"
},
"success": true,
"user": "root"
}
[root@gcp6387testsse01 ~]# salt-run jobs.list_job 20200428210149620344
EndTime:
2020, Apr 28 21:01:54.637924
Error:
Cannot contact returner or no job with this jid
Result:
----------
gcp6387testsse01.earaa6387.gcp.slb.com_master:
----------
return:
----------
_stamp:
2020-04-28T21:01:49.636384
fun:
runner.state.orchestrate
fun_args:
|_
----------
mods:
orchestrate_state
pillar:
----------
event_data:
----------
_stamp:
2020-04-28T21:01:45.194576
cmd:
_minion_event
data:
Minion gcp6387temp38 started at Tue Apr 28 21:01:45 2020
id:
gcp6387temp38
pretag:
None
tag:
salt/minion/gcp6387temp38/start
event_tag:
salt/minion/gcp6387temp38/start
jid:
20200428210149620344
return:
----------
data:
----------
gcp6387testsse01.earaa6387.gcp.slb.com_master:
----------
test_|-orchestrate_state-success_|-Data success on Orchestration Call_|-configurable_test_state:
----------
__id__:
orchestrate_state-success
__run_num__:
1
__sls__:
orchestrate_state
changes:
----------
comment:
Finished ok
event_data: {u'_stamp': u'2020-04-28T21:01:45.194576', u'cmd': u'_minion_event', u'data': u'Minion gcp6387temp38 started at Tue Apr 28 21:01:45 2020', u'id': u'gcp6387temp38', u'pretag': None, u'tag': u'salt/minion/gcp6387temp38/start'}
duration:
0.31
name:
Data success on Orchestration Call
result:
True
start_time:
21:01:54.636565
test_|-orchestrate_state_data_|-Data and tag pillar_|-configurable_test_state:
----------
__id__:
orchestrate_state_data
__run_num__:
0
__sls__:
orchestrate_state
changes:
----------
comment:
event_data: {u'_stamp': u'2020-04-28T21:01:45.194576', u'cmd': u'_minion_event', u'data': u'Minion gcp6387temp38 started at Tue Apr 28 21:01:45 2020', u'id': u'gcp6387temp38', u'pretag': None, u'tag': u'salt/minion/gcp6387temp38/start'}
tag: salt/minion/gcp6387temp38/start
duration:
0.417
name:
Data and tag pillar
result:
True
start_time:
21:01:54.636010
outputter:
highstate
retcode:
0
success:
True
user:
Reactor
StartTime:
2020, Apr 28 21:01:49.620344
jid:
20200428210149620344
I was able to deploy 30 to 50 minions without issues by adding some randomness to the orchestration...
{% set random_start = range(1, 30) | random %}
pause:
salt.runner:
- name: test.sleep
- s_time: {{ random_start }}
ZD-4955
I've been working to make this a minimal reproducible case to make this easier to work on, and just to confirm that it's still an issue. Short answer is that this is reproducible in 3001. Resubmitting below with the bug template.
Near-concurrent reactions that call orchestrations are not reliable. Orchestrations may be dropped, or may be duplicated.
/etc/salt/master.d/reactor.conf:
reactor:
- 'salt/job/*/ret/*':
- '/srv/reactor/gh54045/reaction_state_finished.sls'
/srv/reactor/gh54045/reaction_state_finished.sls:
{% set i = data.get('fun_args')[1].get('pillar').get('i') %}
{% do salt.log.debug("GH54045 reactor ({})".format(i)) %}
reaction_state_finished:
runner.state.orchestrate:
- args:
- mods: orch.gh54045.state_finished
- pillar:
data: {{ data|json }}
/srv/salt/orch/gh54045/state_finished.sls:
{% set i = salt.pillar.get('data').get('fun_args')[1].get('pillar').get('i') %}
{% do salt.log.debug("GH54045 orchestration ({})".format(i)) %}
/srv/salt/gh54045/init.sls:
{% set minion = salt.pillar.get('my_minion', 'minion1') %}
{% for i in range(10) %}
send_notification_{{ i }}:
salt.state:
- tgt: {{ minion }}
- sls:
- gh54045.succeed
- pillar:
i: {{ i }}
{% endfor %}
/srv/salt/gh54045/succeed.sls:
{% set i = salt.pillar.get('i', 'UNKNOWN') %}
succeeded:
test.succeed_without_changes:
- name: "SUCCEEDED: iteration {{ i }}"
Run:
salt-run state.orch gh54045
Watch or grep the master logs after the fact for GH54045. Here's a representative run:
2020-06-24 15:33:52,451 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (0)
2020-06-24 15:33:52,976 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (1)
2020-06-24 15:33:53,433 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (2)
2020-06-24 15:33:53,887 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (1)
2020-06-24 15:33:53,927 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (3)
2020-06-24 15:33:54,399 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (4)
2020-06-24 15:33:54,934 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (5)
2020-06-24 15:33:55,257 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (4)
2020-06-24 15:33:55,435 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (6)
2020-06-24 15:33:55,705 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (5)
2020-06-24 15:33:55,907 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (7)
2020-06-24 15:33:56,177 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (6)
2020-06-24 15:33:56,385 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (8)
2020-06-24 15:33:56,634 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (7)
2020-06-24 15:33:56,834 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (9)
2020-06-24 15:33:57,555 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (9)
All 10 orchestrations from 0-9 shown in the logs. Note that 0, 2, 3, and 8 are missing in this example. Occasionally, there will be duplicate numbers in the GH54045 orchestration log messages.
If I add to /srv/salt/gh54045/init.sls:
- parallel: True
- concurrent: True
... the situation is much worse. E.g.:
2020-06-24 15:41:23,019 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (3)
2020-06-24 15:41:23,085 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (8)
2020-06-24 15:41:23,160 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (4)
2020-06-24 15:41:23,229 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (0)
2020-06-24 15:41:23,240 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (7)
2020-06-24 15:41:23,252 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (2)
2020-06-24 15:41:23,354 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (1)
2020-06-24 15:41:23,459 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (6)
2020-06-24 15:41:23,485 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (5)
2020-06-24 15:41:23,543 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 reactor (9)
2020-06-24 15:41:28,457 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (0)
2020-06-24 15:41:28,522 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (0)
2020-06-24 15:41:29,119 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (2)
2020-06-24 15:41:29,264 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (5)
2020-06-24 15:41:29,269 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (5)
2020-06-24 15:41:29,310 [salt.loaded.int.module.logmod:42 ][DEBUG ][7729] GH54045 orchestration (5)
salt --versions-report
Salt Version:
Salt: 3001
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.11.1
libgit2: Not Installed
M2Crypto: 0.35.2
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: Not Installed
pycryptodome: Not Installed
pygit2: Not Installed
Python: 3.6.8 (default, Apr 2 2020, 13:34:55)
python-gnupg: Not Installed
PyYAML: 3.13
PyZMQ: 17.0.0
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.1.4
System Versions:
dist: centos 7 Core
locale: UTF-8
machine: x86_64
release: 5.3.0-59-generic
system: Linux
version: CentOS Linux 7 Core
ZD-5336
I think I got bitten by this bug on 2019.2.5, but it is hard to reduce my states to a minimal test case. Unfortunately, it looks like Salt reactors can't be relied upon 馃槦
When I run the above suggested steps to reproduce this I get the attached log.
gh54045.log
@oeuftete I was able to replicate your results using your example. @Ch3LL mentioned testing with this PR: https://github.com/saltstack/salt/pull/56513 -- and it appears to solve the issue in your example. I'm seeing all reactor and orchestration logs without any duplicate or missing indices in the log messages. Would you mind checking out that PR to see if you get similar results?
@theskabeater Yes, I just tested now the current master (04a0068dc1) + a patch generated from #56513 and I'm seeing all of the reaction orchestrations happen now, once only, in the parallel and non-parallel cases. Great!
I'm also going to link in the issue at https://github.com/saltstack/salt/issues/58369 to make sure this issue is visible there.
Since https://github.com/saltstack/salt/pull/56513 is merged I am closing here
Are there PRs other than #56513 that this fix is dependent on? 3002.2 appears to have all the modifications from that PR and I'm still seeing this issue.
# salt-call -V
Salt Version:
Salt: 3002.2
Dependency Versions:
cffi: 1.11.5
cherrypy: Not Installed
dateutil: 2.6.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.10.1
libgit2: Not Installed
M2Crypto: 0.35.2
Mako: Not Installed
msgpack: 0.6.2
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: 2.14
pycrypto: Not Installed
pycryptodome: Not Installed
pygit2: Not Installed
Python: 3.6.8 (default, Apr 16 2020, 01:36:27)
python-gnupg: 0.4.6
PyYAML: 3.12
PyZMQ: 19.0.0
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.3
System Versions:
dist: centos 8 Core
locale: UTF-8
machine: x86_64
release: 4.18.0-193.28.1.el8_2.x86_64
system: Linux
version: CentOS Linux 8 Core
@lomeroe I will reopen this. The important memory leak fixes that went into 3002.2 reverted the changes that resolved this issue. I believe that pending PR https://github.com/saltstack/salt/pull/58853 will re-resolve this issue.
@oeuftete good info, thanks!
looks like #58853 was merged and will be in the Aluminium release, closing this issue
ZD-6199. 3002.2, reactors not involved, multiple concurrent orchestrations using inline pillar data, though.
Most helpful comment
@theskabeater Yes, I just tested now the current master (04a0068dc1) + a patch generated from #56513 and I'm seeing all of the reaction orchestrations happen now, once only, in the parallel and non-parallel cases. Great!
I'm also going to link in the issue at https://github.com/saltstack/salt/issues/58369 to make sure this issue is visible there.