In the process of writing some salt orchestration, no matter which options I used, the salt-run orchestration state.sls module/function always ignores the pass/fail return output from the state (that runs on the minion) and the salt-run orchestration is always successful.
I'm using the latest salt released version - 2015.8.3, on both the master and minion (all version info at the bottom).
I first wrote some pass and fail state code:
# /srv/salt/utils/fail.sls
cmd-run-false:
cmd.run:
- name: /bin/false
# /srv/salt/utils/pass.sls
cmd-run-true:
cmd.run:
- name: /bin/true
I next wrote some test orchestration code using state.function:
# /srv/salt/orch/test/orch-by-func-true.sls
orch-by-func-true:
salt.function:
- name: cmd.run
- tgt: mfg-test-iws1
- arg:
- 'true'
Here is the test run and output for orch-by-func-true.sls:
[root@salt test]# salt-run state.orch orch.test.orch-by-func-true
salt.corp.mutualink.net_master:
----------
ID: orch-by-func-true
Function: salt.function
Name: cmd.run
Result: True
Comment: Function ran successfully. Function cmd.run ran on mfg-test-iws1.
Started: 18:39:34.214854
Duration: 396.539 ms
Changes:
mfg-test-iws1:
Summary for salt.corp.mutualink.net_master
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 396.539 ms
# /srv/salt/orch/test/orch-by-func-false.sls
orch-by-func-false:
salt.function:
- name: cmd.run
- tgt: mfg-test-iws1
- arg:
- 'false'
Here is the test run and output for orch-by-func-false.sls:
[root@salt test]# salt-run state.orch orch.test.orch-by-func-false
[ERROR ] {'ret': {'mfg-test-iws1': ''}, 'out': 'highstate'}
salt.corp.mutualink.net_master:
----------
ID: orch-by-func-false
Function: salt.function
Name: cmd.run
Result: False
Comment: Running function cmd.run failed on minions: mfg-test-iws1 Function cmd.run ran on mfg-test-iws1.
Started: 18:42:51.675954
Duration: 397.494 ms
Changes:
mfg-test-iws1:
Summary for salt.corp.mutualink.net_master
------------
Succeeded: 0 (changed=1)
Failed: 1
------------
Total states run: 1
Total run time: 397.494 ms
So far so good, both return the correct return value and the salt-run orchestration behaves correctly.
Next, I wrote some orchestration using state.sls:
# /srv/salt/orch/test/orch-by-state-true.sls
orch-by-state-true:
salt.state:
- name: orch-by-state-true
- tgt: mfg-test-iws1
- sls: utils.pass
Here is the test run and output for orch-by-state-true.sls:
[root@salt test]# salt-run state.orch orch.test.orch-by-state-true
salt.corp.mutualink.net_master:
----------
ID: orch-by-state-true
Function: salt.state
Result: True
Comment: States ran successfully. Updating mfg-test-iws1.
Started: 18:49:17.100715
Duration: 2622.338 ms
Changes:
mfg-test-iws1:
----------
ID: cmd-run-true
Function: cmd.run
Name: /bin/true
Result: True
Comment: Command "/bin/true" run
Started: 13:49:10.257619
Duration: 23.67 ms
Changes:
----------
pid:
3630
retcode:
0
stderr:
stdout:
Summary for mfg-test-iws1
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 23.670 ms
Summary for salt.corp.mutualink.net_master
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 2.622 s
True case looks good, lets try the false case...
# /srv/salt/orch/test/orch-by-state-false.sls
orch-by-state-false:
salt.state:
- name: orch-by-state-false
- tgt: mfg-test-iws1
- sls: utils.fail
Here is the test run and output for orch-by-state-false.sls:
[root@salt test]# salt-run state.orch orch.test.orch-by-state-false
salt.corp.mutualink.net_master:
----------
ID: orch-by-state-false
Function: salt.state
Result: True
Comment: States ran successfully. Updating mfg-test-iws1.
Started: 18:52:26.066099
Duration: 2721.818 ms
Changes:
mfg-test-iws1:
----------
ID: cmd-run-false
Function: cmd.run
Name: /bin/false
Result: False
Comment: Command "/bin/false" run
Started: 13:52:19.312700
Duration: 26.119 ms
Changes:
----------
pid:
3646
retcode:
1
stderr:
stdout:
Summary for mfg-test-iws1
------------
Succeeded: 0 (changed=1)
Failed: 1
------------
Total states run: 1
Total run time: 26.119 ms
Summary for salt.corp.mutualink.net_master
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 2.722 s
This time I think there is a problem. The state fails, but the salt-run orchestration ignores this failure and just succeeds (Succeeded is 1).
The next thing to test was to try and implement a workaround: call the state.sls from the state.function.
(lets do the false case first)
# /srv/salt/orch/test/orch-by-state-false-workaround.sls
orch-by-state-false-workaround:
salt.function:
- name: state.sls
- tgt: mfg-test-iws1
- arg:
- utils.fail
[root@salt test]# salt-run state.orch orch.test.orch-by-state-false-workaround
[ERROR ] {'ret': {'mfg-test-iws1': {'cmd_|-cmd-run-false_|-/bin/false_|-run': {'comment': 'Command "/bin/false" run', 'name': '/bin/false', 'start_time': '13:59:49.792218', 'result': False, 'duration': 26.048999999999999, '__run_num__': 0, 'changes': {'pid': 3678, 'retcode': 1, 'stderr': '', 'stdout': ''}}}}, 'out': 'highstate'}
salt.corp.mutualink.net_master:
----------
ID: orch-by-state-false-workaround
Function: salt.function
Name: state.sls
Result: False
Comment: Running function state.sls failed on minions: mfg-test-iws1 Function state.sls ran on mfg-test-iws1.
Started: 18:59:56.721102
Duration: 2546.751 ms
Changes:
mfg-test-iws1:
----------
ID: cmd-run-false
Function: cmd.run
Name: /bin/false
Result: False
Comment: Command "/bin/false" run
Started: 13:59:49.792218
Duration: 26.049 ms
Changes:
----------
pid:
3678
retcode:
1
stderr:
stdout:
Summary for mfg-test-iws1
------------
Succeeded: 0 (changed=1)
Failed: 1
------------
Total states run: 1
Total run time: 26.049 ms
Summary for salt.corp.mutualink.net_master
------------
Succeeded: 0 (changed=1)
Failed: 1
------------
Total states run: 1
Total run time: 2.547 s
Here, you can see the false workaround case works! The state fails, and the salt-run orchestration does not ignore the failure (Failed is set to 1).
For good measure, I also tested the positive true workaround case:
# /srv/salt/orch/test/orch-by-state-true-workaround.sls
orch-by-state-true-workaround:
salt.function:
- name: state.sls
- tgt: mfg-test-iws1
- arg:
- utils.pass
Here are the results for the positive true workaround case:
[root@salt test]# salt-run state.orch orch.test.orch-by-state-true-workaround
salt.corp.mutualink.net_master:
----------
ID: orch-by-state-true-workaround
Function: salt.function
Name: state.sls
Result: True
Comment: Function ran successfully. Function state.sls ran on mfg-test-iws1.
Started: 19:04:29.652852
Duration: 2569.498 ms
Changes:
mfg-test-iws1:
----------
ID: cmd-run-true
Function: cmd.run
Name: /bin/true
Result: True
Comment: Command "/bin/true" run
Started: 14:04:22.743780
Duration: 23.365 ms
Changes:
----------
pid:
3716
retcode:
0
stderr:
stdout:
Summary for mfg-test-iws1
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 23.365 ms
Summary for salt.corp.mutualink.net_master
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 2.569 s
Is this a bug?
Version info (master):
[root@salt salt]# salt --versions-report
Salt Version:
Salt: 2015.8.3
Dependency Versions:
Jinja2: 2.2.1
M2Crypto: 0.20.2
Mako: Not Installed
PyYAML: 3.11
PyZMQ: 14.5.0
Python: 2.6.6 (r266:84292, Jul 23 2015, 15:22:56)
RAET: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
cffi: Not Installed
cherrypy: 3.2.2
dateutil: Not Installed
gitdb: 0.5.4
gitpython: 0.3.2 RC1
ioflo: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: 0.20.3
python-gnupg: Not Installed
smmap: 0.8.1
timelib: Not Installed
System Versions:
dist: centos 6.7 Final
machine: x86_64
release: 2.6.32-573.3.1.el6.x86_64
system: CentOS 6.7 Final
Version info (mfg-test-iws1 minion):
[root@salt utils]# salt mfg-test-iws1 test.versions_report
mfg-test-iws1:
Salt Version:
Salt: 2015.8.3
Dependency Versions:
Jinja2: 2.5.5
M2Crypto: 0.21.1
Mako: Not Installed
PyYAML: 3.08
PyZMQ: 14.5.0
Python: 2.6.8 (unknown, Nov 14 2015, 01:48:45)
RAET: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.5
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
python-gnupg: Not Installed
smmap: Not Installed
timelib: Not Installed
System Versions:
dist: redhat 5.2 Final
machine: i686
release: 2.6.18-92.1.22.el5
system: CentOS 5.2 Final
@paulfanelli That looks like it should be propagating the failure all the way back up, so yes. I would consider this a bug. Thanks for pointing this out and filing with the detail you did.
I'd like to add: raising an exception in a custom execution module also returns True:
test_module:
salt.function:
- name: test.hello_world
- tgt: vb2
If method hello_world in python module _modules/test.py raises an exception, the output is
salt_|-test_module|-test.hello_world_|-function:
----------
__run_num__:
1
changes:
----------
out:
highstate
ret:
----------
vb2:
The minion function caused an exception: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1071, in _thread_return
return_data = func(*args, **kwargs)
File "/var/cache/salt/minion/extmods/modules/test.py", line 36, in hello_world
raise salt.exceptions.SaltException('this is an exception')
SaltException: this is an exception
comment:
Function ran successfully. Function test.hello_world ran on vb2.
duration:
6025.589
name:
test.hello_world
result:
True
start_time:
23:58:42.278357
The exception is recognised, but the result is still True.
Is this expected behavior or is this part of the bug described here?
Hi,
Is there an ETA to fix this bug?
Thanks!
Ido
Hi,
We are facing the same problem. Is there an ETA to fix this bug?
Regards,
Aditya
This issue seems to be fixed between 2015.8.9 and 2016.3.0
I might have an integration test if anyone is interested.
Still happens as of 2016.11.2
Still happens as of 2016.11.5. A salt-run state.orch returns successfully even if the host to run a state on is not online o_0
Still happening as of 2017.7.2. Hard to believe this hasn't been fixed by now.
Here is an orchestration state:
deploy_check:
salt.function:
- name: file.file_exists
- tgt: {{ some_server }}
- arg:
- /foo/bar
- require:
- salt: some_other_state
- retry:
attempts: 5
interval: 60
And yes, the /foo/bar path above doesn't exist on the server, so this state should fail... but the output shows:
ID: deploy_check
Function: salt.function
Name: file.file_exists
Result: True
Comment: Function ran successfully. Function file.file_exists ran on server-name-xyz.
Started: 22:16:40.639715
Duration: 180.473 ms
Changes:
server-name-xyz:
False
Is there an update on this one? I'm running 2018.3.0 and am seeing the same thing. My custom execution module sets __context['retcode'] = 1 and I'm raising a CommandExecutionError but the orchestration says the step was successful. I don't know how to propagate the failure all the way up :(
The errors all show up in the minion log, but the orchestration doesn't see them.
There isn't work on it now, but i have tagged it to be a blocker for the 2017.7.7 and 2018.3.2 releases.
Daniel
@garethgreenaway Can you take a look at this one?
This should be fixed with #47843
@anitakrueger or @jessebye. Could we get some testing on the above fix, please? Thanks.
Confirmed working!
I added a fail_function as a workaround (basically just returning my False input), but with this fix, the orchestration is much more intuitive. Thanks a lot! Especially since it's only 2 lines :)
Most helpful comment
Confirmed working!
I added a fail_function as a workaround (basically just returning my
Falseinput), but with this fix, the orchestration is much more intuitive. Thanks a lot! Especially since it's only 2 lines :)