salt 🚀 - salt exit codes | bleepingcoder.com

Thanks for bringing this up, I believe the behavior you are seeing is the expected behavior. The command salt <minion> file.access is executed successfully in both cases, so the return value in both cases is 0, regardless of the result of the file.access function itself.

jfindlay on 26 Nov 2014

hey, sorry. i would expect a different behavior, like if my test goes wrong, the return code should be 1 (http://tldp.org/LDP/abs/html/exitcodes.html) regardless if my salt command runs successfully.

salt <minion> file.file_exists /some/file/that/doesn't/exists ; echo $?
   False
0

salt <minion> file.file_exists /some/file/that/exists ; echo $?
   True
0

Deshke on 26 Nov 2014

My reason for thinking a return of 0 was appropriate in this circumstance is that I am thinking of a return code as local to a system. Since salt runs successfully as intended on the master, it's return code would not be affected by the state of the remote minion. I admit that my reasoning on this is not very solid and am inclined to rather support your arguments.

A further consultation of the applicable standards as you have suggested is needed. I believe there is an issue somewhere about generally refactoring return statuses/return codes. I can't find it at the moment, but that ought to be referenced here.

jfindlay on 1 Dec 2014

Here it is: #7013.

jfindlay on 2 Dec 2014

@jfindlay , thanks for the info on #7013. This bug should be classified above "Low Severity", IMO.

tjyang on 3 Dec 2014

@tjyang, you're right. It seems that not all the return code issues were not resolved with #11337.

jfindlay on 3 Dec 2014

This is the plan for unifying and standardizing salt exit codes.

salt beryllium:
- Introduce a standard exit code enum-like data structure, probably somewhere in exitcodes.py such that integers and enums could be used interchangeably, but with enums being preferred within the source code.
- Add a configuration option to enable the use of this new exit code standard by automatically evaluating the truthiness of the return value of the execution module functions and translating it into exit codes.
salt boron: Ensure that all salt execution modules explicitly comply with
the new salt exit code standard.

@thatch45, @cachedout, @basepi, or @UtahDave may have further commentary on this issue.

jfindlay on 9 Dec 2014

+1

inthecloud247 on 13 Feb 2015

+1

simple10 on 13 Feb 2015

+1

I've just raised/tested few other similar issues particularly important for me. Searching for "zero exit" gives a list of issues related to lack of error indication for CLI.

Lack of usability

It effectively makes automation (the very similar reason why Salt is even used in the first place) around Salt commands very inconvenient. And it definitely seems pervasive for Salt (many CLI commands, many functions called through CLI, etc.).

For example, the following are some use cases which require developing custom scripts to analyze Salt output just get Failed/Succeeded result:

VM provisioning tools (such as Vagrant with all possible providers) which use Salt to configure new instance won't detect failure.
Any Jenkins (or other CI platform) jobs which apply configuration or run other Salt jobs won't detect failure.

uvsmtid on 18 Mar 2015

👍3 😕1

relates to #2417.

jfindlay on 18 Apr 2015

Hi,

+1 @uvsmtid for any jenkins jobs, and anything using state.orchestrate

ealphonse on 12 Jun 2015

I've just revisited this place due to related problems and remembered that I wrote an ad-hoc script to sniff failures from Salt output as part of Salt-based automation project. The script, however, is independent/single/standalone Python file. Some additional details in this SO answer.

uvsmtid on 2 Jul 2015

@uvsmtid, do you know if there are other github issues that document the issues you mention in that stackoverflow answer? I have some ideas to automate plugin conformation with unit and possibly integration testing. This is a big project that I've been wanting to do for a while and need to set aside time for it, but first I need to collect the scattered wisdom and user experience on the details for which no clear convention has been either implemented or documented. I think most users would agree that a consistent and a standard plugin behavior is much more important than a favorite plugin behavior.

jfindlay on 2 Jul 2015

@uvsmtid, also my stackoverflow credit is too low to comment there, but if you want to parse the output that is returned by salt, you may have more success writing a custom returner that ensures a valid json data structure is outputted. I know there is at least one open issue, which I am having trouble finding. related to this for the default returner, which prioritizes real time info over producing valid json since you don't know when or if minions will return before the timeout.

jfindlay on 2 Jul 2015

@jfindlay, I din't look up the output issues (1: more than one JSON object, 2: mixed STDOUT of JSON and non-JSON) - it was directly from my observation on the spot. The custom returner may be better, but I haven't written any so far - so, it's to be looked into.

We can split concerns at this point: (A) exit codes and (B) output format/convention/protocol.
If you create feature-issue regarding "automate plugin conformation", I'll help to find or create non-duplicate bug-issues and link them under your solution umbrella to track.

uvsmtid on 3 Jul 2015

@uvsmtid, thanks for the feedback. The stdout issues are going to have to be addressed in the returner/outputter context and you are welcome to open an issue about that because I fail at finding the other issue(s). I've dumped all my ideas about exec and state modules in #25142 and #25143, respectively.

jfindlay on 3 Jul 2015

+1

A big reason for us selecting a tool like salt is to orchestrate deployments.

@uvsmtid - thank you for your contributions to this issue

bradthurber on 8 Jul 2015

et moi +1

PredatorVI on 20 Aug 2015

@jfindlay so 2015.8 is which code name?

DanyC97 on 21 Aug 2015

2015.8.0 is codenamed Beryllium.

Boron is tentatively targeted for 2015.11.0

basepi on 21 Aug 2015

I'm amazed this was originally labelled as low priority!!!

+1

rgardam on 27 Aug 2015

@rgardam welcome to the party :) ...nothing should surprise i'd say

DanyC97 on 11 Nov 2015

Where is the proper place to cheer for this issue? Just integrated a runner into my Buildbot workflow, but would need to do a jobs.lookup_jid on an ID parsed from the output of the original runner and in turn parse the YAML output of the lookup and examine the success key.

I'd much rather have $? ...

My actual issue is that I see a return code of 0 even when my runner explodes (like a Python exception occurs) which is not very useful.

bmcorser on 16 Dec 2015

@bmcorser, you've found the right place, cheer away.

jfindlay on 17 Dec 2015

+1

immediately stopping builds when test.ping returns $? -ne 0 would be just nice

cehrig on 5 Jan 2016

+1

We're using Salt as part a continuous delivery workflow, and it's really problematic that when running salt '*' state.highstate on the master, when states fail in minions for some reason (with Result: False in the logs), the salt command still exits with 0. We'll have to resort to parsing log files to find out if the one thing that Salt does in that workflow (configuration management) actually succeeded or not.

fernandoacorreia on 14 Jan 2016

+1 exit code standard is required. Running state.* vs file.file_exists (or test.ping etc) should be treated differently. And also target should also have an impact e.g. '' vs real hostname or set of hostnames. e.g. salt '' state.highstate (soft error) vs salt 'hostname' state.highstate (hard error)
There is sort of a standard.. http://tldp.org/LDP/abs/html/exitcodes.html
If something does not go to plan it should indicate to look at the logs with an exit code.

May be an option to indicate a person is calling it as part of a script, or set of options which indicates what the executer cares about. e.g. missing hosts, or failed to run on a host, or does not care if it does not run on a host.

May be a bit based exit code for apply state, highstate

 0 - All as expected, no need to review the output
 1 - Not All Targets Reachable
 2 - Issues Applying states
 4 -
 8 - 
16 -
32 - (last one) as exit code 127 is not found

1+2= exit code 3

damon-atkins on 15 Jan 2016

+1

The whole CI+Salt setup is insanely powerful, but somewhat limited by this.

kako-nawao on 1 Mar 2016

It is also important to expose this to the salt-api.
Here is the output from a 2015.8.8 salt command invocation of cmd.run 'false 1':

# salt 'web1' cmd.run 'false 1'
web1:
ERROR: Minions returned with non-zero exit code

While the API does not include the necessary info when sending to the / endpoint:

{
    "return": [
        {
            "web1": ""
        }
    ]
}

When you lookup the job via /jobs/JID you get slightly more info, but still nothing useful for determining exit status.

{
    "info": [
        {
            "Arguments": [
                "false 1"
            ],
            "Function": "cmd.run",
            "Minions": [
                "web1"
            ],
            "Result": {
                "web1": {
                    "return": ""
                }
            },
            "StartTime": "2016, Apr 11 15:59:42.643646",
            "Target": "web1",
            "Target-type": "glob",
            "User": "jenkins",
            "jid": "20160411155942643646"
        }
    ],
    "return": [
        {
            "web1": ""
        }
    ]
}

and this same issue occurs with the service execution module. Here is an example of a service failing to restart:

{
    "info": [
        {
            "Arguments": [
                "klsdklsd"
            ],
            "Function": "service.restart",
            "Minions": [
                "web1"
            ],
            "Result": {
                "web1": {
                    "return": false
                }
            },
            "StartTime": "2016, May 27 14:08:28.970138",
            "Target": "web1",
            "Target-type": "glob",
            "User": "jenkins",
            "jid": "20160527140828970138"
        }
    ],
    "return": [
        {
            "web1": false
        }
    ]
}

mchugh19 on 11 Apr 2016

👍1

+1
How come there is still no way of knowing the exit status of a remote command (cmd.run) ?
salt-run jobs.print_job [jid] doesn't return anything status-related... :(

zerthimon on 14 Apr 2016

👍2

This not being addressed as top priority is a pure negligence. How do you think salt* usage would be scripted without this bug being resolved? How do you test your executables if your exit codes don't have a meaning??

cjelli on 14 Apr 2016

👍1

+1
This issue has caused great pain in our normal deployment process. Actually it has caused a few production live site issues already. We have to apply other ad-hoc detection of the deployment result which is really unnecessary. For state.highstate at least we have succeeded and failed summary, but for others such as state.apply or cmd.run we have to review all the output and check if the command really succeeds or not.

I can't believe this is addressed as low priority. It is opened 2 years ago and still open.

zhany on 21 Apr 2016

👍3

This is really serious compliance issue. How I can make sure without parsing output that all my states applied correctly or will applied correctly?

adubkov on 26 Apr 2016

👍1

+1

htssouza on 27 Apr 2016

👍1

:+1:

rhansen on 2 May 2016

@blacked We are using serverspec on every salt run to make sure that the states are actually applied correctly.

rgardam on 6 May 2016

@meggiebot Please include me when this is added to a sprint for Carbon. I would like to coordinate these changes with salt-api and Salt Enterprise so that all three are on the same page going forward.

whiteinge on 9 May 2016

👍1

+1

ryanchapman on 6 Jun 2016

Isn't this what --retcode-passthrough is supposed to do?

plastikos on 10 Jun 2016

I could be mistaken but I believe this is what --retcode-passthrough is supposed to _expose_.

whiteinge on 10 Jun 2016

@whiteinge , somehow I'm just not understanding what the nuances are of your term "expose". I get the feeling I'm missing something.

My understanding is that when --retcode-passthrough is specified that the aggregate exit status of the command is returned rather than the status of the salt executable itself.

As far as compliance (@blacked), it would be reasonable to query the job cache to see if any of the retcodes were non-zero. (yes, I get that it's somewhat awkward)

plastikos on 10 Jun 2016

retcode-passthrough is not generic for all salt tools:

[root@salt-master ~]# salt-cloud --retcode-passthrough
Usage: salt-cloud

salt-cloud: error: no such option: --retcode-passthrough

cjelli on 10 Jun 2016

Right, --retcode-passthrough does this, and it does not really make a lot of sense for salt cloud, because salt-cloud should just five us good retcodes.
The real issue here is that we need to create a standard retcode definition and location, then go through the execution modules and normalize the retcodes to the standard

thatch45 on 10 Jun 2016

I would expect that there would be a range of retcodes for the specific tool and then another range for failures in anything that the tool manages. For example 0-127 might represent status for the specific tool (viz. salt) where subprocesses/managed devices would have an automatic passthrough of their retcode+128. That way if all you care about is pass/fail you get it without the --retcode-passthrough option. If you want to distinguish between a failure in the invoked tool vs. managed devices then you could test the range on the retcode. One thing to keep in mind is that status of 128+SIGNUM. Often times on POSIX exit status is only 8 bits rather than the full int - although newer calls that use waitid() have access to the full int (which is most software by now).

I have no idea what Windows does.

We could certainly compress to a narrow subrange just for minion retcode pass-through.

plastikos on 10 Jun 2016

👍2

There is also an additional problem: several of the CLI tools have broken exit status handling where some of them _always_ exit with status 0 even when there is an error. I have been working through a set of fixes as well as integration tests for many of the CLI tools. Some of them are submitted and merged and others are waiting acceptance.

Examples:
* #33762
* 1001d2b9e7e630c005cb24d5c43211f5d47d1b11
* 48fc51cb7b39b2d02c5b4b7adcc904962180a54b

plastikos on 11 Jun 2016

Similar problem with salt-call... exits with 0 even when there's a serious error, e.g. "Unable to render top file". --retcode-passthrough does not remediate this.

Best solution is to tee the output to a file, and then grep for error messages.

RobotLimeLtd on 20 Jul 2016

Give me a working example and I will see what I can do

edgan on 20 Jul 2016

ZD-935

rickh563 on 6 Sep 2016

I fixed about one of these bugs, #35964 .

edgan on 6 Sep 2016

"Unable to render top file" is working for me via salt-ssh, which calls salt-call internally. This is based on a pull request I did a while ago.

Version: 2016.3.2

salt-ssh -i 'foo.acme.com' state.highstate
[ERROR   ] Unable to render top file: while parsing a block mapping
  in "<unicode string>", line 1, column 1:
    base:
    ^
expected <block end>, but found '<block sequence start>'
  in "<unicode string>", line 4, column 5:
        - base
        ^
foo.acme.com:

Summary for foo.acme.com
-----------
Succeeded: 0
Failed:   0
-----------
Total states run:    0
[ngrennan@host _ssh_master_dev]$ echo $?
20

'*' vs '*': in top.sls

edgan on 6 Sep 2016

ZD-1502 & ZD-935

rickh563 on 14 Jun 2017

Wait, what is the current status of retcodes for salt operations? It's been 4 years since the issue was created...

Oloremo on 3 Nov 2018

😕1

Exit codes have been normalized in https://github.com/saltstack/salt/pull/48361 for the upcoming Fluorine release. The following conditions will set a nonzero exit code:

An exception is raised
The code being run sets __context__['retcode'] to a nonzero value
The return data for a given function is a dictionary with either a result or success key in it, which resolves as False

file.file_exists would not trigger any of those conditions, however. The problem here is if we use something like __context__['retcode'] in file.file_exists to denote failure, then we've marked the run as failed, so if any state invokesfile.file_exists, this would make salt reutrn a nonzero exit code even if the file not existing was expected (and therefore not an error).

terminalmage on 6 Nov 2018

I'm a bit confused about the status - I've just picked up salt 2018.3.4 which has the latest fixes in for return codes for salt-call. If I pass the argument --retcode-passthrough to salt-call I now get non-zero exit codes for failures but without it I always get 0 - is this the expected functionality at the current time? Anybody know when this functionality becomes the default?

oliver-dungey on 4 Mar 2019

I think this is expected, as far as I know. To get status code from operation executed by salt-call you have to use --retcode-passthrough if nothing was changed.

adubkov on 4 Mar 2019

👍1

@oliver-dungey As explained above, the retcode changes are in the Fluorine release (2019.2.0).

terminalmage on 26 Mar 2019

👍1

@saltstack/team-core Since Fluorine was released, this can be closed.

terminalmage on 26 Mar 2019

👍2

Using salt version below, the issue with the incorrect exit code is still present when you add --batch-size and --batch-wait parameters

Salt Version:
           Salt: 2019.2.0

Dependency Versions:
           cffi: 1.6.0
       cherrypy: unknown
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: 0.26.3
        libnacl: Not Installed
       M2Crypto: 0.31.0
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.7.3
         pygit2: 0.26.4
         Python: 2.7.5 (default, Apr  9 2019, 14:30:50)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.6.1810 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-957.5.1.el7.x86_64
         system: Linux
        version: CentOS Linux 7.6.1810 Core

sudo salt -v -L minion-1,minion-2 --batch-size 50% --batch-wait 1 state.apply queue=True test-state

Even though there is an error:

jid:
    20190613124535690469
retcode:
    1
gc-euw1-salt-1:
    Data failed to compile:
----------
    Rendering SLS 'base:test-state' failed: Jinja syntax error: expected token 'end of statement block', got 'string'; line 1

salt exit status is 0.

echo $?
0

dz-pyps on 13 Jun 2019

👍2

Please open a new issue. Commenting on a closed issue doesn't do much good.

terminalmage on 13 Jun 2019

salt exit codes

Most helpful comment

All 60 comments

Lack of usability

Related issues