As well as the current failure task state resulting from any non-zero exit status from that task's script, we could support triggering off of specific exit statuses. For example (using a syntax with parentheses for illustration, though I am aware that syntax may not be viable):
graph = """
foo:fail => bar # Standard failure: captures all non-zero codes
bar:fail(1) => pub # New: trigger pub if bar fails with exit status 1...
bar:fail(2) => wop # ...but trigger wop if it instead fails with exit status 2.
# Any other bar exit status does not trigger anything.
"""
this graph would distinguish & take a different scheduling course depending on whether bar fails with exit code 1, or 2, or any other non-zero code.
While users are perhaps unlikely to have need to differentiate between direct script setting exit cases, I raise this because with this feature exit codes would essentially become parameters allowing for greatly extended control in scheduling. Instead of only having standard task "final" states of succeeded, failed & submit-failed (& in a sense expired, which is a final state of sorts I understand), there would be essentially unlimited (in practice, 255) possible endpoints available for users to catch in their scripts to trigger off a myriad of possible cases arising in them. Though, it would be a separate specification (e.g. the parentheses syntax); I am not suggesting the standard failure(& success) cases should go, as users would often not need this advanced flexibility.
As a superficial example, note how various end cases of interest can be used to branch the scheduling in the below. Naturally, in a real case, the code would be much more involved; imagine the sys.exit(N) calls are placed at points of interest in the script control flow each with some chosen N = 0, ..., 255.
[runtime]
[[my_task]]
script = "failure-mode-demo.py"
bin/failure-mode-demo.py# ...
# ...
# ... More involved code here! 'this' variable may get set.
# ...
# ...
if not this:
sys.exit(1) # endpoint 1: exit code 1, failure mode
try:
import my_module
my_module.some_operation(this) # say this logically can hit a TypeError
except ImportError:
sys.exit(2) # endpoint 2: exit code 2, different failure mode
except TypeError:
sys.exit(3) # endpoint 3: exit code 3, different failure mode
# endpoint 4: exit code 0, success
I'll just note that we can already achieve the same thing with custom task messages - by translating (in job scripting) application return codes into meaningful messages, and triggering tasks off of those. However, for applications that do have well-defined return codes for specific error conditions, this is a good proposal (as it reduces effort - no need to use custom task messages).
Ah, nice, that's a good point! THanks @hjoliver. I guess the crux of this Issue then becomes making it simpler & more explicit to set exit code specific triggering up, via the suite.rc instead of individal custom task messages.
It's a speculative one perhaps for future, so there isn't too much more to say right now I don't think!
we can already achieve the same thing with custom task messages
Kinda, but also kinda not as custom task messages aren't exit states so don't work particularly well as switches in workflows. They need to be combined with :succeed or whatever:
foo:succeed & foo:msg1 => bar
foo:succeed & foo:msg2 => baz
bar | baz => pub
This would definitely be a nice feature, I think we may have talked about it in a June meeting a couple of years back? I remember a discussion about the awkwardness of doing this nicely at the moment as script might not be set to a single executable but could be an inline bash-script. There could also be pre-script, init-script, env-script etc, any of which could have produced the non-zero return code.
we can already achieve the same thing with custom task messages
Kinda, but also kinda not as custom task messages aren't exit states so don't work particularly well as switches in workflows. They need to be combined with
:succeedor whatever:
Indeed. Custom messages allow to kick off dependent tasks midway execution of the triggering task, which is sometimes really useful (e.g. a polling task waiting for forecast of successive leadtimes and kicking off their processing as they become available).
In the current set up, the main issues are:
*script is not a single command, but a script fragment that can run multiple commands.What can we do?
script gets used to determine the return code. In https://github.com/cylc/cylc-flow/blob/db8872086857fd8d4ad5dff5b6765bb9c770dcb2/cylc/flow/etc/job.sh#L137-L139 we would capture the return code only when running the script part.Kinda, but also kinda not as custom task messages aren't exit states so don't work particularly well as switches in workflows. They need to be combined with :succeed or whatever
Kinda, but also kinda not, but also more kinda than kinda not. As I suggested you would detect the underlying exit status in the script then send the custom message before exiting (immediately or later, do what you need). So for this use case the custom message is more or less as good as a task exit status, and you don't need to worry about using the actual task exit status in the graph as well.
That's not to deny that proper exit statuses would be better, however! (Just saying it's easy enough to workaround with current custom messages).
@matthewrmshin's suggestion may be good,
(#3440 should allow to capture the exit code from user scripts in a consistent manner.)