Data which we would like available in the GraphQL schema:
progress~ (do it client side)dt field.isHeld - #3230isRetry~ superseded by #3423execution retry delays or submission retry delays not to user intervention.status / status_msg - #3267Note: these fields might not have a direct mapping onto data which is currently available to Cylc Flow internally. They might be awkward or not really possible at the moment.
Pull requests welcome!
This is an Open Source project - please consider contributing code yourself
(please read CONTRIBUTING.md before starting any work though).
In the past, we have avoided making too much change to the task state internal representation mainly due to compatibility issue with the GUI representation. Now that the old GUI is gone, we should be in a much better position to work on this...
For held, the current internal representation is basically (status: str, hold_swap: str) so it can look like ("held", "waiting") (which turns back to ("waiting", None) on release). It would make more sense to change it to (status: str, is_held: bool) - so we can get rid of the complex status swap logic.
You are right about retry and submission retry needing more discussion. To me, they are basically ("waiting", is_held=True) status - the task is held by the next (submission) retry delay - and will be automatically released on completing the delay.
I also can't remember if retry and submission retry can be used as task outputs or not.
To me, they are basically
("waiting", is_held=True)
Presumably that's ("waiting", is_[sub_]retry=True)
I also can't remember if retry and submission retry can be used as task outputs or not.
They can't as far as I'm aware so we are safe there.
No, I did mean ("waiting", is_held=True) - the task is being held by a retry delay. The alternate view is simply a ("waiting", None) status - but now it has a new prerequisite in the form of a retry delay.
(I am sure there are many ways to look at this problem. :smile:)
No, I did mean ("waiting", is_held=True)
I kinda get what you mean but a held state won't make sense to the user, might make some sense as an xtrigger though.
This kinda comes down to data representation / UI so I'll leak some cylc/cylc-ui stuff here. How do we represent retrys? Here are four options off the top of my head, feel free to suggest others, I'm happy to mock them up:

#1 - custom icon for each retry state.+ clear separation of retry and task state.- more icons => more confusion.#2 - discrete retry symbol+ clearer separation of retry and task state- may be interesting to graphically represent a held retrying task (which is, of course possible)#3 - discrete held symbol+ one less state to worry about+ communicates what cylc is actually doing- the user diddn't actually hold the task and will be confused as to why it is held- held retrying tasks...#4 - do nothing+ simple!- confusing!#5 - clock-face counting up to next retry time?+ gives user access to information, otherwise hard to find- information not available to GUI yet- non-intuative UIThe (submission) retry state is only applicable while the task is waiting for the clock. Once submitted, the multiple job icons should make it obvious that the task has been retried or re-triggered. Perhaps the job icons should display whether it is an automatic retry or a manual re-trigger? E.g. nothing for automatic retry and an M in the job icon for a manual re-trigger?
E.g. nothing for automatic retry and an M in the job icon for a manual re-trigger?
A little :hand: badge for manual?
Note we also discussed in Exeter modifying edge style in the graph view (I think??), to indicate manual intervention (e.g. task was manually triggered despite prerequisites not being satisfied).
Perhaps the job icons should display whether it is an automatic retry or a manual re-trigger?
I think this would be good, something else we have been asked for is if we could display the retry number e.g:
1/∞ # infinite potential retries e.g. PT5M
3/4 # finite retries e.g. PT5M, 3*PT10M
It would be also good to tell apart normally succeeded tasks from manually succeeded ones. This is helpful for troubleshooting operational suites in the heat of failures, where actions were taken by operators and the support team is called in after the fact. Or more generally, having a clear, visual indication at a glance of where user interaction happened in the suite (manual task trigger, succeed, insertion/deletion, etc.) would be quite useful.
@dwsutherland asked for thoughts in a comment that I will cross-post to leave as a question for those following this Issue (it doesn't strictly relate to this Issue, but I was looking for a suitable enough one on cylc-flow to re-raise it in with those who know more about the plans for the task/job data side to comment than I):
In the job pool (store of job data elements); the creation of an element happens just before job submission, so I added the "ready" state to them..
I guess this relates to TASK_STATUS_READY in the following (correct me if I am wrong, David, thanks)?
@sadielbartholomew - Correct, jobs are usually submitted soon after creation, but there is a space between job file creation (where/when I create the data element alongside).. So ready made sense.
Not sure I understand the "ready" state discussion above. The "ready" state means "ready to run" ... i.e. prerequisites satisfied and queued to the subprocess pool for job submission. If the subprocess pool is small and/or you have a bunch of long-running processes executing in it (e.g. slow event handlers) then tasks can stay in the "ready" state for a while. The moment of job file creation doesn't really have task state implications.
(The ready state was called the submitting state in the distant past.)
Not sure I understand the "ready" state discussion above. The "ready" state means "ready to run" ... i.e. prerequisites satisfied and queued to the subprocess pool for job submission. If the subprocess pool is small and/or you have a bunch of long-running processes executing in it (e.g. slow event handlers) then tasks can stay in the "ready" state for a while. The moment of job file creation doesn't really have task state implications.
Jobs have states too... Job file creation has job state implications "ready to submit"..
And we can even complicate matters by adding the (future) trigger-edit workflow to the mix:
(What's the status at the various stages?)
At the moment, the job data element is:
ready state on job file write.So job states are a subset of task states, although ready means something slightly different I suppose.
(not saying this is how it should be of course)
Jobs have states too... Job file creation has job state implications "ready to submit"..
Hmmm. Not necessarily. I would have thought that a job does not exist until the moment it is submitted (and job file creation is something that the task does before that).
I'm really talking about task and job states that users need to be aware of. Which doesn't necessarily mean we don't need job-related stuff in the back end beyond those states. But I don't think we should refer to those as "job states" ... in the interest of avoiding confusion.
Options for dealing with the "retry" state.
TaskState called is_retry (similar to is_held).is_held logic.I like wallclock xtrigger idea. In that case, if a task fails and has a retry delay lined up can we just do this:
So there's really no need for a special retry attribute or use of the "held" state (the trouble with held is, it would need to be a self-releasing hold, which is weird).
We could use a special variant of the wallclock xtrigger, that takes an absolute time instead of a cycle point offset, then we could easily tell the difference (for display purposes) between a normal clock trigger and a retry one.
@matthewrmshin -
(The
readystate was called thesubmittingstate in the distant past.)
Ha, I'm suggesting going back to that https://github.com/cylc/cylc-admin/pull/47
On "retry" again: if we just use waiting state plus clock trigger, the new job status icons will show definitively that the task is going to retry (you'll see the previous failed job, but the task state is waiting, not failed). Nice :+1:
@matthewrmshin -
(The
readystate was called thesubmittingstate in the distant past.)Ha, I'm suggesting going back to that cylc/cylc-admin#47
What if the task state is ready or queued, wouldn't you think it's misleading to have a job state submitted? To me submitted implies the handing over of a script/job to the batch system ..
@dwsutherland - submitting not submitted
The idea is that once a task's prerequisites are satisfied, we go through the process of submitting it (which may take some time), after which it is indeed submitted (to the batch system).
(I think the original change of terminology from "submitting" to "ready" was because technically we are submitting the job only when running the qsub process (e.g.) which happens at the end of the "ready" state. But that is probably just splitting hairs as far as users are concerned.)
Still, do you want job state submitting while a task is queued?
?? I don't follow you. Submitting (aka ready) and queued are two different task states.
Oh, sorry, you said job state, not task state.
There is no job state until the task is submitted.
There is no job state until the task is submitted.
So you think the respective data element created before should have an empty state field?
I'm just talking about the official set of task and job status names that will be exposed to users, and what they mean, exactly. Presumably you already have null job states alongside other task states like "waiting", or is your question really about when the job "data element" should be created? (If the latter, then I guess it should be created when the task achieves the "submitted" state).
The requested fields have either been implemented or superseded so closing this issue.
Most helpful comment
It would be also good to tell apart normally succeeded tasks from manually succeeded ones. This is helpful for troubleshooting operational suites in the heat of failures, where actions were taken by operators and the support team is called in after the fact. Or more generally, having a clear, visual indication at a glance of where user interaction happened in the suite (manual task trigger, succeed, insertion/deletion, etc.) would be quite useful.