Cylc-flow: graphql: wishlist

Created on 16 Jul 2019 · 31Comments · Source: cylc/cylc-flow

Data which we would like available in the GraphQL schema:

[ ] ~progress~ (do it client side)
- Job progress as percent or decimal, to compliment the dt field.
[x] isHeld - #3230
- When a task is held its previous state is stored, when it is un-held that state is restored.
- For GraphQL it would be better to leave the task state unchanged but add a field to show if the task is held or not.
- Simple to implement, change the current "swap" logic (see https://github.com/cylc/cylc-flow/issues/3223#issuecomment-511825969)
[ ] ~isRetry~ superseded by #3423
- Currently retrying is a strange state which a task may pass through very quickly.
- For GraphQL it would be better to leave the task state unchanged but add a field to show if the task is attempting retry.
- Note retry relates to Cylc's execution retry delays or submission retry delays not to user intervention.
- This one might require more discussion.
...
[x] status / status_msg - #3267
- Separate the suite status and status message

Note: these fields might not have a direct mapping onto data which is currently available to Cylc Flow internally. They might be awkward or not really possible at the moment.

Pull requests welcome!
This is an Open Source project - please consider contributing code yourself
(please read CONTRIBUTING.md before starting any work though).

Source

oliver-sanders

Most helpful comment

It would be also good to tell apart normally succeeded tasks from manually succeeded ones. This is helpful for troubleshooting operational suites in the heat of failures, where actions were taken by operators and the support team is called in after the fact. Or more generally, having a clear, visual indication at a glance of where user interaction happened in the suite (manual task trigger, succeed, insertion/deletion, etc.) would be quite useful.

TomekTrzeciak on 18 Jul 2019

👍3

All 31 comments

In the past, we have avoided making too much change to the task state internal representation mainly due to compatibility issue with the GUI representation. Now that the old GUI is gone, we should be in a much better position to work on this...

For held, the current internal representation is basically (status: str, hold_swap: str) so it can look like ("held", "waiting") (which turns back to ("waiting", None) on release). It would make more sense to change it to (status: str, is_held: bool) - so we can get rid of the complex status swap logic.

You are right about retry and submission retry needing more discussion. To me, they are basically ("waiting", is_held=True) status - the task is held by the next (submission) retry delay - and will be automatically released on completing the delay.

I also can't remember if retry and submission retry can be used as task outputs or not.

matthewrmshin on 16 Jul 2019

To me, they are basically ("waiting", is_held=True)

Presumably that's ("waiting", is_[sub_]retry=True)

I also can't remember if retry and submission retry can be used as task outputs or not.

They can't as far as I'm aware so we are safe there.

oliver-sanders on 16 Jul 2019

No, I did mean ("waiting", is_held=True) - the task is being held by a retry delay. The alternate view is simply a ("waiting", None) status - but now it has a new prerequisite in the form of a retry delay.

(I am sure there are many ways to look at this problem. :smile:)

matthewrmshin on 16 Jul 2019

No, I did mean ("waiting", is_held=True)

I kinda get what you mean but a held state won't make sense to the user, might make some sense as an xtrigger though.

This kinda comes down to data representation / UI so I'll leak some cylc/cylc-ui stuff here. How do we represent retrys? Here are four options off the top of my head, feel free to suggest others, I'm happy to mock them up:

retry

#1 - custom icon for each retry state.
- + clear separation of retry and task state.
- - more icons => more confusion.
#2 - discrete retry symbol
- + clearer separation of retry and task state
- - may be interesting to graphically represent a held retrying task (which is, of course possible)
#3 - discrete held symbol
- + one less state to worry about
- + communicates what cylc is actually doing
- - the user diddn't actually hold the task and will be confused as to why it is held
- - held retrying tasks...
#4 - do nothing
- + simple!
- - confusing!
#5 - clock-face counting up to next retry time?
- + gives user access to information, otherwise hard to find
- - information not available to GUI yet
- - non-intuative UI

oliver-sanders on 16 Jul 2019

The (submission) retry state is only applicable while the task is waiting for the clock. Once submitted, the multiple job icons should make it obvious that the task has been retried or re-triggered. Perhaps the job icons should display whether it is an automatic retry or a manual re-trigger? E.g. nothing for automatic retry and an M in the job icon for a manual re-trigger?

matthewrmshin on 16 Jul 2019

👍1

E.g. nothing for automatic retry and an M in the job icon for a manual re-trigger?

A little :hand: badge for manual?

Note we also discussed in Exeter modifying edge style in the graph view (I think??), to indicate manual intervention (e.g. task was manually triggered despite prerequisites not being satisfied).

hjoliver on 17 Jul 2019

Perhaps the job icons should display whether it is an automatic retry or a manual re-trigger?

I think this would be good, something else we have been asked for is if we could display the retry number e.g:

1/∞  # infinite potential retries e.g. PT5M
3/4  # finite retries e.g. PT5M, 3*PT10M

oliver-sanders on 17 Jul 2019

👍2

TomekTrzeciak on 18 Jul 2019

👍3

@dwsutherland asked for thoughts in a comment that I will cross-post to leave as a question for those following this Issue (it doesn't strictly relate to this Issue, but I was looking for a suitable enough one on cylc-flow to re-raise it in with those who know more about the plans for the task/job data side to comment than I):

In the job pool (store of job data elements); the creation of an element happens just before job submission, so I added the "ready" state to them..

I guess this relates to TASK_STATUS_READY in the following (correct me if I am wrong, David, thanks)?

https://github.com/cylc/cylc-flow/blob/aae31121d173f0685c4fcf46a9815432d050dbf2/cylc/flow/job_pool.py#L33-L45

sadielbartholomew on 19 Jul 2019

@sadielbartholomew - Correct, jobs are usually submitted soon after creation, but there is a space between job file creation (where/when I create the data element alongside).. So ready made sense.

dwsutherland on 19 Jul 2019

👍1

Not sure I understand the "ready" state discussion above. The "ready" state means "ready to run" ... i.e. prerequisites satisfied and queued to the subprocess pool for job submission. If the subprocess pool is small and/or you have a bunch of long-running processes executing in it (e.g. slow event handlers) then tasks can stay in the "ready" state for a while. The moment of job file creation doesn't really have task state implications.

hjoliver on 24 Jul 2019

👍1

(The ready state was called the submitting state in the distant past.)

matthewrmshin on 24 Jul 2019

Not sure I understand the "ready" state discussion above. The "ready" state means "ready to run" ... i.e. prerequisites satisfied and queued to the subprocess pool for job submission. If the subprocess pool is small and/or you have a bunch of long-running processes executing in it (e.g. slow event handlers) then tasks can stay in the "ready" state for a while. The moment of job file creation doesn't really have task state implications.

Jobs have states too... Job file creation has job state implications "ready to submit"..

dwsutherland on 24 Jul 2019

And we can even complicate matters by adding the (future) trigger-edit workflow to the mix:

Put task on hold.
Write job file.
Return job file to client.
(Client edits job file content.)
Client uploads edited job file.
Verify uploaded job file.
Release task.
Submit job.

(What's the status at the various stages?)

matthewrmshin on 24 Jul 2019

At the moment, the job data element is:

Created with the ready state on job file write.
Deleted on backout (entire job element).
State changed by the same mechanism that changes the task state (for active states).

So job states are a subset of task states, although ready means something slightly different I suppose.
(not saying this is how it should be of course)

dwsutherland on 25 Jul 2019

Jobs have states too... Job file creation has job state implications "ready to submit"..

Hmmm. Not necessarily. I would have thought that a job does not exist until the moment it is submitted (and job file creation is something that the task does before that).

hjoliver on 25 Jul 2019

I'm really talking about task and job states that users need to be aware of. Which doesn't necessarily mean we don't need job-related stuff in the back end beyond those states. But I don't think we should refer to those as "job states" ... in the interest of avoiding confusion.

hjoliver on 25 Jul 2019

Options for dealing with the "retry" state.

An attribute of the TaskState called is_retry (similar to is_held).
Attempt to meld the retry state into the is_held logic.
Before a retry place a wallclock xtrigger dependency on the task (which will appear in the graph).

oliver-sanders on 15 Aug 2019

I like wallclock xtrigger idea. In that case, if a task fails and has a retry delay lined up can we just do this:

add the appropriate wallclock xtrigger
return the task to the "waiting" state

So there's really no need for a special retry attribute or use of the "held" state (the trouble with held is, it would need to be a self-releasing hold, which is weird).

We could use a special variant of the wallclock xtrigger, that takes an absolute time instead of a cycle point offset, then we could easily tell the difference (for display purposes) between a normal clock trigger and a retry one.

hjoliver on 15 Aug 2019

@matthewrmshin -

(The ready state was called the submitting state in the distant past.)

Ha, I'm suggesting going back to that https://github.com/cylc/cylc-admin/pull/47

hjoliver on 15 Aug 2019

On "retry" again: if we just use waiting state plus clock trigger, the new job status icons will show definitively that the task is going to retry (you'll see the previous failed job, but the task state is waiting, not failed). Nice :+1:

hjoliver on 15 Aug 2019

@matthewrmshin -

(The ready state was called the submitting state in the distant past.)

Ha, I'm suggesting going back to that cylc/cylc-admin#47

What if the task state is ready or queued, wouldn't you think it's misleading to have a job state submitted? To me submitted implies the handing over of a script/job to the batch system ..

dwsutherland on 15 Aug 2019

@dwsutherland - submitting not submitted

The idea is that once a task's prerequisites are satisfied, we go through the process of submitting it (which may take some time), after which it is indeed submitted (to the batch system).

hjoliver on 15 Aug 2019

(I think the original change of terminology from "submitting" to "ready" was because technically we are submitting the job only when running the qsub process (e.g.) which happens at the end of the "ready" state. But that is probably just splitting hairs as far as users are concerned.)

hjoliver on 15 Aug 2019

Still, do you want job state submitting while a task is queued?

dwsutherland on 15 Aug 2019

?? I don't follow you. Submitting (aka ready) and queued are two different task states.

hjoliver on 15 Aug 2019

Oh, sorry, you said job state, not task state.

hjoliver on 15 Aug 2019

There is no job state until the task is submitted.

hjoliver on 15 Aug 2019

There is no job state until the task is submitted.

So you think the respective data element created before should have an empty state field?

dwsutherland on 15 Aug 2019

I'm just talking about the official set of task and job status names that will be exposed to users, and what they mean, exactly. Presumably you already have null job states alongside other task states like "waiting", or is your question really about when the job "data element" should be created? (If the latter, then I guess it should be created when the task achieves the "submitted" state).

hjoliver on 15 Aug 2019

The requested fields have either been implemented or superseded so closing this issue.

oliver-sanders on 15 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Parallel cylc trigger edit problem

dpmatthews · 3Comments

Traceback with cylc play and Cylc 7 workflows

oliver-sanders · 3Comments

cylc review: Server 404 error when trying Display Options for suites contain special characters

kinow · 3Comments

make main loop more asynchronous

oliver-sanders · 4Comments

03-clock-triggered-non-utc-mode.t failing in NZ time

kinow · 4Comments