Cylc-flow: cli: play|pause|stop

Created on 27 Oct 2020  路  35Comments  路  Source: cylc/cylc-flow

Rename the existing workflow lifecycle commands to match the rose suite-run migration proposal and implement the other CLI changes.

I.E. do what it says in the proposal under the following sections:

  • cylc play

    • With the exception of the --re-run option - #3898

  • cylc pause
  • cylc release

All 35 comments

Not sure why this didn't occur to us before, but cylc resume makes more sense than release, as a counterpart to cylc pause.

  • cylc hold and cylc release for tasks
  • cylc pause and cylc resume for the flow

Update: the proposal is now a little out of date here is the current proposal for cylc play:

  • Rename cylc run => cylc play
  • Remove the START_POINT optional positional argument.
  • Start the flow in "restart" mode if the database is present & delete the cylc restart command.
  • Unpause the flow if already running and exit(0).

There are two loose ends to tie up:

1) The proposal suggests erroring if a flow is started that has already run to completion.

It is tricky to determine whether a flow has run to completion or not. The simplest solution is just to restart the flow anyway, if it has already run to completion it will just shut down.

2) --ignore-*-cycle-point

`cylc run` has these four options:

```
--initial-cycle-point
--final-cycle-point
--start-cycle-point
--stop-cycle-point
```

And `cylc restart` has these corresponding options:

```
--ignore-initial-cycle-point
--ignore-final-cycle-point
--ignore-start-cycle-point
--ignore-stop-cycle-point
```

Suggest removing the `--ignore*` opts and instead using `--*-cycle-point=ignore` (this is safe as it will fail for a cold-start as it is an invalid cycle point). Setting the `--final-cycle-point` on a restart should implicitly ignore the old one.

Would also suggest rationalising the many aliases for the initial and final cycle point:

```
--initial-cycle-point, --icp
--final-cycle-point, --fcp
--start-cycle-point, --stopcp
--stop-cycle-point, --startcp
```

Do we want to keep an alias run -> play or restart -> play?

Nope, but we should add a "dead end" (see cylc.flow.scripts.cylc)

What happens to warm/cold starts? Am I right in thinking the concept of warm start disappears?

Also, should the alias start -> run be updated to start -> play or removed?

The desire is to get rid of warm starts, however, I'm not sure that we can do that right now.

I think we should get rid of the aliases.

Just to note down some further info Oliver sent me:

The hold/release commands do two things, they can hold/release a flow OR tasks within a flow. These seem similar but are actually very different concepts. hold/release a flow is setting the state of the task pool. hold/release a task is adding/removing xtriggers to tasks (currently implemented a little differently).

The idea is to pull the flow and task features apart.

  • play/pause (flow state)
  • hold/release (task state)

What's the difference between

  • cylc run --warm SUITE [START_CYCLE_POINT] and
  • cylc run SUITE --startcp=START_CYCLE_POINT?

From looking at the code & docs, they seem to be identical, in which case the --warm option can be dropped?

Yep. In fact the command help says --startcp "implies --warm". The term "warm start" comes from the modeling terminology for warm starting a cycling model, of course. But in terms of Cylc and the graph, "start cycle point" probably is a better term - more generic, and descriptive of what Cylc is actually doing. (Note also spawn-on-demand will make warm start obsolete, except perhaps as a convenience where multiple tasks have to be triggered at once to start the flow).

It's a much nicer interface, should we drop the --warm option and move the warm documentation onto the --startcp argument?

Ah ok, so the concept of warm start stays but the cli option goes

Would also suggest rationalising the many aliases for the initial and final cycle point

Does that mean I should get rid of:

  • --ict (icp)
  • --initial-point (icp)
  • --until (fcp)
  • --final-point (fcp)
  • --start-point (startcp)
  • --stop-point (stopcp)

(Although from the changelog, --until was only just added in 8.0a1!)

--until goes back a while, the changelog says:

--until=POINT option is now an alias for --final-cycle-point=POINT option.

Should it be possible to hold tasks after a certain point, like it is possible to hold/pause a workflow after a certain point?

Also, what are some good concrete example(s) of cylc hold to put in the docstring, rather than cylc hold REG TASK_GLOBS ... which is identical to the ARGS?

Should it be possible to hold tasks after a certain point

This is the --after option?

Yep we want to keep that although it should be an argument of cylc pause rather than cylc hold because it is fiddling with task_pool state rather than that of individual tasks.

I've just noticed that the documentation for cylc hold and the hold mutation do not match. It is definitely a cycle point not a wallclock time (implemented in cylc.flow.scheduler.Scheduler.hold_suite).

Also, what are some good concrete example(s) of cylc hold to put in the docstring

How about this:

# hold mytask at cycle 1234 in myflow
$ cylc hold myflow mytask.1234

# hold all tasks in cycle 1234 in myflow
$ cylc hold myflow '*.1234'

# hold all active instances of mytask in myflow
$ cylc hold myflow 'mytask.*'

If a workflow has some held tasks (but not all, i.e. it is not paused), should running cylc play release the held tasks or do nothing?

Do nothing, cylc play shouldn't go messing with task states.

For cylc play, what should happen to --hold and --hold-point/--hold-after? Should they be replaced by --pause and --pause-point/--pause-after?

Also, in line with the other aliases, should --hold-point/--pause-point change to --hold-cycle-point/--pause-cycle-point and --holdcp/--pausecp?

If a workflow has some held tasks (but not all, i.e. it is not paused), should running cylc play release the held tasks or do nothing?

Also, if a workflow is paused, what should cylc release task.n do? Should it go ahead and release the task(s)? If so, wouldn't that imply the task pool is not held anymore, therefore the workflow is no longer paused?

if a workflow is paused, what should cylc release task.n do? Should it go ahead and release the task(s)?

It should release the task which shouldn't run because the scheduler is "held".

So the behaviour for the logical combinations of play/pause and hold/release would be:

$ cylc install x

# start a Scheduler pre-paused
# (a.1 won't run because Scheduler paused)
$ cylc play --pause x

# hold task a.1
# (a.1 won't run because Scheduler paused and task held)
$ cylc hold x a.1

# un-pause the Scheduler
# (a.1 won't run because task held)
$ cylc play x

#聽release task a.1
# (a.1 will run)
$ cylc release x a.1

However, as you've noticed, that's not how things work at the moment. For now it's a good start just to separate the interfaces, we can fix the internals later, however, if you fancy taking a deeper dive it should be fairly straight forward to change the behaviour of the task_pool hold mechanism.

The tasks are actually submitted from somewhere in the Scheduler, that's the bit you will need to skip if Scheduler.pool.held is True. You may need to do something to get the workflow state set correctly, not sure how that gets done nowadays.

Another thing: I suppose flow.cylc[scheduling]hold after cycle point will have to be deprecated in favour of pause after cycle point

However, as you've noticed, that's not how things work at the moment.

Yeah, it's all done by holding tasks at the moment, needs to be fixed.

Question about final cycle point: say you have the workflow

[scheduling]
    cycling mode = integer
    final cycle point = 5
    [[graph]]
        P1 = foo

Then you run:

  1. shell $ cylc run temp3 -n --fcp=2 INFO - Cold Start 1 ... INFO - Initial point: 1 INFO - Final point: 2 ... INFO - [foo.1] -triggered off [] INFO - [foo.2] -triggered off [] ... INFO - Suite shutting down - AUTOMATIC
  1. shell $ cylc restart temp3 -n --ignore-final-cycle-point INFO - LOADING suite parameters ... INFO - Initial point: 1 INFO - Final point: 5 INFO - Suite shutting down - AUTOMATIC INFO - DONE

In (2), nothing runs. Is this meant to happen?

Tested on master, d6a09baa93d386a41a7e781d9f372e1347f4d52a


P.S., in (2), in the snipped bit (...), I get this:

INFO - LOADING task action timers
INFO - + foo.1 poll_timer
WARNING - foo.1: task not found, skip
INFO - + foo.1 ['try_timers', 'submission-retry']
WARNING - foo.1: task not found, skip
INFO - + foo.1 ['try_timers', 'execution-retry']
WARNING - foo.1: task not found, skip
INFO - + foo.2 poll_timer
WARNING - foo.2: task not found, skip
INFO - + foo.2 ['try_timers', 'submission-retry']
WARNING - foo.2: task not found, skip
INFO - + foo.2 ['try_timers', 'execution-retry']
WARNING - foo.2: task not found, skip

Why so many warnings?

In Cylc7 the workflow would pick up where it left off and continue to the configured FCP (i.e. 5). I think this is an SoD quirk perhaps caused by restarting with an empty task pool?

Think that should go up as a bug.

Ok, as discussed in our meeting, if that's a bug it can be fixed independently of #4040

In Cylc7 the workflow would pick up where it left off and continue to the configured FCP (i.e. 5). I think this is an SoD quirk perhaps caused by restarting with an empty task pool?

Think that should go up as a bug.

I'm not so sure that's a bug! The workflow ran to completion at FCP=2. A restart starts from the previous recorded state, which was already "completed", so nothing will happen.

It makes some sense to restart with an extended FCP if you haven't already run to completion, but not if you have.

Also, what if you'd already run special tasks in the last cycle(s)? The only sensible way to do this kind of a "restart" would be to do warm start part-way through the new (extended) graph.

Ah sorry, miss read, I thought the flow had been stopped before the fcp. After the cylc play work is finished we would expect the "restart" to be rejected since the flow has already run to completion so I don't think this circumstance should be possible going forward?

I see, the real test of whether --fcp=ignore works is:

[scheduling]
    runahead limit = P1
    cycling mode = integer
    final cycle point = 5
    stop after cycle point = 2
    [[graph]]
        P1 = foo

Then

  1. $ cylc play temp3 -n --fcp=3 runs to cycle point 2 and shuts down
  2. $ cylc play temp3 -n --fcp=ignore restarts at cycle point 3 and runs to 5
  1. $ cylc play temp3 -n --fcp=ignore restarts at cycle point 3 and runs to 5

It wouldn't stop at point 5, because you told it to ignore the fcp.

But the ignore means ignore the fcp stored in the DB only, reverting to whatever is in flow.cylc?

From cylc restart --help:

                        Ignore the final cycle point in the suite run
                        database. If one is specified in the suite definition
                        it will be used, however.

if a workflow is paused, what should cylc release task.n do? Should it go ahead and release the task(s)?

It should release the task which shouldn't run because the scheduler is "held".

(full comment here)

Just to seek some further clarity on that - the idea of "cylc pause pauses the workflow by holding all tasks" should become "it pauses the workflow"?

Not sure where the "pauses the workflow by holding all tasks" quote comes from?

We would like cylc pause to pause the workflow but not to pause any of the tasks within it.

Not sure where the "pauses the workflow by holding all tasks" quote comes from?

That was my initial understanding of "pause" based on the pre-existing behaviour

Actually, if pause does not hold tasks, how can we pause after a particular cycle point? (E.g. cylc play --pause-after=POINT, cylc pause --after=POINT, flow.cylc[scheduling]pause after cycle point) Depending on the runahead limit, there would be nothing to stop tasks running beyond the pause point if any one task still hasn't reached it

Was this page helpful?
0 / 5 - 0 ratings