Cylc-flow: runahead limit / max active cycle points

Created on 30 Jun 2020  路  20Comments  路  Source: cylc/cylc-flow

After discussion on Riot:

We should consider re-aligning max active cycle points to match the SoD implementation (and the configuration name).

Context:

The runahead limit and max active cycle points configurations both limit the span between the oldest and youngest active cycle points.

From the docs for max active cycle points:

It allows up to N (default 3) consecutive cycle points to be active at any time, adjusted up if necessary for any future triggering.

The key word here is consecutive.

So runahead limit and max active cycle points actually both do the same thing, just that one uses integer semantics and the other, date-time.

Issues:

  1. The consecutive aspect of max active cycle points doesn't gel well with SoD. [HO: triggering a reflow in past cycles will completely stop new cycles, depending on the limit; same goes for SoS really, if you insert-and-trigger an old task]
  2. The name max active cycle points is misleading since it's nothing to do with the number of active cycle points.

Questions:

  1. Should we combine the current max active cycle points configuration in with runahead limit?

    • runahead limit = P1D

    • runahead limit = P3 (equivalent to [current consecutive] max active cycle points = 3)

    • This would prevent confusing dual-specification of two orthogonal runahead limits. yes

  2. Should we add a non-consecutive limit which lives up to the name max active cycle points? yes

    • Is there a good use case for this?

Most helpful comment

active cycle point limit?

All 20 comments

Questions

  1. yes, good idea
  2. yes, if we can't think of a good reason not to (I will see if I can find out why we changed to the consecutive version in the first place)

The use case for non-consecutive max active cycle points is pretty general, it seems to me: we should be aiming for maximum concurrency within prescribed limits; and why would we stop two tasks from running concurrently when they are (say) 3 cycle points apart but allow them to do it if (say) 2 cycle points apart?

If we allow non-consecutive max active cycle points, should we sort active and waiting(runahead) tasks into cycle point order and prioritize older points? (Older would tend to allow a reflow to get done first, and if you trigger a reflow in the first place that is probably what you want?)

should we sort active and waiting(runahead) tasks into cycle point order and prioritize older points

I would have thought so.

There's also the question of defaults, personally I would prefer runahead limit and max active cycle points to default to None. Unfortunately that might not be advisable due to graphs like this:

c[-P1] => a
@clock => a
a & b => c => d & e

If both runahead limit and max active cycle points were set to None then b would attempt run in every cycle simultaneously? To me this makes sense, is easier to explain and is a simple case of "you got what you asked for", however, for historical reasons, the change might be dangerous at this stage.

Perhaps safe defaults might be:

  • runahead limit = P5
  • max active cycle points = None

Agreed in video call, for now lets go with the following safe defaults and perhaps reconsider later:

  • runahead limit = P5
  • max active cycle points = None

We should document this "safety feature" and make it clear what the alternatives are.

Post a new issue to consider removing the default limit. (cons? xtriggers on fast spawn-ahead tasks?)

What should happen to max active cycle points in the meantime while the behaviour is moved to runahead limit? Should it be left in the config spec with a todo note?

@MetRonnie I don't think we decided to remove max active cycle points which seems to be what your question is assuming?

We just decided to distinguish it from runahead limit so they become (as originally intended) two different ways of limiting the runahead, instead of a single way with two different names.

The idea is you would use one or the other, in a workflow. Setting to None (if I understand our discussion above, in retropect) equates to "no limiting".

I guess we haven't decided what to do if a user sets both limits at the same time. Probably just fail validation.

I think Ronnie is asking what we should do with the max active cycle points configuration item until we've got a nonconsecutive implementation (since the current functionality will be configured by runahead limit).

I think it's fine to remove/comment it for now pending re-implementation but it doesn't really matter much until 8.0.0.

Ah, OK my bad :confounded: Also think we could just leave it as-is until the proper max active is implemented (which is probably much easier than runahead limit since you can simply count active cycles).

There is currently a backwards compatibility in runahead limit where if you set it to a number n, it takes that to mean PTnH. I'm guessing we should remove that and not allow numbers anymore? We could take the number to mean Pn instead, but then that's a bit of a subtle change that could confuse users?

There is currently a backwards compatibility in runahead limit where if you set it to a number n, it takes that to mean PTnH

Wow, the more you look into things, sounds like Cylc5 legacy to me 馃あ.

Looks like this may be undocumented, however, will have to check if it is used 馃槧. Might be worth checking that this functionality still works before worrying about it.

It should be pretty easy to continue to support this:

  • ^P[\d]+$ - integer cycles
  • ^P.*[TYMDW-].*$ - time interval
  • ^\d+$ - time interval (translate to PT<x>H)

We would document it as deprecated, however, if possible, yes, it would be best to remove this support.

I've just gone through MO suites and found ~a few~ a lot of cases of this old integer format, as it's such a simple translation I think we should probably continue to support it but document it as deprecated and log a deprecation warning when used.

[edit] I found a total of 145 suites using this format (without having to go to source repositories), quite a few were setting runahead limit = 0 which is curious, most of the remainder were set to either 1, 24 or 48.

I'm not sure if I've understood this correctly.

The use case for non-consecutive max active cycle points is pretty general, it seems to me: we should be aiming for maximum concurrency within prescribed limits; and why would we stop two tasks from running concurrently when they are (say) 3 cycle points apart but allow them to do it if (say) 2 cycle points apart?

Say you have the workflow

[scheduling]
    cycling mode = integer
    max active cycle points = 2
    [[graph]]
        P3 = foo

Are foo.1 and foo.4, which are 3 cycle points apart, not consecutive? Running the workflow will spawn foo.1 and foo.4, wait for them to finish, then spawn foo.7 and foo.10, etc.

Are foo.1 and foo.4, which are 3 cycle points apart, not consecutive?

These would be considered "consecutive" cycles (because there is nothing in-between).

With the renaming of the current (consecutive) max active cycle points to runahead limit we aren't expecting a change in behaviour:

[scheduling]
    cycling mode = gregorian
    # before
    runahead limit = PT1D
    # after
    runahead limit = PT1D
[scheduling]
    cycling mode = <whatever>
    # before
    max active cycle points = 2
    # after
    runahead limit = P2

Though good point, P2 would be very confusing in this example as P2 refers to a duration of two. Perhaps this should be 2 rather than P2?

Though good point, P2 would be very confusing in this example as P2 refers to a duration of two. Perhaps this should be 2 rather than P2?

The trouble is that would break backwards compatibility for being able to specify a raw number of hours. However I'm starting to realise/agree that Pn isn't a good way to specify the limit.

I'm trying to understand the difference between consecutive and non-consecutive implementations of max active cycle points.

[scheduling]
    cycling mode = integer
    max active cycle points = 3
    [[graph]]
        P1 = foo & good => bar
[runtime]
    [[foo]]
        script = if [[ "$CYLC_TASK_JOB" == '1/foo/01' ]]; then false; else true; fi

(In this workflow foo fails on the first cycle point only)

With the current consecutive implementation, this workflow has foo.1 fail leaving bar.1 waiting, but 2 and 3 succeed. But then 4 or higher are not spawned and the workflow stalls.

I take it, with a non-consecutive implementation, after 2 and 3 succeed, 4 and 5 would spawn, then after they succeed 6 and 7 would spawn etc? Is that understanding correct?

I take it, with a non-consecutive implementation, after 2 and 3 succeed, 4 and 5 would spawn, then after they succeed 6 and 7 would spawn etc? Is that understanding correct?

Yep.

https://github.com/cylc/cylc-flow/pull/3848#discussion_r498217790

The new non-consecutive implementation (TBD) needs to have a new name (not max active cycle points) because many existing flows may depend on the current consecutive implementation.

Maybe maximum active cycle points?

active cycle point limit?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kinow picture kinow  路  4Comments

oliver-sanders picture oliver-sanders  路  3Comments

kinow picture kinow  路  4Comments

sadielbartholomew picture sadielbartholomew  路  5Comments

oliver-sanders picture oliver-sanders  路  4Comments