“Hold” currently means: prevent waiting task proxies from submitting their jobs. Post-SoD we can still hold a suite, but holding individual tasks is considerably less feasible because the only waiting tasks that exist now as task proxies are those with multiple parents and partially unsatisfied prerequisites. Further, even those waiting tasks might be hidden in future: proposal and #3822. And besides, users should not have to understand the difference between abstract (graph) and concrete (task proxy in the pool) waiting tasks.
Do we really need the concept of individual task hold?
If we do, options include:
I think having the ability to hold an individual task makes sense and is useful.
If it's difficult to hold tasks before they exist in the pool we could, potentially, consider an xtrigger implementation similar to the proposed retry approach.
Yeah, or a broadcast style implementation?
n=1, would make it easier to select the non-existent also.
n=1, would make it easier to select the non-existent also.
Selection is easy, since we plan to allow n=1,2,3,... up to "nmax" which is "all tasks in the cycle point".
The question is just how to record what's been held when the corresponding task proxies don't exist yet in the scheduler. (c.f. SoS where you could only hold tasks that already have task proxies in the pool).
Note that holding only matters once a task proxy exists, so we can still "action" the hold in the old way (essentially, set a held flag in the task proxy, which tells it not to submit its job even if prereqs all satisfied).
Regarding xtrigger- or broadcast-style implementations - what's wrong with this simple approach?:
Regarding xtrigger- or broadcast-style implementations - what's wrong with this simple approach?:
Would be fine, though because the value is not stored on a per-task basis it would introduce a lot of lookups.
That list will need to go the to DB and we will need to re-work the is_held functionality as you will have to ask the task_pool whether a task is held rather than the task_proxy itself.
(BTW I don't see this as a super-high priority because suite-hold still works fine post-SoD and is OK for now as a proxy for task-hold - after all, suite-hold doesn't affect tasks that are already active, and you can still trigger individual tasks if you want to when the suite is held ... so 8.0.0 milestone will do here).
Coming back to this I'm starting to lean towards making "is_held" an XTrigger.
If we go down the XTrigger route with the retry states (#3423) and the queued state then it would make some sense to do the same with the old held state too. After all "held" is a trigger and is very much external in nature.
Unrelated note which is it:
Good idea :+1:
Unrelated note which is it:
I prefer (and have always used) "xtrigger". It's just a name that means "external trigger". IMO use of capital "X" makes it look like an acronym (e.g. short for "Xylophone Trigger") or elevates its importance unnecessarily (we don't capitalise every other concept/feature of Cylc). And xTrigger? - I hate camel case.
Looks like we'll go for option 2 in the description
- allow users to hold abstract tasks, which will require the scheduler to check a list of tasks-to-hold at spawn time
From @oliver-sanders on this PR:
n=1 matching + a new list in the task_pool caching hold requests for proxies that aren't in the pool yet backed up by a new DB table should do the trick
To quickly summarise:
This new table (tasks_to_hold?) should work in a similar way to the task pool table.
I see @hjoliver is assigned, shall I replace you?
Yes please! I'll switch the assignment over now...
To note down what was discussed today and add a bit more:
Globs (e.g. cylc hold <flow> '*', cylc hold <flow> 'foo.*', cylc hold <flow> '*.5') are a problem because either you can just store the glob in the tasks_to_hold set (but then releasing tasks becomes a problem) or you can try to expand the glob (but there isn't really a way to do this for tasks outside the n = 0 window).
However, it should be possible to handle globs and normal task identifiers separately, so normal task identifiers get put in the tasks_to_hold set while globs only get expanded in the n = 0 window.
(For the scenario of trying to do cylc hold <flow> '*.5' outside the n = 0 window, there's always cylc hold <flow> --after=5)
Agreed.
Plus, it would be hard to determine when the glob is "finished".
However, this does make globs pretty much useless in this context (n=0 tasks are already active!)
Another thing to note: cylc hold <flow> foo is the same as cylc hold <flow> 'foo.*'.
I guess we should document the fact that holding active tasks is unlikely to do anything as they have probably already submitted their jobs.
I wasn't aware of this behaviour, it doesn't appear to be documented and I don't think it's helpful so happy to drop.
The documented example is:
cylc hold my_flow 'mytask.*'
Which makes the glob explicit and is much clearer.
It isn't specific to hold, it's in the TaskPool.filter_task_proxies() method
https://github.com/cylc/cylc-flow/blob/f5802012d1bdd03af52afae205fed2a1b19f8597/cylc/flow/task_pool.py#L1375-L1377
Another issue is the status part of cylc hold <flow> 'name[.point][:status]'. Are we really going to store the name, point and status in the tasks_to_hold set? Is it actually necessary to be able to hold based on the status?
Status is really a part of the globbing system as it is used to filter a larger number of matching tasks by status.
When explicitly specifying tasks by name / cycle it serves no purpose so I wouldn't expect it to be included in this task_to_hold set.
I'm beginning to wonder if we should instead not accept globs for cylc hold? If their n-window behaviour is inconsistent with explicitly specifying the task name + point, and if they are unlikely to do anything as the tasks have probably already submitted their jobs..?
Or, we could only accept globs for the task names - this shouldn't be a problem as we know what the full set of task names are regardless of n-window.
There is one valid use case for using globs with cylc hold:
# prevent retries from taking effect
cylc hold <flow> '*:failed' # due to execution issues
cylc hold <flow> '*:submit-failed' # due to submission issues
This is a pretty handy one for operations.
I think:
cylc hold can continue to share globing behaviour with the other task pool operations.:<state>) should be included with globs (i.e. only applied over the n=0 window).Do we want to allow holding future task families e.g. cylc hold myflow FAM.4? (It will be the tasks that are members of the family that get added to the tasks_to_hold set, not the family itself)
Yes, for a consistent CLI and because it's easy to do (i.e. to expand to family members).
Would be good, however, I'm not sure how we would remove these families from the hold list, it would also infer we could do cylc hold myflow root.4 which is effectively globing.
Just store the child tasks in the hold list, not the family, right?
Could end up with some family members that don't get run left in the set forever but there shouldn't be enough of them to cause a problem. Can use recurrence aware family expansion to reduce this.