From #2528.
When users reset the state of a submitted or running task to ready without killing the original job first can lead to existence of multiple jobs of the same task. This should also be handled correctly. We should have the suite make an automatic attempt to kill the original job when user resets the state of a submitted or running task.
Somewhat related (although not a submit number issue): careless (but common) use of suicide triggers can result in removing an active task proxy. Currently we just log a warning about this; we should probably kill the active job as well.
See:
https://github.com/cylc/cylc/issues/2528#issuecomment-359076831
https://github.com/cylc/cylc/issues/2528#issuecomment-359081219
https://github.com/cylc/cylc/issues/2528#issuecomment-359081543
See also: #2199 #2394 #2506 #2618
Note that #2600 disallows manual reset to "ready" - although I suppose simply retriggering a running task, or resetting it wo "waiting" will have exactly the same effect!
(It's arguable that this is a bug IMO, although I agree that attempting to kill the original job is preferable anyway).
To further improve the issue reported in #2528, the logic for job 2 submission should check that job 1 is no longer running. It can then decide to either:
To be safe, I think that if you try to trigger or reset the state of a submitted or running task then, by default, this should fail. We would then need a force mode to override this.
Perhaps a warning prompt/message could be issued/logged (just the GUI? interactive CLI?) on task reset/trigger before kill of found running/submitted job(s).
This could be achieved via an optional request argument (Default; 'cancel_job=True' (kills existing running/submitted job)) when set to False will include a warning message in the response and not kill...
What to do on failure to kill job 1?
If job 1 is stuck and you continue with job 2, then messages from job 1 could still be received by the suite when unstuck/manual-kill (unless this behavior has been changed since 7.5.0)
In 7.7+ messages from old jobs are ignored.
The only problem left is that job 1 may continue to occupy the same computing resource that job 2 will require - causing job 2 to fail eventually.
The only problem left is that job 1 may continue to occupy the same computing resource that job 2 will require - causing job 2 to fail eventually.
True, but not really a Cylc issue.. And given the reset/re-trigger is done manually, the user will have to be confident in their batch system to handle resource contention. I guess the only responsibility of Cylc's is to notify the user of the already running/submitted job (hence the warning prompt)..
Also closed by #3515