Cylc can get in a mess if you do 2 trigger edits in parallel.
This is repeatable using the following trivial workflow:
[scheduling]
[[dependencies]]
graph = root
[runtime]
[[hello]]
script = "sleep 20; exit 1"
[[[remote]]]
host = cylcdev
hello fail.hello - save the file but leave the "Trigger edited task hello.1" prompt alone.hello - save the file and say yes to the "Trigger edited task hello.1" prompt.This results in task hello stuck in the running state.
Relevant entries from the suite log:
2020-04-21T12:35:32+01:00 INFO - Command succeeded: dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:32+01:00 INFO - Processing 1 queued command(s)
+ dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:45+01:00 INFO - Command succeeded: dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:45+01:00 INFO - Processing 1 queued command(s)
+ dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:51+01:00 INFO - Command succeeded: trigger_tasks(['hello.1'], back_out=False)
2020-04-21T12:35:51+01:00 INFO - Processing 1 queued command(s)
+ trigger_tasks(['hello.1'], back_out=False)
2020-04-21T12:35:51+01:00 INFO - [hello.1] -submit-num=03, owner@host=cylcdev
2020-04-21T12:35:53+01:00 INFO - [hello.1] status=ready: (internal)submitted at 2020-04-21T12:35:52+01:00 for job(03)
2020-04-21T12:35:54+01:00 INFO - [hello.1] status=submitted: (received)started at 2020-04-21T12:35:53+01:00 for job(03)
2020-04-21T12:35:58+01:00 INFO - Command succeeded: trigger_tasks(['hello.1'], back_out=True)
2020-04-21T12:35:58+01:00 INFO - Processing 1 queued command(s)
+ trigger_tasks(['hello.1'], back_out=True)
2020-04-21T12:36:15+01:00 WARNING - [hello.1] status=running: (received-ignored)failed/EXIT at 2020-04-21T12:36:13+01:00 for job(03) != current job(02)
The back-out of the first trigger edit results in cylc thinking the running job is submit number 2 rather than 3.
Note that you have to be running the task on a remote host so that there are no log files being written locally - otherwise the cylc trigger can't remove the log directory and the submit number doesn't get changed.
The back_out functionality was introduced in #2461.
Tested with cylc 7.8.4.
Could solve by sending the "current" submission number with all trigger requests (would also prevent accidental re-triggering by out-of-date client).
Trigger edit is going to have to be completely re-written for Cylc8 anyway, I would suggest we leave this as a documented "known bug" for Cylc7.
Note: This problem has not been fixed at Cylc 7, however, is no longer present at Cylc 8
Most helpful comment
Trigger edit is going to have to be completely re-written for Cylc8 anyway, I would suggest we leave this as a documented "known bug" for Cylc7.