Cylc-flow: Parallel cylc trigger edit problem

Created on 21 Apr 2020  路  3Comments  路  Source: cylc/cylc-flow

Cylc can get in a mess if you do 2 trigger edits in parallel.
This is repeatable using the following trivial workflow:

[scheduling]
    [[dependencies]]
        graph = root
[runtime]
    [[hello]]
        script = "sleep 20; exit 1"
        [[[remote]]]
            host = cylcdev
  1. Run the workflow & let task hello fail.
  2. Do a Trigger (edit run) on task hello - save the file but leave the "Trigger edited task hello.1" prompt alone.
  3. Do another Trigger (edit run) on task hello - save the file and say yes to the "Trigger edited task hello.1" prompt.
  4. Once the task is running say no to the original "Trigger edited task hello.1" prompt.

This results in task hello stuck in the running state.
Relevant entries from the suite log:

2020-04-21T12:35:32+01:00 INFO - Command succeeded: dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:32+01:00 INFO - Processing 1 queued command(s)
    +   dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:45+01:00 INFO - Command succeeded: dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:45+01:00 INFO - Processing 1 queued command(s)
    +   dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:51+01:00 INFO - Command succeeded: trigger_tasks(['hello.1'], back_out=False)
2020-04-21T12:35:51+01:00 INFO - Processing 1 queued command(s)
    +   trigger_tasks(['hello.1'], back_out=False)
2020-04-21T12:35:51+01:00 INFO - [hello.1] -submit-num=03, owner@host=cylcdev
2020-04-21T12:35:53+01:00 INFO - [hello.1] status=ready: (internal)submitted at 2020-04-21T12:35:52+01:00 for job(03)
2020-04-21T12:35:54+01:00 INFO - [hello.1] status=submitted: (received)started at 2020-04-21T12:35:53+01:00 for job(03)
2020-04-21T12:35:58+01:00 INFO - Command succeeded: trigger_tasks(['hello.1'], back_out=True)
2020-04-21T12:35:58+01:00 INFO - Processing 1 queued command(s)
    +   trigger_tasks(['hello.1'], back_out=True)
2020-04-21T12:36:15+01:00 WARNING - [hello.1] status=running: (received-ignored)failed/EXIT at 2020-04-21T12:36:13+01:00 for job(03) != current job(02)

The back-out of the first trigger edit results in cylc thinking the running job is submit number 2 rather than 3.

Note that you have to be running the task on a remote host so that there are no log files being written locally - otherwise the cylc trigger can't remove the log directory and the submit number doesn't get changed.

The back_out functionality was introduced in #2461.
Tested with cylc 7.8.4.

bug wontfix

Most helpful comment

Trigger edit is going to have to be completely re-written for Cylc8 anyway, I would suggest we leave this as a documented "known bug" for Cylc7.

All 3 comments

Could solve by sending the "current" submission number with all trigger requests (would also prevent accidental re-triggering by out-of-date client).

Trigger edit is going to have to be completely re-written for Cylc8 anyway, I would suggest we leave this as a documented "known bug" for Cylc7.

Note: This problem has not been fixed at Cylc 7, however, is no longer present at Cylc 8

Was this page helpful?
0 / 5 - 0 ratings

Related issues

oliver-sanders picture oliver-sanders  路  5Comments

kinow picture kinow  路  4Comments

kinow picture kinow  路  4Comments

kinow picture kinow  路  4Comments

sadielbartholomew picture sadielbartholomew  路  4Comments