Cylc-flow: Bug on master, with ':' in cycle point format?

Created on 16 Sep 2019  路  26Comments  路  Source: cylc/cylc-flow

[UPDATE: THIS EXAMPLE SUITE AND TRACEBACK IS WRONG - SEE BELOW]

[cylc]
    cycle point format = %Y-%m-%dT%H:%MZ  # ERROR
    #cycle point format = %Y%m%dT%H%M  # OK
[scheduling]
    initial cycle point = 2020
    [[dependencies]]
        [[[T00]]]
            graph = "foo"

Here's the error:

$ cylc run --no-detach five
            ._.                                                       
            | |                 The Cylc Suite Engine [8.0a0]         
._____._. ._| |_____.           Copyright (C) 2008-2019 NIWA          
| .___| | | | | .___|   & British Crown (Met Office) & Contributors.  
| !___| !_! | | !___.  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____!  This program comes with ABSOLUTELY NO WARRANTY.
      .___! |              It is free software, you are welcome to    
      !_____!             redistribute it under certain conditions;   
                        see `COPYING' in the Cylc source distribution. 

2019-09-16T11:26:14+12:00 WARNING - deprecated graph items were automatically upgraded in 'suite definition':
2019-09-16T11:26:14+12:00 WARNING -  * (8.0.0) [scheduling][dependencies][X][graph] -> [scheduling][graph][X] - for X in:
        T00
2019-09-16T11:26:14+12:00 INFO - Suite server: url=tcp://niwa-1007885.niwa.local:43082/ pid=13249
2019-09-16T11:26:14+12:00 INFO - Run: (re)start=0 log=1
2019-09-16T11:26:14+12:00 INFO - Cylc version: 8.0a0
2019-09-16T11:26:14+12:00 INFO - Run mode: live
2019-09-16T11:26:14+12:00 INFO - Initial point: 2020-01-01T00:00Z
2019-09-16T11:26:14+12:00 INFO - Final point: None
2019-09-16T11:26:14+12:00 INFO - Cold Start 2020-01-01T00:00Z
2019-09-16T11:26:14+12:00 ERROR - R/2020-01-01T12:00Z/P1D, point format %Y-%m-%dT%H:%MZ: equal adjacent points: 2020-01-01T12:00Z => 2020-01-01T12:00Z.
        Traceback (most recent call last):
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/scheduler.py", line 253, in start
            self.run()
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/scheduler.py", line 1523, in run
            if self.pool.release_runahead_tasks():
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/task_pool.py", line 284, in release_runahead_tasks
            point = sequence.get_next_point(point)
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/cycling/iso8601.py", line 544, in get_next_point
            self._check_and_cache_next_point(point, next_point)
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/cycling/iso8601.py", line 554, in _check_and_cache_next_point
            next_point, point
        cylc.flow.exceptions.SequenceDegenerateError: R/2020-01-01T12:00Z/P1D, point format %Y-%m-%dT%H:%MZ: equal adjacent points: 2020-01-01T12:00Z => 2020-01-01T12:00Z.
2019-09-16T11:26:14+12:00 CRITICAL - Suite shutting down - R/2020-01-01T12:00Z/P1D, point format %Y-%m-%dT%H:%MZ: equal adjacent points: 2020-01-01T12:00Z => 2020-01-01T12:00Z.
2019-09-16T11:26:15+12:00 INFO - DONE
SequenceDegenerateError: R/2020-01-01T12:00Z/P1D, point format %Y-%m-%dT%H:%MZ: equal adjacent points: 2020-01-01T12:00Z => 2020-01-01T12:00Z.

@dwsutherland the following does not set sub_num if submit_num is None:
https://github.com/cylc/cylc-flow/blob/master/cylc/flow/job_pool.py#L186-L188

The fact that this only seems to be triggered by a (valid) cycle point format with a : between hours and minutes suggests another error somewhere in splitting an ID perhaps?

bug

Most helpful comment

You should be able to do this:
$ cylc trigger baz foo.201909*T00:00Z<delim>failed
and it target those failed tasks, but this:
$ cylc trigger baz foo.201909*T00:00Z<delim>asdfg
or
$ cylc trigger baz foo.201909*T0000Z<delim>asdfg
and it target no tasks (not all), also this:
$ cylc trigger baz foo.201909*T00:00Z
and it target all foo in the month.

Hence can't use : for state delimiter, and it also be in datetime.

All 26 comments

The ID containing date is parsed for state separated by :
https://github.com/cylc/cylc-flow/blob/5d4db47dfe437397a38bdecfe715110f1bdccadf/cylc/flow/job_pool.py#L170-L173

Also in the GraphQL schema:
https://github.com/cylc/cylc-flow/blob/5d4db47dfe437397a38bdecfe715110f1bdccadf/cylc/flow/network/schema.py#L82-L85
https://github.com/cylc/cylc-flow/blob/5d4db47dfe437397a38bdecfe715110f1bdccadf/cylc/flow/network/schema.py#L118-L121

Same issue with the old/original one I think:
https://github.com/cylc/cylc-flow/blob/5d4db47dfe437397a38bdecfe715110f1bdccadf/cylc/flow/task_pool.py#L1345-L1348

But maybe only used for external input? so would still be a bug for CLI

We may have to chose a different separator for the CLI version (in addition to GraphQL API related)... as name.point:state wouldn't work with that format also

Simplest problem first, this code https://github.com/cylc/cylc-flow/blob/master/cylc/flow/job_pool.py#L186-L188 has special handling to set sub_num for the "not None" case, but does not set a value at all for the "None" case - hence the traceback. So that could be fixed regardless of solving the separator issue. Or if it should never be possible to get None in this function (maybe not, as every job has a submit number?) then the code should reflect that (e.g. don't need the if not None test).

Reproduced here as well.

My opinion is that we should avoid : in cycle point, because the cycle point is part of the path to ~/cylc-run/SUITE/log/job/CYCLE/TASK/SUBMIT_NUM/job, etc. With : in CYCLE, it can affect interpretation of variables such as PATH, which uses : as separator (in Unix-like OSes).

@matthewrmshin - you're right about PATH, but a) YYYY-MM-DDThh:mm is a valid ISO date-time format that we accept; b) it works as far as cylc is concerned - except now on master; and c) I've been known to use it myself if only for demo purposes because it is a more readable and recognizable date-time format if you need to go all the way to minutes.

That said, I'm not entirely averse to banning it, but I suppose we had better be reasonably sure that no one is already using it in real suites.

I agree. Banning it would be a shame, as it is clear and legitimate ISO8601 syntax. In which case, we should definitely fix this, and perhaps open a follow-on issue to improve the system for the Unix PATH problem.

Simplest problem first, this code https://github.com/cylc/cylc-flow/blob/master/cylc/flow/job_pool.py#L186-L188 has special handling to set sub_num for the "not None" case, but does not set a value at all for the "None" case - hence the traceback. So that could be fixed regardless of solving the separator issue. Or if it should never be possible to get None in this function (maybe not, as every job has a submit number?) then the code should reflect that (e.g. don't need the if not None test).

@hjoliver - Please read the above comment again, and consider what happens when passing:

2020-01-01T00:00Z/foo/01

into

 if ":" in item: 
     head, state_str = item.rsplit(":", 1) 
 else: 
     head, state_str = (item, None) 

You get an ID of 2020-01-01T00, a state of 00Z/foo/01, and [hence] a submit number None.

So the problem is bigger than job_pool.py ..The colon in the cycle point format breaks the GraphQL API, and the CLI (and perhaps PATH?)..

So if we want : in the cycle point format, then we need to at least consider changing it as a delimiter of ID and state.

@dwsutherland - sorry, I wrote the above in a bit of a hurry, and the initial description as it turns out (see below!).

I certainly agree on the wider implications (separator) and I commented to that effect. But my point just above was that the following code construct is inherently wrong regardless of that:

if foo is not None:
   sub_num = whatever
return sub_num

if sub_num is not defined higher up. The test implies foo can be None (else no need to test) in which case sub_num is undefined.

I can now see (sorry, I had a crazy busy day!) that I cut-n-pasted the wrong traceback into the description. I actually originally got an error due to sub_num being undefined.

Here's the correct traceback:

$ cylc run --no-detach bar
            ._.                                                       
            | |                 The Cylc Suite Engine [8.0a0]         
._____._. ._| |_____.           Copyright (C) 2008-2019 NIWA          
| .___| | | | | .___|   & British Crown (Met Office) & Contributors.  
| !___| !_! | | !___.  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____!  This program comes with ABSOLUTELY NO WARRANTY.
      .___! |              It is free software, you are welcome to    
      !_____!             redistribute it under certain conditions;   
                        see `COPYING' in the Cylc source distribution. 

2019-09-16T22:02:13+12:00 INFO - Suite server: url=tcp://localhost:43034/ pid=18186
2019-09-16T22:02:13+12:00 INFO - Run: (re)start=0 log=1
2019-09-16T22:02:13+12:00 INFO - Cylc version: 8.0a0
2019-09-16T22:02:13+12:00 INFO - Run mode: live
2019-09-16T22:02:13+12:00 INFO - Initial point: 2020-01-01T00:00
2019-09-16T22:02:13+12:00 INFO - Final point: None
2019-09-16T22:02:13+12:00 INFO - Cold Start 2020-01-01T00:00
2019-09-16T22:02:13+12:00 INFO - [foo.2020-01-01T00:00] -submit-num=01, owner@host=localhost
2019-09-16T22:02:13+12:00 INFO - [foo.2020-01-01T00:00] -triggered off []
2019-09-16T22:02:14+12:00 ERROR - local variable 'sub_num' referenced before assignment
        Traceback (most recent call last):
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/scheduler.py", line 253, in start
            self.run()
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/scheduler.py", line 1526, in run
            self.proc_pool.process()
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/subprocpool.py", line 182, in process
            self._proc_exit(proc, "", ctx, callback, callback_args)
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/subprocpool.py", line 173, in _proc_exit
            self._run_command_exit(ctx, callback, callback_args)
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/subprocpool.py", line 368, in _run_command_exit
            callback(ctx, *callback_args)
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/task_job_mgr.py", line 707, in _submit_task_jobs_callback
            {self.batch_sys_mgr.OUT_PREFIX_COMMAND: self._job_cmd_out_callback}
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/task_job_mgr.py", line 521, in _manip_task_jobs_callback
            callback(suite, itask, ctx, line)
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/task_job_mgr.py", line 732, in _submit_task_job_callback
            self.job_pool.set_job_attr(job_d, 'batch_sys_job_id', items[3])
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/job_pool.py", line 129, in set_job_attr
            point, name, sub_num, _ = self.parse_job_item(job_d)
          File "/home/oliverh/cylc/cylc-flow/venv/lib/python3.7/site-packages/cylc/flow/job_pool.py", line 188, in parse_job_item
            return (point_str, name_str, sub_num, state_str)
        UnboundLocalError: local variable 'sub_num' referenced before assignment
2019-09-16T22:02:14+12:00 CRITICAL - Suite shutting down - local variable 'sub_num' referenced before assignment

And the correct example suite (putting Z on the end of the format resulted in the degenerate cycle points error in the description, which is not what I was trying to highlight):

[cylc]
    cycle point format = %Y-%m-%dT%H:%M  # ERROR
    #cycle point format = %Y%m%dT%H%M  # OK
[scheduling]
    initial cycle point = 2020
    [[dependencies]]
        [[[T00]]]
            graph = "foo"

I understand that sub_num shouldn't ever be None, and the code:

if foo is not None:
   sub_num = whatever
return sub_num

needs revised, but it is a mere symptom of the problem..
It results from the : in the cycle point which breaks the formats:
point/name/sub_num:state
point/name:state
name.point:state

Which is parsed in number of places:
CLI (via task_pool.py method)
GraphQL API (via schema.py)

and in the job_pool.py which may not even need to parse for state.. so to fix the job pool method we could probably just remove the block:

 if ":" in item: 
     head, state_str = item.rsplit(":", 1) 
 else: 
     head, state_str = (item, None) 

But the CLI and GraphQL API will still be broken with : in the cycle point format.

Well we entirely agree then (see my final sentence in the description, after all) - I think you may be somewhat over-interpreting my minor point about the sub_num code block :grin: I was just commenting on that because it was the direct cause of the original error I got (if I had posted the correct traceback in the first place, we might have saved ourselves this whole conversation :grin: )

To fix, I guess we'll need to have an extra if - so if state_str is not in the list of valid task states, we should use the head, state_str = (item, None) logic (like in the else clause)?

If we will need to revisit the separator, perhaps we can simply treat the information as query parameters and URL encode everything for the transport layer? (so following RFC 3986 and with native support in Python, avoiding the need to write any code for this? I

So instead of 2020-01-01T00:00Z/foo/01, have cyclepoint=2020-01-01T00:00Z&task=foo&submitNum=01. Which could be URL encoded ascyclepoint%3D2020-01-01T00%3A00Z%26id%3Dfoo%2F01`. Less human friendly, but avoids having to parse it on the client.

And then the separator is defined only on client layer? Not sure if that makes sense as I am not completely aware of how/where/why/etc this separator is used (except I see this on the GraphQL response). But if that's possible, at least we would be using something natively supported by Python, and avoiding the need to write code to handle this...

So far I think we haven't had the need to parse it, as the same information is present in the rest of the response. The only place I know - off the top of my head - that this is used, is as ID of each HTML element created by Cylc UI for the tree component I think.

To fix, I guess we'll need to have an extra if - so if state_str is not in the list of valid task states, we should use the head, state_str = (item, None) logic (like in the else clause)?

Yes that would work, since it's a right split.. We can apply that to schema.py functions and task_pool.py method...
Again, for the job_pool.py method I don't think we need to parse for state (I think!)..

So instead of 2020-01-01T00:00Z/foo/01, have cyclepoint=2020-01-01T00:00Z&id=foo/01. Which could be URL encoded as cyclepoint%3D2020-01-01T00%3A00Z%26id%3Dfoo%2F01. Less human friendly, but avoids having to parse it on the client.

I think the parsing of this format is always server-side, but it can be provided in mutations & queries client side (owner|workflow|cycle|name|sub_num:status).

To fix, I guess we'll need to have an extra if - so if state_str is not in the list of valid task states, we should use the head, state_str = (item, None) logic (like in the else clause)?

The other thing to watch out for .. If it's not a valid state and we set it to None, then for cylc trigger that would actually trigger all foo instead of the intended (i.e miss spelt) state.. (which is not a desirable behaviour)

Which actually means.. we may have to choose a different delimiter..

@kinow The separator is currently used by various CLI commands. E.g.: you can do:

cylc trigger mysuite 'FAM1.20190916T1200Z:failed'

which will trigger all instances of failed tasks in FAM1 family in the 20190916T1200Z cycle.

The colon was used to look a bit like the syntax in suite.rc graphs, e.g. FAM1:failed => task2. I guess we (probably me) must have forgotten about the potential usage of : in time strings.

Thanks @matthewrmshin and @dwsutherland for the background on the issue :+1: will keep lurking here in the discussion :eyes:

You should be able to do this:
$ cylc trigger baz foo.201909*T00:00Z<delim>failed
and it target those failed tasks, but this:
$ cylc trigger baz foo.201909*T00:00Z<delim>asdfg
or
$ cylc trigger baz foo.201909*T0000Z<delim>asdfg
and it target no tasks (not all), also this:
$ cylc trigger baz foo.201909*T00:00Z
and it target all foo in the month.

Hence can't use : for state delimiter, and it also be in datetime.

Moved to next release after discussion on Riot, as this issue needs more discussion and we won't have time to finish it for 8.0a1.

3373 includes a removal of state parsing from the job_pool.py method.. This is a fix for only one aspect of the problem of : in the datetime.

For curiosity I tried the suite (without the Z at the end of the cycle point format), on master, and it worked fine. Is this issue still a problem?

Started jupyterhub, and can successfully visualize the workflow with no issues (no tracebacks anywhere at least).

image

image

I problem may still arise if the ID arg is parsed on a root query:

    if ':' in item:
        head, state = item.rsplit(':', 1)
    else:
        head, state = (item, None)

But that problem already exists with CLI interactions with the task pool:
https://github.com/cylc/cylc-flow/blob/5d4db47dfe437397a38bdecfe715110f1bdccadf/cylc/flow/task_pool.py#L1345-L1348

But maybe the problem in the cyclepoint got fixed in some other PR/commit?

What I'm saying is; the use of : in the API/CLI args for delimiter of state is still an issue:
https://github.com/cylc/cylc-flow/issues/3369#issuecomment-532520080

Potentially resolved by a downstream PR of #3931

Update:

  • The use of a colon in the cycle point format should not be supported in Cylc.
  • The colon is not a safe character to use in file names due to environment variable and filesystem conventions.
  • The colon conflicts with existing task globbing syntax e.g. task.cycle:failed.
  • The colon conflicts with proposed syntax e.g. workflow*glob:running (https://github.com/cylc/cylc-flow/issues/3592)

Suggest creating a validator for the cycle point format:

https://github.com/cylc/cylc-flow/blob/e3b3f320c48ccaf6c10c6ad15e7eb183b2544853/cylc/flow/unicode_rules.py#L176

(note subclasses are auto-documenting, use .. autoclass:: in cylc-doc)

Note that at least one test uses a colon in the cycle point format which will have to be changed.

@dwsutherland are you working on this one? We should be able to make progress before the universal ID stuff happens.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kinow picture kinow  路  4Comments

sadielbartholomew picture sadielbartholomew  路  4Comments

oliver-sanders picture oliver-sanders  路  3Comments

oliver-sanders picture oliver-sanders  路  3Comments

dpmatthews picture dpmatthews  路  3Comments