Cylc-flow: Replace `batch system` with `job runner`

Created on 2 Nov 2020  路  23Comments  路  Source: cylc/cylc-flow

Spun out of issue #3696 .

As per the config change proposal rename the following config items:

  • [x] [job platforms][]batch system -> [platforms][]job runner
  • [x] [job platforms][]batch submit command template -> [platforms][]job runner command template

These items do not need upgraders.

Question: Cylc has a whole infrastructure of batch system stuff, so this could be a rabbit hole - how far down should we pursue this change now? @oliver-sanders - can I have your input on that aspect of it?

Pull requests welcome!

question config change

Most helpful comment

No, that can be left the way it is.

In Cylc7 you have to configure everything on a per-task basis. The host it runs on, the batch submission method, etc. With Cylc8 we've provided platform as an abstraction which has replaced the old host, batch system, etc settings.

We need to keep back-compatability but have no need to rename these old settings so can leave as is.

All 23 comments

Cylc has a whole infrastructure of batch system stuff, so this could be a rabbit hole - how far down should we pursue this change now?

IMO we should go all the way, otherwise it adds a layer of indirection for developers working on that part of the system in future. Changing batch_system to job_runner throughout the code should be pretty straightforward (albeit mildly painful). And we don't need back compat for Cylc 7 DBs. (But have I missed something more difficult than that?)

should be pretty straightforward (albeit mildly painful)

Why I made this a separate issue.

The two settings are missing descriptions. I'm not sure what to write, can anyone offer a suggestion?

Also, while I'm nearby, the description for [platforms][X][run directory] seems wrong: https://github.com/cylc/cylc-flow/blob/fd9b04903cf1f4c8398d658b700876a03bb0b4dd/cylc/flow/cfgspec/globalcfg.py#L323-L325

@oliver-sanders shall I copy & paste the one for work directory (with suitable changes)? https://github.com/cylc/cylc-flow/blob/fd9b04903cf1f4c8398d658b700876a03bb0b4dd/cylc/flow/cfgspec/globalcfg.py#L326-L337

Another thing I've just noticed: there is [runtime][X][job]batch system (and batch submit command template) in the suite config - why is this not being renamed?

@oliver-sanders - can I have your input on that aspect of it?

Erm, I was actually against this change (just feels like renaming for the sake of it, especially as we no longer expect users to configure it). Perhaps ask @hjoliver,

  • If renaming we should rename everywhere (module names, variables, entry points), perhaps batch this with the suite=>flow and other terminology changes so we can schedule this change for a time when the repo is more quiet.
  • We may need to maintain the batch_sys_job_id event handler variable as there is a reasonable chance it is in use.
  • I would like to stick the batch_sys_handlers behind an entry point pre Cylc8.0.0 to allow easier extension so would be good if we didn't have to rename that entry point.
  • Lots of modules and variables to rename :) fun fun fun.

@MetRonnie, you can leave config descriptions for now, their nature will change with cylc install so no need to document them now.

@wxtim What is the context for 'Deprecated... "batch system" will be removed at Cylc 9 - upgrade to platform' at https://github.com/cylc/cylc-flow/blob/fd9b04903cf1f4c8398d658b700876a03bb0b4dd/cylc/flow/platforms.py#L113-L121

Should that also change to '"job runner" will be removed at Cylc 9'?

No, that can be left the way it is.

In Cylc7 you have to configure everything on a per-task basis. The host it runs on, the batch submission method, etc. With Cylc8 we've provided platform as an abstraction which has replaced the old host, batch system, etc settings.

We need to keep back-compatability but have no need to rename these old settings so can leave as is.

Erm, I was actually against this change (just feels like renaming for the sake of it, especially as we no longer expect users to configure it). Perhaps ask @hjoliver,

It's not an arbitrary change. The reasons are,

  • not all of our "batch systems" are actually batch systems (background)
  • "batch system" is a rather HPC-specific term which looks niche to those outside of our current community
  • future job submission methods (e.g. to cloud platforms) are also unlikely be batch systems

So, as per the discussion at the workshop, let's call it something more intuitive and universal: job runner.

@oliver-sanders shall I copy & paste the one for work directory (with suitable changes)?

Yes. (Even it changes later, probably better to fix an overt error like that to avoid getting confused about it again next time).

especially as we no longer expect users to configure it)

BTW if we are to get uptake outside of our current community, most "external" users will be individuals, at least initially, and so will need to configure this stuff.

Another thing I've just noticed: there is [runtime][X][job]batch system (and batch submit command template) in the suite config - why is this not being renamed?

That's deprecated stuff so no need to rename (I think that's what @oliver-sanders said above too)

Question: Cylc has a whole infrastructure of batch system stuff, so this could be a rabbit hole - how far down should we pursue this change now? @oliver-sanders - can I have your input on that aspect of it?

... perhaps ask @hjoliver

I did respond to this in my first comment on this issue. I don't think it's a rabbit hole - what is there beyond what we've discussed? (config item names, variable names in the code, docs - which all change due to platforms anyway)

We may need to maintain the batch_sys_job_id event handler variable as there is a reasonable chance it is in use.

Can this not be upgraded/deprecated in config.py to maintain backwards compatibility?

Minimal example for that:

[scheduling]
    [[dependencies]]
        graph = foo
[runtime]
    [[foo]]
       script = false
       [[[events]]]
            failed handler = echo "Process %(batch_sys_job_id)s FAILED"

(output appears in job activity log)

I guess the options are:

  • search and replace in event handler command lines :-1:
  • simply add a new template variable and keep supporting the old one as well. A deprecation message could be printed on validation where we currently detect illegal template variables :+1:

I vote for plain job_id for the new name. job_runner_job_id is a bit too much.

I've come across SuiteConfig.TASK_EVENT_TMPL_KEYS https://github.com/cylc/cylc-flow/blob/db993d5f998b3969285434afb88c107915f8eba2/cylc/flow/config.py#L130-L138 but I can't find any references to it. Can it be gotten rid of?

Is there any reason not to change what gets written to job.status? I.e.

CYLC_BATCH_SYS_NAME=background
CYLC_BATCH_SYS_JOB_ID=120353
CYLC_BATCH_SYS_JOB_SUBMIT_TIME=2020-12-07T20:22:32Z
CYLC_JOB_PID=120353
etc

to

CYLC_JOB_RUNNER_NAME=background
CYLC_JOB_ID=120353
CYLC_JOB_SUBMIT_TIME=2020-12-07T20:22:32Z
CYLC_JOB_PID=120353

I've come across SuiteConfig.TASK_EVENT_TMPL_KEYS ... but I can't find any references to it. Can it be gotten rid of?

This has been replaced by an Enum in the task_events_mgr module. So yes, we can get rid of it.

Is there any reason not to change what gets written to job.status?

No, I think we can go ahead and change that. The job,status file is for internal use, and we haven't said the content is stable (if others are using it for other purposes...).

I think it's acceptable to change the status file format between major versions (though not within major versions) as we are not providing restart compatibility anyway.

I would like to stick the batch_sys_handlers behind an entry point pre Cylc8.0.0 to allow easier extension so would be good if we didn't have to rename that entry point.

How disruptive is it to change that to job_runner_handlers?

It's not disruptive at the moment. I think @oliver-sanders is just saying we don't want to rename it again after 8.0.0 (?)

Was this page helpful?
0 / 5 - 0 ratings