Cylc-flow: future of the suite.rc.processed file

Created on 7 Aug 2020 · 8Comments · Source: cylc/cylc-flow

At the moment when we run a workflow the configuration gets dumped in two places:

suite.rc.processed which is the output of cylc get-config --sparse.
log/suiterc/<timestamp>-(run|restart).rc which is the output of cylc get-config.

The non-sparse output contains all the default settings which isn't really something we need to record, this information can always be reproduced at a later date by using cylc get-config at the appropriate Cylc version.

[edit] turns out that the output is slightly stranger than I first reported

suite.rc.processed which is the output of cylc view --process

log/suiterc/<timestamp>-(run|restart).rc which is the output of cylc.flow.parsec.util.printcfg(schd.config.cfg) which is the sparse config but output in a funny way which doesn't actually match either cylc get-config --sparse or cylc get-config

Questions:

[ ] Should we remove the suite.rc.processed file and only write the sparse output to log/suiterc?
[ ] Should we remove the suite.rc.processed file and only write the processed output to log/suiterc?

Pull requests welcome!

question

Source

oliver-sanders

All 8 comments

Should we remove the suite.rc.processed file and only write the sparse output to log/suiterc?

As I recall (and as the .processed suffix suggests) the original intent for this file was to let users see the result of Jinja2-processing, if they wanted to. However, that seems kind of pointless given that we have cylc view --jinja2 for that purpose.

So on that basis I don't see any need to keep the file.

I also think keeping only the sparse output in log/suiterc (erm, log/flow.cylc) is fine. I suppose you could argue that the full (non-sparse) config is provides better provenance, but really you still need to go back to the appropriate Cylc version to interpret that properly anyway.

hjoliver on 8 Aug 2020

From a provenance perspective, so long as the Cylc version used for the run is recorded (which it should be) then the dense config can be obtained at a later date using the same Cylc version.

oliver-sanders on 10 Aug 2020

👍1

Added an [edit] note to this issue as it turns out that the output wasn't as simple as I had first reported.

The suggestion from dave is to remove the suite.rc.processed and use the cylc view --processed output for log/suiterc/(run|restart|reload)-flow.cylc. That way we would be recording the most "raw" version of the config that it is sensible for us to store.

The downside to this is that it has not been passed through cylc.flow.config so may have duplicate settings, etc, however one can easily parse this file through cylc get-config --sparse (with the correct Cylc version) to acquire the "clean" config.

I can see the benefits of both the cylc view --processed (what you asked for) output and the cylc get-config --sparse (what you got), however, I think we should pick one and only one.

Perhaps an interesting use case to take into account is the proposed feature of editing the flow config via the GUI then reloading the workflow with the provided diff. This would work with the latest log/suiterc file, which would make more sense here, the cylc view or cylc get-config offering?

oliver-sanders on 11 Aug 2020

The non-sparse output contains all the default settings which isn't really something we need to record, this information can always be reproduced at a later date by using cylc get-config at the appropriate Cylc version.

From a provenance perspective, should we also be dumping the global config to the run directory?

hjoliver on 13 Aug 2020

👍1

Perhaps an interesting use case to take into account is the proposed feature of editing the flow config via the GUI then reloading the workflow with the provided diff. This would work with the latest log/suiterc file, which would make more sense here, the cylc view or cylc get-config offering?

This probably needs to work with the parsed config rather than merely the processed text file, to allow authorization-based control over edit-runs.

hjoliver on 13 Aug 2020

From a provenance perspective, should we also be dumping the global config to the run directory?

Because the global config is constantly reloaded this does not currently make sense. However, if we recorded the global config at startup and logged any changes made throughout the life of the suite...

This probably needs to work with the parsed config rather than merely the processed text file, to allow authorization-based control over edit-runs.

I guess since authorisation must happen at the UIS which is Python so has access to the Cylc libraries this is more arbitrary.

oliver-sanders on 13 Aug 2020

processed file allows non-owners to see the processed suite (source often hard to read)

hjoliver on 16 Mar 2021

👍1

conclusion: keep the processed file in the log directory, not in the top level.

hjoliver on 16 Mar 2021

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Replace scheduler.py main loop by asyncio loop

kinow · 4Comments

Generalise [runtime] metadata items

hjoliver · 5Comments

GTK/Gobject unnecessary for cylc graph --reference (7.8.x)

sadielbartholomew · 5Comments

cylc review: Server 404 error when trying Display Options for suites contain special characters

kinow · 3Comments

Unhelpful error message for backslash section end

sadielbartholomew · 4Comments