At the moment when we run a workflow the configuration gets dumped in two places:
suite.rc.processed which is the output of cylc get-config --sparse.log/suiterc/<timestamp>-(run|restart).rc which is the output of cylc get-config.The non-sparse output contains all the default settings which isn't really something we need to record, this information can always be reproduced at a later date by using cylc get-config at the appropriate Cylc version.
[edit] turns out that the output is slightly stranger than I first reported
suite.rc.processedwhich is the output ofcylc view --processlog/suiterc/<timestamp>-(run|restart).rcwhich is the output ofcylc.flow.parsec.util.printcfg(schd.config.cfg)which is the sparse config but output in a funny way which doesn't actually match eithercylc get-config --sparseorcylc get-config
Questions:
suite.rc.processed file and only write the sparse output to log/suiterc?suite.rc.processed file and only write the processed output to log/suiterc?Pull requests welcome!
Should we remove the suite.rc.processed file and only write the sparse output to log/suiterc?
As I recall (and as the .processed suffix suggests) the original intent for this file was to let users see the result of Jinja2-processing, if they wanted to. However, that seems kind of pointless given that we have cylc view --jinja2 for that purpose.
So on that basis I don't see any need to keep the file.
I also think keeping only the sparse output in log/suiterc (erm, log/flow.cylc) is fine. I suppose you could argue that the full (non-sparse) config is provides better provenance, but really you still need to go back to the appropriate Cylc version to interpret that properly anyway.
From a provenance perspective, so long as the Cylc version used for the run is recorded (which it should be) then the dense config can be obtained at a later date using the same Cylc version.
Added an [edit] note to this issue as it turns out that the output wasn't as simple as I had first reported.
The suggestion from dave is to remove the suite.rc.processed and use the cylc view --processed output for log/suiterc/(run|restart|reload)-flow.cylc. That way we would be recording the most "raw" version of the config that it is sensible for us to store.
The downside to this is that it has not been passed through cylc.flow.config so may have duplicate settings, etc, however one can easily parse this file through cylc get-config --sparse (with the correct Cylc version) to acquire the "clean" config.
I can see the benefits of both the cylc view --processed (what you asked for) output and the cylc get-config --sparse (what you got), however, I think we should pick one and only one.
Perhaps an interesting use case to take into account is the proposed feature of editing the flow config via the GUI then reloading the workflow with the provided diff. This would work with the latest log/suiterc file, which would make more sense here, the cylc view or cylc get-config offering?
The non-sparse output contains all the default settings which isn't really something we need to record, this information can always be reproduced at a later date by using cylc get-config at the appropriate Cylc version.
From a provenance perspective, should we also be dumping the global config to the run directory?
Perhaps an interesting use case to take into account is the proposed feature of editing the flow config via the GUI then reloading the workflow with the provided diff. This would work with the latest log/suiterc file, which would make more sense here, the cylc view or cylc get-config offering?
This probably needs to work with the parsed config rather than merely the processed text file, to allow authorization-based control over edit-runs.
From a provenance perspective, should we also be dumping the global config to the run directory?
Because the global config is constantly reloaded this does not currently make sense. However, if we recorded the global config at startup and logged any changes made throughout the life of the suite...
This probably needs to work with the parsed config rather than merely the processed text file, to allow authorization-based control over edit-runs.
I guess since authorisation must happen at the UIS which is Python so has access to the Cylc libraries this is more arbitrary.
processed file allows non-owners to see the processed suite (source often hard to read)
conclusion: keep the processed file in the log directory, not in the top level.