Flux-core: config: support config reload

Created on 27 Mar 2020 · 3Comments · Source: flux-framework/flux-core

A strategy is needed for updating the Flux configuration on a live system, for example after configuration management software updates the system instance config in /etc/flux/system/conf.d.

triggering config reload

systemd.service(5) says:

It is strongly recommended to set ExecReload= to a command that not only triggers a configuration reload of the daemon, but also synchronously waits for it to complete.

So one option would be to have a flux config reload command that makes an RPC to the broker. The broker re-parses the config at its configured path, and returns reconfig status/error message in the response.

notification of distributed components

When a new "config object" is successfully parsed by the broker, the cached copy in flux_t handles of modules needs to be updated, and their functions for taking in the config object need to be rerun.

My first thought was to implement a streaming RPC in the broker so that interested modules could request that they receive a new config object in a response each time the config is reloaded. In their message handler, they could re-parse their config out of the object and update their inner workings as appropriate. However, see error handling below.

error handling

A TOML parse error could be returned as an error to flux config reload as described above, and presumably be propagated to config management update processes. The broker could ignore the update and continue running with the old config. If the config is not fixed before the next broker restart, then the broker would refuse to start.

Upon notification of reload, a module might find errors in its part of the config object that were not caught by the broker. It could simply log the error and ignore the update, but this may not be sufficient to get anyone's attention that the update was broken. Again, if the config is not fixed before the next broker restart, then the module would refuse to load (and the broker would fail rc1 and refuse to start).

Is that OK or do we need a "closed loop" mechanism for updating the configuration across modules so that flux config reload also fails if a module rejects the update?

(Possibly the streaming RPC could be replaced with a broker-initiated RPC to a well known method)

Source

garlick

👍1

Most helpful comment

Great summary!

I'm not sure it is the right answer, but my opinion is that it would be better if module reconfigure were an RPC that could fail with textual error reported back to broker and finally to flux config reload.

A module that supports reconfiguration would have to register some kind of handler for a reconfig RPC anyway, so it seems like adding a response to the RPC might be easily accomplished. (Though I haven't thought long on this, so I would defer to your judgement)

(Possibly the streaming RPC could be replaced with a broker-initiated RPC to a well known method)

This seems like it could be used to build a common module config handling scheme for modules. A well known, possibly built-in, method in all modules could allow configuration updates via RPC, either by bulk update of the config object (e.g. by broker as in a global flux config reload), or a set of an individual parameter with flux config set table.key=value. The builtin RPC could also allow query flux config get ....

Additionally, perhaps this could be extended such that configuration specified on module load commandline uses the same callback.

If the module config is specified in common structure, with description of each parameter, then the system could become somewhat self documenting, i.e. flux modinfo or other command could list supported configuration parameters.

(Sorry if I kind of took us off the rails there)

grondo on 27 Mar 2020

👍2

All 3 comments

Great summary!

(Possibly the streaming RPC could be replaced with a broker-initiated RPC to a well known method)

Additionally, perhaps this could be extended such that configuration specified on module load commandline uses the same callback.

(Sorry if I kind of took us off the rails there)

grondo on 27 Mar 2020

👍2

Good input!

my opinion is that it would be better if module reconfigure were an RPC that could fail with textual error reported back to broker and finally to flux config reload.

OK, agreed. How about this: a module that supports config file input registers a <service>.config-reload handler. The handler could call flux_set_conf() with the updated config object, and then call the same internal function that it used to initialize the module's config (which generates the module's _internal_ config from combined (builtin, config-file, environment, command line). On failure, it can include textual errors in its response.

From the broker's perspective, when it receives a request to reload, it iteratively calls this method on all loaded modules, ignoring ENOSYS (module didn't register handler), and aggregating any textual errors with line breaks so that flux config reload can print them.

or a set of an individual parameter with flux config set table.key=value. The builtin RPC could also allow query flux config get ....

Were you thinking of the config in this context as the representation of cached TOML config or the combined config alluded to above? Maybe the ability to get/set both would be useful? flux config ... could strictly deal with the broker's cached TOML config (triggering config-reload RPCs as described above when modified) and flux module config ... could operate directly on the combined config?

garlick on 27 Mar 2020

How about this:

Yeah, exactly what I was thinking!

Were you thinking of the config in this context as the representation of cached TOML config or the combined config alluded to above?

Yeah, I think we're on the same page. I was mainly thinking of using this bit of design work to make sure the combined config was supported the same for all modules.

I think I'm fine with a split between flux config .. and flux module config ..., though it might be nice to be able to dump the entire config at once somehow, and it seems like flux config dump or something might be right thing for that. I admit that doesn't seem high on priority list.

grondo on 27 Mar 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

increase minimum jansson version

chu11 · 3Comments

cleanup: use __func__ not __FUNCTION__

garlick · 4Comments

testsuite: failures when run with `debug=t` OR `verbose=t`

SteVwonder · 5Comments

license: update in-line copyright boilerplate

garlick · 9Comments

Auto-generating keys for first-time users?

SteVwonder · 4Comments