Nixpkgs: systemd: Interactions of `restartIfChanged, reloadIfChanged, stopIfChanged` is fragile

Created on 31 Oct 2018  路  9Comments  路  Source: NixOS/nixpkgs

Issue description

The following three options describe how a unit must be reloaded on switch-to-configuration:

       systemd.user.services.<name>.reloadIfChanged
           Whether the service should be reloaded during a NixOS configuration switch if its definition has changed. If enabled, the value of
           restartIfChanged is ignored.

           Type: boolean

           Default: false

           Declared by:
               <nixpkgs/nixos/modules/system/boot/systemd.nix>


       systemd.user.services.<name>.restartIfChanged
           Whether the service should be restarted during a NixOS configuration switch if its definition has changed.

           Type: boolean

           Default: true

           Declared by:
               <nixpkgs/nixos/modules/system/boot/systemd.nix>


       systemd.services.<name>.stopIfChanged
           If set, a changed unit is restarted by calling systemctl stop in the old configuration, then systemctl start in the new one. Otherwise,
           it is restarted in a single step using systemctl restart in the new configuration. The latter is less correct because it runs the
           ExecStop commands from the new configuration.

           Type: boolean

           Default: true

           Declared by:
               <nixpkgs/nixos/modules/system/boot/systemd.nix>

Their behaviour tightly depends on eachother. E.g. restartIfChanged behaves differently than documented if stopIfChanged is true, but it's true by default, so restartIfChanged never behaves as documented.
But they are not documented in the same place (the options are not next to eachother in man configuration.nix). Also, stopIfChanged has a bit of a deceptive name. This makes these options hard to discover, and they can act a bit surprisingly.

Solution:

Lets instead replace the three options with one option:

options.onChanged.type   = types.enum [ "reload" "restart"  "stop-and-start" ]
options.onChanged.default = "stop-and-start"
enhancement nixos

Most helpful comment

I had some discussion with @aanderse about reload behaviour and also digged into the switch-to-configuration script some time, so I can provide a bit more insight on the behaviour:

As for the 3 booleans, they might still be required, but maybe the names are misleading:

  • restartIfChanged will control whether the unit will be systemctl restarted, or separately stopped, and later started during activation
  • stopIfChanged controls whether the unit should be stopped, if it isn't part of the next generation we're switching to, or should be left running
  • reloadIfChanged currently controls whether a changed unit should just be reloaded on all changes, or restarted.

Even when a unit provides a way to be reloaded, there's still cases where a restart is necessary - like when the underlying binary is exchanged due to a (security) update, or some of the other directives in the systemd unit changed (like the user a service should run as, sandboxing options etc.).

This is currently a big bug IMHO, and we should only reload when its safe, and in all other cases restart.

I'll tinker around with the semantics a bit, but my current idea could entirely deprecate the need for a reloadIfChanged option, while still allowing safe restarts, if only configuration has been updated.

All 9 comments

Any comments of the maintainer? I would love to have this fixed

Cc @edolstra as he has written most of the activation logic. I still very much would like to see it fixed. Also happy to help with implementing. Just need feedback whether I am on the right track and not missing any edge cases.

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/design-choice-of-switch-to-configuration/6016/2

I had some discussion with @aanderse about reload behaviour and also digged into the switch-to-configuration script some time, so I can provide a bit more insight on the behaviour:

As for the 3 booleans, they might still be required, but maybe the names are misleading:

  • restartIfChanged will control whether the unit will be systemctl restarted, or separately stopped, and later started during activation
  • stopIfChanged controls whether the unit should be stopped, if it isn't part of the next generation we're switching to, or should be left running
  • reloadIfChanged currently controls whether a changed unit should just be reloaded on all changes, or restarted.

Even when a unit provides a way to be reloaded, there's still cases where a restart is necessary - like when the underlying binary is exchanged due to a (security) update, or some of the other directives in the systemd unit changed (like the user a service should run as, sandboxing options etc.).

This is currently a big bug IMHO, and we should only reload when its safe, and in all other cases restart.

I'll tinker around with the semantics a bit, but my current idea could entirely deprecate the need for a reloadIfChanged option, while still allowing safe restarts, if only configuration has been updated.

So, to be a bit more specific here, the idea is to do the following:

  • If any parameter in the systemd service file changed, restart the service (via systemctl restart or systemctl start|stop, depending on stopIfChanged). (ignore the X-Re*IfChanged parameters while comparing old and new unit file)

    • Changes of the underlying binary, change of sandboxing or other systemd options should basically always restart the service, as there's no way to apply these to an already running process. We currently just never restart services with reloadIfChanged, which is a serious bug IMHO.

    • Provide some circuit breaker (neverRestartIfChanged) for critical services, that shouldn't be automatically restarted (think of display-manager.service). We should still complain during activation to tell the user what services were skipped, so they can restart them at their convenience.

Then, we could sketch a safer reloading mechanism:

Services that provide a way to reload their configuration while running, and that want to provide the reload possibility need to indirect their configuration file via environment.etc (so the ExecStart*= parameters don't change if only the configuration file is updated)

During activation of the new configuration, the activation script will notice only the monitored config files changed, and trigger a reload.

I think these semantics make reloading services much safer and more predictable.

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/design-choice-of-switch-to-configuration/6016/4

Part of this effort would also be to fix our restart behaviour of systemd's .socket units, which also helps keeping connections open during a service restart:

If a service is upgraded we can restart the service while keeping around its sockets, thus ensuring the service is continously responsive. Not a single connection is lost during the upgrade.

This requires some coorperation of the managed service, but is also something we should make use of, as it allows us to systemctl restart services without the user really noticing ;-)

I marked this as stale due to inactivity. → More info

Definitely still relevant. I guess @flokli and I should chat about this again at some point...

Was this page helpful?
0 / 5 - 0 ratings