Kops: Default setting for CoreOS automatic updates should be off

Created on 18 Feb 2018 · 7Comments · Source: kubernetes/kops

Without turning off automatic updates, instances within the cluster - masters and nodes - update and reboot themselves at random and over multiple days. This can cause service outages.

I think that automatic updates for CoreOS should be opt-in rather than the current process which is opt-out by declaring a hook during the cluster deployment process.

Tagging @gambol99 / @KashifSaadat

kops version

Version 1.8.0

What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-09T21:51:54Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.8", GitCommit:"bc6162cc70b4a39a7f39391564e0dd0be60b39e9", GitTreeState:"clean", BuildDate:"2017-10-05T06:35:40Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

What cloud provider are you using?

What commands did you run? What is the simplest way to reproduce this issue?
Exec into the running pods of the test application and run curl in a loop -

$ systemctl status update-engine.service

What happened after the commands executed?
You will see that update-engine is enabled for all masters and nodes.
What did you expect to happen?
Automatic updates should be turned off by default so instances don't update themselves and reboot at random
If they are to be left on by default, the update process should have a process similar to kops rolling-update
Please provide your cluster manifest. Execute
kops get --name my.example.com -oyaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

Anything else do we need to know?
Related -
https://github.com/kubernetes/kops/pull/3063
https://github.com/kubernetes/kops/issues/3141
Slack conversation

Source

harsha-y

👍1

Most helpful comment

@eherot I think the update operator would be a great solution if there was a setup for it to also update the kops config files. Otherwise, if new instances get added the updates are lost

@balboah Just ran into this as well and it seems adding a hook the way @KashifSaadat mentioned should do the trick. So the hook I am using now is:

hooks:
  - name: disable-automatic-updates.service
    before:
    - update-engine.service
    manifest: |
      Type=oneshot
      ExecStartPre=/usr/bin/systemctl mask --now update-engine.service
      ExecStartPre=/usr/bin/systemctl mask --now locksmithd.service
      ExecStart=/usr/bin/systemctl reset-failed update-engine.service

Globegitter on 4 Apr 2018

🎉8 ❤1

All 7 comments

Hey @harsha-y, thanks for raising an issue.

Personally I'm not too sure about enforcing OS level specific configuration within kops, as this could cause more issues in the future with maintenance of these extra settings that aren't directly related to normal operations of the Kubernetes Cluster (plus requests / PRs to support similar tasks for more operating systems), when OS images are updated.

It's fairly easy for users to cover this via kube daemonsets or a custom hook within kops, such as:

Type=oneshot
ExecStartPre=/usr/bin/systemctl mask --now update-engine.service
ExecStartPre=/usr/bin/systemctl mask --now locksmithd.service
ExecStart=/usr/bin/systemctl reset-failed update-engine.service

@chrislovecnm, what do ya think?

KashifSaadat on 23 Feb 2018

👍2

@harsha-y You can always turn off automatic reboot after update via config or pass this in additionalUserData to disable reboot.

locksmith:
  reboot_strategy: "off"

Or in config file /etc/coreos/update.conf, update REBOOT_STRATEGY param with off and restart locksmithd.

layer3switch on 23 Feb 2018

👍2

@KashifSaadat I think that using kops should be as easy as possible. This means that it just works. In your opinion do we need to setup CoreOS with some reasonable defaults? If so then some OS level config makes sense to me. Or we need to document this stuff.

chrislovecnm on 23 Feb 2018

👍3

kops supporting an operating system should provide sane defaults IMHO. In this case that would be turning off auto update and reboot so it can be done in a controlled way. In an ideal world, we wouldn't have to turn off this "core"OS feature and auto updates are done the same way kops handles IG rolling updates.

Glad we are discussing this and ok with whatever the outcome is here - either we update kops to include this feature for CoreOS or we decide to go the hooks route and documenting that as a warning - but we should warn users about current behavior so that it doesn't cause outages for production clusters. Happy to submit a doc PR for this.

harsha-y on 24 Feb 2018

👍4

Might it be preferable to install an update operator rather than disabling this otherwise fairly handy feature of CoreOS?

eherot on 2 Mar 2018

So how do you provide this setting from command line at cluster create time? I want to disable it until I know that service availability will not be affected

balboah on 16 Mar 2018

@eherot I think the update operator would be a great solution if there was a setup for it to also update the kops config files. Otherwise, if new instances get added the updates are lost

@balboah Just ran into this as well and it seems adding a hook the way @KashifSaadat mentioned should do the trick. So the hook I am using now is:

hooks:
  - name: disable-automatic-updates.service
    before:
    - update-engine.service
    manifest: |
      Type=oneshot
      ExecStartPre=/usr/bin/systemctl mask --now update-engine.service
      ExecStartPre=/usr/bin/systemctl mask --now locksmithd.service
      ExecStart=/usr/bin/systemctl reset-failed update-engine.service

Globegitter on 4 Apr 2018

🎉8 ❤1

Was this page helpful?

0 / 5 - 0 ratings