Without turning off automatic updates, instances within the cluster - masters and nodes - update and reboot themselves at random and over multiple days. This can cause service outages.
I think that automatic updates for CoreOS should be opt-in rather than the current process which is opt-out by declaring a hook during the cluster deployment process.
Tagging @gambol99 / @KashifSaadat
kops versionVersion 1.8.0
kubectl version will print thekops flag.Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-09T21:51:54Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.8", GitCommit:"bc6162cc70b4a39a7f39391564e0dd0be60b39e9", GitTreeState:"clean", BuildDate:"2017-10-05T06:35:40Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
$ systemctl status update-engine.service
You will see that update-engine is enabled for all masters and nodes.
What did you expect to happen?
If they are to be left on by default, the update process should have a process similar to kops rolling-update
Please provide your cluster manifest. Execute
kops get --name my.example.com -oyaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
-v 10 flag.
Hey @harsha-y, thanks for raising an issue.
Personally I'm not too sure about enforcing OS level specific configuration within kops, as this could cause more issues in the future with maintenance of these extra settings that aren't directly related to normal operations of the Kubernetes Cluster (plus requests / PRs to support similar tasks for more operating systems), when OS images are updated.
It's fairly easy for users to cover this via kube daemonsets or a custom hook within kops, such as:
Type=oneshot
ExecStartPre=/usr/bin/systemctl mask --now update-engine.service
ExecStartPre=/usr/bin/systemctl mask --now locksmithd.service
ExecStart=/usr/bin/systemctl reset-failed update-engine.service
@chrislovecnm, what do ya think?
@harsha-y You can always turn off automatic reboot after update via config or pass this in additionalUserData to disable reboot.
locksmith:
reboot_strategy: "off"
Or in config file /etc/coreos/update.conf, update REBOOT_STRATEGY param with off and restart locksmithd.
@KashifSaadat I think that using kops should be as easy as possible. This means that it just works. In your opinion do we need to setup CoreOS with some reasonable defaults? If so then some OS level config makes sense to me. Or we need to document this stuff.
kops supporting an operating system should provide sane defaults IMHO. In this case that would be turning off auto update and reboot so it can be done in a controlled way. In an ideal world, we wouldn't have to turn off this "core"OS feature and auto updates are done the same way kops handles IG rolling updates.
Glad we are discussing this and ok with whatever the outcome is here - either we update kops to include this feature for CoreOS or we decide to go the hooks route and documenting that as a warning - but we should warn users about current behavior so that it doesn't cause outages for production clusters. Happy to submit a doc PR for this.
Might it be preferable to install an update operator rather than disabling this otherwise fairly handy feature of CoreOS?
So how do you provide this setting from command line at cluster create time? I want to disable it until I know that service availability will not be affected
@eherot I think the update operator would be a great solution if there was a setup for it to also update the kops config files. Otherwise, if new instances get added the updates are lost
@balboah Just ran into this as well and it seems adding a hook the way @KashifSaadat mentioned should do the trick. So the hook I am using now is:
hooks:
- name: disable-automatic-updates.service
before:
- update-engine.service
manifest: |
Type=oneshot
ExecStartPre=/usr/bin/systemctl mask --now update-engine.service
ExecStartPre=/usr/bin/systemctl mask --now locksmithd.service
ExecStart=/usr/bin/systemctl reset-failed update-engine.service
Most helpful comment
@eherot I think the update operator would be a great solution if there was a setup for it to also update the kops config files. Otherwise, if new instances get added the updates are lost
@balboah Just ran into this as well and it seems adding a hook the way @KashifSaadat mentioned should do the trick. So the hook I am using now is: