Does this procedure also apply to AKS clusters based on VMSS? Is it enough to reboot, or do we have to do VMSS image updates (if so, how to find proper SKU/Version?).
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@yossaa Thanks for the question.
@iainfoulds Please provide your comments
@yossaa We should not directly reboot for sure. Kured will cordon the pods (graceful shutdown of all the pods in that node) and then it will update and restart the node. Lets wait for Authors reply.
That's clear, we will use some process for this. Not kured because it cannot properly handle our resources, but we will drain node and reboot after that. The question was if it's enough to have all security patches. Or in other words - are the VMSS instances also patched the same way, as VM instances. Because VMSS works differently than pure VMs - there are base images, you can re-image instances, etc.
@iainfoulds @sauryadas Do either of you know if users can utilize VMSS Automatic OS upgrades with AKS? My assumption would be no as an OS update would require a reimage which would destroy the container. But not 100% if this is something supported or planned to be supported.
This is not supported. Directly leveraging VMSS APIs for a PUT operation when VMSS is deployed as a part of AKS can lead to unknown behavior in the cluster.
cc @jluk
Saurya is correct, we should add this note in the docs.
Thanks @sauryadas @jluk for confirming :)
I will assign to @iainfoulds to look into adding this into the doc for future reference.
@MicahMcKittrick-MSFT @jluk @sauryadas On top of documenting what is not possible to do, could it be possible to document how to properly apply OS updates?
This has arisen in the discussion around applying a security fix on AKS in #1065 and it seems users are not able to apply it with VMSS node pools (or I missed something maybe ;)
Good feedback, adding @palma21 for further discussion
I'm not sure I understand, we already document how to do it, literally on this doc. We recommend using kured.
What is not supported is leveraging the VMSS API or any of the underlying APIs for that matter.
@victornoel Jorge and myself looked into this and I think the issues got a bit confusing. We do not know of any reason kured does not work on VMSS, as Jorge mentioned we recognition of what is not supported is direct calls to underlying VMSS APIs to handle AKS clusters.
Could you share what did not work for you when trying to use kured with a VMSS backed cluster? The OP could not use kured because it did not fit their specific workload needs, but it should still work.
I just asked if it will work, because VMSS uses a different concept than pure VMs. Pure VMs are managed by Microsoft and when MS does an OS upgrade, a file /var/run/reboot-required is created.
VMSS has a model and all VMs inside it share the same image. I didn't know how OS upgrades are deployed in such case and if we will get the reboot-required flag. Actually I have checked my VMSS based cluster which is running since 98 days and:
So ye, in such case kured won't work, because it uses a different filename and it's not documented that we have to change it :)
@yossaa /var/run is a symbolic link to /run so the file should appear in both. Could you check if you haven't lost your symlink there?
Here's an example from a VMSS based AKS cluster, kured then proceeded to normally reboot the VMs.
Both VMSS and VMs are managed by MS, and the VMSS base image is the same as VMs.
You are right. I have mounted VMSS filesystem in k8s pod under /worker and symlink /worker/var/run pointed to /run and in this case it should point to /worker/run.
I guess it would be better to symlink ../run instead of /run
Beside that, it looks fine.
This issue appears to be resolved.
Most helpful comment
I just asked if it will work, because VMSS uses a different concept than pure VMs. Pure VMs are managed by Microsoft and when MS does an OS upgrade, a file /var/run/reboot-required is created.
VMSS has a model and all VMs inside it share the same image. I didn't know how OS upgrades are deployed in such case and if we will get the reboot-required flag. Actually I have checked my VMSS based cluster which is running since 98 days and:
So ye, in such case kured won't work, because it uses a different filename and it's not documented that we have to change it :)