I'm using the autoscaler v1.2 on a Azure, VMAS, ACS-Engine deployed Kubernetes cluster.
I keep getting this kind of entries in the autoscaler logs:
I0404 09:37:54.025617 1 static_autoscaler.go:184] 3 unregistered nodes present
What is this supposed to mean and how do I fix it?
Thank you.
An unregistered node is a VM that Cluster Autoscaler learns about via cloud provider API, for which there's no corresponding node registered in Kubernetes master (hence, an 'unregistered' node). If this appears consistently, it's most likely caused by either nodes failing to register, or cloud provider informing Cluster Autoscaler about some unrelated VMs. As for fixing it, I'd start with identifying such nodes.
See #776. useInstanceMetadata should set to false for Kubernetes v1.9.3 and previous versions (or else, autoscaler won't be able to recognize the node by externalID)
@feiskyer we observe the same issue with k8s 1.10.0. But it turns out that we've got useInstanceMetadata: false by default. Despite the fact we've specified "true" in our acs engine's cluster definition
@feiskyer Ah! This explains why my 1.10 is working even without the useInstanceMetadata. So this is a setting to use for kube 1.9.x. Thank you.
I think this issue has been resolved. Let's close it
/close
I'm experiencing this same problem running on AWS, using kubernetes 1.12. The autoscaler brings up instances, then after a short time, removes them since they are unregistered:
I0416 14:13:53.358288 1 static_autoscaler.go:206] 2 unregistered nodes present
I0416 14:13:53.358294 1 utils.go:467] Removing unregistered node aws:///<region>/i-04a694bc25af61cbf
I0416 14:13:53.393227 1 auto_scaling_groups.go:254] Terminating EC2 instance: i-04a694bc25af61cbf
the nodes are present in kubernetes and have pods scheduled to them.
From reading, this PR sticks out: https://github.com/openshift/kubernetes-autoscaler/pull/14
Particularly:
Switching
cloudprovider.NodeGroupForNode()to indexing onnode.Spec.ProviderIDand also returning provider ID values incloudprovider.Nodes()means we no longer experience the case where the nodegroup/node becomes unregistered
Looking at the kubelet documentation, it appears that --provider-id is a flag that I can add.
This kubernetes cluster is being deployed on bare instances and not managed by aws.
Is the --provider-id necessary to prevent nodes from being marked as "unregistered"?
Is the --provider-id necessary to prevent nodes from being marked as "unregistered"?
Yes. TL;DR ProviderId is the only way for Cluster Autoscaler to connect a VM to a Kubernetes Node object. If you'd like to do it differently, you can try to add an alternative implementation to GetInstanceId() in cloudprovider module.
Cool, I didn't see a section in the documentation so I was wondering why my nodes were being brought up and then killed.
Is it written anywhere? If not, I would be happy to add an "If you're deploying this to a self-managed cluster" or something
I don't think it's explicitly documented anywhere. If you want to take a shot at documenting it, we'll be very happy to accept your contribution.
Most helpful comment
Cool, I didn't see a section in the documentation so I was wondering why my nodes were being brought up and then killed.
Is it written anywhere? If not, I would be happy to add an "If you're deploying this to a self-managed cluster" or something