Is there a instruction or best practice to remove a node from an AKS cluster?
@bramvdklinkenberg For adding or removing nodes in a AKS cluster (scaling the cluster) , best practice is to use UI, CLI or PowerShell to change the nodes count of that cluster.
Azure takes care of selecting a node and draining the containers and deletes them in case of scale down.
More information is here.
Auto scaling option is also available( Currenty in preview).
Hi @jakaruna-MSFT , with scaling (through portal, cli) you cannot choose a specific machine to remove. If you have node-0, 1 and 3 and you scale to 6 and scale down to 3 again then 0, 1 and 2 will stay in the cluster. But sometimes I noticed a machine is not always properly functioning, but doesn't get replaced by azure. So sometimes you would like to maybe "manually" delete a machine and scale back to the configurerd amount of nodes.
But maybe I am looking for a solution or workaround for Infra issues on Azure side here.
@bramvdklinkenberg we can look at the Insignts (under monitoring) of that particular node if its not functioning as expected. Most of the time some containers which are running on those machines may consume more resources than expexted and that may be the reason for slowdown.
We also have a workaround to delete a particular node on the cluster. But thats not RECOMMENDED.
Problems with this approach.
Hi @jakaruna-MSFT ,
I understand it is not recommended or preferable, but in case the self healing part of the nodes doesn't work (correctly) and you want or have to do it yourself then it would be nice to somehow say I want that machine ro be replaced by a new node.
Thanks for the feedback, @bramvdklinkenberg. For product feedback suggestions like adding in the option to delete nodes within the cluster as part of the CLI tooling or portal, you can submit suggestions here - https://feedback.azure.com/forums/914020-azure-kubernetes-service-aks. That helps prioritize new features within the platform.
@jakaruna-MSFT As there's no actionable updates right now in the docs, #please-close
As a workaround, an option is to:
What if we want to actually scale down?
On my 7 node cluster I did a kubectl delete node and found out the VM is not deleted. So I manually delete the VM as well. AKS however still thinks my cluster has 7 nodes: when I scale down to 6, it just removes another node. When I scale up again, it adds just one, instead of 2.
So I end up with a 6 node cluster of which AKS thinks its 7 nodes. :-s
We should be able to freely recycle nodes, just for health reasons, just like kube-monkey does for pods.
Code snippet.
Function DeleteNode($nodeName)
{
kubectl delete node $nodeName #delete from k8s
#find the VMSS node ID and then delete it from VMSS.
$instanceIDs = (Get-AzVmssVM -ResourceGroupName $aks_infra_nodes_name -VMScaleSetName $aks_vmss_name | Select InstanceID).InstanceID
foreach($id in $instanceIDs)
{
$currentComputername = (Get-AzVmssVM -ResourceGroupName $aks_infra_nodes_name -VMScaleSetName $aks_vmss_name -InstanceId $id | Select OSProfile).OsProfile.ComputerName
if($currentComputername -eq $nodename)
{
Remove-AzVmss -ResourceGroupName "$aks_infra_nodes_name" -VMScaleSetName "$aks_vmss_name" -InstanceId "$id" -Force
}
}
}
$podsJson = kubectl get pods --field-selector status.phase=Completed -n dev -o json
$podsJson = $podsJson | ConvertFrom-Json
[array]$podJsonArr = $podsJson.items
Foreach($item in $podJsonArr)
{
$podName = $item.metadata.name
if($podName -notlike "cronjob-node0-")
{
[string]$nodeName = $item.spec.nodeName
Write-Output "Going to remove node $nodeName"
DeleteNode $nodeName
Write-Output "Done removing node $nodeName"
}
Then at night time, before the schedule starts I scale the nodes back and taint them for the pods to get dropped on those nodes. The entire requirement and solution is below.
https://github.com/g0pinath/az-fs-bkp
Most helpful comment
On my 7 node cluster I did a kubectl delete node and found out the VM is not deleted. So I manually delete the VM as well. AKS however still thinks my cluster has 7 nodes: when I scale down to 6, it just removes another node. When I scale up again, it adds just one, instead of 2.
So I end up with a 6 node cluster of which AKS thinks its 7 nodes. :-s
We should be able to freely recycle nodes, just for health reasons, just like kube-monkey does for pods.