What happened?
I preformed a eksctl delete nodegroup --cluster prod-eks --name ng-1 the drain failed because of existing daemon sets and some local data.
I drained the nodes manually with kubectl using kubectl drain -l 'alpha.eksctl.io/nodegroup-name=ng-1' --force --ignore-daemonsets --delete-local-data
I ran eksctl delete nodegroup --cluster prod-eks --name ng-1 and now got an error
2019-09-11T18:20:08-05:00 [!] error getting instance role ARN for nodegroup "ng-1"
The CloudFormation delete has also failed to run with the events
2019-08-28 14:06:18 UTC-0500 | eksctl-mim-prod-eks-nodegroup-ng-1 | DELETE_FAILED | The following resource(s) failed to delete: [NodeInstanceRole].
-- | -- | -- | --
2019-08-28 14:06:17 UTC-0500 | NodeInstanceRole | DELETE_FAILED | Cannot delete entity, must detach all policies first. (Service: AmazonIdentityManagement; Status Code: 409; Error Code: DeleteConflict; Request ID: e9ebc137-c9c6-11e9-a56a-e1f2488279d7)
All instances were terminated but performing a eksctl get nodegroups --cluster prod-eks I can see
→ eksctl get nodegroup --cluster mim-prod-eks
CLUSTER NODEGROUP CREATED MIN SIZE MAX SIZE DESIRED CAPACITY INSTANCE TYPE IMAGE ID
prod-eks ng-1 2019-08-14T16:28:19Z 1 4 3 t3.medium ami-0f2e8e5663e16b436
prod-eks ng-6 2019-09-11T19:21:31Z 1 10 4 t3.large ami-0d3998d69ebe9b214
What you expected to happen?
eksctl would no longer list the deleted node group
How to reproduce it?
Not sure why it failed tbh
Anything else we need to know?
Very standard install
Versions
Please paste in the output of these commands:
$ eksctl version
[ℹ] version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.5.3"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:36:28Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
Logs
Include the output of the command line when running eksctl. If possible, eksctl should be run with debug logs. For example:
eksctl get clusters -v 4
Make sure you redact any sensitive information before posting.
If the output is long, please consider a Gist.
hey
Have you been able to fix this issue in any way? The same thing happened to me yesterday and I can't find a way to permanently delete nodegroup from my EKS cluster.
I had this issue, which in my case I found a solution for.
For me it related to dangling ENIs left behind by auto-scaling instances up and down (spot in my case). These ENIs were still attached to the node group security group, so the security groups could not be deleted when deleting the cloudformation stack (initiated by eksctl).
Deleting these ENIs (they have a status of Available and not attached to an instance, also will have the node group security group listed) allowed cloudformation to properly delete the stack for the node group and it appears completely deleted to eksctl.
Deleting these dangling ENIs every so often (depending on how quickly they build up for you) is also good policy as they have caused other issues for me (and others) as well:
See:
https://github.com/aws/amazon-vpc-cni-k8s/issues/59
https://github.com/aws/amazon-vpc-cni-k8s/issues/608
etc
+1 faced the same issue
Facing the same issue here and not sure how to proceed.
In trying to delete the cluster I see the following error
eksctl delete cluster --name floral-rainbow-1574743755
eksctl version 0.10.2
using region us-east-1
deleting EKS cluster "floral-rainbow-1574743755"
cleaning up LoadBalancer services
[no eksctl-managed CloudFormation stacks found for "floral-rainbow-1574743755"
I went to the AWS console, I see the EKS cluster there, trying to delete the cluster manually I am seeing the following error.
ResourceInUseException
Cluster has node groups attached
Drilling into the NodeGroup, I see it listed there. Tried to manually delete the NodeGroup from AWS console and it error'd out as well with DELETE FAILED
With kubectl I am not seeing the nodes anymore but I see the following resources
kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-77f96c54b6-c78x4 0/1 Pending 0 4d2h
kube-system pod/coredns-77f96c54b6-j8jh4 0/1 Pending 0 4d2h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 6d20h
kube-system service/kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP 6d20h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-node 0 0 0 0 0 <none> 6d20h
kube-system daemonset.apps/kube-proxy 0 0 0 0 0 <none> 6d20h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/2 2 0 6d20h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-77f96c54b6 2 2 0 6d20h
Any help is appreciated as I don't know of a way to clean and remove this cluster up now
Thanks
Hey @ddavtian did you manage to delete the EKS cluster ? How did you go about it ?
@Chidiebube I did, the issue is that there is a broken configmap in the cluster and that needs to be manually fixed first. Looking at my history of commands, try poking around this to make sure the yaml is proper
kubectl edit -n kube-system configmap/aws-auth
Fix it and try to remove it again using AWS Console.
+1 faced the same issue
I had this issue, which in my case I found a solution for.
For me it related to dangling ENIs left behind by auto-scaling instances up and down (spot in my case). These ENIs were still attached to the node group security group, so the security groups could not be deleted when deleting the cloudformation stack (initiated by eksctl).
Deleting these ENIs (they have a status of Available and not attached to an instance, also will have the node group security group listed) allowed cloudformation to properly delete the stack for the node group and it appears completely deleted to eksctl.
Deleting these dangling ENIs every so often (depending on how quickly they build up for you) is also good policy as they have caused other issues for me (and others) as well:
See:
aws/amazon-vpc-cni-k8s#59
aws/amazon-vpc-cni-k8s#608
etc
Thanks this worked for me.
+1
+1
+1
We also experienced this issue, what worked for us:
+1
I trapped to the same problem :((
============
This step worked for me:
after that, I can delete nodegroup and the cluster...
Yeah!
I'll also say this is not an eksctl specific issue. Our EKS cluster was not created or managed with eksctl and we had the same issue of dangling ENIs.
Same issue here. Although eksctl said it deleted the node group, the Cloud Formation stack had failed to delete it. The message "must detach all policies first" made me look at the node group's NodeInstanceRole in IAM. I removed the last remaining policy (CloudWatchLogsFullAccess) on that role and that worked for me.
Same issue. I deleted the autoscaling group, The NAT gateways and VPCs. Thanks to the billing alerts. I couldn't find any cluster to delete.
There was another way, the cloudformation stacks were running and I went ahead and deleted the same. That worked too the second time around!
See #2172 and potentially fixed by https://github.com/weaveworks/eksctl/pull/2762
@MG40 +1. I also deleted the hanging nodegroups by deleting the associated cloudformation stack.
Most helpful comment
I had this issue, which in my case I found a solution for.
For me it related to dangling ENIs left behind by auto-scaling instances up and down (spot in my case). These ENIs were still attached to the node group security group, so the security groups could not be deleted when deleting the cloudformation stack (initiated by eksctl).
Deleting these ENIs (they have a status of Available and not attached to an instance, also will have the node group security group listed) allowed cloudformation to properly delete the stack for the node group and it appears completely deleted to eksctl.
Deleting these dangling ENIs every so often (depending on how quickly they build up for you) is also good policy as they have caused other issues for me (and others) as well:
See:
https://github.com/aws/amazon-vpc-cni-k8s/issues/59
https://github.com/aws/amazon-vpc-cni-k8s/issues/608
etc