Eksctl: Cluster deletion does not cleanup AWS CF stack, if the stack creation failed.

Created on 23 Mar 2020  Â·  7Comments  Â·  Source: weaveworks/eksctl

Issue: Eksctl does not have a way of deleting the rolled back Cloudformation stacks if the EKS cluster failed to create.

Description: The AWS Stack for cluster creation failed as we were hitting EIP limits on our account. The stack reached a "Rollback Complete" state.

[✖]  AWS::IAM::Role/ServiceRole: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::InternetGateway/InternetGateway: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::VPC/VPC: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::EIP/NATIP: CREATE_FAILED – "The maximum number of addresses has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: AddressLimitExceeded; Request ID: dc797cfa-42ff-461b-c88ab5c4e291)"
[ℹ]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ]  to cleanup resources, run 'eksctl delete cluster --region=us-west-2 --name=venil'
[✖]  waiting for CloudFormation stack "eksctl-venil-cluster" to reach "CREATE_COMPLETE" status: ResourceNotReady: failed waiting for successful resource state
[✖]  failed to create cluster "venil"
Error when creating EKS cluster

As suggested in the logs, if we try to use "eksctl delete cluster" command to cleanup resources, we get the below error

fetching cluster status to determine if it can be deleted: unable to describe cluster control plane: ResourceNotFoundException: No cluster found for name: venil
aredeletions kinfeature prioritimportant-longterm

Most helpful comment

This also prevents us from reusing the same cluster name, without going to the amazon tooling to delete the existing ROLLBACK_COMPLETE stack.

All 7 comments

This also prevents us from reusing the same cluster name, without going to the amazon tooling to delete the existing ROLLBACK_COMPLETE stack.

The same happens when failed creating the nodegroup. if it's failed once you have to manually goto cloudformation to delete that stack. otherwise eksctl will keep looking into that failed stack which will preventing other normal operation.

eksctl get nodegroup --cluster abcd
Error: getting nodegroup stack summaries: mapping stack to nodegroup summary: error collecting Cloudformation outputs for stack eksctl-abcd-nodegroup-ng-workers-2: no output "InstanceRoleARN" in stack "eksctl-abcd-nodegroup-ng-workers-2"

@chimerab Which version of eksctl are you using? When did the creation fail?

This issue is actually fixed, for cluster stacks. https://github.com/weaveworks/eksctl/pull/2528

@michaelbeaumont

at least below two versions have same problem.

eksctl version
0.27.0
0.28.1

First I try to create nodegorup with "eksctl create nodegroup -f xxxx-ng-workers-2.yaml" the stack failed because the volume size is smaller than image.

[ℹ] deploying stack "eksctl-abcd-nodegroup-ng-workers-2"
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-abcd-nodegroup-ng-workers-2"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[!] AWS::EC2::LaunchTemplate/NodeGroupLaunchTemplate: DELETE_IN_PROGRESS
[!] AWS::EC2::SecurityGroupEgress/EgressInterClusterAPI: DELETE_IN_PROGRESS
[!] AWS::EC2::SecurityGroupEgress/EgressInterCluster: DELETE_IN_PROGRESS
[!] AWS::EC2::SecurityGroupIngress/IngressInterClusterCP: DELETE_IN_PROGRESS
[!] AWS::EC2::SecurityGroupIngress/IngressInterClusterAPI: DELETE_IN_PROGRESS
[!] AWS::EC2::SecurityGroupIngress/IngressInterCluster: DELETE_IN_PROGRESS
[!] AWS::EC2::SecurityGroupIngress/SSHIPv4: DELETE_IN_PROGRESS
[✖] AWS::AutoScaling::AutoScalingGroup/NodeGroup: CREATE_FAILED – "You must use a valid fully-formed launch template. Volume of size 20GB is smaller than snapshot 'snap-abcdef', expect size >= 80GB (Service: AmazonAutoScaling; Status Code: 400; Error Code: ValidationError; Request ID: xxxxx-xxxx-xxxx-xxxxx-xxxxxxxx; Proxy: null)"
[ℹ] 1 error(s) occurred and nodegroups haven't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete nodegroup --region=ap-northeast-1 --cluster=abcd --name=' for each of the failed nodegroup
[✖] waiting for CloudFormation stack "eksctl-abcd-nodegroup-ng-workers-2": ResourceNotReady: failed waiting for successful resource state
Error: failed to create nodegroups for cluster "abcd"

After above nodegroup creation failed. the stack in cloudformation is in "ROLLBACK_COMPLETE" status. any eksctl nodegroup will not work properly until i manually delete the problematic stack.

@chimerab This issue was about eksctl delete cluster not working, does delete cluster not work for this stack?

@michaelbeaumont it's work well.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xakraz picture xakraz  Â·  4Comments

arun-gupta picture arun-gupta  Â·  3Comments

brunojcm picture brunojcm  Â·  3Comments

whereisaaron picture whereisaaron  Â·  4Comments

errordeveloper picture errordeveloper  Â·  4Comments