What happened?
Playing around on the eksworkshop.com
Created a eks cluster with eksctl create cluster -f eksworkshop.yaml, but the managed node group stack failed with Nodegroup nodegroup failed to stabilize: Internal Failure
I think it is easy to reproduce.
$ aws --version
aws-cli/1.18.36 Python/3.6.10 Linux/4.14.171-105.231.amzn1.x86_64 botocore/1.15.36
$ eksctl version
0.16.0
$ cat eksworkshop.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: eksworkshop-eksctl
region: us-west-2
managedNodeGroups:
- name: nodegroup
desiredCapacity: 3
iam:
withAddonPolicies:
albIngress: true
secretsEncryption:
keyARN: arn:aws:kms:us-west-2:<my aws account>:key/<kms key id or name>
What you expected to happen?
I expected the managed node group with desired capacity 3 is successfully created.
How to reproduce it?
Playing around on the eksworkshop.com. Just follow the steps specified in the https://eksworkshop.com/030_eksctl/launcheks/
Anything else we need to know?
What OS are you using, are you using a downloaded binary or did you compile eksctl, what type of AWS credentials are you using (i.e. default/named profile, MFA) - please don't include actual credentials though!
I am using downloaded binary. Just following the steps in https://eksworkshop.com/030_eksctl/launcheks/ and any prerequisite specified in the documented doc.
Versions
Please paste in the output of these commands:
$ eksctl version
0.16.0
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.10-eks-bac369", GitCommit:"bac3690554985327ae4d13e42169e8b1c2f37226", GitTreeState:"clean", BuildDate:"2020-02-21T23:37:18Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Logs
[ℹ] deploying stack "eksctl-eksworkshop-eksctl-nodegroup-nodegroup"
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-nodegroup-nodegroup"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::EKS::Nodegroup/ManagedNodeGroup: CREATE_FAILED – "Nodegroup nodegroup failed to stabilize: Internal Failure"
[ℹ] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-west-2 --name=eksworkshop-eksctl'
[✖] waiting for CloudFormation stack "eksctl-eksworkshop-eksctl-nodegroup-nodegroup": ResourceNotReady: failed waiting for successful resource state
I tried this with your config and the cluster and nodegroup creation were successful.
"Nodegroup nodegroup failed to stabilize: Internal Failure"
The error suggests it might be a temporary failure. Can you retry and see if you still get that error?
Since you're from Amazon, is it possible you're trying this from an AWS account that's whitelisted for internal/beta features and it's failing because of that?
Since this was probably a temporary failure on AWS' side I am closing this issue but please feel free to reopen it if the issue wasn't resolved.
I'm having this same problem consistently today (four times, I think? lost count, really). The same YAML config was working several days ago without a hitch. See the redacted file (substituting MY_SERVICE and MY_ACCT_NO) below:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: MY_SERVICE
region: us-east-1
managedNodeGroups:
- name: MY_SERVICE-node-group
instanceType: t3.medium
desiredCapacity: 2
minSize: 2
maxSize: 3
labels:
app: MY_SERVICE
actorSystemName: MY_SERVICE
ssh:
allow: true
availabilityZones:
- us-east-1a
- us-east-1f
The relevant portion of the output:
[ℹ] deploying stack "eksctl-MY_SERVICE-nodegroup-MY_SERVICE-node-group"
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-MY_SERVICE-nodegroup-MY_SERVICE-node-group"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::EKS::Nodegroup/ManagedNodeGroup: CREATE_FAILED – "Nodegroup MY_SERVICE-node-group failed to stabilize: Internal Failure"
[ℹ] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=MY_SERVICE'
[✖] waiting for CloudFormation stack "eksctl-MY_SERVICE-nodegroup-MY_SERVICE-node-group": ResourceNotReady: failed waiting for successful resource state
I am having the same issue and it is still not resolved. The same cluster spec worked before, and I have tried it multiple times today and hitting the same issue. I have also tried with different regions and different availability zones assuming it is a capacity issue.
[ℹ] building cluster stack "eksctl-MY_EKS-cluster"
[ℹ] deploying stack "eksctl-MY_EKS-cluster"
[ℹ] building managed nodegroup stack "eksctl-MY_EKS-nodegroup-MY_NODEGROUP"
[ℹ] deploying stack "eksctl-MY_EKS-nodegroup-MY_NODEGROUP"
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-MY_EKS-nodegroup-MY_NODEGROUP"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::EKS::Nodegroup/ManagedNodeGroup: CREATE_FAILED – "Nodegroup MY_NODEGROUP failed to stabilize: Internal Failure"
[ℹ] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=MY_EKS'
[✖] waiting for CloudFormation stack "eksctl-MY_EKS-nodegroup-MY_NODEGROUP": ResourceNotReady: failed waiting for successful resource state
I have the same issue , it does not work with --managed , it works fine when I remove --managed.
eksctl create cluster --ssh-access --ssh-public-key=k8s-workshop --nodegroup-name airflowworkers --node-type t3.large --asg-access --vpc-private-subnets=subnet-0d578488c9c110198,subnet-0e8746acbeaad9f9a --vpc-public-subnets=subnet-0d69219ca76278fe1,subnet-0b3cf06c6e8a70a18 --managed
I am also not able to create a node group in existing cluster .
eksctl create nodegroup --cluster=mymir-eks --name=kafka-workers --region us-east-1 --nodes-min 3 --nodes-max 4 --ssh-access --ssh-public-key "C:/Users/desind/kafka.pub" --managed
2020-05-05T16:34:44-04:00 [✖] AWS::EKS::Nodegroup/ManagedNodeGroup: CREATE_FAILED – "Nodegroup kafka-stateful failed to stabilize: Internal Failure"
what is "internal failure"
Confirmed. Fails with managed node groups, works with non-managed ones.
I experienced the same issue on eksctl 0.16.0 and found it was no longer an issue after upgrading to 0.18.0
I believe it was related to this PR: https://github.com/weaveworks/eksctl/pull/2002
You can see if you are having the same problem as me by trying to manually create a managed node group in the EKS console using the same settings as eksctl would have done. The error message regarding not being able to assign IP addresses in the chosen subnets is far more useful than the CF error.
Kind of frustrating but I had the same behavior. Failed to stabilize with 0.16 but worked with 0.18.
Huh. I didn't even know 0.18 was out.
Thanks @miguel-aguirre-iBlocks !
This is then because of the new defaults for the network settings "AssignPublicIpOnLaunch". This setting must now be enabled in every public subnet since it will not take effect when it is set in the nodegroup (that's the legacy settings). This has been the new default since eksctl 0.17.0
@chaochn-amazon can you confirm that you can create managed nodegroups with eksctl 0.17.0 or higher?
Still facing the same issue with eksctl 0.30.0. Does anyone know the workaround?
@shrutilamba Can you please post the exact commands or configs you're using as well as the output?
@michaelbeaumont I am using the following config:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: eks-managed-cluster
region: ap-south-1
vpc:
id: "vpc-xxxxxxxxxxxxxx"
subnets:
private:
ap-south-1a:
id: "subnet-xxxxxxxxxxxx"
ap-south-1b:
id: "subnet-xxxxxxxxxxxx"
ap-south-1c:
id: "subnet-xxxxxxxxxxxxx"
public:
ap-south-1a:
id: "subnet-xxxxxxxxxxxxxx"
ap-south-1b:
id: "subnet-xxxxxxxxxxxxx"
ap-south-1c:
id: "subnet-xxxxxxxxxxxxxx"
managedNodeGroups:
Its giving me this error:
[✖] AWS::EKS::Nodegroup/ManagedNodeGroup: CREATE_FAILED – "Nodegroup hs-eks-new-automation-testing failed to stabilize: []"
@shrutilamba your overrideBootstrapCommand is using the wrong cluster name. The first argument to the bootstrap script should be the cluster name.
overrideBootstrapCommand: |
/etc/eks/bootstrap.sh eks-managed-cluster
@cPu1 thanks. Changing the cluster name worked!
Its January 16, 2021 and I am having the same struggles listed above and I attempted to create node groups in my EKS clusters both from the eksctl CLI and from the EKS console. I get the same results. I managed to get one node group to work after setting auto assign Ipv4 in my subnets. However, every other time it seems to fail now and even failed on the first attempt when I made those modifications, yet worked 6 hours later.
Is there something wrong with my region "us-east-1" in generation node groups? EVery tutorial and person I talk to says this should work like 1-2-3, but it goes for 25 minutes and then just bombs out with errors for the node group.
Also having issues creating Managed NodeGroups using a YAML file:
2021-01-17T13:27:19+02:00 [â–¶] nodegroups = []
I am having the same issue - can't create a managed node group on a subnet without enabling public IP address on the subnet. Any help is appreciated.
@jontiefer @tewner @kellen-dunham There are countless reasons node groups may fail to create. Please create separate issues and provide full configs or commands
@michaelbeaumont I have an open issue which has the config and details you're needing https://github.com/weaveworks/eksctl/issues/2834. The bottom line is that as far as I know the AWS subnet has to have "Assign public IPV4" enabled, otherwise node groups can't be created in them or if this setting is disabled after the node group is created the node group becomes "Degraded" with an error similar to the one referenced in these issues.
Most helpful comment
I have the same issue , it does not work with --managed , it works fine when I remove --managed.
eksctl create cluster --ssh-access --ssh-public-key=k8s-workshop --nodegroup-name airflowworkers --node-type t3.large --asg-access --vpc-private-subnets=subnet-0d578488c9c110198,subnet-0e8746acbeaad9f9a --vpc-public-subnets=subnet-0d69219ca76278fe1,subnet-0b3cf06c6e8a70a18 --managed