I'm currently getting an obscure error when running a create cluster command and I'm wondering if the changes since 0.1.1 will fix it. I searched for open issues around the unexpected status "ROLLBACK_COMPLETE" but didn't see anything yet.
It looks like there have been significant changes since the 0.1.1 release 10 days ago. Can we get a 0.1.2 release cut?
(Note: Foo name is edited in below to replace my real names.)
$ eksctl create cluster --name=Foo --node-type=m5.large --nodes=3 --nodes-min=3 --nodes-max=5 --ssh-public-key=foo-eks --region=us-east-1 --zones=us-east-1a,us-east-1b,us-east-1d
2018-09-10T14:30:41-04:00 [鈩筣 SSH public key file "foo-eks" does not exist; will assume existing EC2 key pair
2018-09-10T14:30:41-04:00 [鈩筣 found EC2 key pair "foo-eks"
2018-09-10T14:30:41-04:00 [鈩筣 creating EKS cluster "Foo" in "us-east-1" region
2018-09-10T14:30:41-04:00 [鈩筣 creating VPC stack "EKS-Foo-VPC"
2018-09-10T14:30:41-04:00 [鈩筣 creating ServiceRole stack "EKS-Foo-ServiceRole"
2018-09-10T14:31:22-04:00 [鉁擼 created ServiceRole stack "EKS-Foo-ServiceRole"
2018-09-10T14:32:04-04:00 [鉁擼 created VPC stack "EKS-Foo-VPC"
2018-09-10T14:32:04-04:00 [鈩筣 creating ControlPlane stack "EKS-Foo-ControlPlane"
2018-09-10T14:32:30-04:00 [鈩筣 an error has occurred and cluster hasn't beend created properly
2018-09-10T14:32:30-04:00 [鈩筣 to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=Foo'
2018-09-10T14:32:30-04:00 [鉁朷 unexpected status "ROLLBACK_COMPLETE" while creating CloudFormation stack "EKS-Foo-ControlPlane"
2018-09-10T14:32:30-04:00 [鈩筣 creating DefaultNodeGroup stack "EKS-Foo-DefaultNodeGroup"
Running the suggested delete command also errors out giving a warning about the kubeconfig file and I'm not sure that it's always cleaning up after itself in CloudFormation / VPC.
$ eksctl delete cluster --region=us-east-1 --name=Foo
2018-09-10T14:41:06-04:00 [鈩筣 deleting EKS cluster "Foo"
2018-09-10T14:41:12-04:00 [!] as you are not using the auto-generated kubeconfig file you will need to remove the details of cluster Foo manually
2018-09-10T14:41:12-04:00 [鉁擼 all EKS cluster "Foo" resource will be deleted (if in doubt, check CloudFormation console)
Note: I'm not sure about what's wrong with the kubeconfig file. I have an existing kubeconfig file from using Docker for Mac and GKE. If eksctl is adding its own entry or modifying the file or something, I haven't changed anything there related to eksctl myself.
When I run kubectl config get-contexts and kubectl config get-clusters, I don't see anything related to EKS or eksctl in the output.
I just found this in the CloudFormation logs under view failure event details.
15:07:45 UTC-0400 | CREATE_FAILED | AWS::EKS::Cluster | EKSCluster | Cannot create cluster '*' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1b, us-east-1c, us-east-1d (Service: AmazonEKS; Status Code: 400; Error Code: UnsupportedAvailabilityZoneException; Request ID: *)
I think I'm just experiencing the availability zones issue on the AWS side that's already documented (like https://github.com/weaveworks/eksctl/issues/75), not something with eksctl. I misread the original docs here and was hardcoding to the zones string in the docs.
Yes, this is availability issue that occurs in us-east-1, but had been seen in us-west-2, so I recommend using us-west-2.
The issue that we should fix is #190. Actually, what happen is that with the switching to CloudFormation in 0.1.1 (#126), we non-intentially hidden the error message away.
I think the following outstanding PRs should land for 0.1.2: #202, #201, #192.
This makes sense and sounds good. Looking forward to it!
Should we keep this issue open for the release or close it out?
Yes, let's closing as release status is now tracked in https://github.com/weaveworks/eksctl/milestone/7.