What happened?
When creating a Private Cluster with user-supplied VPC, if the subnets use the same RouteTable, cluster creation fails with the following error.
$ eksctl create cluster -f cluster.yaml
[ℹ] eksctl version 0.24.0
[ℹ] using region us-west-2
[✔] using existing VPC (vpc-XXX...XXX) and subnets (private:[subnet-XXX...XXX subnet-XXX...XXX subnet-XXX...XXX] public:[])
[!] custom VPC/subnets will be used; if resulting cluster doesn't function as expected, make sure to review the configuration of VPC/subnets
[ℹ] using Kubernetes version 1.16
[ℹ] creating EKS cluster "private-cluster" in "us-west-2" region with
[ℹ] will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
[ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
[ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=private-cluster'
[ℹ] CloudWatch logging will not be enabled for cluster "private-cluster" in "us-west-2"
[ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=us-west-2 --cluster=private-cluster'
[ℹ] Kubernetes API endpoint access will use provided values {publicAccess=true, privateAccess=true} for cluster "private-cluster" in "us-west-2"
[ℹ] 2 sequential tasks: { create cluster control plane "private-cluster", update cluster VPC endpoint access configuration }
[ℹ] building cluster stack "eksctl-private-cluster-cluster"
[ℹ] deploying stack "eksctl-private-cluster-cluster"
[✖] unexpected status "ROLLBACK_COMPLETE" while waiting for CloudFormation stack "eksctl-private-cluster-cluster"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[!] AWS::EC2::SecurityGroup/ClusterSharedNodeSecurityGroup: DELETE_IN_PROGRESS
[!] AWS::IAM::Role/ServiceRole: DELETE_IN_PROGRESS
[✖] AWS::EC2::SecurityGroup/ClusterSharedNodeSecurityGroup: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::IAM::Role/ServiceRole: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::SecurityGroup/ControlPlaneSecurityGroup: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::VPCEndpoint/VPCEndpointS3: CREATE_FAILED – "Property RouteTableIds contains duplicate values."
[!] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-west-2 --name=private-cluster'
[✖] waiting for CloudFormation stack "eksctl-private-cluster-cluster": ResourceNotReady: failed waiting for successful resource state
Error: failed to create cluster "private-cluster"
This is because the same RouteTable Ids are output to RouteTableIds of VPCEndpointS3 in the generated CloudFormation template.
...
"VPCEndpointS3": {
"Type": "AWS::EC2::VPCEndpoint",
"Properties": {
"RouteTableIds": [
"rtb-AAA...AAA",
"rtb-AAA...AAA",
"rtb-AAA...AAA"
],
"ServiceName": "com.amazonaws.us-west-2.s3",
"VpcEndpointType": "Gateway",
"VpcId": "vpc-XXX...XXX"
}
},
...
What you expected to happen?
Private Cluster creation succeed when subnets use the same RouteTable.
How to reproduce it?
1. Prepare the configuration file
Use the following configuration file "cluster.yaml".
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: private-cluster1
region: us-west-2
privateCluster:
enabled: true
vpc:
subnets:
private:
us-west-2a:
id: subnet-aaaa
us-west-2b:
id: subnet-bbbb
us-west-2c:
id: subnet-cccc
Subnets (subnet-aaaa, subnet-bbbb, subnet-cccc) use the same route table.
2. execute the following eksctl command
eksctl create cluster -f cluster.yaml
As a result of the above execution, the issue can be reproduced.
Versions
$ eksctl version
0.24.0
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T23:30:39Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.6-eks-4e7f64", GitCommit:"4e7f642f9f4cbb3c39a4fc6ee84fe341a8ade94c", GitTreeState:"clean", BuildDate:"2020-06-11T13:55:35Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Hi
I am impacted by this bug when I tried to build a private cluster using 0.25.0 version.Any ETA when it will be fixed?.
I also tried to build eksctl with latest git clone and it build with a binary of eksctl version
0.26.0-dev+100e9d0b.2020-08-05T16:10:12Z
Then I tried again as here is says this bug is closed, but I still this issue,
[ℹ] deploying stack "eksctl-lakedev-cdml-private-cluster-cluster"
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-lakedev-cdml-private-cluster-cluster"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::EC2::SecurityGroup/ClusterSharedNodeSecurityGroup: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::SecurityGroup/ControlPlaneSecurityGroup: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::VPCEndpoint/VPCEndpointS3: CREATE_FAILED – "route table rtb-09ec7cd1e8effc6cf already has a route with destination-prefix-list-id pl-68a54001 (Service: AmazonEC2; Status Code: 400; Error Code: RouteAlreadyExists; Request ID: 9815caa0-c444-46fc-9116-624b749e477a)"
[!] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-west-2 --name=lakedev-cdml-private-cluster'
[✖] waiting for CloudFormation stack "eksctl-lakedev-cdml-private-cluster-cluster": ResourceNotReady: failed waiting for successful resource state
Here's the describe stack events
{
"StackId": "arn:aws:cloudformation:us-west-2:701843826545:stack/eksctl-lakedev-cdml-private-cluster-cluster/24896a60-d737-11ea-9211-02e34c33f014",
"EventId": "ClusterSharedNodeSecurityGroup-CREATE_FAILED-2020-08-05T16:17:31.277Z",
"StackName": "eksctl-lakedev-cdml-private-cluster-cluster",
"LogicalResourceId": "ClusterSharedNodeSecurityGroup",
"PhysicalResourceId": "eksctl-lakedev-cdml-private-cluster-cluster-ClusterSharedNodeSecurityGroup-18UQ4X25KGARB",
"ResourceType": "AWS::EC2::SecurityGroup",
"Timestamp": "2020-08-05T16:17:31.277Z",
"ResourceStatus": "CREATE_FAILED",
"ResourceStatusReason": "Resource creation cancelled",
"ResourceProperties": "{\"GroupDescription\":\"Communication between all nodes in the cluster\",\"VpcId\":\"vpc-0d02c75fe677fd1c6\",\"Tags\":[{\"Value\":\"eksctl-lakedev-cdml-private-cluster-cluster/ClusterSharedNodeSecurityGroup\",\"Key\":\"Name\"}]}"
},
{
"StackId": "arn:aws:cloudformation:us-west-2:701843826545:stack/eksctl-lakedev-cdml-private-cluster-cluster/24896a60-d737-11ea-9211-02e34c33f014",
"EventId": "ControlPlaneSecurityGroup-CREATE_FAILED-2020-08-05T16:17:31.212Z",
"StackName": "eksctl-lakedev-cdml-private-cluster-cluster",
"LogicalResourceId": "ControlPlaneSecurityGroup",
"PhysicalResourceId": "eksctl-lakedev-cdml-private-cluster-cluster-ControlPlaneSecurityGroup-B5WD4SE2S17U",
"ResourceType": "AWS::EC2::SecurityGroup",
"Timestamp": "2020-08-05T16:17:31.212Z",
"ResourceStatus": "CREATE_FAILED",
"ResourceStatusReason": "Resource creation cancelled",
"ResourceProperties": "{\"GroupDescription\":\"Communication between the control plane and worker nodegroups\",\"VpcId\":\"vpc-0d02c75fe677fd1c6\",\"Tags\":[{\"Value\":\"eksctl-lakedev-cdml-private-cluster-cluster/ControlPlaneSecurityGroup\",\"Key\":\"Name\"}]}"
},
{
"StackId": "arn:aws:cloudformation:us-west-2:701843826545:stack/eksctl-lakedev-cdml-private-cluster-cluster/24896a60-d737-11ea-9211-02e34c33f014",
"EventId": "VPCEndpointS3-CREATE_FAILED-2020-08-05T16:17:27.802Z",
"StackName": "eksctl-lakedev-cdml-private-cluster-cluster",
"LogicalResourceId": "VPCEndpointS3",
"PhysicalResourceId": "",
"ResourceType": "AWS::EC2::VPCEndpoint",
"Timestamp": "2020-08-05T16:17:27.802Z",
"ResourceStatus": "CREATE_FAILED",
"ResourceStatusReason": "route table rtb-09ec7cd1e8effc6cf already has a route with destination-prefix-list-id pl-68a54001 (Service: AmazonEC2; Status Code: 400; Error Code: RouteAlreadyExists; Request ID: 9815caa0-c444-46fc-9116-624b749e477a)",
"ResourceProperties": "{\"VpcId\":\"vpc-0d02c75fe677fd1c6\",\"RouteTableIds\":[\"rtb-09ec7cd1e8effc6cf\"],\"ServiceName\":\"com.amazonaws.us-west-2.s3\",\"VpcEndpointType\":\"Gateway\"}"
},
Our VPC does have private subnets with same route table and with that vpc s3 endpoint gateway as one of the routes.
Please suggest any workaround if there's one?.
Thanks
Lucky
@lkr2des The commit mentioned above your comment fixed this issue a week ago.
The error message you're getting isn't the same one from the OP so this may be a completely different bug. Can you:
0.24.0 which was released _before_ this was fixed and see if you're getting the same error as the OP?@hiraken-w Can you confirm your fix did fix the issue for you?
Hi @michaelbeaumont
Apologies if this is not the same issue.
I tried with 0.24 and I am still seeing same issue
[ℹ] eksctl version 0.24.0
[ℹ] using region us-west-2
[✔] using existing VPC (vpc-0d02c75fe677fd1c6) and subnets (private:[subnet-0cdb44b4eeb37ec32 subnet-03ae58f6baa404802] public:[])
[!] custom VPC/subnets will be used; if resulting cluster doesn't function as expected, make sure to review the configuration of VPC/subnets
[ℹ] nodegroup "ng-1" will use "ami-037843f6aeb12e236" [AmazonLinux2/1.17]
[ℹ] using EC2 key pair "lakedevecs-keypair"
[ℹ] using Kubernetes version 1.17
[ℹ] creating EKS cluster "lakedev-cdml-private-cluster" in "us-west-2" region with un-managed nodes
[ℹ] 1 nodegroup (ng-1) was included (based on the include/exclude rules)
[ℹ] will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
[ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
[ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=lakedev-cdml-private-cluster'
[ℹ] Kubernetes API endpoint access will use provided values {publicAccess=true, privateAccess=true} for cluster "lakedev-cdml-private-cluster" in "us-west-2"
[ℹ] 2 sequential tasks: { create cluster control plane "lakedev-cdml-private-cluster", 2 sequential sub-tasks: { 3 sequential sub-tasks: { tag cluster, update CloudWatch logging configuration, update cluster VPC endpoint access configuration }, create nodegroup "ng-1" } }
[ℹ] building cluster stack "eksctl-lakedev-cdml-private-cluster-cluster"
[ℹ] deploying stack "eksctl-lakedev-cdml-private-cluster-cluster"
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-lakedev-cdml-private-cluster-cluster"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::EC2::SecurityGroup/ControlPlaneSecurityGroup: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::SecurityGroup/ClusterSharedNodeSecurityGroup: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::VPCEndpoint/VPCEndpointS3: CREATE_FAILED – "Property RouteTableIds contains duplicate values."
[!] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-west-2 --name=lakedev-cdml-private-cluster'
[✖] waiting for CloudFormation stack "eksctl-lakedev-cdml-private-cluster-cluster": ResourceNotReady: failed waiting for successful resource state
Basically I am trying to setup this cluster with a node group on 2 private subnets in us-west-2a and us-west-2b, but both the subnets have the same routetable Id which does have a s3 endpoint as one of the destination defined.
Here's the cluster.yml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: lakedev-cdml-private-cluster
region: us-west-2
version: '1.17'
tags:
privateCluster:
enabled: true
additionalEndpointServices:
- "autoscaling"
vpc:
id: "vpc-0d02c75fe677fd1c6"
cidr: "172.23.0.0/18"
subnets:
private:
us-west-2a:
id: "subnet-03ae58f6baa404802"
cidr: "172.23.2.0/24"
us-west-2b:
id: "subnet-0cdb44b4eeb37ec32"
cidr: "172.23.18.0/24"
iam:
serviceRoleARN: "arn:aws:iam::701843826545:role/EKSClusterRole"
nodeGroups:
- name: ng-1
privateNetworking: true
desiredCapacity: 2
minSize: 0
maxSize: 3
volumeSize: 100
volumeType: gp2
instanceType: m5a.2xlarge
availabilityZones: ["us-west-2a", "us-west-2b"]
labels: {role: worker-node}
tags:
k8s.io/cluster-autoscaler/node-template/label/lifecycle: OnDemand
k8s.io/cluster-autoscaler/node-template/label/aws.amazon.com/spot: "false"
k8s.io/cluster-autoscaler/node-template/label/gpu-count: "0"
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/kubeflow-us-west-2: "owned"
ssh:
allow: true
publicKeyName: 'lakedevecs-keypair'
securityGroups:
withShared: true
withLocal: true
attachIDs: ['sg-00bfbe5c440fa5eb8', 'sg-0ef52b744df998490']
iam:
instanceProfileARN: "arn:aws:iam::701843826545:instance-profile/EKSWorkersInstanceProfile-OZBYVL0UV03F"
instanceRoleARN: "arn:aws:iam::701843826545:role/eksworkerinstancerole"
Thanks
Lucky
Notice you're getting a completely different error with 0.24 as in your first comment.
First error on master:
AWS::EC2::VPCEndpoint/VPCEndpointS3: CREATE_FAILED – "route table rtb-09ec7cd1e8effc6cf already has a route with destination-prefix-list-id pl-68a54001 (Service: AmazonEC2; Status Code: 400; Error Code: RouteAlreadyExists; Request ID: 9815caa0-c444-46fc-9116-624b749e477a)"
With 0.24 (same as the OP):
AWS::EC2::VPCEndpoint/VPCEndpointS3: CREATE_FAILED – "Property RouteTableIds contains duplicate values."
which would suggest the PR to fix the OP didn't completely solve the problem.
Thanks @michaelbeaumont
@michaelbeaumont
Notice you're getting a completely different error with 0.24 as in your first comment.
First error on master:AWS::EC2::VPCEndpoint/VPCEndpointS3: CREATE_FAILED – "route table rtb-09ec7cd1e8effc6cf already has a route with destination-prefix-list-id pl-68a54001 (Service: AmazonEC2; Status Code: 400; Error Code: RouteAlreadyExists; Request ID: 9815caa0-c444-46fc-9116-624b749e477a)"
I could replicate the above error, too. I validated it on master.
This error occurs when the S3 VPC Endpoint Route already exists in the RouteTable that is associated with the Subnet.
My fix was not for this error.
I have the same problem
[ℹ] eksctl version 0.25.0
[ℹ] using region us-east-1
[✔] using existing VPC (vpc-xxx) and subnets (private:[subnet-xxxsubnet-xxx] public:[])
[!] custom VPC/subnets will be used; if resulting cluster doesn't function as expected, make sure to review the configuration of VPC/subnets
[ℹ] using Kubernetes version 1.17
[ℹ] creating EKS cluster "dev" in "us-east-1" region with
[ℹ] will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
[ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
[ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=dev'
[ℹ] CloudWatch logging will not be enabled for cluster "dev" in "us-east-1"
[ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=us-east-1 --cluster=dev'
[ℹ] Kubernetes API endpoint access will use provided values {publicAccess=true, privateAccess=true} for cluster "dev" in "us-east-1"
[ℹ] 2 sequential tasks: { create cluster control plane "dev", update cluster VPC endpoint access configuration }
[ℹ] building cluster stack "eksctl-dev-cluster"
[ℹ] deploying stack "eksctl-dev-cluster"
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-dev-cluster"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[✖] AWS::EC2::SecurityGroup/ControlPlaneSecurityGroup: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::IAM::Role/ServiceRole: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::SecurityGroup/ClusterSharedNodeSecurityGroup: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::VPCEndpoint/VPCEndpointS3: CREATE_FAILED – "Property RouteTableIds contains duplicate values."
[!] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=dev'
[✖] waiting for CloudFormation stack "eksctl-dev-cluster": ResourceNotReady: failed waiting for successful resource state
Error: failed to create cluster "dev"
This issue is fixed in 0.26.0.
Yes, it is fixed in 0.26.0, thank you.
Most helpful comment
This issue is fixed in
0.26.0.