Eksctl: Nodegroups not joining the cluster

Created on 8 Sep 2019  路  6Comments  路  Source: weaveworks/eksctl

What happened?

Using the current version of eksctl, 0.5.2, when creating two nodegroups using a config file, both nodegroups are created but just the first one joins the cluster. When checking the ConfigMap aws-auth, just one nodegroup is there.

What you expected to happen?

I would expect that all the nodegroups are added to the aws-auth ConfigMap, and therefore show up with kubectl get nodes.

How to reproduce it?

Using a 1.13 EKS cluster named development, use this config file, test.yaml to create the nodegroups with eksctl create nodegroup -f test.yaml.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: development
  region: eu-west-1
  version: "1.13"

nodeGroups:
  - name: ng-1
    instanceType: t3.small
    desiredCapacity: 2
    amiFamily: AmazonLinux2
    privateNetworking: true
  - name: ng-2
    instanceType: t3.small
    desiredCapacity: 2
    amiFamily: AmazonLinux2
    privateNetworking: true

Anything else we need to know?

OS: macOS
eksctl: Installed using brew.

This issue seems to be introduced between 0.4.1 and 0.5.2, as with 0.4.1 works fine.

Versions

$ eksctl version
[鈩筣  version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.5.2"}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:36:28Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}

$ go version
go version go1.13 darwin/amd64

Logs

  • Using eksctl 0.5.2
$ eksctl create nodegroup -f test.yaml
[鈩筣  using region eu-west-1
[鈩筣  nodegroup "ng-1" will use "ami-0199284372364b02a" [AmazonLinux2/1.13]
[鈩筣  nodegroup "ng-2" will use "ami-0199284372364b02a" [AmazonLinux2/1.13]
[鈩筣  2 nodegroups (ng-1, ng-2) were included (based on the include/exclude rules)
[鈩筣  will create a CloudFormation stack for each of 2 nodegroups in cluster "development"
[鈩筣  2 parallel tasks: { create nodegroup "ng-1", create nodegroup "ng-2" }
[鈩筣  building nodegroup stack "eksctl-development-nodegroup-ng-1"
[鈩筣  building nodegroup stack "eksctl-development-nodegroup-ng-2"
[鈩筣  --nodes-min=2 was set automatically for nodegroup ng-2
[鈩筣  --nodes-max=2 was set automatically for nodegroup ng-2
[鈩筣  --nodes-min=2 was set automatically for nodegroup ng-1
[鈩筣  --nodes-max=2 was set automatically for nodegroup ng-1
[鈩筣  deploying stack "eksctl-development-nodegroup-ng-2"
[鈩筣  deploying stack "eksctl-development-nodegroup-ng-1"
[鈩筣  adding role "arn:aws:iam::930347582273:role/eksctl-development-nodegroup-ng-NodeInstanceRole-N75PLALYFBL1" to auth ConfigMap
[鈩筣  nodegroup "ng-1" has 0 node(s)
[鈩筣  waiting for at least 2 node(s) to become ready in "ng-1"
[鈩筣  nodegroup "ng-1" has 2 node(s)
[鈩筣  node "ip-10-3-43-18.eu-west-1.compute.internal" is ready
[鈩筣  node "ip-10-3-6-25.eu-west-1.compute.internal" is ready

$ eksctl get nodegroups --cluster development
CLUSTER     NODEGROUP   CREATED         MIN SIZE    MAX SIZE    DESIRED CAPACITY    INSTANCE TYPE   IMAGE ID
development ng-1        2019-09-08T19:41:02Z    2       2       2           t3.small    ami-0199284372364b02a
development ng-2        2019-09-08T19:41:02Z    2       2       2           t3.small    ami-0199284372364b02a

$ kubectl get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-3-43-18.eu-west-1.compute.internal   Ready    <none>   93s   v1.13.8-eks-cd3eb0
ip-10-3-6-25.eu-west-1.compute.internal    Ready    <none>   93s   v1.13.8-eks-cd3eb0

$ kubectl get configmap -n kube-system aws-auth -o yaml
apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::930347582273:role/eksctl-development-nodegroup-ng-NodeInstanceRole-N75PLALYFBL1
      username: system:node:{{EC2PrivateDNSName}}
kind: ConfigMap
metadata:
  creationTimestamp: "2019-09-08T19:45:10Z"
  name: aws-auth
  namespace: kube-system
  resourceVersion: "918"
  selfLink: /api/v1/namespaces/kube-system/configmaps/aws-auth
  uid: 2b278b81-d271-11e9-987b-0a80449407fe
  • Using eksctl 0.4.1
$ eksctl create nodegroup -f test.yaml
[鈩筣  using region eu-west-1
[鈩筣  nodegroup "ng-1" will use "ami-00ac2e6b3cb38a9b9" [AmazonLinux2/1.13]
[鈩筣  nodegroup "ng-2" will use "ami-00ac2e6b3cb38a9b9" [AmazonLinux2/1.13]
[鈩筣  2 nodegroups (ng-1, ng-2) were included
[鈩筣  will create a CloudFormation stack for each of 2 nodegroups in cluster "development"
[鈩筣  2 parallel tasks: { create nodegroup "ng-1", create nodegroup "ng-2" }
[鈩筣  building nodegroup stack "eksctl-development-nodegroup-ng-2"
[鈩筣  building nodegroup stack "eksctl-development-nodegroup-ng-1"
[鈩筣  --nodes-min=2 was set automatically for nodegroup ng-1
[鈩筣  --nodes-max=2 was set automatically for nodegroup ng-1
[鈩筣  --nodes-min=2 was set automatically for nodegroup ng-2
[鈩筣  --nodes-max=2 was set automatically for nodegroup ng-2
[鈩筣  deploying stack "eksctl-development-nodegroup-ng-1"
[鈩筣  deploying stack "eksctl-development-nodegroup-ng-2"
[鈩筣  adding role "arn:aws:iam::930347582273:role/eksctl-development-nodegroup-ng-NodeInstanceRole-2RX93EXNE729" to auth ConfigMap
[鈩筣  nodegroup "ng-1" has 0 node(s)
[鈩筣  waiting for at least 2 node(s) to become ready in "ng-1"
[鈩筣  nodegroup "ng-1" has 2 node(s)
[鈩筣  node "ip-10-3-20-98.eu-west-1.compute.internal" is ready
[鈩筣  node "ip-10-3-47-125.eu-west-1.compute.internal" is ready
[鈩筣  adding role "arn:aws:iam::930347582273:role/eksctl-development-nodegroup-ng-NodeInstanceRole-14X8P4JO00HZO" to auth ConfigMap
[鈩筣  nodegroup "ng-2" has 0 node(s)
[鈩筣  waiting for at least 2 node(s) to become ready in "ng-2"
[鈩筣  nodegroup "ng-2" has 2 node(s)
[鈩筣  node "ip-10-3-41-213.eu-west-1.compute.internal" is ready
[鈩筣  node "ip-10-3-7-52.eu-west-1.compute.internal" is ready
[鉁擼  created 2 nodegroup(s) in cluster "development"
[鈩筣  checking security group configuration for all nodegroups
[鈩筣  all nodegroups have up-to-date configuration

$ eksctl get nodegroups --cluster development
CLUSTER     NODEGROUP   CREATED         MIN SIZE    MAX SIZE    DESIRED CAPACITY    INSTANCE TYPE   IMAGE ID
development ng-1        2019-09-08T19:50:26Z    2       2       2           t3.small    ami-00ac2e6b3cb38a9b9
development ng-2        2019-09-08T19:50:26Z    2       2       2           t3.small    ami-00ac2e6b3cb38a9b9

$ kubectl get nodes
NAME                                        STATUS   ROLES    AGE     VERSION
ip-10-3-20-98.eu-west-1.compute.internal    Ready    <none>   2m20s   v1.13.7-eks-c57ff8
ip-10-3-41-213.eu-west-1.compute.internal   Ready    <none>   2m7s    v1.13.7-eks-c57ff8
ip-10-3-47-125.eu-west-1.compute.internal   Ready    <none>   2m19s   v1.13.7-eks-c57ff8
ip-10-3-7-52.eu-west-1.compute.internal     Ready    <none>   2m2s    v1.13.7-eks-c57ff8

$ kubectl get configmap -n kube-system aws-auth -o yaml
apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::930347582273:role/eksctl-development-nodegroup-ng-NodeInstanceRole-2RX93EXNE729
      username: system:node:{{EC2PrivateDNSName}}
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::930347582273:role/eksctl-development-nodegroup-ng-NodeInstanceRole-14X8P4JO00HZO
      username: system:node:{{EC2PrivateDNSName}}
kind: ConfigMap
metadata:
  creationTimestamp: "2019-09-08T19:45:10Z"
  name: aws-auth
  namespace: kube-system
  resourceVersion: "1951"
  selfLink: /api/v1/namespaces/kube-system/configmaps/aws-auth
  uid: 2b278b81-d271-11e9-987b-0a80449407fe
kinbug

Most helpful comment

@jfusterm @sponrad this has been fixed in 0.5.3: https://github.com/weaveworks/eksctl/releases/tag/0.5.3. Please try it out.

I initially tried your example config with eksctl create cluster and not eksctl create nodegroup, which is why I couldn't reproduce it as this bug only affected eksctl create nodegroup.

All 6 comments

From the logs, it appears that you terminated the command before it could add ng-2's role ARN to the aws-auth ConfigMap. Could you please confirm this?
eksctl waits for the stacks to become ready and serially adds the instance role ARN to the ConfigMap, so if you terminate the command before it was able to add the role ARN, the node group won't join the cluster.

I created a new cluster using your ClusterConfig file with eksctl 0.5.2 and was able to have both node groups join the cluster (I let the command run to completion).

I am experiencing this also.

Just today upgraded eksctl to 0.5.2 and have a cluster yaml with three nodegroups defined. In the eksctl create nodegroup all three nodegroups are "created" and ec2 instances are made, but only the first one has the "added to the ConfigMap" line in the logs, and it's the only one that shows in the cluster (same as jfusterm's example above).

I'm not terminating the eksctl create nodegroup command early, but agree it does appear to be terminating early for some reason.

@cPu1 I have the same behaviour as @sponrad, I'm not terminating the command, it just stops there and I'm not getting any error whatsoever.

Executing eksctl create nodegroup -f test.yaml -v 4 I'm not getting anything either.

...
2019-09-10T06:13:39+02:00 [鈻禲  event = watch.Event{Type:"MODIFIED", Object:(*v1.Node)(0xc0000d4dc0)}
2019-09-10T06:13:39+02:00 [鈻禲  node "ip-10-3-11-232.eu-west-1.compute.internal" is ready in "ng-1"
2019-09-10T06:13:40+02:00 [鈻禲  event = watch.Event{Type:"MODIFIED", Object:(*v1.Node)(0xc0000d5080)}
2019-09-10T06:13:40+02:00 [鈻禲  node "ip-10-3-37-47.eu-west-1.compute.internal" seen in "ng-1", but not ready yet
2019-09-10T06:13:40+02:00 [鈻禲  node = v1.Node{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ip-10-3-37-47.eu-west-1.compute.internal", GenerateName:"", Namespace:"", SelfLink:"/api/v1/nodes/ip-10-3-37-47.eu-west-1.compute.internal", UID:"558e710e-d381-11e9-97e1-0a1e0ff00f04", ResourceVersion:"3938", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63703685605, loc:(*time.Location)(0x5f166e0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"alpha.eksctl.io/cluster-name":"development", "alpha.eksctl.io/instance-id":"i-07f5c2d4a22261c12", "alpha.eksctl.io/nodegroup-name":"ng-1", "beta.kubernetes.io/arch":"amd64", "beta.kubernetes.io/instance-type":"t3.small", "beta.kubernetes.io/os":"linux", "failure-domain.beta.kubernetes.io/region":"eu-west-1", "failure-domain.beta.kubernetes.io/zone":"eu-west-1c", "kubernetes.io/hostname":"ip-10-3-37-47.eu-west-1.compute.internal"}, Annotations:map[string]string{"node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1.NodeSpec{PodCIDR:"", ProviderID:"aws:///eu-west-1c/i-07f5c2d4a22261c12", Unschedulable:false, Taints:[]v1.Taint{v1.Taint{Key:"node.kubernetes.io/not-ready", Value:"", Effect:"NoSchedule", TimeAdded:(*v1.Time)(nil)}, v1.Taint{Key:"node.kubernetes.io/not-ready", Value:"", Effect:"NoExecute", TimeAdded:(*v1.Time)(0xc0008d4040)}}, ConfigSource:(*v1.NodeConfigSource)(nil), DoNotUse_ExternalID:""}, Status:v1.NodeStatus{Capacity:v1.ResourceList{"attachable-volumes-aws-ebs":resource.Quantity{i:resource.int64Amount{value:25, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"25", Format:"DecimalSI"}, "cpu":resource.Quantity{i:resource.int64Amount{value:2, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2", Format:"DecimalSI"}, "ephemeral-storage":resource.Quantity{i:resource.int64Amount{value:21462233088, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"20959212Ki", Format:"BinarySI"}, "hugepages-1Gi":resource.Quantity{i:resource.int64Amount{value:0, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"0", Format:"DecimalSI"}, "hugepages-2Mi":resource.Quantity{i:resource.int64Amount{value:0, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"0", Format:"DecimalSI"}, "memory":resource.Quantity{i:resource.int64Amount{value:2050457600, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}, "pods":resource.Quantity{i:resource.int64Amount{value:11, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"11", Format:"DecimalSI"}}, Allocatable:v1.ResourceList{"attachable-volumes-aws-ebs":resource.Quantity{i:resource.int64Amount{value:25, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"25", Format:"DecimalSI"}, "cpu":resource.Quantity{i:resource.int64Amount{value:2, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"2", Format:"DecimalSI"}, "ephemeral-storage":resource.Quantity{i:resource.int64Amount{value:19316009748, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"19316009748", Format:"DecimalSI"}, "hugepages-1Gi":resource.Quantity{i:resource.int64Amount{value:0, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"0", Format:"DecimalSI"}, "hugepages-2Mi":resource.Quantity{i:resource.int64Amount{value:0, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"0", Format:"DecimalSI"}, "memory":resource.Quantity{i:resource.int64Amount{value:1945600000, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}, "pods":resource.Quantity{i:resource.int64Amount{value:11, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"11", Format:"DecimalSI"}}, Phase:"", Conditions:[]v1.NodeCondition{v1.NodeCondition{Type:"MemoryPressure", Status:"False", LastHeartbeatTime:v1.Time{Time:time.Time{wall:0x0, ext:63703685615, loc:(*time.Location)(0x5f166e0)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63703685605, loc:(*time.Location)(0x5f166e0)}}, Reason:"KubeletHasSufficientMemory", Message:"kubelet has sufficient memory available"}, v1.NodeCondition{Type:"DiskPressure", Status:"False", LastHeartbeatTime:v1.Time{Time:time.Time{wall:0x0, ext:63703685615, loc:(*time.Location)(0x5f166e0)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63703685605, loc:(*time.Location)(0x5f166e0)}}, Reason:"KubeletHasNoDiskPressure", Message:"kubelet has no disk pressure"}, v1.NodeCondition{Type:"PIDPressure", Status:"False", LastHeartbeatTime:v1.Time{Time:time.Time{wall:0x0, ext:63703685615, loc:(*time.Location)(0x5f166e0)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63703685605, loc:(*time.Location)(0x5f166e0)}}, Reason:"KubeletHasSufficientPID", Message:"kubelet has sufficient PID available"}, v1.NodeCondition{Type:"Ready", Status:"False", LastHeartbeatTime:v1.Time{Time:time.Time{wall:0x0, ext:63703685615, loc:(*time.Location)(0x5f166e0)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63703685605, loc:(*time.Location)(0x5f166e0)}}, Reason:"KubeletNotReady", Message:"runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"}}, Addresses:[]v1.NodeAddress{v1.NodeAddress{Type:"InternalIP", Address:"10.3.37.47"}, v1.NodeAddress{Type:"Hostname", Address:"ip-10-3-37-47.eu-west-1.compute.internal"}, v1.NodeAddress{Type:"InternalDNS", Address:"ip-10-3-37-47.eu-west-1.compute.internal"}}, DaemonEndpoints:v1.NodeDaemonEndpoints{KubeletEndpoint:v1.DaemonEndpoint{Port:10250}}, NodeInfo:v1.NodeSystemInfo{MachineID:"ec20412a0093f3b815b3e554a4de4f62", SystemUUID:"EC20412A-0093-F3B8-15B3-E554A4DE4F62", BootID:"16cae243-e079-41d6-8c3b-85c02e5b1108", KernelVersion:"4.14.133-113.112.amzn2.x86_64", OSImage:"Amazon Linux 2", ContainerRuntimeVersion:"docker://18.6.1", KubeletVersion:"v1.13.8-eks-cd3eb0", KubeProxyVersion:"v1.13.8-eks-cd3eb0", OperatingSystem:"linux", Architecture:"amd64"}, Images:[]v1.ContainerImage(nil), VolumesInUse:[]v1.UniqueVolumeName(nil), VolumesAttached:[]v1.AttachedVolume(nil), Config:(*v1.NodeConfigStatus)(nil)}}
2019-09-10T06:13:45+02:00 [鈻禲  event = watch.Event{Type:"MODIFIED", Object:(*v1.Node)(0xc000541600)}
2019-09-10T06:13:45+02:00 [鈻禲  node "ip-10-3-37-47.eu-west-1.compute.internal" is ready in "ng-1"
2019-09-10T06:13:45+02:00 [鈩筣  nodegroup "ng-1" has 2 node(s)
2019-09-10T06:13:45+02:00 [鈩筣  node "ip-10-3-11-232.eu-west-1.compute.internal" is ready
2019-09-10T06:13:45+02:00 [鈩筣  node "ip-10-3-37-47.eu-west-1.compute.internal" is ready

@jfusterm @sponrad this has been fixed in 0.5.3: https://github.com/weaveworks/eksctl/releases/tag/0.5.3. Please try it out.

I initially tried your example config with eksctl create cluster and not eksctl create nodegroup, which is why I couldn't reproduce it as this bug only affected eksctl create nodegroup.

@cPu1 that fixed it for me. eksctl create nodegroup created my three nodegroups and added them all to the cluster.

Thanks so much!

Thanks @cPu1 for your quick fix! It's working right now.

Was this page helpful?
0 / 5 - 0 ratings