Eksctl: Cluster autoscaler cannot scale mixed nodegroup on AWS

Created on 10 Jul 2019  ·  2Comments  ·  Source: weaveworks/eksctl

What happened?
Created a nodegroup with mixed spot instances, tried to set instanceType to mixed and leave it empty.
Cluster autoscaler reports the following error

...
Unable to build proper template node for eksctl-moon-kube-nodegroup-gpu-spot-ng-a-NodeGroup-17OCXU2R4F8ES: unable to find instance type within launch template
...

Looking at cluster autoscaler code, I see that it uses non-empty and valid instanceType in LaunchTemplate and fails if it's not specified (see error above)

Cluster autoscaler code

What you expected to happen?

I expect that a mixed nodegroup created with eksctl will support working with cluster autoscaler

How to reproduce it?
Create a mixed NG with spot instances. Deploy a workload that forces CA to scale up, see above error.

Anything else we need to know?
MacOS
EKS cluster
Cluster autoscaler k8s.gcr.io/cluster-autoscaler:v1.15.0

Versions
Please paste in the output of these commands:

$ eksctl version
version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.1.39"}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-20T04:49:16Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.7-eks-c57ff8", GitCommit:"c57ff8e35590932c652433fab07988da79265d5b", GitTreeState:"clean", BuildDate:"2019-06-07T20:43:03Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Logs
Include the output of the command line when running eksctl. If possible, eksctl should be run with debug logs. For example:
eksctl get clusters -v 4
Make sure you redact any sensitive information before posting.
If the output is long, please consider a Gist.

arenodegroup kinbug

Most helpful comment

This new setup is a very welcome feature! Just want to add that we ran into a similar error trying to use autoscaling from 0 nodes with this new setup (eksctl 1.40, kubernetes 1.13, autoscaler 1.13.5):

W0714 03:27:33.258081       1 aws_manager.go:194] Found multiple availability zones for ASG "eksctl-pangeo-esip-nodegroup-dask-worker-NodeGroup-6UG2LS4KNTPH"; using us-west-2a
E0714 03:27:33.258106       1 utils.go:291] Unable to build proper template node for eksctl-pangeo-esip-nodegroup-dask-worker-NodeGroup-6UG2LS4KNTPH: Unable to get instance type from launch config or launch template

node-config:

  - name: dask-worker
    minSize: 0
    maxSize: 100
    instancesDistribution:
      instanceTypes: ["r5.2xlarge", "r5a.2xlarge", "r4.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 10
      spotInstancePools: 3
    volumeSize: 100
    volumeType: gp2
    labels:
      node-role.kubernetes.io/role: dask-worker
      k8s.dask.org/node-purpose: worker
    taints:
      k8s.dask.org/dedicated: 'worker:NoSchedule'
    desiredCapacity: 0
    ami: auto
    amiFamily: AmazonLinux2
    ssh:
      publicKeyPath: eks-pangeo-esip-us-west-2.pub
    iam:
      withAddonPolicies:
          autoScaler: true
          efs: true

Might want add a bit to documentation about use with autoscaler https://eksctl.io/usage/spot-instances/.

It seems like this should work (https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#using-autoscalinggroup-mixedinstancespolicy). But it also seems like there are still many open issues (for example https://github.com/kubernetes/autoscaler/issues/1754, https://github.com/aws/containers-roadmap/issues/144 and others...)

All 2 comments

This new setup is a very welcome feature! Just want to add that we ran into a similar error trying to use autoscaling from 0 nodes with this new setup (eksctl 1.40, kubernetes 1.13, autoscaler 1.13.5):

W0714 03:27:33.258081       1 aws_manager.go:194] Found multiple availability zones for ASG "eksctl-pangeo-esip-nodegroup-dask-worker-NodeGroup-6UG2LS4KNTPH"; using us-west-2a
E0714 03:27:33.258106       1 utils.go:291] Unable to build proper template node for eksctl-pangeo-esip-nodegroup-dask-worker-NodeGroup-6UG2LS4KNTPH: Unable to get instance type from launch config or launch template

node-config:

  - name: dask-worker
    minSize: 0
    maxSize: 100
    instancesDistribution:
      instanceTypes: ["r5.2xlarge", "r5a.2xlarge", "r4.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 10
      spotInstancePools: 3
    volumeSize: 100
    volumeType: gp2
    labels:
      node-role.kubernetes.io/role: dask-worker
      k8s.dask.org/node-purpose: worker
    taints:
      k8s.dask.org/dedicated: 'worker:NoSchedule'
    desiredCapacity: 0
    ami: auto
    amiFamily: AmazonLinux2
    ssh:
      publicKeyPath: eks-pangeo-esip-us-west-2.pub
    iam:
      withAddonPolicies:
          autoScaler: true
          efs: true

Might want add a bit to documentation about use with autoscaler https://eksctl.io/usage/spot-instances/.

It seems like this should work (https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#using-autoscalinggroup-mixedinstancespolicy). But it also seems like there are still many open issues (for example https://github.com/kubernetes/autoscaler/issues/1754, https://github.com/aws/containers-roadmap/issues/144 and others...)

Closed via #1013.

Was this page helpful?
0 / 5 - 0 ratings