Eksctl: Cluster Autoscaler Addon: Support scaling up from 0

Created on 24 Oct 2019  路  11Comments  路  Source: weaveworks/eksctl

When performing a scale out, Cluster Autoscaler uses a live node in the ASG to build a template to launch new nodes. These nodes can have Kubernetes labels and taints defined.
When there are no live nodes, however, Cluster Autoscaler has to rely on the tags defined on the ASG as node-template keys to build a template for launching the nodes (see https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#scaling-a-node-group-to-0). These node-template keys must contain the Kubernetes labels and taints added to the corresponding node group in order for Cluster Autoscaler to add them to new nodes.

Problem:
eksctl only adds the tags required by Cluster Autoscaler for auto-discovery, but not the node-template keys.

areadd-ons kinimprovement prioritbacklog

Most helpful comment

Since I just raised #2263 to separately request something like nodeGroup.tagForAutoScaler to apply the CA discovery tags to nodes that don't have the CA Instance Role addon, perhaps limit this issue to "whenever the tags are added for the cluster auto-scaler, also add the node-template tags for labels and taints on this NodeGroup".

Although that was the same request as #1066, and it was closed with "just do it manually in the config"...

All 11 comments

@scottyhq thanks, I did see that before opening this issue. That issue is closed and suggests manually adding the tags to node groups. This issue is specifically for having eksctl add the node-template tags when the Cluster Autoscaler addon is enabled (iam.withAddonPolicies.autoScaler: true)

@scottyhq given eksctl already knows the labels and taints for a node group, then it can automatically generate the matching tags for the ASG. This saves labor over manual tagging and reduces the opportunity for mistakes and drift.

@cPu1 I would suggest not linking this to the iam.withAddonPolicies.autoScaler: true option, since that is the old instance role approach, and people using the auto scaler with IAM Service Accounts will not be enabling that IAM option.

I'd suggest this is a node group level option, since the auto scaler(s) may be in use on some node groups and not other, maybe nodeGroup.tagForAutoScaler: true?

IAM Service Account Refs:

@cPu1 I would suggest not linking this to the iam.withAddonPolicies.autoScaler: true option, since that is the old instance role approach

IAM Roles for Service Accounts (IRSA) is a relatively new feature, so we can't expect all users to have started using it. Moreover, it's not supported on EKS versions below 1.13 and not all tools accessing AWS services have been updated to use the newer AWS SDK that added support for it (by including it in the credential chain provider).

and people using the auto scaler with IAM Service Accounts will not be enabling that IAM option.

That's right, this feature is not for users of IRSA.

I'd suggest this is a node group level option, since the auto scaler(s) may be in use on some node groups and not other, maybe nodeGroup.tagForAutoScaler: true?

This _is_ a node group level option and isn't a new addition to the schema. nodeGroups[*].iam.withAddonPolicies.autoScaler already exists, this issue is for also adding the node-template tags when the autoScaler option is enabled.

Eh, there are two separate things:

  1. Adding auto-scaler policy to instance roles
  2. Adding label/taint tags to ASG to enable scale from zero

Making 1 and 2 separate options does not require people to use IRSA!
We can simple enable both options.

  • People not yet using IRSA will want both 1 and 2.
  • People using IRSA still need 2, but do not need or want 1, since that is the requirement IRSA aims to avoid.

Linking 1 and 2 to the same option makes 2 unavailable to IRSA users. Separate options supports both clusters that don鈥檛 use IRSA and those that do.

Why are you opposed to two options so both use cases can be supported @cPu1?

Ah, I misunderstood your point.

I agree that eksctl should be able to support the CA addon for users of IRSA, but, IMO, not by requiring non-IRSA users to supply an extra option (nodeGroup.tagForAutoScaler) to add the node-template tags. This is an internal detail of CA that users of eksctl need not be aware of, and setting a single option (iam.withAddonPolicies.autoScaler: true in the case of non-IRSA users) should do everything required to have a fully-functional CA. (I should note that, it might have been more appropriate to have the addons configuration outside of the iam field as eksctl adds not only the required IAM policies but also tags).

To support IRSA users, a similar level of support as iam.withAddonPolicies needs to be added by creating IAM roles that have all required policies attached, essentially automating iam.serviceAccounts.

I don't think it makes sense to tie this tagging to an IAM setting, regardless of your use of IRSA or not. Only the cluster autoscaler itself needs the elevated permissions. That might be granted with IRSA, or it might be granted to a specific nodegroup, or some other way (kiam). The scaling targets do not need any additional IAM permissions, and the best practice would be to not grant this permission where it is not needed.

Edit - Also, I run my CA pod on a nodegroup that doesn't scale. I have min == max so effectively it doesn't scale, but I could see where someone might want to grant the elevated permissions to a particular node group, but not want to have CA enabled on that group.

Since I just raised #2263 to separately request something like nodeGroup.tagForAutoScaler to apply the CA discovery tags to nodes that don't have the CA Instance Role addon, perhaps limit this issue to "whenever the tags are added for the cluster auto-scaler, also add the node-template tags for labels and taints on this NodeGroup".

Although that was the same request as #1066, and it was closed with "just do it manually in the config"...

Just to cross the streams, Managed Nodegroups may get the same behaviour, of copying all the taints and labels from a node onto the ASG. In the Managed Nodegroups case, all nodegroups are automatically tagged for the Cluster Autoscaler already, so effectively tagForAutoScaler is permanently true for those.

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This issue was closed because it has been stalled for 5 days with no activity.

Was this page helpful?
0 / 5 - 0 ratings