Terraform-aws-eks: desired capacity update does not work for node groups

Created on 16 Apr 2020 · 18Comments · Source: terraform-aws-modules/terraform-aws-eks

desired capacity update does not work for node groups

I'm submitting an issue, where I have tried to update min, max, desired variables for node groups. The terraform does shows min and max being changed, however the desired does not updated

[ *] bug report
[ ] feature request
[ ] support request - read the FAQ first!
[ ] kudos, thank you, warm fuzzy

What is the current behavior?

terraform code.

node_groups = {
    eks_nodegroup = {
      desired_capacity = 2
      max_capacity     = 4
      min_capacity     = 2

      instance_type = var.instance_type
      k8s_labels = {
        Environment = "sbx"
      }
      additional_tags = {
        "k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
        "k8s.io/cluster-autoscaler/enabled" = "true"
      }
    }
  }

output from plan and apply:

  ~ resource "aws_eks_node_group" "workers" {
.
.
.
.
.
      ~ scaling_config {
            desired_size = 1
          ~ max_size     = 3 -> 4
          ~ min_size     = 1 -> 2
        }
}

Error: error updating EKS Node Group (ce-eks-sbx:ce-eks-sbx-eks_nodegroup-lenient-blowfish) config: InvalidParameterException: Minimum capacity 2 can't be greater than desired size 1
{
ClusterName: "test-eks-sbx",
Message_: "Minimum capacity 2 can't be greater than desired size 1",
NodegroupName: "ce-eks-sbx-eks_nodegroup-lenient-blowfish"
}

i have also tried updating desired capacity through node_groups_defaults

If this is a bug, how to reproduce? Please include a code sample if relevant.

change the min, max and desired capacity

What's the expected behavior?

new scaling policies should take place.

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Affected module version:
OS:
Terraform version:

Any other relevant info

Source

amitsehgal

👍23

Most helpful comment

The practical use case that I have for this is that if I set a managed node group to desired 3, max 6, min 3, cluster autoscaler will respect this. There isn't a technical reason why we can't change the min_size, nor should it be dismissed as "not a feature"

So some concrete examples, since this has been a bit of noisy thread.

Here's an example initial definition of a scaling config as passed through node_groups in the eks cluster module:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 1
}

Then update this nodegroup's minimum to:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 3
}

You'll get an error like:

Error: error updating EKS Node Group (eks_cluster_1:compute_1) config: InvalidParameterException: Minimum capacity 3 can't be greater than desired size 1
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "<requestID>"
  },
  ClusterName: "eks_cluster_1",
  Message_: "Minimum capacity 3 can't be greater than desired size 1",
  NodegroupName: "compute_1"
}

Then running a state show, shows the obvious:

terraform state show module.eks_cluster_1.module.node_groups.aws_eks_node_group.workers[\"compute_1\"] | grep -A4 scaling_config
    scaling_config {
        desired_size = 1
        max_size     = 6
        min_size     = 1
    }

So this means that from a terraform perspective desired_capacity which translates into the scaling_config.desired_size is immutable. Which also means that desired_capacity can never be > the initial desired_capacity and min_capacity is effectively limited by this while you can still happily raise the max_capacity.

There's ways to work around this, such as getting the ASG id from the module and modifying it in terraform as part of the workflow, but that's a hack at best.

elebertus on 23 Sep 2020

👍5

All 18 comments

it was caused by https://github.com/terraform-aws-modules/terraform-aws-eks/pull/691 it works in 8.0.0

karlderkaefer on 25 Apr 2020

it was caused by #691 it works in 8.0.0

@karlderkaefer
What do you mean by 8.0.0?

meysammeisam on 27 Apr 2020

I mean the git tag https://github.com/terraform-aws-modules/terraform-aws-eks/tree/v8.0.0

karlderkaefer on 27 Apr 2020

desired_capacity doesn't work for me as well. terraform-aws-eks version installed is v11.1.0

mandeburka on 30 Apr 2020

👍4 👀1

I'm encountering this issue in v11.1.0 as well. I see the significance of #691 however the present issue prevents node group resizing as the author points out.

kuritonasu on 5 May 2020

https://github.com/terraform-aws-modules/terraform-aws-eks/issues/681
https://github.com/terraform-aws-modules/terraform-aws-eks/issues/678
https://github.com/terraform-aws-modules/terraform-aws-eks/issues/510#issuecomment-531700442

🙂

max-rocket-internet on 5 May 2020

👍3

TLDR; you should be using the cluster-autoscaler. If not, you need to make the change manually.

max-rocket-internet on 5 May 2020

👎6 😕2

Consider the case where autoscaling is not desired but still want to resize my node group. And we do not wish to resize manually through the console, for the usual reasons. I don't believe this scenario is uncommon.

Also to be clear, I don't believe _desired_ should be modified by this module by default as this could cause confusion and undesirable consequences. I am not arguing against #691, however there should be a way to override this behaviour.

kuritonasu on 6 May 2020

Sure but the problem is that there is no way to have optional lifecycle on resources, therefor we choose to support the most common option.

max-rocket-internet on 6 May 2020

TLDR; you should be using the cluster-autoscaler. If not, you need to make the change manually.

@max-rocket-internet - Did you mean configure cluster-autoscaler within this module? If so, how? I dont see it in examples.

Also, I provisioned my node group with the following value

      desired_capacity = 4
      max_capacity     = 10
      min_capacity     = 4

At what point does the capacity go beyond 4? I tried to standup a t2.micro instance and try to scale pods beyond t2.micro's capacity. My cluster does not scale up to have more nodes.

PS: I am using V 12.2.0 of this module.

dmanchikalapudi on 27 Jul 2020

👀1

Did anyone here get the nodes to scale properly? I cannot get it to work no matter what I do.

dmanchikalapudi on 29 Jul 2020

@max-rocket-internet - Did you mean configure cluster-autoscaler within this module? If so, how? I dont see it in examples.

@dmanchikalapudi cluster-autoscaler is not connected to nor can be configured through this module.

At what point does the capacity go beyond 4? I tried to standup a t2.micro instance and try to scale pods beyond t2.micro's capacity. My cluster does not scale up to have more nodes.

The desired_capacity value is ignored by the module. You have to modify it by hand through the console.

kuritonasu on 29 Jul 2020

Thanks for the response @kuritonasu. Doing it by hand pretty much negates the idea behind "managed" nodegroups. There is no point in defining the min/max node counts either. It is just an illusion of autoscaling.

My need is simple. When my replicasets scale to initialize more pods than the nodes can run with, I need the nodes to scale to accommodate (assuming there is HW capacity underneath and is within the max node count). How do I go about making that happen via terraform?

dmanchikalapudi on 29 Jul 2020

cluster-autoscaler is not connected to nor can be configured through this module.

Correct ✅

Doing it by hand pretty much negates the idea behind "managed" nodegroups.

Perhaps his doc might help you to see what is "managed" and what is not, specifically this image:

mng

There is no point in defining the min/max node counts either. It is just an illusion of autoscaling.

I wouldn't say it's an illusion, it's just not a "turn-key" thing. ASGs have been around for years and work very well when configured correctly 🙂

My need is simple. When my replicasets scale to initialize more pods than the nodes can run with, I need the nodes to scale to accommodate (assuming there is HW capacity underneath and is within the max node count). How do I go about making that happen via terraform?

This is how typical autoscaling works in k8s but this module is only for the AWS resources. The cluster-autoscaler runs in your cluster and is not supported by us or this module in any way, it's a completely separate thing. But there is some doc here that might help you: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/autoscaling.md

max-rocket-internet on 30 Jul 2020

@max-rocket-internet I'd say, that "managed" in the broader sense together with Terraform also means scaling the worker nodes by setting the desired size as I'm also managing the VPC configuration with Terraform (and everything in between actually :) )

So IMHO, IF setting the desired size is possible through the API it SHOULD be supported by this ressource.

dploeger on 1 Sep 2020

So some concrete examples, since this has been a bit of noisy thread.

Here's an example initial definition of a scaling config as passed through node_groups in the eks cluster module:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 1
}

Then update this nodegroup's minimum to:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 3
}

You'll get an error like:

Error: error updating EKS Node Group (eks_cluster_1:compute_1) config: InvalidParameterException: Minimum capacity 3 can't be greater than desired size 1
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "<requestID>"
  },
  ClusterName: "eks_cluster_1",
  Message_: "Minimum capacity 3 can't be greater than desired size 1",
  NodegroupName: "compute_1"
}

Then running a state show, shows the obvious:

terraform state show module.eks_cluster_1.module.node_groups.aws_eks_node_group.workers[\"compute_1\"] | grep -A4 scaling_config
    scaling_config {
        desired_size = 1
        max_size     = 6
        min_size     = 1
    }

There's ways to work around this, such as getting the ASG id from the module and modifying it in terraform as part of the workflow, but that's a hack at best.

elebertus on 23 Sep 2020

👍5

One hacky workaround that I have found works is you can specify a different instance size which will then force a totally new node group to be created which will then respect your (new, "initial") desired_capacity setting. I sure any other hack which forces a new node group to be created would work as well.

I agree with many of the other thread comments, it really feels odd that desired_capacity is not actually mutable by terraform. That said I do not have a clear picture of what the aws interface is like - I'm sure it's easier said than done!

rsmets on 30 Nov 2020

I hacked it for now by using the value for the desired capacity in place of minimum capacity. At least if that's not a problem for your design, it works.

worker_groups = [
{
name = "worker-group-1"
key_name = var.worker_ssh_key_name
instance_type = var.worker_instance_type
asg_desired_capacity = var.worker_asg_desired_capacity
asg_max_size = var.worker_asg_max_size
asg_min_size = var.worker_asg_desired_capacity