Terraform-aws-eks: desired capacity update does not work for node groups

Created on 16 Apr 2020  Â·  18Comments  Â·  Source: terraform-aws-modules/terraform-aws-eks

desired capacity update does not work for node groups

I'm submitting an issue, where I have tried to update min, max, desired variables for node groups. The terraform does shows min and max being changed, however the desired does not updated

  • [ *] bug report
  • [ ] feature request
  • [ ] support request - read the FAQ first!
  • [ ] kudos, thank you, warm fuzzy

What is the current behavior?

terraform code.

node_groups = {
    eks_nodegroup = {
      desired_capacity = 2
      max_capacity     = 4
      min_capacity     = 2

      instance_type = var.instance_type
      k8s_labels = {
        Environment = "sbx"
      }
      additional_tags = {
        "k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
        "k8s.io/cluster-autoscaler/enabled" = "true"
      }
    }
  }

output from plan and apply:

  ~ resource "aws_eks_node_group" "workers" {
.
.
.
.
.
      ~ scaling_config {
            desired_size = 1
          ~ max_size     = 3 -> 4
          ~ min_size     = 1 -> 2
        }
}

Error: error updating EKS Node Group (ce-eks-sbx:ce-eks-sbx-eks_nodegroup-lenient-blowfish) config: InvalidParameterException: Minimum capacity 2 can't be greater than desired size 1
{
ClusterName: "test-eks-sbx",
Message_: "Minimum capacity 2 can't be greater than desired size 1",
NodegroupName: "ce-eks-sbx-eks_nodegroup-lenient-blowfish"
}

i have also tried updating desired capacity through node_groups_defaults

If this is a bug, how to reproduce? Please include a code sample if relevant.

change the min, max and desired capacity

What's the expected behavior?

new scaling policies should take place.

Are you able to fix this problem and submit a PR? Link here if you have already.

No

Environment details

  • Affected module version:
  • OS:
  • Terraform version:

Any other relevant info

Most helpful comment

The practical use case that I have for this is that if I set a managed node group to desired 3, max 6, min 3, cluster autoscaler will respect this. There isn't a technical reason why we can't change the min_size, nor should it be dismissed as "not a feature"

So some concrete examples, since this has been a bit of noisy thread.

Here's an example initial definition of a scaling config as passed through node_groups in the eks cluster module:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 1
}

Then update this nodegroup's minimum to:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 3
}

You'll get an error like:

Error: error updating EKS Node Group (eks_cluster_1:compute_1) config: InvalidParameterException: Minimum capacity 3 can't be greater than desired size 1
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "<requestID>"
  },
  ClusterName: "eks_cluster_1",
  Message_: "Minimum capacity 3 can't be greater than desired size 1",
  NodegroupName: "compute_1"
}

Then running a state show, shows the obvious:

terraform state show module.eks_cluster_1.module.node_groups.aws_eks_node_group.workers[\"compute_1\"] | grep -A4 scaling_config
    scaling_config {
        desired_size = 1
        max_size     = 6
        min_size     = 1
    }

So this means that from a terraform perspective desired_capacity which translates into the scaling_config.desired_size is immutable. Which also means that desired_capacity can never be > the initial desired_capacity and min_capacity is effectively limited by this while you can still happily raise the max_capacity.

There's ways to work around this, such as getting the ASG id from the module and modifying it in terraform as part of the workflow, but that's a hack at best.

All 18 comments

it was caused by #691 it works in 8.0.0

@karlderkaefer
What do you mean by 8.0.0?

desired_capacity doesn't work for me as well. terraform-aws-eks version installed is v11.1.0

I'm encountering this issue in v11.1.0 as well. I see the significance of #691 however the present issue prevents node group resizing as the author points out.

TLDR; you should be using the cluster-autoscaler. If not, you need to make the change manually.

Consider the case where autoscaling is not desired but still want to resize my node group. And we do not wish to resize manually through the console, for the usual reasons. I don't believe this scenario is uncommon.

Also to be clear, I don't believe _desired_ should be modified by this module by default as this could cause confusion and undesirable consequences. I am not arguing against #691, however there should be a way to override this behaviour.

Sure but the problem is that there is no way to have optional lifecycle on resources, therefor we choose to support the most common option.

TLDR; you should be using the cluster-autoscaler. If not, you need to make the change manually.

@max-rocket-internet - Did you mean configure cluster-autoscaler within this module? If so, how? I dont see it in examples.

Also, I provisioned my node group with the following value

      desired_capacity = 4
      max_capacity     = 10
      min_capacity     = 4

At what point does the capacity go beyond 4? I tried to standup a t2.micro instance and try to scale pods beyond t2.micro's capacity. My cluster does not scale up to have more nodes.

PS: I am using V 12.2.0 of this module.

Did anyone here get the nodes to scale properly? I cannot get it to work no matter what I do.

@max-rocket-internet - Did you mean configure cluster-autoscaler within this module? If so, how? I dont see it in examples.

@dmanchikalapudi cluster-autoscaler is not connected to nor can be configured through this module.

At what point does the capacity go beyond 4? I tried to standup a t2.micro instance and try to scale pods beyond t2.micro's capacity. My cluster does not scale up to have more nodes.

The desired_capacity value is ignored by the module. You have to modify it by hand through the console.

Thanks for the response @kuritonasu. Doing it by hand pretty much negates the idea behind "managed" nodegroups. There is no point in defining the min/max node counts either. It is just an illusion of autoscaling.

My need is simple. When my replicasets scale to initialize more pods than the nodes can run with, I need the nodes to scale to accommodate (assuming there is HW capacity underneath and is within the max node count). How do I go about making that happen via terraform?

cluster-autoscaler is not connected to nor can be configured through this module.

Correct ✅

Doing it by hand pretty much negates the idea behind "managed" nodegroups.

Perhaps his doc might help you to see what is "managed" and what is not, specifically this image:

mng

There is no point in defining the min/max node counts either. It is just an illusion of autoscaling.

I wouldn't say it's an illusion, it's just not a "turn-key" thing. ASGs have been around for years and work very well when configured correctly 🙂

My need is simple. When my replicasets scale to initialize more pods than the nodes can run with, I need the nodes to scale to accommodate (assuming there is HW capacity underneath and is within the max node count). How do I go about making that happen via terraform?

This is how typical autoscaling works in k8s but this module is only for the AWS resources. The cluster-autoscaler runs in your cluster and is not supported by us or this module in any way, it's a completely separate thing. But there is some doc here that might help you: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/autoscaling.md

@max-rocket-internet I'd say, that "managed" in the broader sense together with Terraform also means scaling the worker nodes by setting the desired size as I'm also managing the VPC configuration with Terraform (and everything in between actually :) )

So IMHO, IF setting the desired size is possible through the API it SHOULD be supported by this ressource.

The practical use case that I have for this is that if I set a managed node group to desired 3, max 6, min 3, cluster autoscaler will respect this. There isn't a technical reason why we can't change the min_size, nor should it be dismissed as "not a feature"

So some concrete examples, since this has been a bit of noisy thread.

Here's an example initial definition of a scaling config as passed through node_groups in the eks cluster module:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 1
}

Then update this nodegroup's minimum to:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 3
}

You'll get an error like:

Error: error updating EKS Node Group (eks_cluster_1:compute_1) config: InvalidParameterException: Minimum capacity 3 can't be greater than desired size 1
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "<requestID>"
  },
  ClusterName: "eks_cluster_1",
  Message_: "Minimum capacity 3 can't be greater than desired size 1",
  NodegroupName: "compute_1"
}

Then running a state show, shows the obvious:

terraform state show module.eks_cluster_1.module.node_groups.aws_eks_node_group.workers[\"compute_1\"] | grep -A4 scaling_config
    scaling_config {
        desired_size = 1
        max_size     = 6
        min_size     = 1
    }

So this means that from a terraform perspective desired_capacity which translates into the scaling_config.desired_size is immutable. Which also means that desired_capacity can never be > the initial desired_capacity and min_capacity is effectively limited by this while you can still happily raise the max_capacity.

There's ways to work around this, such as getting the ASG id from the module and modifying it in terraform as part of the workflow, but that's a hack at best.

One hacky workaround that I have found works is you can specify a different instance size which will then force a totally new node group to be created which will then respect your (new, "initial") desired_capacity setting. I sure any other hack which forces a new node group to be created would work as well.

I agree with many of the other thread comments, it really feels odd that desired_capacity is not actually mutable by terraform. That said I do not have a clear picture of what the aws interface is like - I'm sure it's easier said than done!

I hacked it for now by using the value for the desired capacity in place of minimum capacity. At least if that's not a problem for your design, it works.

worker_groups = [
{
name = "worker-group-1"
key_name = var.worker_ssh_key_name
instance_type = var.worker_instance_type
asg_desired_capacity = var.worker_asg_desired_capacity
asg_max_size = var.worker_asg_max_size
asg_min_size = var.worker_asg_desired_capacity

},

]

Was this page helpful?
0 / 5 - 0 ratings