Terraform-aws-eks: Best way to remove asg without rebuilding all of them.

Created on 23 Aug 2019 · 20Comments · Source: terraform-aws-modules/terraform-aws-eks

When I remove group-01 it changes the indexes and forces a rebuild of group-02 and group-03. Is there a recommended way to avoid that?

  worker_groups = [
    {
      name                     = "group-01"
      instance_type            = "r5.2xlarge"
    },
    {
      name                     = "group-02"
      instance_type            = "m5.4xlarge"
    },
    {
      name                     = "group-03"
      instance_type            = "m5.8xlarge"
    }
  ]

stale

Source

sethpollack

👍2

Most helpful comment

What if we support both in parallel?
Add worker_groups_launch_template_map so that new users can start using the _map variable, while old users can remain on the old one and can try to transition in the future

jaimehrubiks on 17 Mar 2020

👍2

All 20 comments

I don't think you can do that actually. But since Terraform 0.12.6, we can now use for_each in resources. With for_each the resource reference is not an indice anymore. This let us remove specific resource within a list or a map.

More info here:

https://www.terraform.io/docs/configuration/resources.html#when-to-use-for_each-instead-of-count
https://github.com/hashicorp/terraform/issues/14275

barryib on 23 Aug 2019

Maybe this will help you as workaround for now https://github.com/hashicorp/terraform/issues/14275#issuecomment-360245405

barryib on 23 Aug 2019

👍1

Cool thanks!. Are there any plans to move to for_each?

sethpollack on 23 Aug 2019

we have an eks cluster with five worker groups (asg's). i wanted to upgrade them from the 12.7 ami to the 12.10 ami. based on https://docs.aws.amazon.com/eks/latest/userguide/migrate-stack.html, i tried these steps:

1) helm delete --purge cluster-autoscaler
2) in our eks/main.tf which uses this module, duplicate the asg's inside the worker_groups list and update the ami's on the new worker groups. (this means add 5 new asg's to the end of the list.)
3) tf apply; confirm ten asg's on 2 different launch configs
4) run ./drain-asgs.sh which is

#!/bin/bash

set -e

K8S_VERSION=1.12.7
nodes=$(kubectl  get nodes -o jsonpath="{.items[?(@.status.nodeInfo.kubeletVersion==\"v$K8S_VERSION\")].metadata.name}")
for node in ${nodes[@]}
do
    echo "Draining $node"
    kubectl drain $node --ignore-daemonsets --delete-local-data
done

5) delete original asg's from worker_groups list in eks/main.tf. (this means the list is back to its original size of 5 and all using the new 12.10 ami.)
6) tf apply
Plan: 5 to add, 5 to change, 20 to destroy.

because the worker_groups logic is index-based, a bunch of chaos ensues and the new asg's actually end up deleted.

sounds like using for_each would allow worker_groups mgmt to move away from index-based logic and will support removing groups from the list.

aeugenio on 11 Sep 2019

I would like to work on this, but it sounds like it will force everyone to move theirs resources in states.

I don't know how this can be painful for users.

Any thoughts ? @max-rocket-internet @dpiddockcmp ?

barryib on 9 Oct 2019

i had the same concerns about existing tf state, but fortunately our team is in a spot where we can rebuild all of our envs. so i forked somewhere in between 6.0.1 and 6.0.2 and implemented using a map of maps for worker_group_launch_templates instead of a list. our team also had to split the eks control plane from the worker groups as flipping bits for the eks api endpoint access vars was causing all the asg's to roll (i believe due to naming and kubeconfig/auth-map.rendered dependencies).

here's an example of how our eks-worker-groups module looks:
worker_groups_launch_template = { "${var.env}-${var.cluster_id}-infra-${data.aws_availability_zones.available.names[0]}-20191003" = { instance_type = "m5.large" asg_min_size = 1 asg_max_size = 2 key_name = var.ssh_key_name subnets = [ data.terraform_remote_state.vpc.outputs.private_subnets[0] ] kubelet_extra_args = "--node-labels=k8s-env=${local.full-cluster-name},nodegroup=infra --register-with-taints=infra=true:NoSchedule" autoscaling_enabled = true root_volume_size = 1024 termination_policies = [ "OldestInstance" ] ami_id = "ami-05d586e6f773f6abf" tags = [{ key = "worker-type" value = "infra" propagate_at_launch = true }] },

i am planning on getting a public repo up that can at least provide a starting point if someone wants to see how i handled all the for_each's. we also made a couple other adjustments that would make submitting a PR not really possible. but really want to at least contribute the concept back.

aeugenio on 9 Oct 2019

oh yeah forgot to mention that upgrading the cluster via the steps i outlined above works really nicely.

aeugenio on 9 Oct 2019

There's no way around it, moving to a map of maps and for_each will be painful for everyone upgrading.

I've done it to a few unrelated plans as part of our 0.12 upgrade efforts. You can't simply use terraform state mv resource[0] resource["key"] for the first item as TF gets confused. You have to either drop and import, manually edit the state file or let TF delete and recreate the resources. Not sure if TF > 0.12.7 has improved that. Need to test what has changed in 0.12.10

A lot of short term pain for all. Would make life much easier for people who change their worker_groups.

dpiddockcmp on 10 Oct 2019

You have to either drop and import, manually edit the state file or let TF delete and recreate the resources

Hmmm, sounds painful.

When I remove group-01 it changes the indexes and forces a rebuild of group-02 and group-03.

Is it not possible to use terraform state mv resource[1] resource[0] for this?

max-rocket-internet on 21 Oct 2019

Unfortunately no @max-rocket-internet . Take a look:

First, I tried to just move it:

$ terraform state mv module.eks.aws_autoscaling_group.workers[1] module.eks.aws_autoscaling_group.workers[0]
Acquiring state lock. This may take a few moments...

Error: Invalid target address

Cannot move to module.eks.aws_autoscaling_group.workers[0]:
there is already a resource instance at that address in the current state.

Then, I tried to remove the resource from index 0 and try to move again:

$ terraform state rm module.sc-eks.module.eks.aws_autoscaling_group.workers[0]
Removed module.sc-eks.module.eks.aws_autoscaling_group.workers[0]
Successfully removed 1 resource instance(s).

$ terraform state mv module.eks.aws_autoscaling_group.workers[1] module.eks.aws_autoscaling_group.workers[0]
Move "module.eks.aws_autoscaling_group.workers[1]" to "module.eks.aws_autoscaling_group.workers[0]"

Error: Invalid target address

Cannot move to module.eks.aws_autoscaling_group.workers[0]:
module.eks.aws_autoscaling_group.workers does not exist in the
current state.

galindro on 8 Jan 2020

I was able to workaround by moving the index 0 to index 2 and then move index 1 to 0. After that, I just ran the apply command again to destroy the old workers.

galindro on 9 Jan 2020

👍2

I wanted to try and work on this, I started with the worker_launch_template and iterate from here.

If someone have the time I'd like some feedback/help here

I'm having issues with the coalescelist and getting this error:

  on aws_auth.tf line 13, in data "template_file" "launch_template_worker_role_arns":                          
  13:         lookup(data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile[eac
h.key], "role_name", ""                                                                                        
    |----------------                                                                                          
    | data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile is object with no 
attributes                                                                                                     
    | each.key is "default-eu-west-3c"                                                                         

The given key does not identify an element in this collection value.

Despite using a lookup here, it stills tried to access the attribute on an empty map. I'm stuck here, I don't know how to have the same behavior as coalescelist but with two maps.

ArchiFleKs on 13 Feb 2020

Looking into what's been done for the node_group the best way would be to merge everything beforehand from what I understand here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/locals.tf

ArchiFleKs on 13 Feb 2020

jaimehrubiks on 17 Mar 2020

👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 15 Jun 2020

This issue has been automatically closed because it has not had recent activity since being marked as stale.

stale[bot] on 15 Jul 2020

Any updates to this issue?

I would like to have worker_groups as a map too. That way I can delete and add groups without rebuilding all others which indexes are higher. Also I can address them by names in worker_asg_names rather than indexes.

Several days ago I wanted to delete groups 0 and 1. After considering the plan Terraform gave me I just scaled them to 0 not to rebuild all other groups :)

lexsys27 on 3 Sep 2020

To treat the worker_groups as a map would need extensive changes to the module. It will also require every user of the module to recreate all worker groups when upgrading.

dpiddockcmp on 4 Sep 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 4 Dec 2020

This issue has been automatically closed because it has not had recent activity since being marked as stale.

stale[bot] on 3 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings