When I remove group-01 it changes the indexes and forces a rebuild of group-02 and group-03. Is there a recommended way to avoid that?
worker_groups = [
{
name = "group-01"
instance_type = "r5.2xlarge"
},
{
name = "group-02"
instance_type = "m5.4xlarge"
},
{
name = "group-03"
instance_type = "m5.8xlarge"
}
]
I don't think you can do that actually. But since Terraform 0.12.6, we can now use for_each in resources. With for_each the resource reference is not an indice anymore. This let us remove specific resource within a list or a map.
More info here:
Maybe this will help you as workaround for now https://github.com/hashicorp/terraform/issues/14275#issuecomment-360245405
Cool thanks!. Are there any plans to move to for_each?
we have an eks cluster with five worker groups (asg's). i wanted to upgrade them from the 12.7 ami to the 12.10 ami. based on https://docs.aws.amazon.com/eks/latest/userguide/migrate-stack.html, i tried these steps:
1) helm delete --purge cluster-autoscaler
2) in our eks/main.tf which uses this module, duplicate the asg's inside the worker_groups list and update the ami's on the new worker groups. (this means add 5 new asg's to the end of the list.)
3) tf apply; confirm ten asg's on 2 different launch configs
4) run ./drain-asgs.sh which is
#!/bin/bash
set -e
K8S_VERSION=1.12.7
nodes=$(kubectl get nodes -o jsonpath="{.items[?(@.status.nodeInfo.kubeletVersion==\"v$K8S_VERSION\")].metadata.name}")
for node in ${nodes[@]}
do
echo "Draining $node"
kubectl drain $node --ignore-daemonsets --delete-local-data
done
5) delete original asg's from worker_groups list in eks/main.tf. (this means the list is back to its original size of 5 and all using the new 12.10 ami.)
6) tf apply
Plan: 5 to add, 5 to change, 20 to destroy.
because the worker_groups logic is index-based, a bunch of chaos ensues and the new asg's actually end up deleted.
sounds like using for_each would allow worker_groups mgmt to move away from index-based logic and will support removing groups from the list.
I would like to work on this, but it sounds like it will force everyone to move theirs resources in states.
I don't know how this can be painful for users.
Any thoughts ? @max-rocket-internet @dpiddockcmp ?
i had the same concerns about existing tf state, but fortunately our team is in a spot where we can rebuild all of our envs. so i forked somewhere in between 6.0.1 and 6.0.2 and implemented using a map of maps for worker_group_launch_templates instead of a list. our team also had to split the eks control plane from the worker groups as flipping bits for the eks api endpoint access vars was causing all the asg's to roll (i believe due to naming and kubeconfig/auth-map.rendered dependencies).
here's an example of how our eks-worker-groups module looks:
worker_groups_launch_template = {
"${var.env}-${var.cluster_id}-infra-${data.aws_availability_zones.available.names[0]}-20191003" = {
instance_type = "m5.large"
asg_min_size = 1
asg_max_size = 2
key_name = var.ssh_key_name
subnets = [ data.terraform_remote_state.vpc.outputs.private_subnets[0] ]
kubelet_extra_args = "--node-labels=k8s-env=${local.full-cluster-name},nodegroup=infra --register-with-taints=infra=true:NoSchedule"
autoscaling_enabled = true
root_volume_size = 1024
termination_policies = [ "OldestInstance" ]
ami_id = "ami-05d586e6f773f6abf"
tags = [{
key = "worker-type"
value = "infra"
propagate_at_launch = true
}]
},
i am planning on getting a public repo up that can at least provide a starting point if someone wants to see how i handled all the for_each's. we also made a couple other adjustments that would make submitting a PR not really possible. but really want to at least contribute the concept back.
oh yeah forgot to mention that upgrading the cluster via the steps i outlined above works really nicely.
There's no way around it, moving to a map of maps and for_each will be painful for everyone upgrading.
I've done it to a few unrelated plans as part of our 0.12 upgrade efforts. You can't simply use terraform state mv resource[0] resource["key"] for the first item as TF gets confused. You have to either drop and import, manually edit the state file or let TF delete and recreate the resources. Not sure if TF > 0.12.7 has improved that. Need to test what has changed in 0.12.10
A lot of short term pain for all. Would make life much easier for people who change their worker_groups.
You have to either drop and import, manually edit the state file or let TF delete and recreate the resources
Hmmm, sounds painful.
When I remove group-01 it changes the indexes and forces a rebuild of group-02 and group-03.
Is it not possible to use terraform state mv resource[1] resource[0] for this?
Unfortunately no @max-rocket-internet . Take a look:
First, I tried to just move it:
$ terraform state mv module.eks.aws_autoscaling_group.workers[1] module.eks.aws_autoscaling_group.workers[0]
Acquiring state lock. This may take a few moments...
Error: Invalid target address
Cannot move to module.eks.aws_autoscaling_group.workers[0]:
there is already a resource instance at that address in the current state.
Then, I tried to remove the resource from index 0 and try to move again:
$ terraform state rm module.sc-eks.module.eks.aws_autoscaling_group.workers[0]
Removed module.sc-eks.module.eks.aws_autoscaling_group.workers[0]
Successfully removed 1 resource instance(s).
$ terraform state mv module.eks.aws_autoscaling_group.workers[1] module.eks.aws_autoscaling_group.workers[0]
Move "module.eks.aws_autoscaling_group.workers[1]" to "module.eks.aws_autoscaling_group.workers[0]"
Error: Invalid target address
Cannot move to module.eks.aws_autoscaling_group.workers[0]:
module.eks.aws_autoscaling_group.workers does not exist in the
current state.
I was able to workaround by moving the index 0 to index 2 and then move index 1 to 0. After that, I just ran the apply command again to destroy the old workers.
I wanted to try and work on this, I started with the worker_launch_template and iterate from here.
If someone have the time I'd like some feedback/help here
I'm having issues with the coalescelist and getting this error:
on aws_auth.tf line 13, in data "template_file" "launch_template_worker_role_arns":
13: lookup(data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile[eac
h.key], "role_name", ""
|----------------
| data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile is object with no
attributes
| each.key is "default-eu-west-3c"
The given key does not identify an element in this collection value.
Despite using a lookup here, it stills tried to access the attribute on an empty map. I'm stuck here, I don't know how to have the same behavior as coalescelist but with two maps.
Looking into what's been done for the node_group the best way would be to merge everything beforehand from what I understand here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/locals.tf
What if we support both in parallel?
Add worker_groups_launch_template_map so that new users can start using the _map variable, while old users can remain on the old one and can try to transition in the future
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity since being marked as stale.
Any updates to this issue?
I would like to have worker_groups as a map too. That way I can delete and add groups without rebuilding all others which indexes are higher. Also I can address them by names in worker_asg_names rather than indexes.
Several days ago I wanted to delete groups 0 and 1. After considering the plan Terraform gave me I just scaled them to 0 not to rebuild all other groups :)
To treat the worker_groups as a map would need extensive changes to the module. It will also require every user of the module to recreate all worker groups when upgrading.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity since being marked as stale.
Most helpful comment
What if we support both in parallel?
Add worker_groups_launch_template_map so that new users can start using the _map variable, while old users can remain on the old one and can try to transition in the future