Terraform-aws-eks: fails to destroy asg

Created on 21 Oct 2018  路  7Comments  路  Source: terraform-aws-modules/terraform-aws-eks

I have issues

More than what will be helped here ... The terrafrom-aws-eks module fails to destroy.

I'm submitting a...

  • [x] bug report
  • [ ] feature request
  • [ ] support request
  • [ ] kudos, thank you, warm fuzzy

What is the current behavior?

terraform destroy exits non-zero :

* module.eks.aws_autoscaling_group.workers (destroy): 1 error(s) occurred:
* aws_autoscaling_group.workers: group still has 3 instances

If this is a bug, how to reproduce? Please include a code sample if relevant.

Apply a plan with the following module config:

module "eks" {
  source                 = "terraform-aws-modules/eks/aws"
  version                = "1.7.0"
  cluster_name           = "${local.cluster_name}"
  cluster_delete_timeout = "25m"
  config_output_path     = "${local.config_path}"
  subnets                = "${local.subnets}"
  tags                   = "${local.tags}"
  vpc_id                 = "${local.vpc_id}"

  workers_group_defaults = {
    additional_security_group_ids = ""          # A comma delimited list of additional security group ids to include in worker launch config
    asg_desired_capacity          = "3"         # Desired worker capacity in the autoscaling group.
    asg_max_size                  = "6"         # Maximum worker capacity in the autoscaling group.
    asg_min_size                  = "3"         # Minimum worker capacity in the autoscaling group.
    autoscaling_enabled           = true        # Sets whether policy and matching tags will be added to allow autoscaling.
    enable_monitoring             = true        # Enables/disables detailed monitoring.
    instance_type                 = "t3.medium" # Size of the workers instances.
    protect_from_scale_in         = true        # Prevent AWS from scaling in, so that cluster-autoscaler is solely responsible.
    public_ip                     = false       # Associate a public ip address with a worker
    root_volume_size              = "100"       # root volume size of workers instances.
    root_volume_type              = "gp2"       # root volume type of workers instances, can be 'standard', 'gp2', or 'io1'
    target_group_arns             = ""          # A comma delimited list of ALB target group ARNs to be associated to the ASG
  }
}

Then attempt to terraform destroy

What's the expected behavior?

That terraform destroy successfully deletes the asg and exits 0.

Are you able to fix this problem and submit a PR? Link here if you have already.

Willing to help, but could use some direction pinpointing the issue.

Environment details

  • Affected module version: 1.7.0
  • OS: ubuntu 18.04
  • Terraform version: 0.11.8

Any other relevant info

protect_from_scale_in = true seems to be the culprit. If I provision without it, I can successfully destroy.

Most helpful comment

That is not very satisfying. A plan that can not be destroyed seems defective. Deploying the cluster with it enable and then doing an update to disable and then trying the destroy also fails.

If this is as designed and not a bug, I think a more descriptive on the functionality and impact would be helpful.

Thanks.

All 7 comments

protect_from_scale_in = true seems to be the culprit. If I provision without it, I can successfully destroy.

Cool, then problem solved, right? If you want to enable protect_from_scale_in then what could we do in this module? You've already decided to protect your instances from termination. If you want to be able to destroy your ASG and instances then don't enable that setting 馃檪

That is not very satisfying. A plan that can not be destroyed seems defective. Deploying the cluster with it enable and then doing an update to disable and then trying the destroy also fails.

If this is as designed and not a bug, I think a more descriptive on the functionality and impact would be helpful.

Thanks.

Just hit this and this is not a behavior I expected.
We are planning to use this module to create ephemeral clusters and being able to delete them is vital. At the same time, we very much want to run cluster-autoscaler so we do need to enable protect_from_scale_in.

What's the best way we could do this? How are you all handling this problem?

The best way I can think of is adding a local-exec or remote-exec provisioner with when = "destroy" that just deletes the remaining instances/ disables scale-in protection on the ASGs.
Do you have any other ideas? Would such a PR be accepted?

Hi,

We have another module wrapping this module and @bmihaescu created the workaround I mentioned above:

resource "null_resource" "eks-predestroy" {
    provisioner "local-exec" {
    when = "destroy"

    interpreter = ["/bin/bash", "-c"]

    command = <<CMD
ASGName=$(aws autoscaling describe-auto-scaling-groups | grep "${local.cluster_name}" -A 100 -B 100 | grep AutoScalingGroupName | tr -d '" ,' | cut -f2 -d ":")

for asg in $ASGName; do
    InstanceID=$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name "$asg" | grep InstanceId | tr -d '" ,' | cut -f2 -d ":")

    for instance in $InstanceID; do
        aws autoscaling set-instance-protection --instance-ids "$instance" --auto-scaling-group-name "$asg" --no-protected-from-scale-in

    done
done
CMD
    }
}

It's a bit crude and could be converted to use jq but it seems to work fine for us and the EKS cluster gets deleted properly on destroy.
Is there any value in upstreaming this script? Should we create a PR?

LE: aaaand due to https://github.com/hashicorp/terraform/issues/13549 this only works on terraform destroy and not on tf apply when the module is deleted 馃槥

This is totally 100% "working as intended". If you set protect-from-scale-in to true you have to manually terminate the instances. If I set protect-from-scale-in to true I am intentionally and explicitly saying I don't want anything to happen to those instances until I intentionally and explicitly tell the ASG otherwise.

Changing the aws_autoscaling_group to fail fast if protect-from-scale-in is true should be the right choice here as proposed here

This is totally 100% "working as intended". If you set protect-from-scale-in to true you have to manually terminate the instances.

Exactly. You can't have it both ways.

Was this page helpful?
0 / 5 - 0 ratings