Terraform-aws-eks: fails to destroy asg

Created on 21 Oct 2018 · 7Comments · Source: terraform-aws-modules/terraform-aws-eks

I have issues

More than what will be helped here ... The terrafrom-aws-eks module fails to destroy.

I'm submitting a...

[x] bug report
[ ] feature request
[ ] support request
[ ] kudos, thank you, warm fuzzy

What is the current behavior?

terraform destroy exits non-zero :

* module.eks.aws_autoscaling_group.workers (destroy): 1 error(s) occurred:
* aws_autoscaling_group.workers: group still has 3 instances

If this is a bug, how to reproduce? Please include a code sample if relevant.

Apply a plan with the following module config:

module "eks" {
  source                 = "terraform-aws-modules/eks/aws"
  version                = "1.7.0"
  cluster_name           = "${local.cluster_name}"
  cluster_delete_timeout = "25m"
  config_output_path     = "${local.config_path}"
  subnets                = "${local.subnets}"
  tags                   = "${local.tags}"
  vpc_id                 = "${local.vpc_id}"

  workers_group_defaults = {
    additional_security_group_ids = ""          # A comma delimited list of additional security group ids to include in worker launch config
    asg_desired_capacity          = "3"         # Desired worker capacity in the autoscaling group.
    asg_max_size                  = "6"         # Maximum worker capacity in the autoscaling group.
    asg_min_size                  = "3"         # Minimum worker capacity in the autoscaling group.
    autoscaling_enabled           = true        # Sets whether policy and matching tags will be added to allow autoscaling.
    enable_monitoring             = true        # Enables/disables detailed monitoring.
    instance_type                 = "t3.medium" # Size of the workers instances.
    protect_from_scale_in         = true        # Prevent AWS from scaling in, so that cluster-autoscaler is solely responsible.
    public_ip                     = false       # Associate a public ip address with a worker
    root_volume_size              = "100"       # root volume size of workers instances.
    root_volume_type              = "gp2"       # root volume type of workers instances, can be 'standard', 'gp2', or 'io1'
    target_group_arns             = ""          # A comma delimited list of ALB target group ARNs to be associated to the ASG
  }
}

Then attempt to terraform destroy

What's the expected behavior?

That terraform destroy successfully deletes the asg and exits 0.

Are you able to fix this problem and submit a PR? Link here if you have already.

Willing to help, but could use some direction pinpointing the issue.

Environment details

Affected module version: 1.7.0
OS: ubuntu 18.04
Terraform version: 0.11.8

Any other relevant info

protect_from_scale_in = true seems to be the culprit. If I provision without it, I can successfully destroy.

Source

tomdavidson

Most helpful comment

That is not very satisfying. A plan that can not be destroyed seems defective. Deploying the cluster with it enable and then doing an update to disable and then trying the destroy also fails.

If this is as designed and not a bug, I think a more descriptive on the functionality and impact would be helpful.

Thanks.

tomdavidson on 25 Oct 2018

👍6

All 7 comments

protect_from_scale_in = true seems to be the culprit. If I provision without it, I can successfully destroy.

Cool, then problem solved, right? If you want to enable protect_from_scale_in then what could we do in this module? You've already decided to protect your instances from termination. If you want to be able to destroy your ASG and instances then don't enable that setting 🙂

max-rocket-internet on 23 Oct 2018

That is not very satisfying. A plan that can not be destroyed seems defective. Deploying the cluster with it enable and then doing an update to disable and then trying the destroy also fails.

If this is as designed and not a bug, I think a more descriptive on the functionality and impact would be helpful.

Thanks.

tomdavidson on 25 Oct 2018

👍6

Just hit this and this is not a behavior I expected.
We are planning to use this module to create ephemeral clusters and being able to delete them is vital. At the same time, we very much want to run cluster-autoscaler so we do need to enable protect_from_scale_in.

What's the best way we could do this? How are you all handling this problem?

The best way I can think of is adding a local-exec or remote-exec provisioner with when = "destroy" that just deletes the remaining instances/ disables scale-in protection on the ASGs.
Do you have any other ideas? Would such a PR be accepted?

Vlaaaaaaad on 6 Dec 2018

Hi,

We have another module wrapping this module and @bmihaescu created the workaround I mentioned above:

resource "null_resource" "eks-predestroy" {
    provisioner "local-exec" {
    when = "destroy"

    interpreter = ["/bin/bash", "-c"]

    command = <<CMD
ASGName=$(aws autoscaling describe-auto-scaling-groups | grep "${local.cluster_name}" -A 100 -B 100 | grep AutoScalingGroupName | tr -d '" ,' | cut -f2 -d ":")

for asg in $ASGName; do
    InstanceID=$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name "$asg" | grep InstanceId | tr -d '" ,' | cut -f2 -d ":")

    for instance in $InstanceID; do
        aws autoscaling set-instance-protection --instance-ids "$instance" --auto-scaling-group-name "$asg" --no-protected-from-scale-in

    done
done
CMD
    }
}

It's a bit crude and could be converted to use jq but it seems to work fine for us and the EKS cluster gets deleted properly on destroy.
Is there any value in upstreaming this script? Should we create a PR?

LE: aaaand due to https://github.com/hashicorp/terraform/issues/13549 this only works on terraform destroy and not on tf apply when the module is deleted 😞

Vlaaaaaaad on 8 Jan 2019

This is totally 100% "working as intended". If you set protect-from-scale-in to true you have to manually terminate the instances. If I set protect-from-scale-in to true I am intentionally and explicitly saying I don't want anything to happen to those instances until I intentionally and explicitly tell the ASG otherwise.

RothAndrew on 8 Jan 2019

Changing the aws_autoscaling_group to fail fast if protect-from-scale-in is true should be the right choice here as proposed here

RothAndrew on 8 Jan 2019

This is totally 100% "working as intended". If you set protect-from-scale-in to true you have to manually terminate the instances.

Exactly. You can't have it both ways.

max-rocket-internet on 9 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Better options for spot instances: Support aws_ec2_fleet or aws_spot_fleet_request

max-rocket-internet · 3Comments

Error creating EKS Node Group when manage_cluster_iam_resources set to false: ResourceNotFoundException: No cluster found for name

kobemtl · 4Comments

Open PR to remove providers from module

gb-ckedzierski · 5Comments

module error when using loops to define node_groups

swapzero · 4Comments

Include autoscaling related IAM policies for workers for the cluster-autoscaler

max-rocket-internet · 4Comments