Terraform-provider-aws: aws_ecs_cluster with capacity_providers cannot be destroyed

Created on 22 Dec 2019 · 5Comments · Source: hashicorp/terraform-provider-aws

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.18

provider.aws v2.43.0

Affected Resource(s)

aws_ecs_cluster
aws_ecs_capacity_provider
aws_autoscaling_group

Terraform Configuration Files

resource "aws_ecs_cluster" "indestructable" { 
  name = "show_tf_cp_flaw"

  capacity_providers = [aws_ecs_capacity_provider.cp.name]

  default_capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.cp.name
  }
}

resource "aws_ecs_capacity_provider" "cp" {
  name = "show_tf_cp_flaw"

  auto_scaling_group_provider {
    auto_scaling_group_arn = aws_autoscaling_group.asg.arn

    managed_scaling {
      status          = "ENABLED"
      target_capacity = 80
    }
  }
}

resource "aws_autoscaling_group" "asg" {
  min_size = 2
  ....
}

Debug Output

Panic Output

Expected Behavior

terraform destroy should be able to destroy an aws_ecs_cluster which has capacity_providers set.

Actual Behavior

Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.

The problem is that this new capacity_provider property on the aws_ecs_cluster introduces a new dependency:
aws_ecs_cluster
depends on aws_ecs_capacity_provider
depends on aws_autoscaling_group

This causes terraform to destroy the ECS cluster before the autoscaling group, which is the wrong way around: the autoscaling group must be destroyed first because the cluster must contain zero instances before it can be destroyed.

A possible solution may be to introduce a new resource type representing the attachment of a capacity provider to a cluster (inspired by aws_iam_role_policy_attachment which is the attachment of an IAM policy to a role).

This would allow the following dependency graph which would work beautifully:
aws_ecs_capacity_provider_cluster_attachment
depends on aws_ecs_cluster and aws_ecs_capacity_provider;
aws_ecs_capacity_provider
depends on aws_autoscaling_group
depends on aws_launch_template
depends on aws_ecs_cluster (e.g. via the user_data property which needs to set the ECS_CLUSTER environment variable to the name of the cluster).

Steps to Reproduce

terraform apply
terraform destroy

Important Factoids

References

The problematic capacity_providers field on aws_ecs_cluster was added recently in #11150
Using aws_ecs_capacity_provider with managed_termination_protection = "ENABLED" requires that the aws_autoscaling_group has protect_from_scale_in enabled, which has a separate issue with destroy: #5278

new-resource servicecs

Source

lukedd

👍58 🚀1

Most helpful comment

Meanwhile here is a nasty workaround using a destroy provisioner, that worked for me to allow the aws_ecs_cluster to be destroyed:

resource "aws_ecs_cluster" "cluster" {
  name = local.cluster_name

  capacity_providers = [aws_ecs_capacity_provider.cp.name]

  default_capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.cp.name
  }

  # We need to terminate all instances before the cluster can be destroyed.
  # (Terraform would handle this automatically if the autoscaling group depended
  #  on the cluster, but we need to have the dependency in the reverse
  #  direction due to the capacity_providers field above).
  provisioner "local-exec" {
    when = destroy

    command = <<CMD
      # Get the list of capacity providers associated with this cluster
      CAP_PROVS="$(aws ecs describe-clusters --clusters "${self.arn}" \
        --query 'clusters[*].capacityProviders[*]' --output text)"

      # Now get the list of autoscaling groups from those capacity providers
      ASG_ARNS="$(aws ecs describe-capacity-providers \
        --capacity-providers "$CAP_PROVS" \
        --query 'capacityProviders[*].autoScalingGroupProvider.autoScalingGroupArn' \
        --output text)"

      if [ -n "$ASG_ARNS" ] && [ "$ASG_ARNS" != "None" ]
      then
        for ASG_ARN in $ASG_ARNS
        do
          ASG_NAME=$(echo $ASG_ARN | cut -d/ -f2-)

          # Set the autoscaling group size to zero
          aws autoscaling update-auto-scaling-group \
            --auto-scaling-group-name "$ASG_NAME" \
            --min-size 0 --max-size 0 --desired-capacity 0

          # Remove scale-in protection from all instances in the asg
          INSTANCES="$(aws autoscaling describe-auto-scaling-groups \
            --auto-scaling-group-names "$ASG_NAME" \
            --query 'AutoScalingGroups[*].Instances[*].InstanceId' \
            --output text)"
          aws autoscaling set-instance-protection --instance-ids $INSTANCES \
            --auto-scaling-group-name "$ASG_NAME" \
            --no-protected-from-scale-in
        done
      fi
CMD
  }
}

lukedd on 22 Dec 2019

👍11

All 5 comments

Meanwhile here is a nasty workaround using a destroy provisioner, that worked for me to allow the aws_ecs_cluster to be destroyed:

resource "aws_ecs_cluster" "cluster" {
  name = local.cluster_name

  capacity_providers = [aws_ecs_capacity_provider.cp.name]

  default_capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.cp.name
  }

  # We need to terminate all instances before the cluster can be destroyed.
  # (Terraform would handle this automatically if the autoscaling group depended
  #  on the cluster, but we need to have the dependency in the reverse
  #  direction due to the capacity_providers field above).
  provisioner "local-exec" {
    when = destroy

    command = <<CMD
      # Get the list of capacity providers associated with this cluster
      CAP_PROVS="$(aws ecs describe-clusters --clusters "${self.arn}" \
        --query 'clusters[*].capacityProviders[*]' --output text)"

      # Now get the list of autoscaling groups from those capacity providers
      ASG_ARNS="$(aws ecs describe-capacity-providers \
        --capacity-providers "$CAP_PROVS" \
        --query 'capacityProviders[*].autoScalingGroupProvider.autoScalingGroupArn' \
        --output text)"

      if [ -n "$ASG_ARNS" ] && [ "$ASG_ARNS" != "None" ]
      then
        for ASG_ARN in $ASG_ARNS
        do
          ASG_NAME=$(echo $ASG_ARN | cut -d/ -f2-)

          # Set the autoscaling group size to zero
          aws autoscaling update-auto-scaling-group \
            --auto-scaling-group-name "$ASG_NAME" \
            --min-size 0 --max-size 0 --desired-capacity 0

          # Remove scale-in protection from all instances in the asg
          INSTANCES="$(aws autoscaling describe-auto-scaling-groups \
            --auto-scaling-group-names "$ASG_NAME" \
            --query 'AutoScalingGroups[*].Instances[*].InstanceId' \
            --output text)"
          aws autoscaling set-instance-protection --instance-ids $INSTANCES \
            --auto-scaling-group-name "$ASG_NAME" \
            --no-protected-from-scale-in
        done
      fi
CMD
  }
}

lukedd on 22 Dec 2019

👍11

Any updates here? This is terribly annoying to deal with. (The workaround does not work in my particular case)

kkost on 17 Jun 2020

Any news? I still waiting for this issue to be fixed

carlitos081 on 30 Jun 2020

If you don't link the cluster to the capacity provider as a dependency and just use the name as a string does that fix the issue? It's not great but as long as you can delete a capacity provider while the ASG it's linked to has instances then that would work.

tomelliff on 13 Aug 2020

@tomelliff Unfortunately this has another issue during construction:
Error: InvalidParameterException: The specified capacity provider 'XXX' is not in an ACTIVE state. Specify a valid capacity provider and try again.