Terraform v0.12.18
resource "aws_ecs_cluster" "indestructable" {
name = "show_tf_cp_flaw"
capacity_providers = [aws_ecs_capacity_provider.cp.name]
default_capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.cp.name
}
}
resource "aws_ecs_capacity_provider" "cp" {
name = "show_tf_cp_flaw"
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.asg.arn
managed_scaling {
status = "ENABLED"
target_capacity = 80
}
}
}
resource "aws_autoscaling_group" "asg" {
min_size = 2
....
}
terraform destroy
should be able to destroy an aws_ecs_cluster
which has capacity_providers
set.
Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.
The problem is that this new capacity_provider
property on the aws_ecs_cluster
introduces a new dependency:
aws_ecs_cluster
depends on aws_ecs_capacity_provider
depends on aws_autoscaling_group
This causes terraform to destroy the ECS cluster before the autoscaling group, which is the wrong way around: the autoscaling group must be destroyed first because the cluster must contain zero instances before it can be destroyed.
A possible solution may be to introduce a new resource type representing the attachment of a capacity provider to a cluster (inspired by aws_iam_role_policy_attachment
which is the attachment of an IAM policy to a role).
This would allow the following dependency graph which would work beautifully:
aws_ecs_capacity_provider_cluster_attachment
depends on aws_ecs_cluster
and aws_ecs_capacity_provider
;
aws_ecs_capacity_provider
depends on aws_autoscaling_group
depends on aws_launch_template
depends on aws_ecs_cluster
(e.g. via the user_data
property which needs to set the ECS_CLUSTER
environment variable to the name of the cluster).
terraform apply
terraform destroy
The problematic capacity_providers
field on aws_ecs_cluster
was added recently in #11150
Using aws_ecs_capacity_provider
with managed_termination_protection = "ENABLED"
requires that the aws_autoscaling_group
has protect_from_scale_in
enabled, which has a separate issue with destroy: #5278
Meanwhile here is a nasty workaround using a destroy provisioner, that worked for me to allow the aws_ecs_cluster
to be destroyed:
resource "aws_ecs_cluster" "cluster" {
name = local.cluster_name
capacity_providers = [aws_ecs_capacity_provider.cp.name]
default_capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.cp.name
}
# We need to terminate all instances before the cluster can be destroyed.
# (Terraform would handle this automatically if the autoscaling group depended
# on the cluster, but we need to have the dependency in the reverse
# direction due to the capacity_providers field above).
provisioner "local-exec" {
when = destroy
command = <<CMD
# Get the list of capacity providers associated with this cluster
CAP_PROVS="$(aws ecs describe-clusters --clusters "${self.arn}" \
--query 'clusters[*].capacityProviders[*]' --output text)"
# Now get the list of autoscaling groups from those capacity providers
ASG_ARNS="$(aws ecs describe-capacity-providers \
--capacity-providers "$CAP_PROVS" \
--query 'capacityProviders[*].autoScalingGroupProvider.autoScalingGroupArn' \
--output text)"
if [ -n "$ASG_ARNS" ] && [ "$ASG_ARNS" != "None" ]
then
for ASG_ARN in $ASG_ARNS
do
ASG_NAME=$(echo $ASG_ARN | cut -d/ -f2-)
# Set the autoscaling group size to zero
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name "$ASG_NAME" \
--min-size 0 --max-size 0 --desired-capacity 0
# Remove scale-in protection from all instances in the asg
INSTANCES="$(aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names "$ASG_NAME" \
--query 'AutoScalingGroups[*].Instances[*].InstanceId' \
--output text)"
aws autoscaling set-instance-protection --instance-ids $INSTANCES \
--auto-scaling-group-name "$ASG_NAME" \
--no-protected-from-scale-in
done
fi
CMD
}
}
Any updates here? This is terribly annoying to deal with. (The workaround does not work in my particular case)
Any news? I still waiting for this issue to be fixed
If you don't link the cluster to the capacity provider as a dependency and just use the name as a string does that fix the issue? It's not great but as long as you can delete a capacity provider while the ASG it's linked to has instances then that would work.
@tomelliff Unfortunately this has another issue during construction:
Error: InvalidParameterException: The specified capacity provider 'XXX' is not in an ACTIVE state. Specify a valid capacity provider and try again.
Most helpful comment
Meanwhile here is a nasty workaround using a destroy provisioner, that worked for me to allow the
aws_ecs_cluster
to be destroyed: