_This issue was originally opened by @jaloren as hashicorp/terraform#18263. It was migrated here as a result of the provider split. The original body of the issue is below._
I am using the aws_cloudformation_stack resource to provision an aws Elastic Container Service cluster and one or more services in that cluster. I used terraform graph -type=plan-destroy to verify that I successfully set up a dependency relationship in terraform between the TF resource for creating the service and the TF resource for creating the ECS cluster.
According to graphviz, the service is a child node of the ecs cluster node. Given that, I am expecting TF to delete the service and then delete the cluster. However, this seems to happen out of order, which causes the delete of the ECS cluster to fail since you can't delete a cluster that has services in it.
Terraform v0.11.8
Terraform successfully delete aws ECS cluster and its associated services.
Terraform successfully deleted the service in the ECS cluster but failed to delete the ECS cluster itself with the following error:
* aws_cloudformation_stack.ecs-cluster: DELETE_FAILED: ["The following resource(s) failed to delete: [ECSCluster]. " "The Cluster cannot be deleted while Services are active. (Service: AmazonECS; Status Code: 400; Error Code: ClusterContainsServicesException; Request ID: 7bcbeae4-70ab-11e8-bd0b-3d3254c7f7d3)"]
Please list the full steps required to reproduce the issue, for example:
terraform init
terraform plan
terraform apply
This Should work if we stop the ECS services and try deleting ECS Cluster.
@avengers009 you're right, but ideally Terraform should be able to schedule these actions accordingly, where possible, or if not possible the user should be able to hint Terraform via depends_on
. TL;DR users shouldn't need to manually touch the infrastructure in order to run apply
or destroy
successfully.
@jaloren Do you mind sharing the configs with us to understand the relationships between resources and allow us reproduce the problem?
Thanks.
I am also seeing this issue:
Error: Error applying plan:
1 error(s) occurred:
* aws_ecs_cluster.ecs (destroy): 1 error(s) occurred:
* aws_ecs_cluster.ecs: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.
status code: 400, request id: 30e1e812-854c-11e8-bec1-397064633d2b
Here is my configuration:
ecs_service
resource "aws_ecs_service" "authenticator" {
name = "authenticator"
cluster = "${aws_ecs_cluster.ecs.id}"
task_definition = "${aws_ecs_task_definition.authenticator.arn}"
desired_count = 2
load_balancer {
target_group_arn = "${aws_lb_target_group.authenticator.arn}"
container_name = "authenticator"
container_port = 3030
}
}
ecs_cluster
resource "aws_ecs_cluster" "ecs" {
name = "${local.safe_name_prefix}"
}
@Kartstig is that error occurring for you after 10 minutes or so of trying?
Yes it does. I usually make an attempt to destroy twice to account for any timeouts
I'm seeing very similar behavior with Terraform 0.11.7/AWS provider 1.19. I am frequently (but not every time) seeing this behavior:
00:12:27.512 aws_ecs_cluster.ecs_cluster: Still destroying... (ID: arn:aws:ecs:us-east-1:<MYACCOUNT>:cluster/my-service, 9m50s elapsed)
00:12:36.041
00:12:36.042 Error: Error applying plan:
00:12:36.043
00:12:36.044 1 error(s) occurred:
00:12:36.045
00:12:36.045 * aws_ecs_cluster.ecs_cluster (destroy): 1 error(s) occurred:
00:12:36.046
00:12:36.046 * aws_ecs_cluster.ecs_cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.
00:12:36.047 status code: 400, request id: b920a9e3-8b45-11e8-8e1a-0751c6fe0d1a
@radeksimko I am not sure how much of the configs you would like to see. Its a little bit involved. But here's the key part of the main.tf in the root module.
Each module is nothing but a wrapper for a cloudformation template. So by referring to the output from one module as input in another, I am establishing a dependency between the two resources encapsulated in each module. Ergo, I am expecting on a destroy that the cluster would be deleted after the service since the service depends on the cluster
module "public_load_balancer" {
source = "../../modules/aws/network/load_balancer/alb"
environment = "${var.environment}"
security_group = "${module.network_acls.load_balancer_security_group}"
vpc = "${module.network.vpc}"
subnets = "${module.network.public_one_subnet},${module.network.public_two_subnet}"
}
module "ecs_cluster" {
source = "../../modules/aws/ecs/cluster"
environment = "${var.environment}"
}
module "log_group" {
source = "../../modules/aws/logs/log_group"
environment = "${var.environment}"
log_retention = 3
}
module "ecs_application" {
source = "../../modules/aws/ecs/services/ecsapp"
subnets = "${module.network.ecs_traffic_one},${module.network.ecs_traffic_two}"
target_group = "${module.public_load_balancer.enrollment_api_target_group}"
environment = "${var.environment}"
security_group = "${module.network_acls.container_security_group}"
vpc = "${module.network.vpc}"
tag = "v1.0.0"
log_group = "${module.log_group.id}"
cluster_name = "${module.ecs_cluster.name}"
}
Any update on this issue? Is there a plan to fix this? Or at least provide/output a machine readable list of services to be destroyed before destroying the instances?
I think Terraform should stop/terminate the instances as part of the destroy process, right now you have to manually terminate instances in order for the destroy action to finish.
hey, we are trying to automate this destruction of instances instead of doing it manually. Is there a recommended way to automate this? Our application code is in Java.
One way to do this could be to parse the generated terraform plan(by "terraform destroy" command). Can you help us find a way to parse the terraform plan to identify what instances/clusters need to be destroyed?
You can prevent that situation with splitting your terraform project in at least two. You can use remote_state for that. If you put ECS cluster and service creation in two different projects, when you want to destroy, you can call first destroy process of service, then ECS cluster can be destroyed without any problem
Is there any solution here? Terraform was working great for me and now I'm having the same error "The Cluster cannot be deleted while Services are active" and don't understand why I need to manually stop/terminate the instances...
I am seeing this with 0.12.7 in my company's production environment intermittently. Is there any way to specify a "depends_on" or "teardown_first" which works for teardown?
I am seeing this still on latest version...
I'm here for the same issue - has anyone found a workaround? Or can anyone confirm that this _sometimes_ works (even after n retries)? Otherwise, it seems the aws_ecs_service
resource is broken. The core promise is that terraform apply
followed by terraform destroy
will just work.
Hoping to better understand if this _never_ works or if it's just a retry/interim issue or an issue particular to a set of configs.
UPDATE: In my particular instance, I can confirm upon retry that terraform destroy
does not list the ECS cluster as something to be destroyed - meaning the destroy of the ECS service failed at some point but was logged as destroyed anyway. (Or conversely, I guess, I could have been created and not correctly confirmed as created.) I will post back here if I have additional test results.
+1 having the same issue here. Latest version on Terraform Cloud
I have the same issue with terraform 0.12.19
Hey everyone, I'm using AWS CloudFormation and I'm experiencing this issue as well. I'm currently suspecting that it's not an issue with either CloudFormation or Terraform, but possibly with the underlying EC2 AMI. I'm using the Amazon Linux 2 AMI, while an example I'm referencing is using Amazon Linux 1, and the latter deletes fine while my former does not (even with an explicit DependsOn and Refs sprinkled throughout). There were a good number of changes to Amazon Linux 2, which I'm guessing may have included a change to cfn-bootstrap
which might impact /opt/aws/cfn-signal
behaviors. I haven't tested this out though.
Not sure if this is the right place to complain, but probably the same issue here:
Error: Error draining autoscaling group: Group still has 1 instances
Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining
Surprisingly, 2 moments:
Terraform v0.12.20
code is being used:
data "aws_ami" "amazon2_ecs_optimized" {}
resource "aws_launch_template" "this" {}
resource "aws_autoscaling_group" "this" {}
resource "aws_ecs_task_definition" "this" {}
resource "aws_ecs_service" "default" {
# ...
depends_on = [
# consider note at https://www.terraform.io/docs/providers/aws/r/ecs_service.html
aws_iam_role_policy.ecs_service
]
# ...
}
resource "aws_ecs_cluster" "application" {}
p.s. will try to build workaround with null_resource
and local-exec
provisioner with when = destroy
strategy running aws
cli to find and deregister ECS EC2 instances... but it's sad in terms of "reliable" cloud services.
I have also faced the exact similar issue as raised by mikalai-t.
@mikalai-t would you like to share what steps did you follow as workaround.
Still didn't implement a workaround, but... I noticed that sometimes even termination process took a while, so I assumed our application becomes unresponsive and consumes too much CPU and therefore EC2 instance failed to respond in time.
I just configured t3a.small
instead of t3a.micro
and the issue hasn't appeared since then. Not sure if this is a final solution, but you can start from analyzing your application behavior on a different instance type.
Also I would recommend to check current instance's protect from scale-in
setting. I had similar issue when I stopped using ECS Capacity Provider
and forgot to set this setting to false
.
btw... Even with capacity provider
configured in the cluster I faced timeouts when destroying the ASG, but after a couple of repeated attempts it was always successful.
Most helpful comment
+1 having the same issue here. Latest version on Terraform Cloud