Terraform-provider-aws: Failing to delete an already provisioned subnet if it was used for an Autoscaling Group (that created some EC2 instances)

Created on 25 Jul 2019  路  9Comments  路  Source: hashicorp/terraform-provider-aws

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.5

  • provider.aws v2.20.0
  • provider.template v2.1.2

Terraform Configuration Files

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  enable_dns_hostnames = true
}
resource "aws_subnet" "subnet1" {
  vpc_id = aws_vpc.main.id
  cidr_block = "10.0.1.0/24"
}
# Read all subnet ids for this vpc/region.
data "aws_subnet_ids" "all_subnets" {
  vpc_id = data.aws_vpc.default.id
  # Wait for the subnets to be actually created, not just the VPC
  depends_on = [
    aws_subnet.subnet1
  ]
}
resource "aws_autoscaling_group" "ecs_cluster_spot" {
  name_prefix = "ecs_cluster_spot"
  termination_policies = [
    "OldestInstance"]
  max_size = local.max_spot_instances
  min_size = local.min_spot_instances
  launch_configuration = aws_launch_configuration.ecs_config_launch_config_spot.name
  lifecycle {
    create_before_destroy = true
  }
  # This is the important part:
  # We attach the subnets of the VPC to the autoscaling group
  vpc_zone_identifier = data.aws_subnet_ids.all_subnets.ids
}

I've truncated some pieces of my configuration to the bare minimum. I later add ECS task definitions and services onto the AWS ECS but I don't think these are important for the issue. I might as well launch them using the AWS console and not with Terraform and I assume the effect will be the same.

Debug Output

...
aws_subnet.subnet1: Still destroying... [id=subnet-0a7d3066014860a8e, 18m10s elapsed]
aws_subnet.subnet1: Still destroying... [id=subnet-0a7d3066014860a8e, 18m20s elapsed]
aws_subnet.subnet1: Still destroying... [id=subnet-0a7d3066014860a8e, 18m30s elapsed]
aws_subnet.subnet1: Still destroying... [id=subnet-0a7d3066014860a8e, 18m40s elapsed]
aws_subnet.subnet1: Still destroying... [id=subnet-0a7d3066014860a8e, 18m50s elapsed]
aws_subnet.subnet1: Still destroying... [id=subnet-0a7d3066014860a8e, 19m0s elapsed]
aws_subnet.subnet1: Still destroying... [id=subnet-0a7d3066014860a8e, 19m10s elapsed]
aws_subnet.subnet1: Still destroying... [id=subnet-0a7d3066014860a8e, 19m20s elapsed]

After 19 minutes. The subnet is still not destroyed.

Expected Behavior

subnet1 is destroyed

Actual Behavior

Destroying subnet1 hangs. If I attempt to manually remove the resource from the AWS console, I get this:
image

I assume this is the same reason why Terraform fails to delete the subnet and hangs.

Steps to Reproduce

  1. terraform apply
  2. Launch some services in your ECS cluster and instances (I don't think this makes a difference)

Important Factoids

I removed the "subnet1" definition from my terraform files and added another subnet definition, causing "subnet1" to be marked for destruction. On my attempt to "apply" the changes, I encountered this hang in deletion.

needs-triage servicec2

Most helpful comment

Hi folks 馃憢 If you are seeing DependencyViolation errors on EC2 Subnet deletions or long delays in EC2 Subnet deletion, the causes for these will be very specific to your environment and sometimes caused by AWS not properly cleaning up its own infrastructure. Some pointers that may help:

  • Ensure no infrastructure creating ENIs within the Subnet exists outside Terraform
  • For AWS resources that manage ENIs automatically (including but not limited to Network Load Balancers, Lambda Functions, or EKS Node Groups that automatically provision/delete ENIs), if they require IAM Role permissions to perform these actions, ensure that the Terraform resource with the IAM Role reference also includes explicit depends_on to the aws_iam_role_policy/aws_iam_role_policy_attachment resources so those permissions remain until the AWS resource that needs those permissions is deleted properly first
  • Check for lingering ENIs in the Subnet either via the web console (EC2 > Network Interfaces) or via the AWS CLI, e.g. aws ec2 describe-network-interfaces --filters Name=subnet-id,Values=subnet-XXXXXXXXX -- these should help narrow down AWS/Terraform resources that are causing the long deletion delays or DependencyViolation errors.

All 9 comments

I've just experienced this same thing, believe it or not.

aws_subnet.subnet_ovpn: Still destroying... [id=subnet-0e676ec34db23e1d7, 1m40s elapsed]
aws_subnet.subnet_ovpn: Still destroying... [id=subnet-0e676ec34db23e1d7, 1m50s elapsed]
aws_subnet.subnet_ovpn: Still destroying... [id=subnet-0e676ec34db23e1d7, 2m0s elapsed]
aws_subnet.subnet_ovpn: Still destroying... [id=subnet-0e676ec34db23e1d7, 2m10s elapsed]
aws_subnet.subnet_ovpn: Still destroying... [id=subnet-0e676ec34db23e1d7, 2m20s elapsed]
aws_subnet.subnet_ovpn: Still destroying... [id=subnet-0e676ec34db23e1d7, 2m30s elapsed]
aws_subnet.subnet_ovpn: Still destroying... [id=subnet-0e676ec34db23e1d7, 2m40s elapsed]
aws_subnet.subnet_ovpn: Still destroying... [id=subnet-0e676ec34db23e1d7, 2m50s elapsed]
...

unfortunate this is an open issue.

Any updates on this issue? I bumped into https://aws.amazon.com/blogs/compute/update-issue-affecting-hashicorp-terraform-resource-deletions-after-the-vpc-improvements-to-aws-lambda/ but I can't delete subnets even using provider v2.41.

Hi folks 馃憢 If you are seeing DependencyViolation errors on EC2 Subnet deletions or long delays in EC2 Subnet deletion, the causes for these will be very specific to your environment and sometimes caused by AWS not properly cleaning up its own infrastructure. Some pointers that may help:

  • Ensure no infrastructure creating ENIs within the Subnet exists outside Terraform
  • For AWS resources that manage ENIs automatically (including but not limited to Network Load Balancers, Lambda Functions, or EKS Node Groups that automatically provision/delete ENIs), if they require IAM Role permissions to perform these actions, ensure that the Terraform resource with the IAM Role reference also includes explicit depends_on to the aws_iam_role_policy/aws_iam_role_policy_attachment resources so those permissions remain until the AWS resource that needs those permissions is deleted properly first
  • Check for lingering ENIs in the Subnet either via the web console (EC2 > Network Interfaces) or via the AWS CLI, e.g. aws ec2 describe-network-interfaces --filters Name=subnet-id,Values=subnet-XXXXXXXXX -- these should help narrow down AWS/Terraform resources that are causing the long deletion delays or DependencyViolation errors.

The orphaned ENI issue is also being worked here:

https://github.com/aws/amazon-vpc-cni-k8s/issues/608#issuecomment-571938279

I'm having the same issue as the OP when trying to change the availability zone of a subnet. Terraform wanted to update the auto scaling group in place, instead of destroying and recreating it. This made the subnet deletion fail as the subnet still had resources in it. There seems to be similar behavior for load balancers and RDS instances which terraform also wants to update in place.

I ended up destroying pretty much the entire infrastructure and recreating from scratch, that was the only workaround I could find.

I have the same issue. In my setup, I create a VPC, an EKS, multiple ASGs, etc. The good thing is that Terraform destroys the ASGs (and EC2 instance which are the costly resources). That bad is that the Internet Gateway, subnets, and network interfaces are left dangling.

I have noticed that they eventually get cleaned up by what is likely a background cleanup job the AWS runs to deallocate dangling resources.

Same problem for me. In my TF script I'm trying to remove one availability zone and all the resources belonging to it. It's not possible due to the fact, that TF is trying to remove the subnet and this can't be deleted because it still has resources in it. Any Ideas how to solve this problem? Any suggestion despite destroying everything?

I tried with refreshing keys like Access key and Secret key
This help to resolved my issue.

For anyone who got here because of Jenkins X on EKS, I had this issue too. The terraform destroy was stuck and couldn't delete the subnets or the internet gateway.

I manually deleted the NLB that had been created, and then re-ran the terraform destroy and then the project was deleted successfully.

Was this page helpful?
0 / 5 - 0 ratings