_This issue was originally opened by @brikis98 as hashicorp/terraform#11047. It was migrated here as a result of the provider split. The original body of the issue is below._
Terraform v0.8.2
This is part of a larger configuration, but I think the relevant parts are as follows.
Under modules/webserver-cluster/main.tf
, I define a module with the following code:
resource "aws_autoscaling_group" "example" {
launch_configuration = "${aws_launch_configuration.example.id}"
availability_zones = ["${data.aws_availability_zones.all.names}"]
load_balancers = ["${aws_elb.example.name}"]
health_check_type = "ELB"
min_size = 2
max_size = 10
}
resource "aws_launch_configuration" "example" {
image_id = "ami-40d28157"
instance_type = "t2.micro"
security_groups = ["${aws_security_group.instance.id}"]
lifecycle {
create_before_destroy = true
}
}
resource "aws_security_group" "instance" {
name = "my-security-group"
lifecycle {
create_before_destroy = true
}
}
resource "aws_security_group_rule" "allow_http_inbound" {
type = "ingress"
security_group_id = "${aws_security_group.instance.id}"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
data "aws_availability_zones" "all" {}
resource "aws_elb" "example" {
name = "my-example-elb"
availability_zones = ["${data.aws_availability_zones.all.names}"]
security_groups = ["${aws_security_group.elb.id}"]
listener {
lb_port = 80
lb_protocol = "http"
instance_port = 80
instance_protocol = "http"
}
health_check {
healthy_threshold = 2
unhealthy_threshold = 2
timeout = 3
interval = 30
target = "HTTP:80/"
}
}
resource "aws_security_group" "elb" {
name = "elb"
}
resource "aws_security_group_rule" "allow_http_inbound" {
type = "ingress"
security_group_id = "${aws_security_group.elb.id}"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "allow_all_outbound" {
type = "egress"
security_group_id = "${aws_security_group.elb.id}"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
output "elb_security_group_id" {
value = "${aws_security_group.elb.id}"
}
In a separate folder, I use this module in the usual way, but also add a custom security group rule:
module "webserver_cluster" {
source = "modules/webserver-cluster"
# ... pass various parameters ...
}
resource "aws_security_group_rule" "allow_testing_inbound" {
type = "ingress"
security_group_id = "${module.webserver_cluster.elb_security_group_id}"
from_port = 12345
to_port = 12345
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
I expect to be able to run terraform apply
and terraform destroy
without errors.
terraform apply
works fine. Occasionally, terraform destroy
fails with the following error:
aws_security_group.elb: DependencyViolation: resource sg-344baa48 has a dependent object
terraform apply
terraform destroy
It's an intermittent issue, so I can't be sure, but I don't think this error happened with Terraform 0.7.x.
I have run into this issue with terraform 0.10.6.
+ module.infrastructure.aws_security_group.sg
id: <computed>
description: "Allow traffic to sg from client security groups"
egress.#: <computed>
ingress.#: "1"
ingress.522618655.cidr_blocks.#: "0"
ingress.522618655.from_port: "1234"
ingress.522618655.ipv6_cidr_blocks.#: "0"
ingress.522618655.protocol: "tcp"
ingress.522618655.security_groups.#: "1"
ingress.522618655.security_groups.980544208: "sg-175fa66a"
ingress.522618655.self: "false"
ingress.522618655.to_port: "1234"
name: "sg_ingress_ydqxa4"
owner_id: <computed>
vpc_id: "vpc-63741921"
the delete retried multiple times
* aws_security_group.sg: DependencyViolation: resource sg-234bb25e has a dependent object
status code: 400, request id: bd64a44d-3e84-4ac4-a2c9-4e392f7c88a3
Terraform v0.9.9
Same issue
Terraform v0.10.7
Same issue. Is the only workaround to delete the SG manually and then recreate it via TF?
Same issue. For me I'm 99% sure it's because there an ec-2 instance not being changed that is still using the security group. So right now looks like I have to make this change manually.
Yes, that's indeed the case. I don't have access to the Web UI (managed by the client), so I had to resolve it manually. I created an empty security group, replaced the existing one with that empty SG and then reran the Terraform command. That worked fine.
Same issue for us, has anyone tested if v0.10.8 fixes this?
I'm on v0.10.8 and have experienced this.
Got it again...
* module.network.module.aws_vpc.aws_security_group.default_private (destroy): 1 error(s) occurred:
* aws_security_group.default_private: DependencyViolation: resource sg-cd40e2b6 has a dependent object
status code: 400, request id: 6da496ce-b444-4a5c-b85d-c4f2bbadf842
* module.network.module.aws_vpc.aws_subnet.public[0] (destroy): 1 error(s) occurred:
* aws_subnet.public.0: Error deleting subnet: timeout while waiting for state to become 'destroyed' (last state: 'pending', timeout: 10m0s
Hey all –
It sounds like, because this is intermittent, that the times it's failing is because "aws_security_group_rule" "allow_testing_inbound"
is set to be destroyed _after_ the Security Group itself... I believe because the rule is dependent on the _output_ of the module and not the _group itself_. But I could be wrong.
As a workaround for that, version 1.2.0
of the AWS provider shipped with an new attribute on security_group
called revoke_rules_on_delete
:
Adding that to the security group in the module will likely work around this.
There was other mention of an instance still using the group, can anyone provide a configuration that triggers this with instances?
Thanks!
@catsby I just tried setting revoke_rules_on_delete
to true
on my Security Group, but I still get the exact same aws_security_group: DependencyViolation: resource sg-XXX has a dependent object
error on destroy
.
The code doesn't seem to be doing anything very complicated. The simplified version is as follows.
I have a module called single-server
:
resource "aws_instance" "instance" {
ami = "${var.ami}"
instance_type = "${var.instance_type}"
vpc_security_group_ids = ["${aws_security_group.instance.id}"]
user_data = "${var.user_data}"
# ... (other params omitted) ...
}
resource "aws_security_group" "instance" {
name = "${var.name}"
description = "Security Group for ${var.name}"
vpc_id = "${var.vpc_id}"
# This workaround, unfortunately, did not help
revoke_rules_on_delete = true
}
resource "aws_security_group_rule" "allow_outbound_all" {
type = "egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
security_group_id = "${aws_security_group.instance.id}"
}
resource "aws_security_group_rule" "allow_inbound_ssh_from_cidr" {
count = "${signum(var.allow_ssh_from_cidr)}"
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["${var.allow_ssh_from_cidr_list}"]
security_group_id = "${aws_security_group.instance.id}"
}
resource "aws_security_group_rule" "allow_inbound_ssh_from_security_group" {
count = "${signum(var.allow_ssh_from_security_group)}"
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
source_security_group_id = "${var.allow_ssh_from_security_group_id}"
security_group_id = "${aws_security_group.instance.id}"
}
I'm using this module in some code that creates a server and two EBS volumes for it:
module "example" {
source = "../../modules/single-server"
name = "example"
instance_type = "t2.micro"
ami = "${var.ami}"
allow_ssh_from_cidr_list = ["0.0.0.0/0"]
vpc_id = "${data.aws_vpc.default.id}"
subnet_id = "${data.aws_subnet.selected.id}"
# Script that attaches and mounts the two EBS volumes
user_data = "${data.template_file.user_data.rendered}"
}
resource "aws_ebs_volume" "example_1" {
availability_zone = "${data.aws_subnet.selected.availability_zone}"
type = "gp2"
size = 5
}
resource "aws_ebs_volume" "example_2" {
availability_zone = "${data.aws_subnet.selected.availability_zone}"
type = "gp2"
size = 5
}
We have automated tests that run against this code and do the following:
apply
.name
parameter and re-run apply
to force a redeploy.destroy
.All of this works until destroy
, where we get the aws_security_group: DependencyViolation: resource sg-XXX has a dependent object
error. It's happening fairly consistently lately, even though there is nothing in the code anywhere—including the Terraform code, User Data script, or test code—that in any way touches the security group, so I'm quite stumped what could possibly be triggering this problem.
I'm on 0.10.8 and am currently experiencing this issue. One tip for figuring out what the "dependent object" is: type the name of the SG into the ENI instances searchbox (https://serverfault.com/a/866203/223606). It appears that an attached ENI is preventing Terraform from deleting my SG.
@brikis98 are you able to do as elektron9 above mentioned, and determine what the dependency is? The revoke_rules_on_delete
parameter will only help here if the dependency is due to a security group rule that has caused a dependency loop with another security group. Perhaps yours is something else?
I'll have to check next time I'm working on this code, but I'm pretty sure it's not an ENI, as there are no ENIs being created in that code.
Update: we eventually determined that this issue was being caused by a security group that had an inbound rule for another security group. After we manually removed the inbound rule, Terraform was able to proceed with the destruction of the security group that was causing this issue.
We did toggle the revoke_rules_on_delete
setting to true
but the Terraform deploy of that change was blocked by this issue.
@elektron9 Can you confirm that revoke_rules_on_delete
fixes the issue of being unable to delete a security group that had an inbound rule for another security group?
@CamelCaseNotation yes, our issue was resolved after setting revoke_rules_on_delete
to true
.
Spent several hours on various configurations to attempt to work around this. The DependencyViolation... has a dependent object error occurs after the 5 minute timeout in every scenario. Bottom line is that the network interface does not get assigned to the new security group if a new SG resource must be created (e.g. sg name change). The security group is created (if lifecycle create_before_destroy = true) as desired alongside the existing sg which is assigned to the ENI, but the ENI is never reassigned to the new SG.
While Terraform is waiting, I can go into the AWS console and do "change security groups" on either the Network Interface or the ec2 Instance itself and Terraform will immediately continue its process and remove the old SG before completing.
I also tried several iterations using aws_network_interface_sg_attachment without a security_group block on the aws_instance. This deploys fine, but it relies on the AZs default_vpc to initially launch the ec2 instance into which is a security issue for us and leaves the issue of removing it after deployment. Anyway, the idea was to see if a more specific dependency on the ENI would cause terraform to make the change on AWS (ENI to new SG). It did not work.
Is there no explicit way of causing the "Change Security Group" functionality? Seems this should be done under the Terraform hood whenever it recognizes that the SG will change for an instance or ENI.
OK, I finally had some time to go back and dig into this, and I think I've figured out what's happening! The code looks roughly like this:
resource "aws_instance" "example" {
# ... (other params omitted) ...
vpc_security_group_ids = ["${aws_security_group.example.id}"]
tags {
Name = "${var.name}"
}
}
resource "aws_security_group" "example" {
name = "${var.name}"
revoke_rules_on_delete = true
# ... (other params omitted) ...
}
In our test code, we are updating var.name
and running terraform apply
. Changing the name of a security group means deleting the old one and replacing it with a new one... But Terraform can't do that because aws_instance.example
still depends on it! That's why we are getting the DependencyViolation: resource sg-XXX has a dependent object
error.
I think all we really need is a create_before_destroy = true
on aws_security_group.example
. I'll try that and report back.
OK, it looks like adding create_before_destroy = true
, and using name_prefix
instead of name
fixed this issue. I can't believe it took me this long to figure it out!
resource "aws_security_group" "example" {
name_prefix = "${var.name}"
# ... (other params omitted) ...
lifecycle {
create_before_destroy = true
}
}
Sweet! I tested create_before_destroy = true, and using name_prefix instead of name fixed it for me! Thank you brikis98
@brikis98 @ura718 you can also use revoke_rules_on_delete = true
I am experiencing the issue where security groups are not deleted, referencing dependent objects, because they are attached to lingering ENIs.
The ENIs seem to be coming from an aws_launch_template/aws_autoscaling_group combo and since I did not experience this behaviour when I was using aws_launch_configuration, I suspect that aws_launch_template is somehow the cause of this.
I have tried to solve the problem via revoke_rules_on_delete
, lifecycle
and name_prefix
but they all have no effect since the root cause are the lingering ENIs.
As of 0.11.7 it was fixed by lifecycle { create_before_destroy = true }
@martinbokmankewill I've been running into the same issue recently as well. Noticed, the lingering ENIs are almost always previously attached to an ELB. Are you still running into the issue?
I am not running into the issue anymore.
I traced it to not having set delete_on_termination = true
in the network_interfaces
part of the aws_launch_template
resource I was using.
Anything I try is doesn't work. You can try in this repo. Just make sure you have DEBUG enabled. Is anyone knows the solution for this repo?
No of the above work for me, i have to change SG name in terraform
Encountered this today as well.
lifecycle {
create_before_destroy = true
}
fixed it for me as well.
Most helpful comment
OK, it looks like adding
create_before_destroy = true
, and usingname_prefix
instead ofname
fixed this issue. I can't believe it took me this long to figure it out!