Terraform: Security group --> vpc_security_group_ids change bug

Created on 29 Apr 2016  Â·  16Comments  Â·  Source: hashicorp/terraform

Terraform Version

0.6.15 ( Works in 0.6.14 )

Affected Resource(s)

  • aws_instance

    Terraform Configuration Files

    security_groups
    vpc_security_group_ids

    Info

In Terraform 0.6.14 security_groups was valid from custom VPC, in 0.6.15 it takes it as a value and works but then upon a refresh it tries to re-create the resource.

This means from upgrading to 0.6.14 to 0.6.15 the it re-creates these resources each time:

security_groups.#:          "0" => "2" (forces new resource)
security_groups.3326659650: "" => "sg-6b56000f" (forces new resource)
security_groups.3522483665: "" => "sg-74560010" (forces new resource)

Even though the resource has these 2 security groups attached and working. This either needs documenting or some more verbose error to say that this security group is not valid.

bug provideaws

Most helpful comment

Hey friends – we identified the regression that's causing this to start happening, it was introduced in v0.6.15. I have a pull request open here https://github.com/hashicorp/terraform/pull/6664 that will revert the regression.

That said, after discussion we'll likely reestablish this behavior with an upcoming release (probably v0.7.0). The difference between security_groups and vpc_security_group_ids is subtle on the surface, but has bigger implications. In EC2 Classic, security groups were referred to by name, and changing the security groups that an instance belonged to required the instance to be stopped first. Because Terraform does not support stopping and restarting an Instance, it was simply destroyed and recreated.

EC2-VPC allows the security groups to be modified on a running Instance, without stopping/destroying it. A while back, a new config attribute was added, vpc_security_group_ids, to allow users to add/remove security groups to an instance without triggering the need to destroy and recreate the instance, while still providing backwards compatibly for users on EC2 Classic.

The attributes are defined as follows:

• `security_groups` - (Optional) A list of security group names to associate with. If you are within a non-default VPC, you'll need to use vpc_security_group_ids instead.
• `vpc_security_group_ids` - (Optional) A list of security group IDs to associate with.

What was happening here is Terraform allowing users to provide Security Group IDs in the security_groups attribute. When we READ the resource, we determined if we were in a VPC or not, and only saved to state the groups for either security_groups or vpc_security_group_ids. Because these attributes are “computed” (meaning TF is allowed to read them from the API and not complain about them not being in the configuration), Terraform simply put what was in the configuration into state as-is. What would
happen is, you would use security_groups, but we would save them as vpc_security_group_ids; you would have both populated in the state, and Terraform wouldn’t complain because they are both “computed”.

With the change here, security_groups or vpc_security_group_ids are being “zeroed out” in the state based on if we’re in a VPC or not. This is where the change appears, because users were being allowed to use security_groups, thus Terraform displays the diff because only vpc_security_group_ids were getting written to state. To make matters worse, Terraform wanted to remove the security_groups, which forces destruction of the resource.

Ultimately we believe the behavior is correct now; vpc_security_group_ids should be used for Security Groups and Instances in a VPC, and security_groups should only be used in EC2-Classic. That said, the hard-break was unintentional and unannounced, and we’re going to revert to the old behavior for now.

In the near future we’ll reestablish this requirement, and possibly even deprecate security_groups in favor of something more explicit, like security_group_names, but when we do we’ll be upfront about it in the CHANGELOG and ideally add runtime validation for it as well.

I apologize for the trouble here! Thank you for your patience as we sorted it out. #6664 should be included in an upcoming maintenance release and restore the expected behavior.

Please let me know if you have any questions.
Thanks!
Clint

All 16 comments

Have the same issue, forced to revert back to 0.6.14, not fixed in current master either.

Sorry for the trouble, folks!

The referenced issue #6369 has a workaround for this, which appears to allow you to get around the problem by adjusting the config to match what Terraform is reading back from AWS.

It sounded like @catsby wanted to use this ticket to investigate this and either implement a fix or document the workaround, but hopefully that workaround is useful in the mean time.

thanks @apparentlymart , that worked.
I think it is safe to close this one :+1:

In #6585 @radeksimko is retroactively adding a note to the changelog about this.

@catsby, did you want to add something to the main aws_instance docs about this, or perhaps change the provider implementation to give better feedback for this case?

(A potential fix for this was merged in #5193 but then reverted because it broke for EC2-Classic users.)

I have closed several other reports of this same issue to consolidate discussion here. Let's not close this one until we're sure we've got a handle on the plan for this problem.

Hey friends – we identified the regression that's causing this to start happening, it was introduced in v0.6.15. I have a pull request open here https://github.com/hashicorp/terraform/pull/6664 that will revert the regression.

That said, after discussion we'll likely reestablish this behavior with an upcoming release (probably v0.7.0). The difference between security_groups and vpc_security_group_ids is subtle on the surface, but has bigger implications. In EC2 Classic, security groups were referred to by name, and changing the security groups that an instance belonged to required the instance to be stopped first. Because Terraform does not support stopping and restarting an Instance, it was simply destroyed and recreated.

EC2-VPC allows the security groups to be modified on a running Instance, without stopping/destroying it. A while back, a new config attribute was added, vpc_security_group_ids, to allow users to add/remove security groups to an instance without triggering the need to destroy and recreate the instance, while still providing backwards compatibly for users on EC2 Classic.

The attributes are defined as follows:

• `security_groups` - (Optional) A list of security group names to associate with. If you are within a non-default VPC, you'll need to use vpc_security_group_ids instead.
• `vpc_security_group_ids` - (Optional) A list of security group IDs to associate with.

What was happening here is Terraform allowing users to provide Security Group IDs in the security_groups attribute. When we READ the resource, we determined if we were in a VPC or not, and only saved to state the groups for either security_groups or vpc_security_group_ids. Because these attributes are “computed” (meaning TF is allowed to read them from the API and not complain about them not being in the configuration), Terraform simply put what was in the configuration into state as-is. What would
happen is, you would use security_groups, but we would save them as vpc_security_group_ids; you would have both populated in the state, and Terraform wouldn’t complain because they are both “computed”.

With the change here, security_groups or vpc_security_group_ids are being “zeroed out” in the state based on if we’re in a VPC or not. This is where the change appears, because users were being allowed to use security_groups, thus Terraform displays the diff because only vpc_security_group_ids were getting written to state. To make matters worse, Terraform wanted to remove the security_groups, which forces destruction of the resource.

Ultimately we believe the behavior is correct now; vpc_security_group_ids should be used for Security Groups and Instances in a VPC, and security_groups should only be used in EC2-Classic. That said, the hard-break was unintentional and unannounced, and we’re going to revert to the old behavior for now.

In the near future we’ll reestablish this requirement, and possibly even deprecate security_groups in favor of something more explicit, like security_group_names, but when we do we’ll be upfront about it in the CHANGELOG and ideally add runtime validation for it as well.

I apologize for the trouble here! Thank you for your patience as we sorted it out. #6664 should be included in an upcoming maintenance release and restore the expected behavior.

Please let me know if you have any questions.
Thanks!
Clint

The issue is still present with v.0.7.0-rc1

still present in 0.10.8!

It happened with me when I used variables on cdir_blocks.

it's happening to me with security_groups

$ terraform --version
Terraform v0.11.7
+ provider.aws v1.15.0

Solution:

Similar rules (to_port, from_port, protocol) should be clubbed within a single ingress block

@idlecool are you having issues? Can you share your configuration that causes it?

@catsby

Version that didn't work:

resource "aws_security_group" "efs" {
  name        = "efs"
  vpc_id      = "${aws_vpc.main.id}"
  description = "EFS mount target security group"

  ingress {
    protocol  = -1
    from_port = 0
    to_port   = 0
    security_groups = [
      "${aws_security_group.client_1.id}",
    ]
    description = "${aws_security_group.client_1.name}"
  }

  ingress {
    protocol  = -1
    from_port = 0
    to_port   = 0
    security_groups = [
      "${aws_security_group.client_2.id}"
    ]
    description = "${aws_security_group.client_2.name}"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = -1
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Version that did work:

resource "aws_security_group" "efs" {
  name        = "efs"
  vpc_id      = "${aws_vpc.main.id}"
  description = "EFS mount target security group"

  ingress {
    protocol  = -1
    from_port = 0
    to_port   = 0
    security_groups = [
      "${aws_security_group.client_1.id}",
      "${aws_security_group.client_2.id}"
    ]
    description = "${aws_security_group.client_1.name}, ${aws_security_group.client_2.name}"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = -1
    cidr_blocks = ["0.0.0.0/0"]
  }
}

@catsby it happened to me before when I was trying to add multiple subnets into a single security group ingress rule using count

I had to use list instead.

Something that always works:

resource "aws_security_group" "sg1" {
    ingress {
        security_groups = [sg[0], sg[2], sg[3]]
    }
}

Something that doesn't always work:

resource "aws_security_group" "sg1" {
    ingress {
        count = "${length(sg)}"
        security_groups = ["${element(sg, count.index)}"]
    }
}

I think this is happening because AWS doesn't assign a ID to the ingress/egress rules. Like it does for other resources. And terraform is having a hard time identifying the rules it has previously written.

I am able to reproduce on last version please look at: https://github.com/hashicorp/terraform/issues/13388

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

darron picture darron  Â·  3Comments

ronnix picture ronnix  Â·  3Comments

rjinski picture rjinski  Â·  3Comments

zeninfinity picture zeninfinity  Â·  3Comments

larstobi picture larstobi  Â·  3Comments