Terraform: Terraform apply not idempotent for security groups

Created on 26 Apr 2017 · 19Comments · Source: hashicorp/terraform

Terraform Version

0.9.3

Affected Resource(s)

aws_security_group

Terraform Configuration Files

resource "aws_security_group" "cassandra"
{
  name             = "prod"
  description   = "Security group cassandra"
  vpc_id        = "${aws_vpc.main.id}"

  // allows traffic from the SG itself for tcp
    ingress {
        from_port = 0
        to_port = 65535
        protocol = "tcp"
        self = true
    }

    // allows traffic from the SG itself for udp
    ingress {
        from_port = 0
        to_port = 65535
        protocol = "udp"
        self = true
    }

    // allow traffic for TCP 9042 (Cassandra clients)
    ingress {
        from_port = 9042
        to_port = 9042
        protocol = "tcp"
        cidr_blocks = ["${data.aws_subnet.public.*.cidr_block}"]
    }

    // allow traffic for TCP 9160 (Cassandra Thrift clients)
    ingress {
        from_port = 9160
        to_port = 9160
        protocol = "tcp"
        cidr_blocks = ["${data.aws_subnet.public.*.cidr_block}"]
    }

    // allow traffic for TCP 7199 (JMX)
    ingress {
        from_port = 7199
        to_port = 7199
        protocol = "tcp"
        cidr_blocks = ["${data.aws_subnet.public.*.cidr_block}"]
    }

  depends_on = ["data.aws_subnet.public"]
  tags {
    Name        = "prod-sg-cassandra"
    Environment = "prod"
    Type    = "cassandra"
  }
}

Debug Output

https://gist.github.com/SanchitBansal/2683c645360b8ee31978cfa75e4d7abe

Panic Output

https://gist.github.com/SanchitBansal/3c034d8380ed4e0f6f7d089cf3164979

Expected Behavior

During first time "terraform apply", it launched the complete infra and I was expecting it to just refresh the state on second time "terraform apply". Means Terraform should execute smoothly in case of multiple "terraform apply"

Actual Behavior

During first time, it executed successfully but second time it gave me error related to security group difference did not match.

Steps to Reproduce

terraform apply
terraform apply

bug core

Source

SanchitBansal

All 19 comments

Hi @SanchitBansal,

Sorry you're having a problem here, but I'm not able to reproduce this issue with the config you've provided with Terraform 0.9.3 or the latest build.

It may be related to the definition of data.aws_subnet.public, can you provide a more complete configuration to reproduce this?

Also, though I don't think it affects the issue, you don't need to add depends_on = ["data.aws_subnet.public"] when you already are referencing data.aws_subnet.public in the resource. You should rarely need depends_on at all, and putting in a data source can effect how that data source works.

jbardin on 30 Apr 2017

@jbardin : i am also facing this issue with
terraform version : Terraform v0.9.4
resource: aws_security_group_rule

shamimgeek on 30 Apr 2017

@jbardin Sharing below the required configuration

data "aws_subnet" "elb" {
  vpc_id = "${var.vpc_id}"
  filter {
    name = "tag:role"
    values = ["elb"]
  }
  filter {
    name = "tag:az"
    values = ["ap-south-1a"]
  }
  count = "${length(var.availability_zones)}"
  depends_on = ["aws_subnet.public"]
}

resource "aws_subnet" "public" {
  vpc_id            = "${var.vpc_id}"
  cidr_block        = "192.168.0.1/28"
  availability_zone = "ap-south-1a"

  tags {
    Name = "dev-elb-public-1a-1"
    role = "elb"
    az   = "ap-south-1a"
  }
}

SanchitBansal on 1 May 2017

Thanks @SanchitBansal, I was able to reproduce the error with the help of the added config .

What's causing the error is actually the depends_on value in the the data.aws_subnet.public datasource. Adding depends_on to a datasource prevents the datasource from being loaded early on, because terraform has no way to know _why_ you've added depends_on so it has to wait until apply. If the data source really does depend on the resource (though I'm not sure why you have data sources for resources that already exist in your config), you could reference an attribute via interpolation, like:

data "aws_subnet" "elb" {
  vpc_id = "${var.vpc_id}"
  filter {
    name = "tag:role"
    values = ["elb"]
  }
  filter {
    name = "tag:az"
    values = ["${aws_subnet.public.availability_zone"]
  }
}

Your cassandra config above also does not need the depends_on block, since you're already referencing the same security group in the ingress rules.

This is still a bug in terraform, as terraform apply should complete without error, but the fact that the plan can't resolve the data source because of depends_on is expected.

jbardin on 1 May 2017

I am using terraform version
Terraform v0.9.4 with below configuration and i see idempotent issue with resource :

provider "aws" {
  access_key = ""
  secret_key = ""
  insecure  = true
  skip_credentials_validation = true
  skip_region_validation = true
  region = "eucalyptus"
  endpoints {
    ec2 = "xxxxxxxxxxxxxxxxxxxxxxx"
    iam = "xxxxxxxxxxxxxxxxxxxxxxx"
    elb = "xxxxxxxxxxxxxxxxxxxx"
  }
}

resource "aws_security_group" "mesos-masters-sakhtar2" {
  name        = "mesos-masters-sakhtar2"
  description = "Security Group for mesos masters of PaaS sakhtar2"

  ingress {
    from_port = 22    to_port = 22    protocol = "tcp" cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "mesos-slaves-sakhtar2" {
  name        = "mesos-slaves-sakhtar2"
  description = "Security Group for mesos slaves of PaaS sakhtar2"

  ingress {
    from_port = 22    to_port = 22    protocol = "tcp" cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group_rule" "allow53tcp" {
    type = "ingress"
    from_port = 53
    to_port = 53
    protocol = "tcp"
    security_group_id = "${aws_security_group.mesos-masters-sakhtar2.id}"
    source_security_group_id = "${aws_security_group.mesos-slaves-sakhtar2.name}"

}

resource "aws_security_group_rule" "allow53udp" {
    type = "ingress"
    from_port = 53
    to_port = 53
    protocol = "udp"
    security_group_id = "${aws_security_group.mesos-masters-sakhtar2.id}"
    source_security_group_id = "${aws_security_group.mesos-slaves-sakhtar2.name}"

}

command output:
https://gist.github.com/shamimgeek/2b11da238795f195f7568ab0a8780775

shamimgeek on 1 May 2017

Hi @shamimgeek,

This is a different issue from the original attribute mismatch error.
Can you file a new issue with the example provided? I thought there was an open issue already, but I don't see it offhand.

jbardin on 1 May 2017

@jbardin: sure. i have opened new issue

https://github.com/hashicorp/terraform/issues/14124

shamimgeek on 1 May 2017

@jbardin I tried by removing depends_on block and working fine for now.. Actually in few cases terraform was not picking up the references by itself so I started defining dependencies in all configurations :)
Thanks a lot for your help... I will let you know in case the same error comes again even without using depends_on.

SanchitBansal on 2 May 2017

👍1

@SanchitBansal,

Glad it works! I'm actually going to keep this open because it led me to a reproduction case with a "diffs didn't match" error.

jbardin on 2 May 2017

@jbardin I'm having a related issue, and it seems to currently be by design. Every time I terraform apply vpc security groups forces a new resource. Here's the config:

resource "aws_security_group" "master" {

  ingress {
    from_port   = "80"
    to_port     = "80"
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = "443"
    to_port     = "443"
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = "0"
    to_port     = "65535"
    protocol    = "tcp"
    cidr_blocks = ["${data.terraform_remote_state.networking.vpc_cidr_block}"]
  }

  ingress {
    from_port   = "22"
    to_port     = "22"
    protocol    = "tcp"
    cidr_blocks = ["${data.terraform_remote_state.axis.public_ip}/32"]
  }

  egress {
    from_port   = "0"
    to_port     = "0"
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  vpc_id = "${data.terraform_remote_state.networking.vpc_id}"

  tags {
    Name        = "${data.terraform_remote_state.networking.environment}-master"
    Description = "ports required for the DF Master instance"
    Environment = "${data.terraform_remote_state.networking.environment}" 
  }
}

resource "aws_security_group" "slave" {

  ingress {
    from_port   = "0"
    to_port     = "65535"
    protocol    = "tcp"
    cidr_blocks = ["${data.terraform_remote_state.networking.vpc_cidr_block}"]
  }

  ingress {
    from_port   = "22"
    to_port     = "22"
    protocol    = "tcp"
    cidr_blocks = ["${data.terraform_remote_state.axis.public_ip}/32"]
  }

  egress {
    from_port   = "0"
    to_port     = "0"
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  vpc_id = "${data.terraform_remote_state.networking.vpc_id}"

  tags {
    Name        = "${data.terraform_remote_state.networking.environment}-slave"
    Description = "ports required for the DF Master instance" 
    Environment = "${data.terraform_remote_state.networking.environment}"
  }
}

Based on the docs it looks like name, description, and vpc_id all force a new resource for security groups, which leads to my instances being terminated. I removed the name and description directives, but obviously get an error when removing the vpc_id directive.

Can you offer any insight into what the correct way to configure idempotent security groups would be?

akio-outori on 22 Jul 2017

I believe I am seeing this in Terraform v0.10.7. I have 2 aws_security_group_rule hanging off a aws_security_group, each providing a description. I can create from scratch just fine but if I run plan a second time, Terraform wants to update/swap description fields for some reason.

resource "aws_security_group" "ec2_access" {
    name_prefix = "ec2-"
    description = "Controls access to the EC2 instances"
    vpc_id      = "${var.vpc_id}"
    tags {
        Name        = "EC2 Access"
        Project     = "${var.project}"
        Purpose     = "Controls access to the EC2 instances"
        Creator     = "${var.creator}"
        Environment = "${var.environment}"
        Freetext    = "${var.freetext}"
    }
    lifecycle {
        create_before_destroy = true
    }
}

resource "aws_security_group_rule" "ec2_ingress_bastion" {
    type                     = "ingress"
    from_port                = 0
    protocol                 = "all"
    security_group_id        = "${aws_security_group.ec2_access.id}"
    source_security_group_id = "${aws_security_group.bastion_access.id}"
    to_port                  = 65535
    description              = "Only allow traffic from the Bastion boxes"
    lifecycle {
        create_before_destroy = true
    }
}

resource "aws_security_group_rule" "ec2_ingress_alb" {
    type                     = "ingress"
    from_port                = 0
    protocol                 = "all"
    security_group_id        = "${aws_security_group.ec2_access.id}"
    source_security_group_id = "${aws_security_group.alb_access.id}"
    to_port                  = 65535
    description              = "Only allow traffic from the load balancers"
    lifecycle {
        create_before_destroy = true
    }
}

plan wants to make this change:

terraform show debug/proposed-changes.plan
  ~ module.security-group.aws_security_group_rule.ec2_ingress_alb
      description: "Only allow traffic from the Bastion boxes" => "Only allow traffic from the load balancers"

If I start fresh, commenting out the description attributes, I can run plan as many times as I want and Terraform rightfully thinks that no changes have to be applied.

kurron on 20 Oct 2017

👍1

@kurron,

That's an interesting error too, which may be a provider issue, but I'll leave this here for now until we can investigate further.

Extra notes: not only is the diff somehow getting the incorrect description field, but running apply again fails with an error that from-port isn't allowed, and destroying fails on the first attempt with rule does not exist

jbardin on 20 Oct 2017

Terraform v0.10.8

I can confirm similar behavior as described by @kurron

After my first plan and apply, with no changes to my TF files or the state of the resources in AWS:

Terraform plan wants to rename some of my aws_security_group_rule.descriptions
Terraform plan wants to rename these descriptions incorrectly
Terraform plan always using the same source description for the destination description
Plan apply succeeds
After plan apply no visible description changes are made

Snippet from plan:

  ~ module.table.aws_security_group_rule.ec2_admin_rdp_theirco_cidr01
      description:                       "MYCo: OFFICE1 IP block" => "TheirCo: 100.xxx.xxx.xxx"

  ~ module.table.aws_security_group_rule.ec2_admin_rdp_theirco_cidr02
      description:                       "MYCo: OFFICE1 IP block" => "TheirCo: 111.xxx.xxx.xxx"

  ~ module.table.aws_security_group_rule.ec2_admin_rdp_theirco_cidr03
      description:                       "MyCo: OFFICE1 IP block" => "TheirCo: 222.xxx.xxx.xxx"

talbright on 2 Nov 2017

Terraform v0.10.8

I can also confirm this behavior. I have several aws_security_group_rule resources, and Terraform wants to update the description fields for all but one of the resources on each plan/apply.

It seems to be in the logic that creates the .tfstate on apply. While the description fields on each inbound rule are correctly applied in AWS, each resource has the same description value written to the .tfstate , so when we do a plan/apply, Terraform needs to change them. Terraform then incorrectly applies the same description to all the resources in the .tfstate again.

battenworks on 6 Nov 2017

👍1

There is a pr that addresses the state problem in the aws provider.

pf-curtis-mitchell on 9 Nov 2017

The error I reopened this for has since been fixed, so closing it back out once and for all.

jbardin on 16 Nov 2017

Hello all,

Sorry reopen this case, but I think I can help you a little bit more to reproduce this error.

I have the same problem here. It's related with something involving Ingress descriptions when I've tried append multiple ingress roles with the same description.

Sor some reason, when I've tried retry terraform apply, tfstate didn't read the previous change corretly.

Hope it helps,

rickkbarbosa on 22 Feb 2018

Happens to me when a resource already exist which was not created by Terraform (same name). Instead of proceeding the TF fails. I have a "aws_security_group" which already exists, but in case it doesn't it needs to be created. Is it a correct behavior of TF? Or I can flag it to "skip if exists" somehow?

Dmitry1987 on 9 Aug 2018

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.