Terraform: Recreating lost resources does not work when resources are referenced by other resources using array element syntax

Created on 16 May 2017  ยท  14Comments  ยท  Source: hashicorp/terraform

Terraform Version

Terraform v0.9.6-dev
Terraform v0.9.5

Affected Resource(s)

Please list the resources as a list, for example:

  • openstack_compute_instance_v2
  • template_file
    (probably a core issue)

Terraform Configuration Files

variable "auth_url" {}
variable "domain_name" {}
variable "tenant_name" {}
variable "region" {}
variable "node_flavor" {}
variable "worker_node_flavor" {}
variable "coreos_image" {}
variable "user_name" {}
variable "password" {}
variable "network" {}

variable "worker_count" { default = 4 }

provider "openstack" {
    auth_url = "${var.auth_url}"
    domain_name = "${var.domain_name}"
    tenant_name = "${var.tenant_name}"
    user_name = "${var.user_name}"
    password = "${var.password}"
}

resource "openstack_compute_instance_v2" "worker" {
    count = "${var.worker_count}"
    name = "worker-${count.index}"
    region = "${var.region}"
    image_id = "${var.coreos_image}"
    flavor_name = "${var.worker_node_flavor}"
    network {
        uuid = "${var.network}"
    }
}

data "template_file" "workers_ansible" {
    template = "$${name} ansible_host=$${ip}"
    count = "${var.worker_count}"
    vars {
        name  = "${openstack_compute_instance_v2.worker.*.name[count.index]}"
        ip = "${openstack_compute_instance_v2.worker.*.access_ip_v4[count.index]}"
#        name  = "${element(openstack_compute_instance_v2.worker.*.name,count.index)}"
#        ip = "${element(openstack_compute_instance_v2.worker.*.access_ip_v4,count.index)}"
    }
}

output "inventory" {
    value = "${join("\n", data.template_file.workers_ansible.*.rendered)}"
}

Debug Output

https://gist.github.com/sigmunau/0c3c698bb26ec7835f146b5e49b34c3b

Expected Behavior

terraform refresh should work.
terraform plan should indicate that missing node will get recreated

Actual Behavior

Both terraform refresh and terraform plan fails with the following error:

openstack_compute_instance_v2.worker.1: Refreshing state... (ID: 73810b14-2041-4ff3-b3bb-5166ad870e51)
openstack_compute_instance_v2.worker.3: Refreshing state... (ID: 4b326211-0892-4783-9820-435fdc8eb749)
openstack_compute_instance_v2.worker.2: Refreshing state... (ID: 596dafd4-2641-4067-b15e-eb03900d82f4)
openstack_compute_instance_v2.worker.0: Refreshing state... (ID: be4a57ac-1cf1-4b59-864c-c80a5a29e67f)
data.template_file.workers_ansible.0: Refreshing state...
data.template_file.workers_ansible.1: Refreshing state...
data.template_file.workers_ansible.2: Refreshing state...
Error refreshing state: 1 error(s) occurred:

* data.template_file.workers_ansible: 1 error(s) occurred:

* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.access_ip_v4 (max 3) in:

${openstack_compute_instance_v2.worker.*.access_ip_v4[count.index]}

Steps to Reproduce

  1. terraform apply
  2. delete one of the instances using openstack portal
  3. run terraform refresh or terraform plan

Important Factoids

Replacing array syntax with element() function (as shown in comments in code) gives the desired behaviour. Changing worker_count does not trigger the problem

References

  • GH-3449 Issue originally mentioned in discussion and most likely caused by changes for this issue
  • GH-14521 Possibly related issue
bug config

Most helpful comment

Thanks for the reply. I see this still seems to be an issue in Terraform 0.9.11. Below is a really cut down example of how to re-create the issue:

###################################################################################################

variable "ami_id"    { default = "ami-af455dc9"    }
variable "az_name"   { default = "eu-west-1c"      }
variable "ssh_key"   { default = "yoursshkeyhere"  }
variable "subnet_id" { default = "subnet-00000000" }

###################################################################################################

variable "instance_type"                     { default = "t2.micro"  }
variable "instance_count"                    { default = 3           }
variable "instance_data_volume_name"         { default = "/dev/xvdb" }
variable "instance_data_volume_size"         { default = 1           }
variable "instance_data_volume_force_detach" { default = true        }
variable "instance_data_volume_skip_destroy" { default = false       }

###################################################################################################

resource "aws_instance" "instance" {
  count         = "${var.instance_count}"
  ami           = "${var.ami_id        }"
  key_name      = "${var.ssh_key       }"
  subnet_id     = "${var.subnet_id     }"
  instance_type = "${var.instance_type }"
}

###################################################################################################

resource "aws_ebs_volume" "volume_1" {
  count             = "${var.instance_count           }"
  size              = "${var.instance_data_volume_size}"
  availability_zone = "${var.az_name                  }"
}

###################################################################################################

resource "aws_volume_attachment" "scope" {
  count        = "${var.instance_count                                }"
  volume_id    = "${element(aws_ebs_volume.volume_1.*.id, count.index)}"
  device_name  = "${var.instance_data_volume_name                     }"
  instance_id  = "${element(aws_instance.instance.*.id, count.index)  }"
  force_detach = "${var.instance_data_volume_force_detach             }"
  skip_destroy = "${var.instance_data_volume_skip_destroy             }"
}

###################################################################################################

Steps:

  • Change the first four lines to match your AWS environment.
  • Run Terraform Apply against this script
  • Log into AWS, manually destroy ONE instance from the 3 created
  • Run Terraform Plan

Observe it wants to re-create all 3 volume attachments (for all 3 instances). This would cause a disruption in service in an environment where if a server got terminated then all other production servers had a filesystem ripped out and re-attached.

Running a Terraform Apply does indeed rip out the attachments and re-create them for ALL instances.

Terraform 0.9.3 worked fine...

Output from plan:

+ aws_instance.instance.1
    ami:                          "ami-af455dc9"
    associate_public_ip_address:  "<computed>"
    availability_zone:            "<computed>"
    ebs_block_device.#:           "<computed>"
    ephemeral_block_device.#:     "<computed>"
    instance_state:               "<computed>"
    instance_type:                "t2.micro"
    ipv6_address_count:           "<computed>"
    ipv6_addresses.#:             "<computed>"
    key_name:                     "<<removed by me>>"
    network_interface.#:          "<computed>"
    network_interface_id:         "<computed>"
    placement_group:              "<computed>"
    primary_network_interface_id: "<computed>"
    private_dns:                  "<computed>"
    private_ip:                   "<computed>"
    public_dns:                   "<computed>"
    public_ip:                    "<computed>"
    root_block_device.#:          "<computed>"
    security_groups.#:            "<computed>"
    source_dest_check:            "true"
    subnet_id:                    "<<removed by me>>"
    tenancy:                      "<computed>"
    volume_tags.%:                "<computed>"
    vpc_security_group_ids.#:     "<computed>"

-/+ aws_volume_attachment.scope.0
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-08d60cd9a9ef5889f" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-0d0555687a55fc584" => "vol-0d0555687a55fc584"

+ aws_volume_attachment.scope.1
    device_name:  "/dev/xvdb"
    force_detach: "true"
    instance_id:  "${element(aws_instance.instance.*.id, count.index)  }"
    skip_destroy: "false"
    volume_id:    "vol-09e95c5fa89a0650a"

-/+ aws_volume_attachment.scope.2
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-0ded09a87a5047c23" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-027145aa2fb5e1555" => "vol-027145aa2fb5e1555"


Plan: 4 to add, 0 to change, 2 to destroy.

All 14 comments

Hi @sigmunau! Sorry for the problems here and thanks for reporting this.

This seems similar to #14521, so for the moment I'm going to proceed under the assumption that they are the same root cause, though I'll definitely circle back here once I have a theory over there and see if it holds up.

Hi again @sigmunau! Sorry for the delay in getting back to you here.

There were some fixes in this area included in 0.9.6, but looking at this again with the existing fixes in mind I'm suspecting that this is something different than what we fixed already. If you're able, it'd be useful if you could retry this with the official 0.9.6 release and let me know if it's still broken and, if it is, whether there are any differences in the error messages produced. (It's possible that changes may have affected exactly how this manifests, even if they didn't fix it.)

Hi,

This appears to be affecting me on Terraform 0.9.6 official release.

* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope: 2 error(s) occurred:

* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope[1]: index 1 out of range for list var.target_instances (max 1) in:

${var.target_instances[count.index]}
* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope[2]: index 2 out of range for list var.target_instances (max 1) in:

${var.target_instances[count.index]}

Changing to an element type lookup works fine - and DOES select the correct items rather than repeating the first one.

Also the same code, using array syntax, worked fine in Terraform 0.9.3.

Thanks for the confirmation, @rlees85! I'll see if I can repro this and get it fixed.

Thanks for the reply. I see this still seems to be an issue in Terraform 0.9.11. Below is a really cut down example of how to re-create the issue:

###################################################################################################

variable "ami_id"    { default = "ami-af455dc9"    }
variable "az_name"   { default = "eu-west-1c"      }
variable "ssh_key"   { default = "yoursshkeyhere"  }
variable "subnet_id" { default = "subnet-00000000" }

###################################################################################################

variable "instance_type"                     { default = "t2.micro"  }
variable "instance_count"                    { default = 3           }
variable "instance_data_volume_name"         { default = "/dev/xvdb" }
variable "instance_data_volume_size"         { default = 1           }
variable "instance_data_volume_force_detach" { default = true        }
variable "instance_data_volume_skip_destroy" { default = false       }

###################################################################################################

resource "aws_instance" "instance" {
  count         = "${var.instance_count}"
  ami           = "${var.ami_id        }"
  key_name      = "${var.ssh_key       }"
  subnet_id     = "${var.subnet_id     }"
  instance_type = "${var.instance_type }"
}

###################################################################################################

resource "aws_ebs_volume" "volume_1" {
  count             = "${var.instance_count           }"
  size              = "${var.instance_data_volume_size}"
  availability_zone = "${var.az_name                  }"
}

###################################################################################################

resource "aws_volume_attachment" "scope" {
  count        = "${var.instance_count                                }"
  volume_id    = "${element(aws_ebs_volume.volume_1.*.id, count.index)}"
  device_name  = "${var.instance_data_volume_name                     }"
  instance_id  = "${element(aws_instance.instance.*.id, count.index)  }"
  force_detach = "${var.instance_data_volume_force_detach             }"
  skip_destroy = "${var.instance_data_volume_skip_destroy             }"
}

###################################################################################################

Steps:

  • Change the first four lines to match your AWS environment.
  • Run Terraform Apply against this script
  • Log into AWS, manually destroy ONE instance from the 3 created
  • Run Terraform Plan

Observe it wants to re-create all 3 volume attachments (for all 3 instances). This would cause a disruption in service in an environment where if a server got terminated then all other production servers had a filesystem ripped out and re-attached.

Running a Terraform Apply does indeed rip out the attachments and re-create them for ALL instances.

Terraform 0.9.3 worked fine...

Output from plan:

+ aws_instance.instance.1
    ami:                          "ami-af455dc9"
    associate_public_ip_address:  "<computed>"
    availability_zone:            "<computed>"
    ebs_block_device.#:           "<computed>"
    ephemeral_block_device.#:     "<computed>"
    instance_state:               "<computed>"
    instance_type:                "t2.micro"
    ipv6_address_count:           "<computed>"
    ipv6_addresses.#:             "<computed>"
    key_name:                     "<<removed by me>>"
    network_interface.#:          "<computed>"
    network_interface_id:         "<computed>"
    placement_group:              "<computed>"
    primary_network_interface_id: "<computed>"
    private_dns:                  "<computed>"
    private_ip:                   "<computed>"
    public_dns:                   "<computed>"
    public_ip:                    "<computed>"
    root_block_device.#:          "<computed>"
    security_groups.#:            "<computed>"
    source_dest_check:            "true"
    subnet_id:                    "<<removed by me>>"
    tenancy:                      "<computed>"
    volume_tags.%:                "<computed>"
    vpc_security_group_ids.#:     "<computed>"

-/+ aws_volume_attachment.scope.0
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-08d60cd9a9ef5889f" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-0d0555687a55fc584" => "vol-0d0555687a55fc584"

+ aws_volume_attachment.scope.1
    device_name:  "/dev/xvdb"
    force_detach: "true"
    instance_id:  "${element(aws_instance.instance.*.id, count.index)  }"
    skip_destroy: "false"
    volume_id:    "vol-09e95c5fa89a0650a"

-/+ aws_volume_attachment.scope.2
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-0ded09a87a5047c23" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-027145aa2fb5e1555" => "vol-027145aa2fb5e1555"


Plan: 4 to add, 0 to change, 2 to destroy.

I've encountered similar behavior using Terraform 0.9.11 and openstack_compute_volume_attach_v2. Simply by tainting a node or changing the count of the compute resource, _every_ volume attachment would have to be recreated, and thus every node is impacted by the change. I think the ideal behavior would be to confine the behavior only to the nodes that are "supposed" to change.

This is impacting me quite badly when creation of a resource fails. Subsequent applies hit this index out of range error. Still seems to be an issue in TF 0.10.

https://github.com/hashicorp/terraform/issues/16110

Also related.

Hitting me quite hard again now too. Even with resources that don't use a count. Data sources run with against the current state not the target state EVEN if there is a direct input into the module from the prerequisite module.

i.e.

module "my_security_group" {
  source = "git::blah"
  name  = "my_sg"
..
}

module "my_security_group_rule" {
  source = "git::blah-blah"
  security_group_name = "${module.my_security_group.name}"
}

my_security_group_rule has a data source that resolves the security group ID by the name. This is so things can be decoupled.

First run works, if you change the input name = "my_sg" it breaks on second apply, as the data source in my_security_group_rule is searching against the current state not the target state.

Are there any workarounds to this? I am on Terraform v0.11.1. I do not use modules at all.

Halp! I can't figure this out. I removed the .terraform folder by accident and started receiving the same error. I've never had this issue before and have not modified any code related to the VPC and now I cant update simple stuff in my production account. Any hel[p would be greatly appreciated. thanks

To give me detail. I am outputing this from the VPC module

value = "${module.vpc_platform.private_route_table_ids}"

and ingesting it like this

route_table_id = "${module.platform_virginia_v1.private_route_table_ids[0]}"

The element fixed my issue.

Hello! :robot:

This issue relates to an older version of Terraform that is no longer in active development, and because the area of Terraform it relates to has changed significantly since the issue was opened we suspect that the issue is either fixed or that the circumstances around it have changed enough that we'd need an updated issue report in order to reproduce and address it.

If you're still seeing this or a similar issue in the latest version of Terraform, please do feel free to open a new bug report! Please be sure to include all of the information requested in the template, even if it might seem redundant with the information already shared in _this_ issue, because the internal details relating to this problem are likely to be different in the current version of Terraform.

Thanks!

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings