Terraform: Recreating lost resources does not work when resources are referenced by other resources using array element syntax

Created on 16 May 2017 · 14Comments · Source: hashicorp/terraform

Terraform Version

Terraform v0.9.6-dev
Terraform v0.9.5

Affected Resource(s)

Please list the resources as a list, for example:

openstack_compute_instance_v2
template_file
(probably a core issue)

Terraform Configuration Files

variable "auth_url" {}
variable "domain_name" {}
variable "tenant_name" {}
variable "region" {}
variable "node_flavor" {}
variable "worker_node_flavor" {}
variable "coreos_image" {}
variable "user_name" {}
variable "password" {}
variable "network" {}

variable "worker_count" { default = 4 }

provider "openstack" {
    auth_url = "${var.auth_url}"
    domain_name = "${var.domain_name}"
    tenant_name = "${var.tenant_name}"
    user_name = "${var.user_name}"
    password = "${var.password}"
}

resource "openstack_compute_instance_v2" "worker" {
    count = "${var.worker_count}"
    name = "worker-${count.index}"
    region = "${var.region}"
    image_id = "${var.coreos_image}"
    flavor_name = "${var.worker_node_flavor}"
    network {
        uuid = "${var.network}"
    }
}

data "template_file" "workers_ansible" {
    template = "$${name} ansible_host=$${ip}"
    count = "${var.worker_count}"
    vars {
        name  = "${openstack_compute_instance_v2.worker.*.name[count.index]}"
        ip = "${openstack_compute_instance_v2.worker.*.access_ip_v4[count.index]}"
#        name  = "${element(openstack_compute_instance_v2.worker.*.name,count.index)}"
#        ip = "${element(openstack_compute_instance_v2.worker.*.access_ip_v4,count.index)}"
    }
}

output "inventory" {
    value = "${join("\n", data.template_file.workers_ansible.*.rendered)}"
}

Debug Output

https://gist.github.com/sigmunau/0c3c698bb26ec7835f146b5e49b34c3b

Expected Behavior

terraform refresh should work.
terraform plan should indicate that missing node will get recreated

Actual Behavior

Both terraform refresh and terraform plan fails with the following error:

openstack_compute_instance_v2.worker.1: Refreshing state... (ID: 73810b14-2041-4ff3-b3bb-5166ad870e51)
openstack_compute_instance_v2.worker.3: Refreshing state... (ID: 4b326211-0892-4783-9820-435fdc8eb749)
openstack_compute_instance_v2.worker.2: Refreshing state... (ID: 596dafd4-2641-4067-b15e-eb03900d82f4)
openstack_compute_instance_v2.worker.0: Refreshing state... (ID: be4a57ac-1cf1-4b59-864c-c80a5a29e67f)
data.template_file.workers_ansible.0: Refreshing state...
data.template_file.workers_ansible.1: Refreshing state...
data.template_file.workers_ansible.2: Refreshing state...
Error refreshing state: 1 error(s) occurred:

* data.template_file.workers_ansible: 1 error(s) occurred:

* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.access_ip_v4 (max 3) in:

${openstack_compute_instance_v2.worker.*.access_ip_v4[count.index]}

Steps to Reproduce

terraform apply
delete one of the instances using openstack portal
run terraform refresh or terraform plan

Important Factoids

Replacing array syntax with element() function (as shown in comments in code) gives the desired behaviour. Changing worker_count does not trigger the problem

References

GH-3449 Issue originally mentioned in discussion and most likely caused by changes for this issue
GH-14521 Possibly related issue

bug config

Source

sigmunau

👍4

Most helpful comment

Thanks for the reply. I see this still seems to be an issue in Terraform 0.9.11. Below is a really cut down example of how to re-create the issue:

###################################################################################################

variable "ami_id"    { default = "ami-af455dc9"    }
variable "az_name"   { default = "eu-west-1c"      }
variable "ssh_key"   { default = "yoursshkeyhere"  }
variable "subnet_id" { default = "subnet-00000000" }

###################################################################################################

variable "instance_type"                     { default = "t2.micro"  }
variable "instance_count"                    { default = 3           }
variable "instance_data_volume_name"         { default = "/dev/xvdb" }
variable "instance_data_volume_size"         { default = 1           }
variable "instance_data_volume_force_detach" { default = true        }
variable "instance_data_volume_skip_destroy" { default = false       }

###################################################################################################

resource "aws_instance" "instance" {
  count         = "${var.instance_count}"
  ami           = "${var.ami_id        }"
  key_name      = "${var.ssh_key       }"
  subnet_id     = "${var.subnet_id     }"
  instance_type = "${var.instance_type }"
}

###################################################################################################

resource "aws_ebs_volume" "volume_1" {
  count             = "${var.instance_count           }"
  size              = "${var.instance_data_volume_size}"
  availability_zone = "${var.az_name                  }"
}

###################################################################################################

resource "aws_volume_attachment" "scope" {
  count        = "${var.instance_count                                }"
  volume_id    = "${element(aws_ebs_volume.volume_1.*.id, count.index)}"
  device_name  = "${var.instance_data_volume_name                     }"
  instance_id  = "${element(aws_instance.instance.*.id, count.index)  }"
  force_detach = "${var.instance_data_volume_force_detach             }"
  skip_destroy = "${var.instance_data_volume_skip_destroy             }"
}

###################################################################################################

Steps:

Change the first four lines to match your AWS environment.
Run Terraform Apply against this script
Log into AWS, manually destroy ONE instance from the 3 created
Run Terraform Plan

Observe it wants to re-create all 3 volume attachments (for all 3 instances). This would cause a disruption in service in an environment where if a server got terminated then all other production servers had a filesystem ripped out and re-attached.

Running a Terraform Apply does indeed rip out the attachments and re-create them for ALL instances.

Terraform 0.9.3 worked fine...

Output from plan:

+ aws_instance.instance.1
    ami:                          "ami-af455dc9"
    associate_public_ip_address:  "<computed>"
    availability_zone:            "<computed>"
    ebs_block_device.#:           "<computed>"
    ephemeral_block_device.#:     "<computed>"
    instance_state:               "<computed>"
    instance_type:                "t2.micro"
    ipv6_address_count:           "<computed>"
    ipv6_addresses.#:             "<computed>"
    key_name:                     "<<removed by me>>"
    network_interface.#:          "<computed>"
    network_interface_id:         "<computed>"
    placement_group:              "<computed>"
    primary_network_interface_id: "<computed>"
    private_dns:                  "<computed>"
    private_ip:                   "<computed>"
    public_dns:                   "<computed>"
    public_ip:                    "<computed>"
    root_block_device.#:          "<computed>"
    security_groups.#:            "<computed>"
    source_dest_check:            "true"
    subnet_id:                    "<<removed by me>>"
    tenancy:                      "<computed>"
    volume_tags.%:                "<computed>"
    vpc_security_group_ids.#:     "<computed>"

-/+ aws_volume_attachment.scope.0
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-08d60cd9a9ef5889f" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-0d0555687a55fc584" => "vol-0d0555687a55fc584"

+ aws_volume_attachment.scope.1
    device_name:  "/dev/xvdb"
    force_detach: "true"
    instance_id:  "${element(aws_instance.instance.*.id, count.index)  }"
    skip_destroy: "false"
    volume_id:    "vol-09e95c5fa89a0650a"

-/+ aws_volume_attachment.scope.2
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-0ded09a87a5047c23" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-027145aa2fb5e1555" => "vol-027145aa2fb5e1555"


Plan: 4 to add, 0 to change, 2 to destroy.

rlees85 on 19 Jul 2017

👍6

All 14 comments

Hi @sigmunau! Sorry for the problems here and thanks for reporting this.

This seems similar to #14521, so for the moment I'm going to proceed under the assumption that they are the same root cause, though I'll definitely circle back here once I have a theory over there and see if it holds up.

apparentlymart on 16 May 2017

Hi again @sigmunau! Sorry for the delay in getting back to you here.

There were some fixes in this area included in 0.9.6, but looking at this again with the existing fixes in mind I'm suspecting that this is something different than what we fixed already. If you're able, it'd be useful if you could retry this with the official 0.9.6 release and let me know if it's still broken and, if it is, whether there are any differences in the error messages produced. (It's possible that changes may have affected exactly how this manifests, even if they didn't fix it.)

apparentlymart on 31 May 2017

Hi,

This appears to be affecting me on Terraform 0.9.6 official release.

* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope: 2 error(s) occurred:

* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope[1]: index 1 out of range for list var.target_instances (max 1) in:

${var.target_instances[count.index]}
* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope[2]: index 2 out of range for list var.target_instances (max 1) in:

${var.target_instances[count.index]}

Changing to an element type lookup works fine - and DOES select the correct items rather than repeating the first one.

Also the same code, using array syntax, worked fine in Terraform 0.9.3.

rlees85 on 5 Jun 2017

Thanks for the confirmation, @rlees85! I'll see if I can repro this and get it fixed.

apparentlymart on 5 Jun 2017

Thanks for the reply. I see this still seems to be an issue in Terraform 0.9.11. Below is a really cut down example of how to re-create the issue:

###################################################################################################

variable "ami_id"    { default = "ami-af455dc9"    }
variable "az_name"   { default = "eu-west-1c"      }
variable "ssh_key"   { default = "yoursshkeyhere"  }
variable "subnet_id" { default = "subnet-00000000" }

###################################################################################################

variable "instance_type"                     { default = "t2.micro"  }
variable "instance_count"                    { default = 3           }
variable "instance_data_volume_name"         { default = "/dev/xvdb" }
variable "instance_data_volume_size"         { default = 1           }
variable "instance_data_volume_force_detach" { default = true        }
variable "instance_data_volume_skip_destroy" { default = false       }

###################################################################################################

resource "aws_instance" "instance" {
  count         = "${var.instance_count}"
  ami           = "${var.ami_id        }"
  key_name      = "${var.ssh_key       }"
  subnet_id     = "${var.subnet_id     }"
  instance_type = "${var.instance_type }"
}

###################################################################################################

resource "aws_ebs_volume" "volume_1" {
  count             = "${var.instance_count           }"
  size              = "${var.instance_data_volume_size}"
  availability_zone = "${var.az_name                  }"
}

###################################################################################################

resource "aws_volume_attachment" "scope" {
  count        = "${var.instance_count                                }"
  volume_id    = "${element(aws_ebs_volume.volume_1.*.id, count.index)}"
  device_name  = "${var.instance_data_volume_name                     }"
  instance_id  = "${element(aws_instance.instance.*.id, count.index)  }"
  force_detach = "${var.instance_data_volume_force_detach             }"
  skip_destroy = "${var.instance_data_volume_skip_destroy             }"
}

###################################################################################################

Steps:

Change the first four lines to match your AWS environment.
Run Terraform Apply against this script
Log into AWS, manually destroy ONE instance from the 3 created
Run Terraform Plan

Running a Terraform Apply does indeed rip out the attachments and re-create them for ALL instances.

Terraform 0.9.3 worked fine...

Output from plan:

+ aws_instance.instance.1
    ami:                          "ami-af455dc9"
    associate_public_ip_address:  "<computed>"
    availability_zone:            "<computed>"
    ebs_block_device.#:           "<computed>"
    ephemeral_block_device.#:     "<computed>"
    instance_state:               "<computed>"
    instance_type:                "t2.micro"
    ipv6_address_count:           "<computed>"
    ipv6_addresses.#:             "<computed>"
    key_name:                     "<<removed by me>>"
    network_interface.#:          "<computed>"
    network_interface_id:         "<computed>"
    placement_group:              "<computed>"
    primary_network_interface_id: "<computed>"
    private_dns:                  "<computed>"
    private_ip:                   "<computed>"
    public_dns:                   "<computed>"
    public_ip:                    "<computed>"
    root_block_device.#:          "<computed>"
    security_groups.#:            "<computed>"
    source_dest_check:            "true"
    subnet_id:                    "<<removed by me>>"
    tenancy:                      "<computed>"
    volume_tags.%:                "<computed>"
    vpc_security_group_ids.#:     "<computed>"

-/+ aws_volume_attachment.scope.0
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-08d60cd9a9ef5889f" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-0d0555687a55fc584" => "vol-0d0555687a55fc584"

+ aws_volume_attachment.scope.1
    device_name:  "/dev/xvdb"
    force_detach: "true"
    instance_id:  "${element(aws_instance.instance.*.id, count.index)  }"
    skip_destroy: "false"
    volume_id:    "vol-09e95c5fa89a0650a"

-/+ aws_volume_attachment.scope.2
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-0ded09a87a5047c23" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-027145aa2fb5e1555" => "vol-027145aa2fb5e1555"


Plan: 4 to add, 0 to change, 2 to destroy.

rlees85 on 19 Jul 2017

👍6

I've encountered similar behavior using Terraform 0.9.11 and openstack_compute_volume_attach_v2. Simply by tainting a node or changing the count of the compute resource, _every_ volume attachment would have to be recreated, and thus every node is impacted by the change. I think the ideal behavior would be to confine the behavior only to the nodes that are "supposed" to change.

elliotweiser on 16 Nov 2017

This is impacting me quite badly when creation of a resource fails. Subsequent applies hit this index out of range error. Still seems to be an issue in TF 0.10.

dannytrigo on 18 Dec 2017

https://github.com/hashicorp/terraform/issues/16110

Also related.

Hitting me quite hard again now too. Even with resources that don't use a count. Data sources run with against the current state not the target state EVEN if there is a direct input into the module from the prerequisite module.

i.e.

module "my_security_group" {
  source = "git::blah"
  name  = "my_sg"
..
}

module "my_security_group_rule" {
  source = "git::blah-blah"
  security_group_name = "${module.my_security_group.name}"
}

my_security_group_rule has a data source that resolves the security group ID by the name. This is so things can be decoupled.

First run works, if you change the input name = "my_sg" it breaks on second apply, as the data source in my_security_group_rule is searching against the current state not the target state.

rlees85 on 18 Dec 2017

Are there any workarounds to this? I am on Terraform v0.11.1. I do not use modules at all.

mrubin on 4 Jan 2018

👍3

Halp! I can't figure this out. I removed the .terraform folder by accident and started receiving the same error. I've never had this issue before and have not modified any code related to the VPC and now I cant update simple stuff in my production account. Any hel[p would be greatly appreciated. thanks

OneBadSanta on 21 Jun 2018

To give me detail. I am outputing this from the VPC module

value = "${module.vpc_platform.private_route_table_ids}"

and ingesting it like this

route_table_id = "${module.platform_virginia_v1.private_route_table_ids[0]}"

OneBadSanta on 21 Jun 2018

The element fixed my issue.

lmayorga1980 on 14 Mar 2019

Hello! :robot:

This issue relates to an older version of Terraform that is no longer in active development, and because the area of Terraform it relates to has changed significantly since the issue was opened we suspect that the issue is either fixed or that the circumstances around it have changed enough that we'd need an updated issue report in order to reproduce and address it.

If you're still seeing this or a similar issue in the latest version of Terraform, please do feel free to open a new bug report! Please be sure to include all of the information requested in the template, even if it might seem redundant with the information already shared in _this_ issue, because the internal details relating to this problem are likely to be different in the current version of Terraform.

Thanks!

hashibot on 27 Aug 2019

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.