terraform 🚀 - Using element with splat reference should scope dependency to selected resource

Confirmed. We have run into this issue as well. I think it has to do with dependencies not taking the "count" into account.

danabr on 9 Oct 2015

I think this comes down to that the state does not track which specific instance an instance depends on, only the resource. Here is an example:

"aws_volume_attachment.db_persistent_volume_attachment.0": {
                "type": "aws_volume_attachment",
                "depends_on": [
                    "aws_ebs_volume.db_volume",
                    "aws_instance.db_instance"
                ],
                "primary": {
                    "id": "vai-1795886726",
                    "attributes": {
                        "device_name": "/dev/sdb",
                        "id": "vai-1795886726",
                        "instance_id": "i-bb16c319",
                        "volume_id": "vol-cca36821"
                    }
                }
            }

When removing an aws_instance, you would have to find all aws_volume_attachments which happen to share the same "instance_id" attribute. But that would be provider, and perhaps even resource, specific.

However, this is not specific to aws. It will occur anytime you have to resources with count parameters, where one resource depends on the other. The right abstraction would be to depend on "gw_instance.db_instance.0" in this case. I don't know what the implications of that would be, though.

danabr on 9 Oct 2015

Turns out I was wrong. The "depends_on" attribute in the state file has nothing to do with this. Consider this diff:

-/+ aws_volume_attachment.persistent_volume_attachment.0
    device_name:  "/dev/sdb" => "/dev/sdb"
    force_detach: "" => "<computed>"
    instance_id:  "i-deb76479" => "${element(aws_instance.my_instance.*.id, count.index)}" (forces new resource)
    volume_id:    "vol-9ba36878" => "vol-9ba36878"

It seems like changing one element of the aws_instance.my_instance.*.id causes the entire "element" expression to be considered changed.

danabr on 12 Oct 2015

Our current workaround is to duplicate the "aws_volume_attachment" resources, rather than using the element function.

danabr on 13 Oct 2015

I dug further into this. It seems the expected behaviour broke with commit 7735847579e777160664088b830624d0cde876e6, which was introduced to fix issue #2744.

To me, it seems like you want the treatment of unknown values in splats to behave differently depending on the interpolation context. When you use formatlist, you want to treat the entire list as unknown if it contains any unknown value, but for element, you only care about if a specific value in the list is unknown or not.

I did a test where I introduced a new splat operator with the only difference being how it is treated if the list contains unknown values. It solves the problem, but having two splat operators is kind of confusing.

@mitchellh: Ideas?

danabr on 14 Oct 2015

Thanks for the report @kklipsch and thanks for taking the time to do a deep dive, @danabr!

To me, it seems like you want the treatment of unknown values in splats to behave differently depending on the interpolation context. When you use formatlist, you want to treat the entire list as unknown if it contains any unknown value, but for element, you only care about if a specific value in the list is unknown or not.

Yep I think this is the key insight. Switching this to core since it's not a provider-level bug.

In the meantime, duplicating aws_volume_attachments to avoid the usage of splat / element() is a valid workaround.

phinze on 14 Oct 2015

Ok thanks. Unfortunately, for our use case that very quickly becomes unwieldy as we are doing 10s of nodes currently but want to be able to scale up to hundreds.

kklipsch on 14 Oct 2015

@kklipsch: If you are OK with running a patched terraform for a while, and you don't rely on the formatlist behavior anywhere, you can just comment out the three lines at https://github.com/hashicorp/terraform/blob/master/terraform/interpolate.go#L466, and compile terraform yourself.

danabr on 15 Oct 2015

@danabr and @phinze
I tried the work around by using terraform constructs in following manner but it did not help, Could you please share more details on the work around mentioned above "In the meantime, duplicating aws_volume_attachments to avoid the usage of splat / element() is a valid workaround."

resource "aws_instance" "appnodes" {
  instance_type = "${var.flavor_name}"
  ami = "${var.image_name}"
  key_name = "${var.key_name}"
  security_groups = ["${split(",", var.security_groups)}"]
  availability_zone = "${var.availability_zone}" 
  user_data = "${file("mount.sh")}"
  tags {
    Name = "${var.app_name}-${format("%02d", 1)}"
  }
}


resource "aws_volume_attachment" "ebsatt" {
  device_name = "/dev/sdh"
  volume_id = "${aws_ebs_volume.ebsvolumes.id}"
  instance_id = "${aws_instance.appnodes.id}"
}

resource "aws_ebs_volume" "ebsvolumes" {
  availability_zone = "${var.availability_zone}" 
  size = "${var.ebs_size}"
  type = "${var.ebs_type}"
}

resource "aws_instance" "app-nodes" {
  instance_type = "${var.flavor_name}"
  ami = "${var.image_name}"
  key_name = "${var.key_name}"
  security_groups = ["${split(",", var.security_groups)}"]
  availability_zone = "${var.availability_zone}" 
  user_data = "${file("mount.sh")}"
  tags {
    Name = "${var.app_name}-${format("%02d", 1)}"
  }
}


resource "aws_volume_attachment" "ebs_att" {
  device_name = "/dev/sdh"
  volume_id = "${aws_ebs_volume.ebs-volumes.id}"
  instance_id = "${aws_instance.app-nodes.id}"
}

resource "aws_ebs_volume" "ebs-volumes" {
  availability_zone = "${var.availability_zone}" 
  size = "${var.ebs_size}"
  type = "${var.ebs_type}"
}

pdakhane on 22 Oct 2015

@pdakhane: Just take kklipsch, example, but instead of using a "count" attribute of the aws_volume_attachment resource, create multiple aws_volume_attachment_resources referring directly to the instances and volumes. For example if you have three instances:

resource "aws_volume_attachment" "persistent_volume_attachment_0" {
    device_name = "/dev/sdb"
    instance_id = "${aws_instance.instance.0.id}"
    volume_id = "${aws_ebs_volume.volume.0.id}"
}

resource "aws_volume_attachment" "persistent_volume_attachment_1" {
    device_name = "/dev/sdb"
    instance_id = "${aws_instance.instance.1.id}"
    volume_id = "${aws_ebs_volume.volume.1.id}"
}
resource "aws_volume_attachment" "persistent_volume_attachment_2" {
    device_name = "/dev/sdb"
    instance_id = "${aws_instance.instance.2.id}"
    volume_id = "${aws_ebs_volume.volume.2.id}"
}

This only works if you have a small number of nodes, though, and are OK to use the same number of instances in all environments.

danabr on 22 Oct 2015

@phinze pointed to this issue as potentially related to mine.

Here is my config (redacted for readability):

resource "aws_instance" "cockroach" {
  tags {
    Name = "${var.key_name}-${count.index}"
  }
  count = "${var.num_instances}"
  ...
}

resource "null_resource" "cockroach-runner" {
  count = "${var.num_instances}"
  connection {
    ...
    host = "${element(aws_instance.cockroach.*.public_ip, count.index)}"
  }

  triggers {
    instance_ids = "${element(aws_instance.cockroach.*.id, count.index)}"
  }

   provisioner "remote-exec" {
     ....
  }
}

The basic idea is that every instance gets a "runner" attached that does binary deployment and other things. I'm using a null_resource to break a dependency cycle with ELB addresses used by the runner.

The first time I bring up an instance, everything works fine: each instance gets created, then the null_resource runs properly on each.
However, when I terminate an arbitrary instance through the EC2 console (eg: destroying instance 1), all null_resources get rerun.

Here's the log of terraform plan after terminating an instance:

~ aws_elb.elb
    instances.#: "" => "<computed>"

+ aws_instance.cockroach.1
    ami:                        "" => "ami-1c552a76"
    availability_zone:          "" => "us-east-1b"
    ebs_block_device.#:         "" => "<computed>"
    ephemeral_block_device.#:   "" => "<computed>"
    instance_type:              "" => "t2.medium"
    key_name:                   "" => "cockroach-marc"
    placement_group:            "" => "<computed>"
    private_dns:                "" => "<computed>"
    private_ip:                 "" => "<computed>"
    public_dns:                 "" => "<computed>"
    public_ip:                  "" => "<computed>"
    root_block_device.#:        "" => "<computed>"
    security_groups.#:          "" => "1"
    security_groups.2129892981: "" => "cockroach-marc-security-group"
    source_dest_check:          "" => "1"
    subnet_id:                  "" => "<computed>"
    tags.#:                     "" => "1"
    tags.Name:                  "" => "cockroach-marc-1"
    tenancy:                    "" => "<computed>"
    vpc_security_group_ids.#:   "" => "<computed>"

-/+ null_resource.cockroach-runner.0
    triggers.#:            "1" => "<computed>" (forces new resource)
    triggers.instance_ids: "i-21867290" => ""

-/+ null_resource.cockroach-runner.1
    triggers.#:            "1" => "<computed>" (forces new resource)
    triggers.instance_ids: "i-fd85714c" => ""

-/+ null_resource.cockroach-runner.2
    triggers.#:            "1" => "<computed>" (forces new resource)
    triggers.instance_ids: "i-20867291" => ""

I was expecting only "null_resource.cockroach-runner.1" to be updated, but it seems that 0 and 2 changed as well.

mberhault on 10 Dec 2015

👍1

Re-titling this to indicate the nature of the core issue here. We'll get this looked at soon!

phinze on 15 Dec 2015

Just pinging here since we just ran into this issue as well.

bflad on 4 Jan 2016

Okay just consolidated a few other issue threads that were expressions of this bug into this one.

My apologies to all the users who have been hitting this - this is now in my list of top priority core bugs to get fixed soon.

As I alluded to with the re-title, this issue comes down to the fact that Terraform core is currently unaware that ${element(some.splat.*.reference)} is a dependency on a single element from the splat. It simply sees "there is a splat reference in this interpolation" and therefore believes--incorrectly--that it needs to consider every element in the list when calculating whether or not the overall value of the interpolation is computed or not.

The most direct solution would be to "just make it work for element()". In other words, add special-casing into Terraform's interpolation handling of splats that would look for a surrounding element() and use the different logic for computed calculations if it is found.

This is probably not the right way to go as it is (a) difficult to implement "context-awareness" into that part of the codebase, and (b) a brittle solution that sets a bad precedent of special casing certain functions in the core.

Because of this, the core team thinks the best way forward is to add first-class list indexing into the interpolation language. This would promote the behavior of element() to a language feature (likely square-bracket notation) and give the core interpolation code a rich enough context to be able to implement the correct computed-value scoping we need to fix this bug.

I've got a spike of first-class-indexing started, and I'll keep this thread updated with my progress.

phinze on 13 Jan 2016

👍10

:100:

justinclayton on 13 Jan 2016

@phinze thank you so much for the detailed response and the ongoing effort! :tada:

bflad on 13 Jan 2016

Thanks for the report @phinze - is there a WIP branch available to follow along?

rgs1 on 6 Feb 2016

Keen to see this one resolved. Quite limiting for those of using count with instances and template_file to generate userdata.

Does anyone know of a workaround?

jkinred on 7 Feb 2016

👍1

@jen20 has WIP to address this issue. Stay tuned! :grinning:

phinze on 8 Feb 2016

:+1:

@jkinred I haven't tried, but from top of my head, the only workaround is to use provisioner's for now.

Deserved on 8 Feb 2016

@jkinred it's case-specific, but we've been working around this issue by not using count on the template_file, and instead embedding node-specific vars in metadata and querying them at launch time using scripts that curl the cloud provider's metadata service. A lot this sort of thing:

runcmd:
  - curl -s http://169.254.169.254/latest/meta-data/role >> /etc/environment

justinclayton on 9 Feb 2016

@jkinred I've come up with a workaround. We set the count on our user-data.cfg template files to a much higher number than we expect to have instances, and then only change the count of the aws_instance resource, not the count of the template files. The count of the template files never changes, so the dependent aws_instances are not recreated.

resource "aws_instance" "web" {
  instance_type = "${var.instance_type}"
  ami = "${var.aws_ami}"
  count = "${var.instance_count}"
  user_data = "${element(template_file.user-data-web.*.rendered, count.index)}"
}

resource "template_file" "user-data-web" {
  template = "${file("templates/user-data.cfg")}"
  vars { fqdn = "${var.short-env}-web-${count.index}.${var.vpc_domain}" }
  count = "20"
}

This may or may not help in your situation, depending on what's in your user_data.

billputer on 9 Feb 2016

🎉8 ❤1

@billputer I think yours workaround is awesome!

@jen20 if it is possible, could you please think of backward compatibility. I mean that, will it be possible to increment/decrement size without resources being destroyed and created again, after your patch will be applied?

Deserved on 9 Feb 2016

Thanks for all the suggestions! @billputer, great workaround and suitable for our use case.

Workaround, TODO and bug reference added!

jkinred on 11 Feb 2016

Hi @phinze , any progress on this issues? The workaround if fine for template_file but not for EBS volumes. It doesn't make sense to overbuild storage as a workaround. We need to be able to use count to build instances and attach EBS volumes.

gladsonnyc on 17 Mar 2016

👍10

It is very possible this is fixed in 0.7 (as of writing, 0.7 is the "master" branch of Terraform). At the very least, we're a lot closer. 0.7 introduces first class lists so you no longer need element() (we're keeping it for backwards compat though). In this case, you can now directly do listvar[count.index].

The internals of list access are very different from element() which is a function that simply mimicked list access. Because it was a function, core was unable to realize you're referencing a single list element, as @phinze pointed out above. Its still very possible we don't make that distinction and this is still a bug, but the right architecture is in place now where we can fix it.

I'm going to keep this open as we should be in a strong place now where we can add a core test case to verify whether or not this works to prevent regressions. For anyone who wants to help: we'd need a test in context_apply_test.go that reproduces this bug on 0.6 (check out a tag to get a confirmed failure). Whether it passes or fails having the test would be helpful. We can write it in time otherwise.

With the test in place the actual work can begin to fix it if it is failing, or if its passing it will prevent any regressions in the future.

Apologies this has taken so long, bringing first class lists/maps into Terraform has been a huge undertaking and has taken months of full time work.

mitchellh on 13 May 2016

👍5

Hello,

I'm trying to apply the workaround provided by @billputer
It seems to be very logical, but I don't know why, it's doesn't work for me.

When I try to scale by increasing the nb_instance variable, all my instances are recreated.

Do you know why ?

Thank you,
Pierre

variable "nb_instance" {default = "3"}
variable "build_number" {default = "16"}

resource "template_file" "config_filebeat" {
  template = "${file("${path.module}/templates/provisioner/filebeat.cfg.tpl")}"
  vars {
    laas_project = "123456789"
  }
}
resource "template_file" "config" {
  count = 10
  template = "${file("${path.module}/templates/provisioner/chef.cfg.tpl")}"
  vars {
    instance_name = "${format("myappone_terraform_pierre_%s_%02d",var.build_number,count.index + 1)}"
    #for yaml convention, we need to add some tabulation (4 spaces) in front of each line of the validation key.
    validationkey = "${replace(replace(file("${path.module}/keys/validation.pem"), "/^/" , "    "), "\n", "\n    ")}"
    secret_key = "${file("${path.module}/keys/chef_secret")}"
    filebeat_config_file = "${base64encode(template_file.config_filebeat.rendered)}"
  }
}
resource "template_cloudinit_config" "init_server" {
  count = 10
  gzip          = true
  base64_encode = false
  part {
    filename = "init.cfg"
    content_type = "text/cloud-config"
    content = "${element(template_file.config.*.rendered, count.index)}"
  }
}
# Create a web server
resource "openstack_compute_instance_v2" "myapp" {
  count = "${var.nb_instance}"
  name = "${format("myappAS_%02d", count.index + 1)}"
  flavor_name = "m1.medium"
  image_name = "RED_HAT_6.2"
  key_pair = "pierre"
  security_groups = [
    "op-default",
    "myapp_as"
  ]
  network {
    name = "Internal_Internet_Protected"
  }
  network {
    name = "Internal_Network_protected"
  }
  user_data = "${element(template_cloudinit_config.init_server.*.rendered, count.index)}"
}

pmithrandir on 3 Jun 2016

Here's another repro case w/ EFS resources (i.e. that bug still exists in 0.7 ddc0f4cdb0c5b5fb848ac4856e9bcf32cc55ec0f):
https://gist.github.com/radeksimko/869c266bc8572c8f190059e65f12dee3

radeksimko on 22 Jul 2016

@radeksimko Thanks for the repro, I think I know where this might be.

jen20 on 22 Jul 2016

🎉1 👍1

Is using ignore_changes for lifecycle on the EBS and EBS attachment a reasonable solution? The below example seems to work (increasing and decreasing the nodes count). I just wanted a second opinion (pros/cons) before moving forward with it.

Example:

resource "aws_instance" "database" {
  ami                     = "${var.amis}"
  instance_type           = "${var.instance_type}"
  subnet_id               = "${element(split(",", var.private_subnet_ids), count.index)}"
  key_name                = "${var.key_name}"
  vpc_security_group_ids  = ["${aws_security_group.database.id}"]
  disable_api_termination = true

  count                   = "${var.nodes}"

  tags      { Name = "${var.name}${format("%02d", count.index + 1)}" }
  lifecycle { create_before_destroy = true }
}

resource "aws_ebs_volume" "database_mysql_vol" {
  availability_zone = "${element(aws_instance.database.*.availability_zone, count.index)}"
  iops              = 1000
  size              = 500
  type              = "io1"
  count            = "${var.nodes}"

  lifecycle {
    ignore_changes = ["availability_zone"]
  }

  tags { Name = "${var.name}-mysql" }
}

resource "aws_ebs_volume" "database_binlog_vol" {
  availability_zone = "${element(aws_instance.database.*.availability_zone, count.index)}"
  size              = 50
  type              = "gp2"

  count            = "${var.nodes}"

  lifecycle {
    ignore_changes = ["availability_zone"]
  }

  tags { Name = "${var.name}-binlog" }
}


resource "aws_volume_attachment" "mysql_vol_attachment" {
  device_name = "/dev/sdf"
  instance_id = "${element(aws_instance.database.*.id, count.index)}"
  volume_id   = "${element(aws_ebs_volume.database_mysql_vol.*.id, count.index)}"

  count          = "${var.nodes}"

  lifecycle {
    ignore_changes = ["instance_id", "volume_id"]
  }

}

resource "aws_volume_attachment" "mysql_binlog_attachment" {
  device_name = "/dev/sdg"
  instance_id = "${element(aws_instance.database.*.id, count.index)}"
  volume_id   = "${element(aws_ebs_volume.database_binlog_vol.*.id, count.index)}"

  count          = "${var.nodes}"

  lifecycle {
    ignore_changes = ["instance_id", "volume_id"]
  }
}

sssanchez on 3 Aug 2016

🎉1

I ran into this issue when bumping count (using openstack), running terraform (v0.7.1-dev), how can I test if the new list interpolation syntax fixes this, how am I to replace the element(foo.*.id, count.inex) here?

theanalyst on 12 Aug 2016

👍1

@theanalyst new syntax means using: ${foo.*.id[count.index]) instead element() in your case. New syntax is working, but result is the same with destroying operations.

serjs on 20 Aug 2016

@serjs yeah I figured, I changed the stuff to new syntax and still ran into issues

theanalyst on 22 Aug 2016

+1 to changing to new list interpolation syntax and issue persisting and forcing a new resource.

resource "aws_volume_attachment" "SLRS_Data" {
count = 3
device_name = "/dev/sdf"
volume_id = "${aws_ebs_volume.SLRS_Data._.id[count.index]}"
instance_id = "${aws_instance.SLRS._.id[count.index]}"
}

davyt10 on 29 Aug 2016

+1 I have issue using openstack and both 'element' and '[]' syntax.

avakhov on 30 Aug 2016

👍2

+1 I have same issue, tried ignore_changes, ${foo.*.id[count.index]) and element(). they did not work.

arcadiatea on 31 Aug 2016

Same issue - new syntax didn't work for me either.

maxenglander on 1 Sep 2016

I've submitted a fix and a test for the bug which causes a resource with an attribute containing ${list[idx]} to be re-created when list contains an uncomputed element, even if idx references a known element.

Note that, as of now, this does not fix the closely related ${element(list,idx)} bug.

maxenglander on 8 Sep 2016

👍2

Any update on this issue? I think requiring the bracket notation/indexing format is a fair and logical fix, but it'd be great to know a timeline around this as the current bug can have some relatively harmful results.

erutherford on 26 Sep 2016

👍4

Bracket notation/indexing format is not compatible with math operations, eg. something.*.id[count.index % 3].

matti on 11 Oct 2016

@matti for this behaviour at the moment you can use the element function - ${element(something.*.id, count.index) - this will wrap.

jen20 on 11 Oct 2016

@jen20 but that does not work because of the bug thats open in this thread...? not sure what you are saying.

I was referencing to the [] workaround that is coming (?) is not compatible with other stuff.

matti on 11 Oct 2016

@matti interesting! Was unaware of that limitation. I tried and got something like this (formatting mine):

* node doesn't support evaluation: *ast.Arithmetic{
    Op:2, 
    Exprs: []ast.Node{
        *ast.VariableAccess{
            Name:"var.count",
            Posx:ast.Pos{Column:20, Line:1}
        },
        *ast.VariableAccess{
            Name:"count.index", 
            Posx:ast.Pos{Column:31, Line:1}
        }
    }, 
    Posx:ast.Pos{Column:20, Line:1}
} in: ${var.instance_ids[var.count - count.index]}-baz

@jen20 @matti do you know if is there an open issue anywhere for this particular problem? I searched for terms in the message above both in hashicorp/terraform and hashicorp/hil but couldn't find one.

maxenglander on 12 Oct 2016

I'm not sure if that can be solved with the current templating engine. I think that the current templating engine is the root of all evil and results in clever workarounds summarized in here https://blog.gruntwork.io/terraform-tips-tricks-loops-if-statements-and-gotchas-f739bbae55f9#.ub1tfxsmg

As the variables can not contain interpolations (https://github.com/hashicorp/terraform/issues/4084) and there are no intermediate variables I'm starting to feel very cornered here.

The point of having count in resources is currently very broken when it's not working with splats (deletes ALB attachments, EBS volumes etc)

matti on 12 Oct 2016

Hey @matti there certainly are a number of issues related to count, but I don't know that I agree with your view that the templating engine - do you mean HIL? - is the root in all cases. The issue here, for instance, is caused by Terraform making certain decisions about how changes in state should be treated, and, as such, the solution lies in Terraform-land, not HIL-land. The fix I've offered doesn't make any changes to the templating engine.

In any case, I think the issue you brought up - not being able to do math operations inside of array index operations, is problem distinct from this issue, and deserves its own issue. Do you know if there's an open GH issue for that particular problem? Do you get an error similar to the one I posted above, or a different one?

maxenglander on 12 Oct 2016

@maxenglander, Sorry I might be wrong about it. I'm just really tired of trying to do stuff nicely and then finding out waaay later that you have cornered yourself.

On the error: I get the same error. I don't think there is an issue anywhere, I tried to look for one.

matti on 12 Oct 2016

@matti I opened a separate ticket for the issue you brought up, and a fix has been submitted.

maxenglander on 14 Oct 2016

Cool. So currently does [count.index] addressing work? Because for my ALB attachments it's sometimes failing..

On 14 Oct 2016, at 9.31, Max Englander [email protected] wrote:

@matti I opened a separate ticket for the issue you brought up, and a fix has been submitted.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

matti on 14 Oct 2016

@matti I can't say without knowing the specifics of your setup. However, once the math fix makes into Terraform, you'll be able to do myresource[var.count - count.index - 1].

maxenglander on 17 Oct 2016

Providing a minimal in-memory test case to work off:

resource "null_resource" "foo" {
    count = 2
}

resource "null_resource" "bar" {
    count = 2
    triggers {
        key = "${element(null_resource.foo.*.id, count.index)}"
    }
}

output "bar_0" {
    value = "${null_resource.bar.0.id}"
}

after the following:

$ terraform apply
$ terraform taint null_resource.foo.1

a plan shows null_resource.bar.0 being replaced, and an apply shows the output of bar_0 has changed.

jbardin on 8 Dec 2016

@jbardin could you verify that the samme happens when changing to

    triggers {
        key = "${null_resource.foo.*.id[count.index]}"
    }

ref @mitchellh comment 13th may?

sigmunau on 12 Dec 2016

@sigmunau,

Yes, it's the same behavior with direct indexing as well. While it might be a little easier to calculate which node is referenced with the index operation, we still need to add a way to get at this reference when building the graph.

jbardin on 12 Dec 2016

While this isn't yet implemented, it is in principle possible for the interpolation language to deal with this properly for indexing with [ ... ] because it has more context to work with.

Since the language evaluator doesn't know what a function does internally, it just assumes that if any function argument is "unknown" then the function can't be called, and so its result is also "unknown". This then combines with the rule that any list containing one computed element is itself entirely computed, creating the behavior folks are hitting here.

But the evaluator does know what [ ... ] does, so it can in principle have its own special behavior for "unknown" where it considers only the value of the specific element being addressed, rather than checking the whole list for unknown values. Terraform's responsibility then is to just make sure that only the appropriate list elements are marked as "computed".

The part that makes this a little tricky is that we mark a partially-unknown list as being entirely unknown at variable-access time, so by the time we get to the index evaluator we've already lost all of the detail about what exactly was unknown. To implement this will require re-organizing this logic a little, I think as follows:

Allow VariableAccess to return a partially-unknown collection, rather than marking it as entirely unknown like today.
Move this "deep unknown" check into some other places:
- Function calls: if any of the arguments are partially-unknown collections then the function result must be unknown
- Output: concatenating an unknown value into a string doesn't make sense, so the whole output should be unknown in that case.
- Conditional: if the condition is unknown then the result is unknown. If the condition is known then a possibly-unknown value may pass through untouched, but we don't need to do any special checking of the result values otherwise.
- Arithmetic: no special action required because at evaluation time it's just treated as a function call.
Drop the rather-draconian rule in the generic visit that says that encountering any unknown value at all immediately terminates evaluation. Having unknown values pass around should be fine as long as the above constraints are enforced.
Finally, the index operator should now just "do the right thing" because it'll copy an unknown value along with its unknown type into its result.

apparentlymart on 12 Dec 2016

@apparentlymart I'd actually go about it a different way since by the time [in Terraform] you're evaluating, graph construction (and therefore ordering) is too late.

First, you're absolutely right this will only work with [...]. I had no intentions of making this work with any function calls (including element).

However, I wouldn't modify evaluation at all. The logic I want to spike out is doing an early pass during graph construction (a new transform probably) that does static analysis over the ASTs of all interpolations and looks for ast.Index operations. You then want to check for a very specific case:

The Node (in node[key] syntax) in the Index is a resource reference. We can check this.
The Key (in node[key]) is _not_ computed. We can check this by trying to evaluate and making sure it goes to something other than TypeUnknown.

If this is true, we evaluate Key, and use that to create the proper reference to the resource referenced in Node. This can happen at graph construction time and solves a huge variety of issues (since for example count.index is currently not allowed to be computed.

There still remains the question of what to do for computed index access, and I think a core feature for this isn't quite available yet but I'm curious on building it in. What I'd like to introduce is some concept of a runtime graph edge that dag understands so that Walk just does the right thing. I'm not sure if that's possible with the current API but I'm looking into it.

The idea is: the above static analysis is also done one final time during the Eval step (so there should be no TypeUnknowns anymore unless its plan in which case it should be unknown anyways probably). If we detect any NEW references we didn't have previously, introduce a _runtime edge_ into the graph that connects the two references. Then call some special Wait type function to make sure your dependencies are once again checked if they're evaluated. If they are, Wait is basically a no-op. If they aren't, then you just block until they are.

I think this solves everything but clearly requires a lot of new work, hence we're pushing it off to 0.9.

mitchellh on 12 Dec 2016

I think I must be missing the case where having Terraform deal with this is necessary, vs. just adjusting the HIL behavior as I mentioned.

Considering the following config as an example:

variable "instance_count" {}

resource "aws_instance" "foo" {
  count = "${var.instance_count}"
  # etc, etc...
}

resource "null_resource" "foo" {
  count = "${var.instance_count}"
  triggers {
    instance_id = "${aws_instance.foo.*.id[count.index]}"
  }
  provisioner "remote-exec" {
    # ...
  }
}

For the sake of what we're discussing, I think the desired behavior is that I should be able to change the value of var.instance_count and have Terraform understand only run the null_resource provisioner for any instances that were added.

Let's assume that in our initial state we have instance_count set to 2 and the two instances already exist, so aws_instance.foo.*.id evaluates to ["i-abc", "i-123"] and we also already have the two corresponding null_resource instances.

Now I increment instance_count to 3 and run terraform plan. During the walk, Terraform will visit the aws_instance.foo node, dynamic-expand it, and then notice that aws_instance.foo.2 needs to be created, and so aws_instance.foo.*.id is written into the scope as ["i-abc", "i-123", <unknown>]

Continuing the plan walk, Terraform will then visit null_resource.foo and dynamic-expand that. Terraform will notice that two of the three desired instances already exist in state, but will evaluate ${aws_instance.foo.*.id[count.index]} for them and find concrete values that are unchanged from what's in the state, and thus generate no diff, due to the adjusted HIL evaluation rules I posted before.

It will then generate the "create" diff for null_resource.foo.2 and evaluate ${aws_instance.foo.*.id[count.index]}. For this it will get "unknown" and thus Terraform will mark this attribute as <computed> in the create diff.

During apply, things proceed as expected: there's no diff for the pre-existing aws_instance or null_resource instances, so Terraform leaves them alone. The apply walk will visit aws_instance.foo.2 before it visits null_resource.foo.2, so the previously-unknown value will be a concrete string by the time we get to creating the null resource.

That seems to get us what we need as far as I can tell. We just need to make sure Terraform can set a partially-unknown value at index 2 of in aws_instance.foo.*.id and then the rest just falls out logically as a result of the adjusted HIL evaluation rules, without the need for any static analysis.

I didn't cover it in my ruleset above, but computed indices can also be handled by a further rule that if the i in n[i] is unknown then the result is also unknown.

I feel like I must be missing an edge-case here. :grinning:

apparentlymart on 12 Dec 2016

@apparentlymart Thats right, too, actually. I went down that path way back and wasn't able to get it to work so I think there is a missing edge case but on the surface its worth trying too. It probably doesn't even need much HIL change, since HIL only returns unknown if it encounters an unknown. If we just build the list out (rather than making whole list TypeUnknown), then it probably just works for many cases.

mitchellh on 13 Dec 2016

👍1

I have recently encountered this issue. I have a set of hosts that require fixed IPs, and certain flavor sizes, so I created a list to define them statically:

static_infra = [
  {  "ip" = "10.1.1.55",     "service" = "av1-service-030-srv01",  "type" = "small"      },
  {  "ip" = "10.1.1.56",     "service" = "av1-service-050-srv01",  "type" = "small"      },
  {  "ip" = "10.1.1.57",     "service" = "av1-service-070-srv01",  "type" = "medium"     },
  {  "ip" = "10.1.1.58",     "service" = "av1-service-090-srv01",  "type" = "large"      },
]

fyi, "type" just allows me to look up a flavor name in a map, so not important here but figure i'd explain why i wrote it in the comment.

Anyway, my .tf file looks something like:

resource "openstack_compute_instance_v2" "instances" {
    count         = "${length(var.static_infra)}"
    region        = "${var.region}"
    name          = "${format("${var.hostname_prefix}-%s", lookup(var.static_infra[count.index], "service"))}"

    flavor_name   = "${lookup(var.flavors, lookup(var.static_infra[count.index], "type"))}"
    image_name    = "${var.image}"
    user_data     = "${data.template_file.cloud-config.rendered}"

    network {
      name        = "${var.network_name}"
      fixed_ip_v4 = "${lookup(var.static_infra[count.index], "ip")}"
    }

}

If i remove one of the entries in the middle of the static_infra list, then various things get recomputed or new resources are forced, munging hostnames and ips in the statefile, because the count iteration throws off the whole thing since the count gets put onto the end of the resource name/key like "openstack_compute_instance_v2.instances.3"

Maybe an interim workaround until a real fix is to allow an empty item in the list to be skipped over and allow count to be incremented to the next number, thereby skipping that resource and keeping the resulting list the same length? I would hope the 'missing' resource is thereby marked for destruction with the count var iterating correctly and not changing other resource's state.

like this:

static_infra = [
  {  "ip" = "10.1.1.55",     "service" = "av1-service-030-srv01",  "type" = "small"      },
  {  "ip" = "10.1.1.56",     "service" = "av1-service-050-srv01",  "type" = "small"      },
  { },
  {  "ip" = "10.1.1.58",     "service" = "av1-service-090-srv01",  "type" = "large"      },
]

Not sure how this could be implemented, however, or if it in reality even works the way i am imagining.

I suppose this would still fail when adding a new resource in the middle of the list, though, so any additions could only be appended onto the list. Over time, you'd potentially build up cruft of empty { } scattered throughout the list depending on where/when you need to remove entries.

thishitshome on 1 Feb 2017

👍1

This is a core issue, I wish it had more priority than adding new features :(

c4milo on 1 Feb 2017

👍2

Hello, I am currently using terraform version 0.8.6.

I took the Azure example and added a count parameter to the objects that needed them

Here is the .tf file

variable "counts" {}

provider "azurerm" {
  subscription_id = "<removed>"
  client_id       = "<removed>"
  client_secret   = "<removed>"
  tenant_id       = "<removed>"
}

resource "azurerm_resource_group" "test" {
    name = "acctestrg"
    location = "West US"
}

resource "azurerm_virtual_network" "test" {
    name = "acctvn"
    address_space = ["10.0.0.0/16"]
    location = "West US"
    resource_group_name = "${azurerm_resource_group.test.name}"
}

resource "azurerm_subnet" "test" {
    name = "acctsub"
    resource_group_name = "${azurerm_resource_group.test.name}"
    virtual_network_name = "${azurerm_virtual_network.test.name}"
    address_prefix = "10.0.2.0/24"
}

resource "azurerm_network_interface" "test" {
    count = "${var.counts}"
    name = "acctni${count.index}"
    location = "West US"
    resource_group_name = "${azurerm_resource_group.test.name}"

    ip_configuration {
        name = "testconfiguration1"
        subnet_id = "${azurerm_subnet.test.id}"
        private_ip_address_allocation = "dynamic"
    }
}

resource "azurerm_storage_account" "test" {
    count = "${var.counts}"
    name = "accsai${count.index}"
    resource_group_name = "${azurerm_resource_group.test.name}"
    location = "westus"
    account_type = "Standard_LRS"

    tags {
        environment = "staging"
    }
}

resource "azurerm_storage_container" "test" {
    count = "${var.counts}"
    name = "vhds"
    resource_group_name = "${azurerm_resource_group.test.name}"
    storage_account_name = "${azurerm_storage_account.test.*.name[count.index]}"
    container_access_type = "private"
}

resource "azurerm_virtual_machine" "test" {
    count = "${var.counts}"
    name = "acctvm${count.index}"
    location = "West US"
    resource_group_name = "${azurerm_resource_group.test.name}"
    network_interface_ids = ["${azurerm_network_interface.test.*.id[count.index]}"]
    vm_size = "Standard_A0"

    storage_image_reference {
        publisher = "Canonical"
        offer = "UbuntuServer"
        sku = "14.04.2-LTS"
        version = "latest"
    }

    storage_os_disk {
        name = "myosdisk1"
        vhd_uri = "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd"
        caching = "ReadWrite"
        create_option = "FromImage"
    }

    os_profile {
        computer_name = "hostname${count.index}"
        admin_username = "testadmin"
        admin_password = "Password1234!"
    }

    os_profile_linux_config {
        disable_password_authentication = false
    }

    tags {
        environment = "staging"
    }
}

I started with a count = 1, I get all 7 resources created without issues

Apply complete! Resources: 7 added, 0 changed, 0 destroyed.

After that, I increase the count to 2.

The planning shows that I will get the following: Plan: 5 to add, 0 to change, 1 to destroy.

It seems that the os_disk is at fault here, more specifically the vhd_uri. I am using the new bracketing format instead of the element one. This is causing us issues in production where we are loosing VM when we try to augment the capacity of a cluster.

For the Azure or Terraform experts, anything I could do until this issue gets resolved to prevent my resources from being deleted? I need a storage Account PER VMs so reusing the same one is not an option.

storage_os_disk.#:                                                  "1" => "1"
    storage_os_disk.730729623.create_option:                            "FromImage" => ""
    storage_os_disk.730729623.disk_size_gb:                             "0" => "0"
    storage_os_disk.730729623.image_uri:                                "" => ""
    storage_os_disk.730729623.name:                                     "myosdisk1" => ""
    storage_os_disk.730729623.os_type:                                  "" => ""
    storage_os_disk.730729623.vhd_uri:                                  "https://accsai0.blob.core.windows.net/vhds/myosdisk1.vhd" => "" (forces new resource)
    storage_os_disk.~4275591411.caching:                                "" => "ReadWrite"
    storage_os_disk.~4275591411.create_option:                          "" => "FromImage"
    storage_os_disk.~4275591411.disk_size_gb:                           "" => ""
    storage_os_disk.~4275591411.image_uri:                              "" => ""
    storage_os_disk.~4275591411.name:                                   "" => "myosdisk1"
    storage_os_disk.~4275591411.os_type:                                "" => ""
    storage_os_disk.~4275591411.vhd_uri:                                "" => "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd" (forces new resource)

The full output

+ azurerm_network_interface.test.1
    applied_dns_servers.#:                                                 "<computed>"
    dns_servers.#:                                                         "<computed>"
    enable_ip_forwarding:                                                  "false"
    internal_dns_name_label:                                               "<computed>"
    internal_fqdn:                                                         "<computed>"
    ip_configuration.#:                                                    "1"
    ip_configuration.2250438086.load_balancer_backend_address_pools_ids.#: "<computed>"
    ip_configuration.2250438086.load_balancer_inbound_nat_rules_ids.#:     "<computed>"
    ip_configuration.2250438086.name:                                      "testconfiguration1"
    ip_configuration.2250438086.private_ip_address:                        "<computed>"
    ip_configuration.2250438086.private_ip_address_allocation:             "dynamic"
    ip_configuration.2250438086.public_ip_address_id:                      "<computed>"
    ip_configuration.2250438086.subnet_id:                                 "/subscriptions/f0dc697c-673c-4fb0-8852-7f47533b4dd6/resourceGroups/acctestrg/providers/Microsoft.Network/virtualNetworks/acctvn/subnets/acctsub"
    location:                                                              "westus"
    mac_address:                                                           "<computed>"
    name:                                                                  "acctni1"
    network_security_group_id:                                             "<computed>"
    private_ip_address:                                                    "<computed>"
    resource_group_name:                                                   "acctestrg"
    tags.%:                                                                "<computed>"
    virtual_machine_id:                                                    "<computed>"

+ azurerm_storage_account.test.1
    access_tier:              "<computed>"
    account_kind:             "Storage"
    account_type:             "Standard_LRS"
    location:                 "westus"
    name:                     "accsai1"
    primary_access_key:       "<computed>"
    primary_blob_endpoint:    "<computed>"
    primary_file_endpoint:    "<computed>"
    primary_location:         "<computed>"
    primary_queue_endpoint:   "<computed>"
    primary_table_endpoint:   "<computed>"
    resource_group_name:      "acctestrg"
    secondary_access_key:     "<computed>"
    secondary_blob_endpoint:  "<computed>"
    secondary_location:       "<computed>"
    secondary_queue_endpoint: "<computed>"
    secondary_table_endpoint: "<computed>"
    tags.%:                   "1"
    tags.environment:         "staging"

+ azurerm_storage_container.test.1
    container_access_type: "private"
    name:                  "vhds"
    properties.%:          "<computed>"
    resource_group_name:   "acctestrg"
    storage_account_name:  "accsai1"

-/+ azurerm_virtual_machine.test.0
    availability_set_id:                                                "" => "<computed>"
    delete_data_disks_on_termination:                                   "false" => "false"
    delete_os_disk_on_termination:                                      "false" => "false"
    license_type:                                                       "" => "<computed>"
    location:                                                           "westus" => "westus"
    name:                                                               "acctvm0" => "acctvm0"
    network_interface_ids.#:                                            "1" => "<computed>"
    os_profile.#:                                                       "1" => "1"
    os_profile.2123949718.admin_password:                               "" => "Password1234!"
    os_profile.2123949718.admin_username:                               "testadmin" => "testadmin"
    os_profile.2123949718.computer_name:                                "hostname0" => "hostname0"
    os_profile.2123949718.custom_data:                                  "" => "<computed>"
    os_profile_linux_config.#:                                          "1" => "1"
    os_profile_linux_config.2972667452.disable_password_authentication: "false" => "false"
    os_profile_linux_config.2972667452.ssh_keys.#:                      "0" => "0"
    resource_group_name:                                                "acctestrg" => "acctestrg"
    storage_image_reference.#:                                          "1" => "1"
    storage_image_reference.1807630748.offer:                           "UbuntuServer" => "UbuntuServer"
    storage_image_reference.1807630748.publisher:                       "Canonical" => "Canonical"
    storage_image_reference.1807630748.sku:                             "14.04.2-LTS" => "14.04.2-LTS"
    storage_image_reference.1807630748.version:                         "latest" => "latest"
    storage_os_disk.#:                                                  "1" => "1"
    storage_os_disk.730729623.create_option:                            "FromImage" => ""
    storage_os_disk.730729623.disk_size_gb:                             "0" => "0"
    storage_os_disk.730729623.image_uri:                                "" => ""
    storage_os_disk.730729623.name:                                     "myosdisk1" => ""
    storage_os_disk.730729623.os_type:                                  "" => ""
    storage_os_disk.730729623.vhd_uri:                                  "https://accsai0.blob.core.windows.net/vhds/myosdisk1.vhd" => "" (forces new resource)
    storage_os_disk.~4275591411.caching:                                "" => "ReadWrite"
    storage_os_disk.~4275591411.create_option:                          "" => "FromImage"
    storage_os_disk.~4275591411.disk_size_gb:                           "" => ""
    storage_os_disk.~4275591411.image_uri:                              "" => ""
    storage_os_disk.~4275591411.name:                                   "" => "myosdisk1"
    storage_os_disk.~4275591411.os_type:                                "" => ""
    storage_os_disk.~4275591411.vhd_uri:                                "" => "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd" (forces new resource)
    tags.%:                                                             "1" => "1"
    tags.environment:                                                   "staging" => "staging"
    vm_size:                                                            "Standard_A0" => "Standard_A0"

+ azurerm_virtual_machine.test.1
    availability_set_id:                                                "<computed>"
    delete_data_disks_on_termination:                                   "false"
    delete_os_disk_on_termination:                                      "false"
    license_type:                                                       "<computed>"
    location:                                                           "westus"
    name:                                                               "acctvm1"
    network_interface_ids.#:                                            "<computed>"
    os_profile.#:                                                       "1"
    os_profile.1736693719.admin_password:                               "Password1234!"
    os_profile.1736693719.admin_username:                               "testadmin"
    os_profile.1736693719.computer_name:                                "hostname1"
    os_profile.1736693719.custom_data:                                  "<computed>"
    os_profile_linux_config.#:                                          "1"
    os_profile_linux_config.2972667452.disable_password_authentication: "false"
    os_profile_linux_config.2972667452.ssh_keys.#:                      "0"
    resource_group_name:                                                "acctestrg"
    storage_image_reference.#:                                          "1"
    storage_image_reference.1807630748.offer:                           "UbuntuServer"
    storage_image_reference.1807630748.publisher:                       "Canonical"
    storage_image_reference.1807630748.sku:                             "14.04.2-LTS"
    storage_image_reference.1807630748.version:                         "latest"
    storage_os_disk.#:                                                  "1"
    storage_os_disk.~4275591411.caching:                                "ReadWrite"
    storage_os_disk.~4275591411.create_option:                          "FromImage"
    storage_os_disk.~4275591411.disk_size_gb:                           ""
    storage_os_disk.~4275591411.image_uri:                              ""
    storage_os_disk.~4275591411.name:                                   "myosdisk1"
    storage_os_disk.~4275591411.os_type:                                ""
    storage_os_disk.~4275591411.vhd_uri:                                "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd"
    tags.%:                                                             "1"
    tags.environment:                                                   "staging"
    vm_size:                                                            "Standard_A0"

djsly on 9 Feb 2017

@djsly I had a somewhat similar issue and was able to use a lifecycle block in the resource with ignore_changes to work around, Maybe that would work for you here if you set it to storage_os_disk? https://www.terraform.io/docs/configuration/resources.html

bpoland on 14 Feb 2017

thanks @bpoland , this worked indeed perfectly. I also added the os_profile section since the admin_password was also detected as a change. nothing major but cleaner.

lifecycle { ignore_changes = ["storage_os_disk", "os_profile"] }

thanks a lot!

djsly on 14 Feb 2017

No worries. Just be careful -- if those ever change and it SHOULD cause the resources to get deleted and recreated, it won't happen.

And let's be clear, this is a workaround not a solution to the problem in this thread. It seems like there was some discussion late last year about possible fixes, would love to see this get solved correctly.

bpoland on 15 Feb 2017

👍2

@mitchellh In #8595 you said you had plans for having a fix ready for 0.9. Does this mean this is an incompatible change? Also, is there some branch with the work in progress we could look at?

You also said that #8595 isn't complete and has unintended side effects. Can you elaborate on that? The incompleteness previously mentioned in that PR is about modules interactions and it seems to have been fixed later.

I care to ask because this can be very problematic when trying to increase the cluster size behind a load balancer, and having the load balancer reconstructed and thus unavailable for a relatively long period of time (#8684). We absolutely need this fixed and are available to help to fix that issue. Workarounds don't work well in our case because we have a fully automated process.

mildred on 2 Mar 2017

@jen20 it seems my last comment unassigned you from this issue. This was unintended, please fix if you can 😨

mildred on 2 Mar 2017

This issue is a huge blocker for creating a cluster of nodes that are backed by EBS. Right now, I see people copying and pasting resources multiple times, which is error prone and totally defeats the point of having the count attribute. Given Terraform's lack of proper "looping", it's really important for the count construct to work without this bug. In the end, we had to write our own pre-processor to generate TF files to keep things DRY.

kishorenc on 24 Mar 2017

👍2

Note that since terraform 0.8.X, the workaround with manually duplicating resources has an additional caveat. If you destroy a resource manually, you will run into an error like this next time you run terraform plan:
````

aws_volume_attachment.db_persistent_volume_attachment_1: Resource 'aws_instance.db_instance' not found for variable 'aws_instance.db_instance.1.id'
````

I guess it is a improvement that terraform now has better dependency tracking. It does make this issue more painful to work around, though.

danabr on 29 Mar 2017

I've encountered another issue with the workaround of duplicating resources in 0.8.8 and 0.9.2.

Consider the following setup.
````
resource "aws_instance" "node" {
count = 3
...
}

resource "aws_ebs_volume" "node-ebs" {
count = 3
...
}

resource "aws_volume_attachment" "node-attach-0" {
device_name = "/dev/xvdh"
volume_id = "${aws_ebs_volume.node-ebs.0.id}"
instance_id = "${aws_instance.node.0.id}"
}

resource "aws_volume_attachment" "node-attach-1" {
device_name = "/dev/xvdh"
volume_id = "${aws_ebs_volume.node-ebs.1.id}"
instance_id = "${aws_instance.node.1.id}"
}

resource "aws_volume_attachment" "node-attach-2" {
device_name = "/dev/xvdh"
volume_id = "${aws_ebs_volume.node-ebs.2.id}"
instance_id = "${aws_instance.node.2.id}"
}
````

Now, if I terraform taint aws_instance.node.0, terraform taint aws_instance.node.1, and terraform taint aws_instance.node.2, terraform will only recreate the corresponding volume attachments for the first instance. The plan will be:

-/+ aws_instance.node.0 (tainted)
-/+ aws_instance.node.1 (tainted)
-/+ aws_instance.node.2 (tainted)
-/+ aws_instance.node-ebs.0
      instance_id:  "i-aaaabbbbccccdddde" => "${aws_instance.node.0.id}" (forces new resource)

Hence, we are missing the recreation of aws_instance.node-ebs.1 and aws_instance.node-ebs.2.

With all the issues related to count, I start to think that perhaps the sanest option is to go in the same direction as @kishorenc and stop using count and instead use a preprocessor.

danabr on 29 Mar 2017

Has been an issue for me in 0.9.2 to manage my aws_s3_buckets (tried to remove the 1st one, so all were going to be replaced see #13724).

What I'm wondering is if there's a way for the terraform core to use the id instead of the count index in the extension?
For example, instead of calling the resource aws_s3_bucket.mybuckets.${count.index}, calling it aws_s3_bucket.mybuckets.${id}?

The id attribute is unique inside a resource type and used as a reference when calling other resources, so could work as an identifier here.

aerostitch on 18 Apr 2017

👍2

Hi @aerostitch!

The issue of the count indices becoming "misaligned" when you add/remove items from the middle of your list is a separate but related issue to this one. The likely solution to this will be a foreach attribute which can be used instead of count to get a result like you're looking for, but we need to make some foundational configuration language changes first before that will be usable. This is likely to arrive in a future version, but we need to do some more internal design work first.

apparentlymart on 18 Apr 2017

OK. Thanks for the answer @apparentlymart. I thought that leveraging the id attribute could have been an easier fix, but a foreach could become handy indeed! :)
Should I reopen my original issue in this case?

aerostitch on 18 Apr 2017

@aerostitch the problem and intended (high-level) solution is already known, so I think we can leave your other issue closed though I do appreciate you taking the time to open it, and to hunt out this other similar issue!

apparentlymart on 18 Apr 2017

👍2

Is there any progress update on this issue? I am experiencing this problem with a combination of aws_instance and aws_alb_target_group_attachment, whereby all target group attachments are destroyed when only two aws instances need creation.

As you can imagine, removing all web nodes from a production environment's load balancer is a pretty bad outcome when you're trying to make a partial change.

Partial plan output;

-/+ aws_alb_target_group_attachment.web_tga.0
    port:             "80" => "80"
    target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq"
    target_id:        "i-aaaa" => "${aws_instance.web.*.id[count.index]}" (forces new resource)

-/+ aws_alb_target_group_attachment.web_tga.1
    port:             "80" => "80"
    target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq"
    target_id:        "i-bbbb" => "${aws_instance.web.*.id[count.index]}" (forces new resource)

-/+ aws_alb_target_group_attachment.web_tga.2
    port:             "80" => "80"
    target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq"
    target_id:        "i-cccc" => "${aws_instance.web.*.id[count.index]}" (forces new resource)

-/+ aws_alb_target_group_attachment.web_tga.3
    port:             "80" => "80"
    target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq"
    target_id:        "i-dddd" => "${aws_instance.web.*.id[count.index]}" (forces new resource)

+ aws_instance.web.1
    [...omitted]

+ aws_instance.web.3
    [...omitted]

atyndall on 20 Apr 2017

👍1

Hi everyone! Sorry this one sat here for so long.

Since this has been open for a long time and things have changed a bunch throughout its life, I want to add some context to close out this great, multi-year discussion:

When this issue was originally filed, Terraform's support for lists was rather rudimentary, and we had a function called element that served to extract a particular element from a list. From the perspective of the interpolation language, this was just a function like any other and so the language was conservative and assumed that any unknown values in the list had to produce an unknown result.

This interacted poorly with how the splat syntax deals with new instances created when count is increased, since the elements for the new instances were marked as unknown until the apply completed, and thus caused the whole set to be treated as computed.

Back in 0.7 we added a first-class indexing operator using brackets, like var.foo[1] which then gave the interpolation language an awareness of indexing. This wasn't enough to solve the problem, because there were still assumptions about lists either being wholly known or unknown, but it gave us an important building block to fix this.

In #14135 I reorganized how the interpolation language deals with unknowns so that partially-unknown collections (lists and maps) can be passed around _within a single interpolated string, with the final "partially unknown becomes fully unknown" mapping now done at the end, before the final result is returned to Terraform.

To get the benefit of this fix, it will be necessary to rewrite any existing configs using this pattern:

    some_attr = "${element(some_other_resource.foo.*.some_attr, count.index)}"

The new form, with the first-class indexing operator, would be the following:

    some_attr = "${some_other_resource.foo.*.some_attr[count.index]}"

apparentlymart on 5 May 2017

❤32 🎉3

Great addition! @ahl this should solve your problem.

jen20 on 5 May 2017

After changing my terraform files to use this syntax it becomes impossible to recreate resources that are referenced by other resources when they disappear. I get this sort of error message:

Error refreshing state: 2 error(s) occurred:

* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.name (max 3) in:

${openstack_compute_instance_v2.worker.*.name[count.index]}
* data.template_file.workers_ansible: 1 error(s) occurred:

* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.name (max 3) in:

${openstack_compute_instance_v2.worker.*.name[count.index]}

That is I have a openstack_compute_instace_v2 resource and an output resource to generate an ansible inventory. This works fine, but if I delete one of the instances in the openstack portal terraform fails to refresh state and subsequently to recreate the node. Use element() allows recreating the instance

sigmunau on 15 May 2017

👍1

Hi @sigmunau,

I think what you're seeing there is the bug that was fixed in #14098. It's merged in master but hasn't made it into a release yet.

apparentlymart on 15 May 2017

I'm actually seeing the same issue that @sigmunau is experiencing using 0.9.6-dev (installed via go get -u github.com/hashicorp/terraform). I'll try to isolate this.

kreisys on 15 May 2017

Hmm sorry on second read it does look a bit different. I'll take a look.

@kreisys if you dig up some info it'd be cool to have a new top-level issue for this! Thanks 😀

apparentlymart on 15 May 2017

@apparentlymart I managed to isolate this: #14521

kreisys on 15 May 2017

From the description of #14098 my reported issue seem to be just that. Particularly my case does not involve modules or the [] syntax that @kreisys reports in #14521, so it doesn't seem that my issue is that. I'll see if I can reproduce using 0.9.6-dev

sigmunau on 16 May 2017

With 0.9.6-dev I get this:

Error refreshing state: 1 error(s) occurred:

* data.template_file.workers_ansible: 1 error(s) occurred:

* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.name (max 3) in:

${openstack_compute_instance_v2.worker.*.name[count.index]}

So the error message seems the same, but no longer repeated.

sigmunau on 16 May 2017

I created a new issue #14536 with details of my problem

sigmunau on 16 May 2017

I have a similar issue passing a splat list to a module, even if I access the elements inside the module with [] syntax. Does passing a list into a module impact whether its considered 'unknown' as a whole?

dannytrigo on 30 Nov 2017

👍1

@apparentlymart I have the same issue as @dannytrigo :'(

Here is my sample :

.
├── inputs.tfvars
├── instance
│   └── main.tf
├── main.tf

main.tf
```.tf
variable "shortnames" {
type = "list"
}

module "generic_linux" {
source = "./instance/"
shortnames = "${var.shortnames}"
}

resource "aws_ebs_volume" "data" {
count = "${length(var.shortnames)}"
availability_zone = "eu-west-1a"
size = "5"
type = "gp2"
}

resource "aws_volume_attachment" "data_ebs_att" {
count = "${length(var.shortnames)}"
device_name = "/dev/sdc"
volume_id = "${aws_ebs_volume.data.*.id[count.index]}"
instance_id = "${module.generic_linux.instances_ids[count.index]}"
}


My module code is : 
instance/main.tf
```.tf
variable "shortnames" {
  type        = "list"
  description = "list of shortname"
}

resource "aws_instance" "instances" {
  count = "${length(var.shortnames)}"

  instance_type          = "t2.micro"
  key_name               = "formation-hpc"
  ami                    = "ami-xxxxxxxx"
  vpc_security_group_ids = ["sg-xxxxxxxx"]
  subnet_id              = "subnet-xxxxxxxx"

  tags {
    Name = "${var.shortnames[count.index]}-${count.index}"
  }
}

output "instances_ids" {
  value = "${aws_instance.instances.*.id}"
}

Usage of instance_id = "${module.generic_linux.instances_ids[count.index]}" force new resource :

-/+ aws_volume_attachment.data_ebs_att[0] (new resource required)
      id:                               "vai-764663169" => <computed> (forces new resource)
      device_name:                      "/dev/sdc" => "/dev/sdc"
      force_detach:                     "" => <computed>
      instance_id:                      "i-02759cd6c3590764f" => "${module.generic_linux.instances_ids[count.index]}" (forces new resource)
      skip_destroy:                     "" => <computed>
      volume_id:                        "vol-096815f03a512625c" => "vol-096815f03a512625c"

Is there a workaround to add nodes on an undefined size cluster based on a generic instance module without recreate each dependant resources ?

jnahelou on 28 Dec 2017

I do have the same issue. It's a bit shocking that this issue has been opened for 2 years and hasn't been fixed already. It'd be great if someone had a workaround for this.

loalf on 2 Mar 2018

I just happened to find a workaround-ish using lifecycle ignore_changes. So, in @jnahelou's example, the amended terraform script would look like:

resource "aws_volume_attachment" "data_ebs_att" {
  count       = "${length(var.shortnames)}"
  device_name = "/dev/sdc"
  volume_id   = "${aws_ebs_volume.data.*.id[count.index]}"
  instance_id = "${module.generic_linux.instances_ids[count.index]}"

  lifecycle {
    ignore_changes: ["instance_id"]
  }
}

loalf on 2 Mar 2018

I've done similar @loalf - but it feels as though that really shouldn't be necessary. Being that Terraform is intentionally declarative, I can see how it's ended up being this way.

In my case, I dynamically allocate instances in round-robin fashion to whatever variable number of subnets I have. BUT, when you change the number of subnets you have provisioned in a given VPC, it can dangerously trigger the recreation of your EC2's, so I've done something similar to what you have.

Check this:

resource "aws_instance" "ec2_instance" {
  ami                     = "${lookup(var.aws_machine_images, "${var.ubuntu_version},${var.aws_region}")}"
  instance_type           = "${var.instance_type}"
  count                   = "${var.total_instances}"
  disable_api_termination = "${var.enable_instance_protection}"

  # TODO: Fix this!
  # Changing the number of subnets will trigger resource recreation
  # ergo, the lifecycle manager
  subnet_id = "${element(var.subnet_ids, count.index)}"

  key_name                    = "${var.key_pair_id}"
  vpc_security_group_ids      = ["${var.security_group_ids}"]
  associate_public_ip_address = "${var.associate_public_ip_address}"

  root_block_device {
    volume_type           = "${var.root_volume_type}"
    volume_size           = "${var.root_volume_size_gb}"
    delete_on_termination = "${var.storage_delete_on_termination}"
  }

  tags {
    Name        = "${var.total_instances > 1 ? format("%s-%02d-%s", var.instance_name, (count.index + 1), var.environment) : format("%s-%s", var.instance_name, var.environment)}"
    ServerGroup = "${var.instance_name}-${var.environment}"
    ServerName  = "${var.instance_name}${count.index}"
    Environment = "${var.environment}"
  }

  lifecycle {
    ignore_changes = ["subnet_ids"]
  }
}

armenr on 5 Mar 2018

👍1

@armenr would your solution create additional ec2 instances or remove extra ones when the count of subnets changes?

Or is this designed to always keep the number of ec2 instances static after initial creation?

misham on 5 Mar 2018

Please check this option:
https://github.com/hashicorp/terraform/issues/14357

Instead of "element" use the [] option.

bereti on 14 Mar 2018

😕1

@misham - Good question! It will KEEP existing instances in the subnets where they reside, and add instances when you add a subnet to your list of subnets.

From what I recall, if I issue a destroy on a specific subnet, the EC2's get destroyed also.

armenr on 20 Mar 2018

👍1

Recently upgraded terraform

Terraform v0.11.7
+ provider.aws v1.18.0
+ provider.template v1.0.0

I've tried the syntax by @apparentlymart

# Create AWS Instances
resource "aws_instance" "web" {
  count                         = "${var.count}"
  ami                           = "${var.aws_ami}"
  instance_type                 = "${var.aws_instance_type}"
  associate_public_ip_address   = "${var.aws_public_ip}"
  ...
}

# Attach Instances to Application Load Balancer
resource "aws_alb_target_group_attachment" "web" {
  count = "${var.count}"
  target_group_arn = "${var.aws_alb_target_group_arn}"
  # target_id = "${element(aws_instance.web.*.id, count.index)}"
  target_id = "${aws_instance.web.*.id[count.index]}"
  port = "${var.aws_alb_target_group_port}"
}

However when I issue the command:

terraform plan --destroy --var-file=staging.tfvars -target=aws_alb_target_group_attachment.web[2] -target=aws_instance.web[2]

or just

terraform plan --destroy --var-file=staging.tfvars -target=aws_instance.web[2]

Terraform wants to destroy all aws_alb_target_group_attachments:

Terraform will perform the following actions:

  - aws_alb_target_group_attachment.web[0]

  - aws_alb_target_group_attachment.web[1]

  - aws_alb_target_group_attachment.web[2]

  - aws_alb_target_group_attachment.web[3]

  - aws_instance.web[2]


Plan: 0 to add, 0 to change, 5 to destroy.

I can properly remove just the aws_alb_target_group_attachment:

terraform plan --destroy --var-file=staging.tfvars -target=aws_alb_target_group_attachment.web[2]

However, if I follow that up with a destroy of the instance it will want to remove all other remaining target group attachment still.

Is the approach wrong or is there still a bug here?

zeroedin on 17 May 2018

I still have same problem.

Example:

resource "aws_instance" "masters" {
  count = "3"
  ami = "${var.ami}"
}

resource "null_resource" "outindex" {
  count = "3"
  triggers {
    cluster_instance = "${aws_instance.masters.*.id[count.index]}"
  }

  provisioner "local-exec" {
    command = "date"
  }
  lifecycle { create_before_destroy = true }
}

When I try to update instance with new AMI for first resource it first updated ALL instances, then start execute null resource.

 $ terraform plan -target="null_resource.outindex[0]"
-/+ aws_instance.masters[0] (new resource required)
      ami:               "ami-xxxx" => "ami-yyy" (forces new resource)

-/+ aws_instance.masters[1] (new resource required)
      ami:               "ami-xxxx" => "ami-yyy" (forces new resource)

-/+ aws_instance.masters[2] (new resource required)
      ami:               "ami-xxxx" => "ami-yyy" (forces new resource)

+ null_resource.outindex[0]

I expected to see only for first instance changes.

Environment:

 $ terraform version
Terraform v0.11.11
+ provider.aws v1.57.0
+ provider.external v1.0.0
+ provider.local v1.1.0
+ provider.null v2.0.0
+ provider.template v2.0.0

OS: MacOS

UPDATE:

Currently to fix this I do:

$ terraform plan -target="null_resource.outindex[0]" -target="aws_instance.masters[0]"
-/+ aws_instance.masters[0] (new resource required)
      ami:               "ami-xxxx" => "ami-yyy" (forces new resource)

+ null_resource.outindex[0]

miry on 8 Feb 2019

I see that this is closed but I'm still experiencing the same issue in v0.11.10. Is this expected?

zygisa on 23 Apr 2019

👍3

I noticed this issue appears to still be happening in Terraform v0.11.14. Could this be because we are using a module under the hood to create the EC2 instances? Incrementing our count from 7 => 8 causes all volume attachments 1-7 to be re-attached.

module "elk-elasticsearch-node" {
  source = "./app-cluster-static"
}

# ./app-cluster-static/main.yml
module "this" {
  source  = "terraform-aws-modules/ec2-instance/aws"
  version = "~> 1.19.0"
  ...
}

-/+ aws_volume_attachment.elk-elasticsearch-node[7] (new resource required)
      id:                                        "vai-2048368739" => <computed> (forces new resource)
      device_name:                               "/dev/sdf" => "/dev/sdf"
      instance_id:                               "i-08493c7837712a7ea" => "${module.elk-elasticsearch-node.instance_ids[count.index]}" (forces new resource)
      volume_id:                                 "vol-0c9a19f4f4ce4dfd6" => "vol-0c9a19f4f4ce4dfd6"

  + aws_volume_attachment.elk-elasticsearch-node[8]
      id:                                        <computed>
      device_name:                               "/dev/sdf"
      instance_id:                               "${module.elk-elasticsearch-node.instance_ids[count.index]}"
      volume_id:                                 "${aws_ebs_volume.elk-elasticsearch-node.*.id[count.index]}"

$ terraform -v
Terraform v0.11.14
+ provider.aws v2.13.0
+ provider.azuread v0.3.1
+ provider.null v2.1.0
+ provider.random v2.1.0
+ provider.template v2.1.0
+ provider.tls v2.0.1

dekimsey on 1 Jul 2019

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashibot[bot] on 24 Jul 2019

Terraform: Using element with splat reference should scope dependency to selected resource

Most helpful comment

All 95 comments

Related issues