Terraform: Increasing Count of Elements causes Plan with destruction of all previous.

Created on 30 Dec 2017 · 11Comments · Source: hashicorp/terraform

Terraform Version

Terraform v0.11.1
+ provider.azurerm v0.3.3
+ provider.external v1.0.0
+ provider.local v0.1.0
+ provider.null v0.1.0
+ provider.template v1.0.0

Terraform Configuration Files

https://github.com/becoinc/DCOS-Azure

Debug Output

Crash Output

Expected Behavior

Terraform generates a plan which adds one additional VM.

Actual Behavior

Terraform generates a plan which wants to destroy all existing VMs.

Steps to Reproduce

Important Factoids

Use a count field with dependences on templates and external disks.

Adding a VM (or really any element) to an existing set is a basic operation. Can't afford to redeploy entire cluster to add one node.

References

Related to: #5054, #15789, #3885, #3449

Most importantly: #15789

bug config

Source

jzampieron

Most helpful comment

Hi @jzampieron!

Since your configuration is large I'm not sure I'm looking at the resource you're talking about, but I think you're talking about azurerm_virtual_machine.master here, where primary_network_interface_id is set to "${element(azurerm_network_interface.master.*.id, count.index)}" and azurerm_network_interface.master has count = "${var.master_count}" .

When var.master_count is increased, Terraform will plan to create one or more new instances for
azurerm_network_interface.master, which means that azurerm_network_interface.master.*.id has some new values that cannot be known until after apply, and thus Terraform marks them as <computed>.

Since element is a function, Terraform doesn't know that it will only look at one element of the given list, and so it pessimistically assumes that the result must always be <computed> if any of the elements of that list are <computed>, which then causes Terraform to see a potential change to primary_network_interface_id, which forces a new resource.

This can be addressed by accessing the list element using Terraform's list index syntax rather than the element function. Since this syntax is a fundamental part of the language, Terraform "knows" that only one element is accessed and thus it can see correctly that the existing indexes in azurerm_network_interface.master.*.id already have known values:

  primary_network_interface_id  = "${azurerm_network_interface.master.*.id[count.index]}"

In general I'd recommend _always_ using the list index syntax unless you are depending on the special "wrap-around" behavior that the element function provides.

Sorry for this current quirk in the language. We're currently working on improvements to the configuration language that allow smarter analysis of <computed> values, and that should allow us to make the element function behave as expected in the long run. We should have an opt-in experimental version of this new config language implementation coming soon, where we can see about changing the behavior here to do what you expected.

apparentlymart on 4 Jan 2018

👍3

All 11 comments

Hi @jzampieron!

  primary_network_interface_id  = "${azurerm_network_interface.master.*.id[count.index]}"

In general I'd recommend _always_ using the list index syntax unless you are depending on the special "wrap-around" behavior that the element function provides.

apparentlymart on 4 Jan 2018

👍3

Thank you for the thoughtful and complete reply.

Did I completely miss the list-index syntax or is this something that was introduced in a more recent version of terraform?

I certainly was not aware of this particular quirk until recently, so thank you for the piece of tribal knowledge. Perhaps adding a "warning" box to the language docs on the doc site would help other users avoid a similar situation.

I'm going to work to incorporate this ASAP and I'll leave feedback on the behavior.

jzampieron on 4 Jan 2018

Hi again @jzampieron!

This list index syntax was introduced in Terraform 0.7, as part of the more thorough support for lists and maps that was added in that version. The element function existed before that and was originally the only way to access list elements, since prior to 0.7 lists in Terraform were actually just delimited strings with some special helper functions. It's retained today because its wraparound behavior is useful in some situations.

You're right that this situation is currently under-documented. Unfortunately the current format of the function reference doesn't lend itself well to block structures like warnings, but in the short term we could add some additional words to the existing paragraph to talk about this. We most probably do a significant rework of the language-related documentation as part of releasing the improvements I mentioned, so we'll have an opportunity to improve the presentation to allow more per-function discussion.

apparentlymart on 4 Jan 2018

One thing that's still an issue, with the list syntax is that truly computed things, i.e. rendered fields of template files, still have this problem.

Using the element syntax with them causes the same destroy/restart plan and using the list syntax doesn't work b/c it doesn't exist:

* module.dcos.azurerm_virtual_machine.dcosPrivateAgent[6]: index 6 out of range for list data.template_file.coreos_private_ignition.*.rendered (max 6) in:

I'm wondering if I can work around by first putting the rendered output into a local file and then pulling that file back in.

jzampieron on 4 Jan 2018

I had the same issue using element. Switched to indexas indicated above. But I am facing the same index out of range for list issue.
Please suggest if there is an alternative workaround until this bug is fixed

mukund1989 on 16 Jul 2018

Hi @jzampieron and @mukund1989,

First, sorry I didn't see @jzampieron's question here sooner.

It seems like what you've both hit there is a separate issue, though it might be related. If one of you wouldn't mind opening a new issue and filling in the details about that particular problem in the template we can hopefully dig in and figure out what's going on there too.

The original problem described in this issue -- that passing a partially-unknown list to element causes the result to always be unknown -- will be fixed in Terraform v0.12. It's possible that the other issue will be too, but it'll be easier to answer that with a full reproduction case.

apparentlymart on 16 Jul 2018

👍1

Thanks for the response @apparentlymart. I have opened #18470

mukund1989 on 16 Jul 2018

It appears that this issue also affects module outputs of list type. If I have resources that rely on the output of modules and increase the count; this also results in plan that destroys previous instances.

For example, I have inside my module:

output "private_ips" {
  value = "${oci_core_instance.server.*.private_ip}"
}

And outside of my module, I reference this output as: module.mymodule.private_ips[count.index].

Increasing the count results in same behavior as using element; but I'm not sure if there's a workaround for this since it's already using the list-index syntax.

alexng-canuck on 6 Oct 2018

I was encountering errors increasing the instance count on an existing deployment but am not sure if this is the same issue:

Terraform Version

Terraform v0.11.8
+ provider.azurerm v1.13.0

Terraform Configuration Files

Many resources and attributes deleted for brevity:

resource "azurerm_managed_disk" "data" {
  count = "${var.Count}"
  name = "test${count.index}data"
  create_option = "empty"
  disk_size_gb  = "${var.DataVolumeSize}"
 }

resource "azurerm_virtual_machine" "default" {
  name = "test${count.index}"
  count = "${var.Count}"

  delete_os_disk_on_termination = true
  //delete_data_disks_on_termination = false

  storage_os_disk {
    name = "test${count.index}os"
    managed_disk_type = "${var.AccountTier}_${var.AccountReplicationType}"
    disk_size_gb = "${var.OSVolumeSize}"
    create_option = "FromImage"
    os_type = "linux"
  }

  storage_data_disk {
    name = "${azurerm_managed_disk.data.*.name[count.index]}"
    managed_disk_id = "${azurerm_managed_disk.data.*.id[count.index]}"
    disk_size_gb  = "${azurerm_managed_disk.data.*.disk_size_gb[count.index]}"
    create_option = "Attach"
    lun = 0
  }
}

Expected Behavior

Plan + apply with Count=1
Creates virtual machine "test0" with disks "test0os" and "test0data"
Plan + apply with Count=2
Creates virtual machine "test1" with disks "test1os" and "test1data"
Does not modify virtual machine "test0"

Actual Behavior

Plan + apply with Count=1
Creates virtual machine "test0" with disks "test0os" and "test0data"
Plan + apply with Count=2
Errors encountered trying to delete disk "test0data" from virtual machine "test0"

* azurerm_managed_disk.data (destroy): 1 error(s) occurred:

* azurerm_managed_disk.data: compute.DisksClient#Delete: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Disk test0data is attached to VM /subscriptions/1234567890-1234-5678-abcd-efghijkmnop/resourceGroups/test/providers/Microsoft.Compute/virtualMachines/test0"

Things I tried

Based on this comment, changed from _element_ syntax:
managed_disk_id = "${element(azurerm_managed_disk.data.*.id,count.index)}"
to array syntax:
managed_disk_id = "${azurerm_managed_disk.data.*.id[count.index]}"
Based on this comment, added the following to the azurerm_managed_disk resource:

  lifecycle {
    ignore_changes = [ "azurerm_virtual_machine", "storage_os_disk", "storage_data_disk", "os_profile" ]
  }

Thanks for any insight you can provide.

Update

Discovered the problem occurs even when the count remains unchanged; filed new issue Terraform apply not idempotent for managed disks

lubars on 8 Nov 2018

Hi all! Sorry for the long silence here.

The root cause of the original problem here was that using the element function with a list containing unknown values would always produce an unknown result, even if the specific element being requested was not unknown.

For the forthcoming Terraform v0.12.0 release we've rationalized the handling of the various parts of the configuration language at play here, which has a few different consequences for this issue.

The most interesting, I think, is that it's now possible to avoid using the splat syntax altogether for the simple situation of referencing between resources with count set, since Terraform will now accept a more intuitive direct indexing syntax:

  source_id = null_resource.source[count.index].id

This works because null_resource.source is itself now a list value, and so it can have the index operator applied to it like any other list.

However, a consequence more directly related to the original problem here is that the new language allows for functions to opt in to handling unknown values themselves, rather than the language engine just forcing any unknown input to produce an unknown output. As a result, the element function in Terraform v0.12 will behave in the same way as the index operator, returning an unknown value only if the specific requested index is unknown, and letting existing known values pass through unmodified.

To verify this, I used the following contrived configuration with the v0.12.0-alpha2 prerelease build:

variable "instance_count" {
  default = 1
}

resource "null_resource" "source" {
  count = var.instance_count
}

resource "null_resource" "new" {
  count = var.instance_count

  triggers = {
    source_id = null_resource.source[count.index].id
  }
}

resource "null_resource" "old_index" {
  count = var.instance_count

  triggers = {
    source_id = null_resource.source.*.id[count.index]
  }
}

resource "null_resource" "old_element" {
  count = var.instance_count

  triggers = {
    source_id = element(null_resource.source.*.id, count.index)
  }
}

I applied this first letting the default count value of 1 be used:

$ terraform apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # null_resource.new[0] will be created
  + resource "null_resource" "new" {
      + id       = (known after apply)
      + triggers = (known after apply)
    }

  # null_resource.old_element[0] will be created
  + resource "null_resource" "old_element" {
      + id       = (known after apply)
      + triggers = (known after apply)
    }

  # null_resource.old_index[0] will be created
  + resource "null_resource" "old_index" {
      + id       = (known after apply)
      + triggers = (known after apply)
    }

  # null_resource.source[0] will be created
  + resource "null_resource" "source" {
      + id = (known after apply)
    }

Plan: 4 to add, 0 to change, 0 to destroy.

...

After letting Terraform "create" all of those instances, I then ran it again with a higher count:

$ terraform apply -var="instance_count=2"
null_resource.source[0]: Refreshing state... [id=1449592651005192622]
null_resource.old_element[0]: Refreshing state... [id=989759912955237738]
null_resource.old_index[0]: Refreshing state... [id=1974769490544349180]
null_resource.new[0]: Refreshing state... [id=6701798703690438555]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # null_resource.new[1] will be created
  + resource "null_resource" "new" {
      + id       = (known after apply)
      + triggers = (known after apply)
    }

  # null_resource.old_element[1] will be created
  + resource "null_resource" "old_element" {
      + id       = (known after apply)
      + triggers = (known after apply)
    }

  # null_resource.old_index[1] will be created
  + resource "null_resource" "old_index" {
      + id       = (known after apply)
      + triggers = (known after apply)
    }

  # null_resource.source[1] will be created
  + resource "null_resource" "source" {
      + id = (known after apply)
    }

Plan: 4 to add, 0 to change, 0 to destroy.

....

All three permutations now behave in the same way: the zeroth instance of null_resource.source is unchanged, so accessing that instance's id produces a known value, even though the new element 1 is not yet known.

For completeness, I also tried applying again to reduce it back down to one instance per resource:

$ terraform apply
null_resource.source[0]: Refreshing state... [id=1449592651005192622]
null_resource.source[1]: Refreshing state... [id=8893888605842410816]
null_resource.old_element[1]: Refreshing state... [id=4945510547004764516]
null_resource.old_element[0]: Refreshing state... [id=989759912955237738]
null_resource.new[1]: Refreshing state... [id=7018838688027719423]
null_resource.old_index[0]: Refreshing state... [id=1974769490544349180]
null_resource.new[0]: Refreshing state... [id=6701798703690438555]
null_resource.old_index[1]: Refreshing state... [id=27026890569823466]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # null_resource.new[1] will be destroyed
  - resource "null_resource" "new" {
      - id       = "7018838688027719423" -> null
      - triggers = {
          - "source_id" = "8893888605842410816"
        } -> null
    }

  # null_resource.old_element[1] will be destroyed
  - resource "null_resource" "old_element" {
      - id       = "4945510547004764516" -> null
      - triggers = {
          - "source_id" = "8893888605842410816"
        } -> null
    }

  # null_resource.old_index[1] will be destroyed
  - resource "null_resource" "old_index" {
      - id       = "27026890569823466" -> null
      - triggers = {
          - "source_id" = "8893888605842410816"
        } -> null
    }

  # null_resource.source[1] will be destroyed
  - resource "null_resource" "source" {
      - id = "8893888605842410816" -> null
    }

Plan: 0 to add, 0 to change, 4 to destroy.

This worked before anyway, but it's good to verify that it still works correctly after the changes to the configuration language interpreter.

More complex situations may still exhibit this problem if the list of values is accessed via a function that _cannot_ handle unknowns in this way, such as join (which must produce a wholly-unknown string if any of its inputs are unknown), but direct accesses like this will work fine and most of the list-related functions have been updated to be unknown-element-aware for v0.12, so this should take care of most situations.

We'll improve on this further in a later release by resolving #17179; we laid the groundwork for this already but it'll take some more work to fully complete it.

Since this is now fixed in the master branch and ready for inclusion in the forthcoming v0.12.0 release, I'm going to close this out. Thanks for reporting this, and sorry for the delay in getting it fixed.

apparentlymart on 20 Nov 2018

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.