Terraform: ParseInt error on apply for count value

Created on 4 Dec 2017 · 13Comments · Source: hashicorp/terraform

Terraform Version

Terraform v0.11.0
+ provider.null v1.0.0

Your version of Terraform is out of date! The latest version
is 0.11.1. You can update by downloading from www.terraform.io/downloads.html

Terraform Configuration Files

main.tf:

module "a" {
  source  = "foo"
  enabled = false
}

resource "null_resource" "d" {
  triggers {
    a = "${module.a.deps}"
  }
}

foo/main.tf:

variable "enabled" {
  default = true
}

resource "null_resource" "nothing" {
  count = "${var.enabled?0:1}"
}

output "deps" {
  value = "${null_resource.nothing.count}"
}

Expected Behavior

I would have expected this to proceed with a value of zero for the count.

Actual Behavior

Plan:

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

null_resource.nothing: Refreshing state... (ID: 3587990660641692538)

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + null_resource.d
      id:         <computed>
      triggers.%: "1"
      triggers.a: "1"


Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: tf.out

To perform exactly these actions, run the following command to apply:
    terraform apply "tf.out"

apply (using the out file)

module.a.null_resource.nothing: Creating...
module.a.null_resource.nothing: Creation complete after 0s (ID: 8193429725265536686)

Error: Error applying plan:

1 error(s) occurred:

* module.a.output.deps: Error reading null_resource.nothing count: strconv.ParseInt: parsing "${var.enabled?0:1}": invalid syntax

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Steps to Reproduce

terraform init
terraform plan -out tf.out
terraform apply tf.out

Important Factoids

The outfile is critical.

References

There are a few other issues that refer to various ParseInt issues, but I did not see one with my specific case. This obviously is a contrived example, but it represents a minimal error case.

bug config

Source

davedash

Most helpful comment

Hi again, everyone!

I have found the root cause of the issue here, and indeed it _is_ only working in the one-shot terraform apply case because we share the configuration structures between the plan and apply steps in that scenario.

(This section contains some details about Terraform's guts which are probably not interesting to non-maintainers; feel free to skip to the next section for discussion about workarounds and next steps.)

In most of Terraform's graph walks it starts off with a single graph node per resource or data block and then _during the walk_ it "dynamic-expands" that single node into a subgraph containing a number of nodes corresponding to the count. When Terraform visits the initial _aggregate_ graph node the first step is to interpolate the count expression, whose result is then retained inside the configuration object so that the dynamic-expand operation knows how many instances to create.

The trick for this issue is that when we generate a plan we've _already_ dynamic-expanded the aggregate nodes and so the plan result contains one diff for each instance. Because of this, when we build the graph for the _apply_ walk we don't need to dynamic-expand (we know statically what nodes are required) and so the count interpolation never executes.

Unfortunately, an interpolation like aws_instance.foo.count still needs to be able to access that interpolated result, and due to some fallback logic in the configuration model it ends up obtaining the raw source expression in the absence of a resolved value. It then tries to parse that result as an integer, resulting in the error we see here.

This works when running both plan and apply in the same command because both the plan and apply graphs point to the same configuration instance, and so the interpolation result from the plan step is still present in the object when we get to apply. But when we're applying from a plan that was serialized to disk we only have the config _source_ saved, and so the loaded configuration object is missing that result.

This issue would therefore arise in any situation where a count interpolation is resolved during the apply walk. Provisioners and connection information are the main example of this because provisioning steps are not included in the plan and thus they are resolved entirely during the apply phase.

I am going to try to make a targeted patch for this issue by finding a safe and suitable place to introduce an additional call to interpolate the count expression during the apply walk. I'm not yet totally sure that a simple fix is possible here, since the apply step doesn't have a single aggregation point at which to deal with per-configuration-block tasks, but I will investigate further after writing this message.

I believe that this bug may be fixed automatically or may be easier to fix after we have fully introduced the new configuration parser/interpreter, since that uses a different internal model that doesn't rely on shared mutable state. However, that change will also switch to using expressions like length(aws_instance.foo) instead of this count pseudo-attribute and thus it will become generally moot at that point anyway.

For the moment I would suggest a workaround of using named local values to avoid referring directly to counts while still avoiding repetition of the count expressions.

By this, I mean to move the count expressions out into locals like this:

locals {
  ebs_instance_count = "${var.ebs_vol1_size_in_GB != "0" ? var.instance_count : 0}"
}

...and then use this local value each time the number of EBS instances is needed, rather than referring directly to the resource's count pseudo-attribute:

resource "aws_ebs_volume" "vol1" {
  count             = "${local.ebs_volume_count}"
  availability_zone = "${element(var.azs, count.index)}"
  size              = 40
  encrypted         = true
}

resource "aws_volume_attachment" "vol1_att" {
  count       = "${local.ebs_volume_count}"
  device_name = "/dev/xvdb"
  volume_id   = "${aws_ebs_volume.vol1.*.id[count.index]}"
  instance_id = "${aws_instance.instance.*.id[count.index]}"
}

resource "null_resource" "instance_vol1_provisioner" {
  count      = "${local.ebs_volume_count}"

  provisioner "local-exec" {
    command = "echo ${local.ebs_volume_count == 0 ? "" : aws_volume_attachment.volume1_att.*.device_name[count.index]}"
  }
}

Less-disruptive patterns may be possible in some situations where the count is already available somewhere else without using a .count pseudo-attribute. For example, in the above situation we know that there's always the same number of null_resource.instance_vol1_provisioner as aws_ebs_volume.vol1, and so in that provisioner you could refer to length(aws_ebs_volume.vol1.*.id) instead of self.count. However, factoring the count out into a named local value is a general solution that should work for all scenarios, though I understand it's not ideal and will hurt readability in many cases.

apparentlymart on 17 Jan 2018

❤3

All 13 comments

Hi @davedash! Sorry for this strange behavior, and thanks for the detailed reproduction case.

It's indeed strange that this would occur only if you use a separate plan file. I expect this is an unfortunate interaction with the fact that the output gets re-evaluated at apply time but the count is normally dealt with during plan, so that interpolation may not have what it needs during apply time. We'll have to dig in and see exactly what's going on here.

In the mean time, you may be able to get what you need using a different approach here:

output "deps" {
  value = "${length(null_resource.nothing.*.id)}"
}

This way of getting the length uses the number of instances present rather than the value specified in the config. These should have the same value at apply time, but counting the number of ids can be done without re-evaluating the count expression so it should be able to bypass the bug here.

apparentlymart on 5 Dec 2017

👍1

Wow, thanks for that. That worked with both the fictional example and my actual issue.

davedash on 5 Dec 2017

I also see the same issue when using Terraform Enterprise when upgrading to 0.11.1 from 0.10.8.

resource "aws_instance" "instance" {
  count                  = "${var.instance_count}"
  ami                    = "${var.ami_id}"
  instance_type          = "${var.instance_type}"
  subnet_id              = "${element(var.subnet_ids, count.index)}"
}

resource "aws_ebs_volume" "vol1" {
  count             = "${var.ebs_vol1_size_in_GB != "0" ? var.instance_count : 0}"
  availability_zone = "${element(var.azs, count.index)}"
  size              = 40
  encrypted         = true
}

resource "aws_volume_attachment" "vol1_att" {
  count       = "${aws_ebs_volume.vol1.count}"
  device_name = "/dev/xvdb"
  volume_id   = "${element(aws_ebs_volume.vol1.*.id, count.index)}"
  instance_id = "${element(aws_instance.instance.*.id, count.index)}"
}

resource "null_resource" "instance_vol1_provisioner" {
  count      = "${aws_volume_attachment.vol1_att.count}"

  # ...
}

* Error reading null_resource.instance_vol1_provisioner count: strconv.ParseInt: parsing "${aws_volume_attachment.vol1_att.count}": invalid syntax

ktham on 8 Dec 2017

I still get the same error when replacing count with length interpolation on the actual resource id values on Terraform Enterprise's terraform apply step. Any idea what's the issue @apparentlymart ? Wondering if you're able to replicate it on TFE? I run terraform apply locally just fine with or without the workaround above.

* Error reading null_resource.instance_vol1_provisioner count: strconv.ParseInt: parsing "${length(aws_volume_attachment.vol1_att.*.id)}": invalid syntax

ktham on 8 Dec 2017

Hi @ktham!

Unfortunately I wasn't able to reproduce the error you saw here, after adjusting your config slightly so I could apply it in my dev environment:


provider "aws" {
  region = "us-west-2"
}

variable "azs" {
  default = ["us-west-2a", "us-west-2b"]
}

resource "aws_vpc" "foo" {
  cidr_block = "10.123.0.0/16"
}

resource "aws_subnet" "subnets" {
  count = "${length(var.azs)}"

  vpc_id = "${aws_vpc.foo.id}"
  availability_zone = "${var.azs[count.index]}"
  cidr_block = "${cidrsubnet(aws_vpc.foo.cidr_block, 8, count.index)}"
}

resource "aws_instance" "instance" {
  count                  = 4
  ami                    = "ami-6eef3616"
  instance_type          = "t2.micro"
  subnet_id              = "${element(aws_subnet.subnets.*.id, count.index)}"
}

resource "aws_ebs_volume" "vol1" {
  count             = 4
  availability_zone = "${element(aws_subnet.subnets.*.availability_zone, count.index)}"
  size              = 40
  encrypted         = true
}

resource "aws_volume_attachment" "vol1_att" {
  count       = "${aws_ebs_volume.vol1.count}"
  device_name = "/dev/xvdb"
  volume_id   = "${element(aws_ebs_volume.vol1.*.id, count.index)}"
  instance_id = "${element(aws_instance.instance.*.id, count.index)}"
}

resource "null_resource" "instance_vol1_provisioner" {
  count      = "${aws_volume_attachment.vol1_att.count}"

  # ...
}

I tried this both directly with running terraform apply and with running terraform plan -out=tfplan && terraform apply tfplan, both of which successfully created everything. I also tried an initial apply using Terraform 0.10.8 and then switched to 0.11.1 to try to update it, but didn't run into any errors.

Could you possibly try the above yourself and see if it works? If it does, then we can try to figure out if one of the changes I made to complete the config has changed the outcome.

apparentlymart on 13 Dec 2017

Thanks @apparentlymart for the follow up! The above does not give an error, but after I add the following provisioner to null_resource.instance_vol1_provisioner:

  provisioner "local-exec" {
    command = "echo ${self.count == 0 ? "" : element(aws_volume_attachment.volume1_att.*.device_name, count.index)}"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo mkfs -t ext4 ${self.count == 0 ? "" : element(aws_volume_attachment.vol1_att.*.device_name, count.index)}"
      # format disk
    ]
  }

  provisioner "remote-exec" {
    inline = [
      "sudo mkfs -t ext4 ${self.count == 0 ? "" : aws_volume_attachment.vol1_att.*.device_name[count.index]}"
      # format disk
    ]
  }

It will produce this:

4 error(s) occurred:

* null_resource.instance_vol1_provisioner[2]: 1 error(s) occurred:

* Error reading null_resource.instance_vol1_provisioner count: strconv.ParseInt: parsing "${aws_volume_attachment.vol1_att.count}": invalid syntax
* null_resource.instance_vol1_provisioner[3]: 1 error(s) occurred:

* Error reading null_resource.instance_vol1_provisioner count: strconv.ParseInt: parsing "${aws_volume_attachment.vol1_att.count}": invalid syntax
* null_resource.instance_vol1_provisioner[0]: 1 error(s) occurred:

* Error reading null_resource.instance_vol1_provisioner count: strconv.ParseInt: parsing "${aws_volume_attachment.vol1_att.count}": invalid syntax
* null_resource.instance_vol1_provisioner[1]: 1 error(s) occurred:

* Error reading null_resource.instance_vol1_provisioner count: strconv.ParseInt: parsing "${aws_volume_attachment.vol1_att.count}": invalid syntax

ktham on 13 Dec 2017

@apparentlymart Were you able to reproduce this error? If so, is there any workaround for this issue?

ktham on 13 Dec 2017

We're seeing a similar issue on 0.11.1, been trying to create a simple tf config to reproduce, but wasn't able so far.

I try to explain the behaviour I see and how I interpret it:

We have a main tf config which imports several other modules, one of these modules (a) takes in a list variable (i.e. clusters ), uses it to create resources per list entry and zipmaps resources as output maps (for other modules to use specific list indices to retrieve resources created). (i.e. cidr_map, rtb_map, vpc_map ...)

terraform plan / apply all run without error.

terraform plan -target module.x gives error:

Error: Error refreshing state: 1 error(s) occurred:

* module.a.output.rtb_map: strconv.ParseInt: parsing "${length(var.clusters)}": invalid syntax

if module.x uses one of module.a output maps (i.e cidr_map), the error is not with the map it uses, but with another map (i.e. rtb_map).

if I include -target module.y, where module.y uses the output map that gives the error (i.e. module.y takes in rtb_map):

terraform plan -target module.x -target module.y

will not give an error.

Note that nowhere do we call length directly, this is done by zipmap internally

TLDR: if plan is ran with a target of module X which uses zipmap-ed output of module A and somehow causes other modules Y which use other zipmap-ed outputs of module A to be evaluated, count evaluation triggered by modules Y fails unless module Y is explicitly included as a target.

so0k on 2 Jan 2018

Hey @apparentlymart, apologize for the repeated nag, were you able to reproduce the issue? This is a blocker for us right now since we are trying to use Terraform through Terraform Enterprise (TFE). If there's anything else you need, do let me know or if there's any workaround we can do to get around the bug

ktham on 3 Jan 2018

Hi @ktham! Sorry for the delay in getting back to you here; had some delays due to the holidays and then illness, but trying to get caught back up now.

I have a simple reproduction case for this now, so I'm going to dig in and try to figure out what's going wrong here.

apparentlymart on 17 Jan 2018

Hi again, everyone!

For the moment I would suggest a workaround of using named local values to avoid referring directly to counts while still avoiding repetition of the count expressions.

By this, I mean to move the count expressions out into locals like this:

locals {
  ebs_instance_count = "${var.ebs_vol1_size_in_GB != "0" ? var.instance_count : 0}"
}

...and then use this local value each time the number of EBS instances is needed, rather than referring directly to the resource's count pseudo-attribute:

resource "aws_ebs_volume" "vol1" {
  count             = "${local.ebs_volume_count}"
  availability_zone = "${element(var.azs, count.index)}"
  size              = 40
  encrypted         = true
}

resource "aws_volume_attachment" "vol1_att" {
  count       = "${local.ebs_volume_count}"
  device_name = "/dev/xvdb"
  volume_id   = "${aws_ebs_volume.vol1.*.id[count.index]}"
  instance_id = "${aws_instance.instance.*.id[count.index]}"
}

resource "null_resource" "instance_vol1_provisioner" {
  count      = "${local.ebs_volume_count}"

  provisioner "local-exec" {
    command = "echo ${local.ebs_volume_count == 0 ? "" : aws_volume_attachment.volume1_att.*.device_name[count.index]}"
  }
}

apparentlymart on 17 Jan 2018

❤3

Thank you @apparentlymart !

ktham on 20 Jan 2018

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.