Terraform: Interpolated count values differ after the initial apply

Created on 22 Apr 2017 · 10Comments · Source: hashicorp/terraform

Creating a resource with an interpolated count works on the very first apply. Subsequent plans and applies that cause the resource to be modified will change the name of the resource. This can lead to issues where there are references to resources that terraform doesn't believe exist, however they do... just under a slightly different name.

From my debugging it appears that EvalCountFixZeroOneBoundary is executed in two passes. After modification of a resource containing a count, the first pass will always return a count of 1 (and count appears in the RawConfig's unknownKeys). The second has the correct value, but by this point, the resource name has already been replaced with a string that has trimmed the ".0" from the end.

Terraform Version

Terraform v0.9.3

Affected Resource(s)

Affects core

Terraform Configuration Files

resource "aws_autoscaling_group" "asg" {
  ...
  count = "${length(data.aws_availability_zones.available.names)}"
  ...
}

Expected Behavior

Resource names should be consistent as per the initial apply.

aws_autoscaling_group.asg.0: Refreshing state... (ID: ...)
aws_autoscaling_group.asg.1: Refreshing state... (ID: ...)
aws_autoscaling_group.asg.2: Refreshing state... (ID: ...)

Actual Behavior

Resource names are modified on later applies/plans. The ".0" prefix is removed:

aws_autoscaling_group.asg: Refreshing state... (ID: ...)
aws_autoscaling_group.asg.1: Refreshing state... (ID: ...)
aws_autoscaling_group.asg.2: Refreshing state... (ID: ...)

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform apply
terraform plan/apply after a modification that causes an update to the resource with a count

References

https://github.com/hashicorp/terraform/pull/11482

bug config

Source

saracen

All 10 comments

I attempted to fix this by ignoring a rename if there's another resource in the state that ends in ".1"
(basically assume that the resource.Count() variable of 1 must be incorrect).

However, this means that if we do actually modify the count to 1, the other resources are deleted and 1 resource will be created with the removed ".0" (aws_autoscaling_group.asg), and then will error, because it hasn't deleted the .0 resource (aws_autoscaling_group.asg.0).

This renaming business seems error prone. Maybe we could just always have a .0 whenever a count is defined, even if it happens to only equal 1.

saracen on 22 Apr 2017

Sounds like another case possibly resolved by #13793.

@saracen - if you are hacking the source directly - can you see if adding the transformer to the plan graph builder fixes the issue for you?

Also I think that having count imply a list by default regardless of the count is a good idea, but one consideration that would need to be taken into account in that case is that it is currently a common pattern to use count to toggle singular resources in the graph (ie: ${var.enabled ? 1 : 0}), and those configs may be actually relying on something like aws_autoscaling_group.foo.id (without splat) and those configs might break under that change (not 100% sure on that one, just something to check).

vancluever on 22 Apr 2017

@vancluever Early indications seem that it's working great! I added a bunch of weird workarounds and static values to my configs because I've got a deadline. But I'll be gradually removing more of those today and testing this out.

Thank you so much for this solution and spotting this issue at a weekend!

saracen on 23 Apr 2017

🎉1

@saracen no problem and happy to hear things are working for you!

vancluever on 23 Apr 2017

@vancluever Something still seems to be a problem somewhere, but I haven't tracked down whats causing it yet.

I can do a bunch of updates, and then I eventually hit a cycle error. When listing the state, I see that an instance of a resource without the count suffix has been introduced:

resource.dynamic
resource.dynamic[0]
resource.dynamic[1]
resource.dynamic[2]

Removing it fixes the cycle problem. When I'm free I'll try to come up with something that's repeatable.

saracen on 24 Apr 2017

@saracen there definitely seems to be a case where this is still happening (see #13828 for a repro). Looking into it a bit more I'd imagine that there's still something a little amiss where on refreshes the resource does not exist for interpolation like it should... I'm still in the process of tracking it down but probably won't be able to look that much more into it until the evening.

vancluever on 24 Apr 2017

@saracen I think the stuff in #13828 might be due to a different issue actually, but you might want to check the updates I put in that ticket to see if applying that stuff helps with the cycles. Cheers!

vancluever on 25 Apr 2017

I just got bit badly by this bug. It was while developing a plugin which made it worse because I kept assuming it was a bug in the plugin...

It seems it can manifest itself in 2 ways:

Error reading aws_instance.web count: strconv.ParseInt: parsing "${length(var.clusters) * var.total_workers_per_cluster}": invalid syntax

or with the .0 disappearing, which in my case resulted in

cabot_check_graphite.web_disk_critical.0: diffs didn't match during apply. This is a bug with Terraform and should be reported as a GitHub Issue.

I don't have much to add except that i hope it's fixed soon

frankh on 26 Apr 2017

Hi folks,
I am going to close this issue, because we approach this in a very different way in terraform 0.12. The relevant code base has changes significantly.
Anyone experiencing a similar-looking issue should please open a new GitHub issue and fill out the issue template in its entirety.
Thank you!

teamterraform on 31 Jul 2019

👍1

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.