When using template_file for user_data in aws_instance, hash for the new user_data in plan always be the same one.
(I have updated this issue to better describe the situation.)
Step1 - Run plan (there is existing resource):
user_data: "5f0f80cd55ecdabd2e01f8b75efbab033bbfdca0" => "52af9502d532f588235a22618e42a3ce3c395fd4" (forces new resource)
Step2 - Then apply:
user_data: "" => "5f0f80cd55ecdabd2e01f8b75efbab033bbfdca0"
You can see the hash for the new user_data in plan is not actually the one as apply.
Step3 - When I add a new line to the user_data template file, and run the plan again:
user_data: "5f0f80cd55ecdabd2e01f8b75efbab033bbfdca0" => "52af9502d532f588235a22618e42a3ce3c395fd4" (forces new resource)
It generates the same hash for the new user_data.
Step4 - I revert my change on user_data (same as original one), and run the plan again:
user_data: "5f0f80cd55ecdabd2e01f8b75efbab033bbfdca0" => "52af9502d532f588235a22618e42a3ce3c395fd4" (forces new resource)
It has the same output as previous step. The hash for user_data doesn't change back.
So the main problem caused by this issue is that if I accidentally change my user_data and run plan, there is no way back, I have to recreate all instances again even though there is no change at all. This is a critical bug in terms of workflow. And I cannot safely manage the infrastrucutre without this issue being fixed.
Just bumped into the _opposite_ problem - I'm modifying template_file and terraform plan ignores my change.
I have an aws_launch_configuration which depends on a template_file for its user_data field. When I modify the template file (that is, the file contents, not the template_file resource definition), terraform plan does not register this change. Expected behavior is modification or recreation of the LC.
I'm running Terraform v0.6.1-dev (ab0a7d8096df5ef45e3d7fa995cccf399220264f).
Hi @killercentury and @pikeas - thanks for the report.
Can you provide some config that reproduces this behavior? That would be a big help to us in debugging.
Unfortunately, this seems to be intermittent. If and when I find a simple repro, I'll share a test case!
I have some configs that will do it:
provider "aws" {
region = "us-east-1"
}
variable "count" {
default = 2
}
resource "template_file" "example" {
filename = "test.txt"
count = "${var.count}"
vars = {
name = "file-${count.index+1}"
}
}
resource "aws_instance" "servers" {
count = "${var.count}"
ami = "ami-e0efab88"
instance_type = "t2.micro"
user_data = "${element(template_file.example.*.rendered, count.index)}"
}
output "templates" {
value = "${join(\",\",template_file.example.*.rendered)}"
}
With a template file of this:
${name}
First time you apply with count set to 2, it builds 2 instances. If you change count to 3, it will properly show only 1 template file will be added, but all the user_data hashes will be wrong and cause all instances to be destroyed/recreated, when only 1 needs to be created.
I use this as a way to build slaves for different things quickly, and allow me to scale them slowly (while also inserting data into each to make them unique). The file here is just an example, I'm usually passing in a hostname to set, that allows each host in this set to have a unique name coming from Terraform (thanks to cloud-init).
_EDIT_: The ami above is for the Debian provided wheezy instance, but any other AMI will show the same issue. Just wanted to tell people what it was without them having to hunt things down themselves.
@phinze Any chance to look at this soon? I'm working around it for now by reworking the user_data file to not be a rendered template, but static that does it's own data lookups and just version that file manually, but I'd love for this to be as simple as it should be here.
+1
mlrobinson's scenario is almost identical to mine. I'm using a Terraform-templated cloud-init as user-data passed to our AWS instances. I'm also using count to allow us to manage the number of AWS instances in a resource pool. Our cloud-init config is fairly simple (hostname, provisioning user, SSH allowed key).
count to 4 works as expectedterraform applycount back to 5 results in one rendered template for the new resource (template_file.cloud-init.4), seen when running terraform plan -module-depth=-1@mikelaws I have the exact same issue. As a workaround, I tried to run terraform plan using the -target option, only targeting the specific new instance, but that doesn't work. Terraform just claims there's nothing to be done.
Also tried working around this using the lifecycle parameter but that caused my Terraform run to fail.
For instance:
resource "aws_instance" "foo" {
count = "${var.count}"
lifecycle {
ignore_changes = ["user_data"]
}
# more config
# ...
}
caused:
* aws_instance.foo.1: diffs didn't match during apply. This is a bug with Terraform and should be reported.
* aws_instance.foo.0: diffs didn't match during apply. This is a bug with Terraform and should be reported.
* aws_instance.foo.2: diffs didn't match during apply. This is a bug with Terraform and should be reported.
The reason for this happening is this issue https://github.com/hashicorp/terraform/issues/3864
meanwhile I've worked around it by using autoscaling groups instead of instances with counts. This does introduce some new issues regarding create_before_destroy and dependency cycles when destroying, but those I can quite easily work around.
@bennycornelissen sounds like a viable option till this bug is fixed. But
@bennycornelissen another workaround would be to rollback this PR #2788 If you roll it back and recompile terraform then everything should be fine
@Fodoj what exactly did you mean by the second point? I'm not sure I quite understand. I've been testing the autoscaling groups for about a week now, and it works exactly the way I wanted. One thing that is worth noting, and that might be what you were getting at, is that whenever I update the launch configuration, it doesn't rebuild the instances already running in the ASG. In my specific use case however, that is actually a good thing (I can manage 'rolling' upgrades myself.
But I can see how it could cause problems for other people.
@bennycornelissen well terraform is a tool to describe your infrastructure via templates. Your tempalte would say "1 instance"", but then autoscaling will create 9 more. And now your infrastructure template says "1" while in fact there are "10"of them. But that applies only if you are still using aws_instance resource. If not and if you are using only ASG resourcre in the template then I am wrong :)
I replaced the aws_instance resource with the aws_autoscaling_group and aws_launch_configuration resources
I am also encountering this issue, but I am seeing this right from the start and not after changing the count.
I do terraforom apply on a clean project and it gives me this error: diffs didn't match during apply. This is a bug with Terraform and should be reported. on aws_instance resource. I am passing in user_data argument on it. When I run it a second time it creates the aws_instance resources, but now user_data is not using the correct generated ID from template_file.
Edit: The user_data was actually loaded to the machine, I had the cloud-init misconfigured so it wasn't running properly. However, using template_file inside a module does give me that error the first time exactly as described in #3732.
I've been fighting this for the last few hours just to add a new server. I eventually built a new terraform binary with #2788 reverted. But I still get the same new user_data hash no matter what I put in my template.
It seems that this bug it's been here for a while with no fix and my impression is that is really important. Is there any forecast for its resolution?
I am not sure whether this is a related issue or not, but I am encountering an problem where the plan says my user_data (which comes from a template_file) is changed and thus forces resource recreation.
However, the plan does not indicate that my template_file.cloud-init has been modified.
FYI: Still an issue in Terraform v0.6.11
Yea, this is annoying. The hash of the user data is _not_ idempotent. We are not able to resize a cluster of machines by changing count.
Hey folks, sorry for all the trouble here. The core problem with changing counts is described over in #3449 - the fix requires that we introduce first class list indexing into Terraform's config language. This is work that @jen20 has been pushing forward, and we expect it to land with 0.7, our next major release.
So for everybody on this thread reporting problems when count is changing, that will address your issue.
It sounds like a few of you are seeing behavior unrelated to count changing - in those cases it'd be great to see some steps to reproduce so we can investigate.
Thanks @phinze! Maybe not the topic here (my apologize in advance) but, when do you think 0.7 will be released?
@jordiclariana We're still in the thick of it, so no precise timeline quite yet - should get a better handle on the expected timeline in the coming week or two. We'll continue with the relatively high patch release cadence in the meantime. :+1:
We also ran into this 👍
Looks like it's not fixed in v0.7.0
Such a pity
Hi everyone! Thanks for all the great discussion here and sorry for the lack of movement on this issue for a while.
The good news is that some core changes have been made in the intervening time that address this problem. These don't make the existing configurations work, but they provide new features of the configuration language that mitigate the root causes here.
First, some background one what's going on here: sometimes the result of an interpolation expression can't be resolved during the plan phase because its value depends on the result of an action that won't be executed until apply. In this case, Terraform puts a placeholder value in the plan, which renders in the plan out put as <computed>. Unfortunately when such a value appears in a "forces new resource" attribute, Terraform is forced to assume that the value is going to change because it can't prove during plan that the resulting value will match the current state of the resource, and so we get the problem described here where instances get recreated unnecessarily. (the "magic value" of user_data shown in these diffs is a leaky abstraction where Terraform is hashing the placeholder value used to represent a computed attribute.)
The resource "template_file" block in the earliest example in this discussion is problematic because although we (via human intuition) know that rendering a template is safe to do during plan, Terraform just assumes that all resources can't be created until apply. In 0.7 we introduced a new feature called data sources which allows us to express to Terraform that certain operations are safe to execute during the plan phase, and template_file was recast as one to enable the use-case shown in that first example:
provider "aws" {
region = "us-east-1"
}
variable "count" {
default = 2
}
data "template_file" "example" {
filename = "test.txt"
count = "${var.count}"
vars = {
name = "file-${count.index+1}"
}
}
resource "aws_instance" "servers" {
count = "${var.count}"
ami = "ami-e0efab88"
instance_type = "t2.micro"
user_data = "${data.template_file.example[count.index]}"
}
output "templates" {
value = "${join(",", data.template_file.example.*.rendered)}"
}
With template_file now a data source, Terraform can render the template during plan and notice that its result is the same as what's already in the state for user_data and thus avoid replacing the aws_instance resources.
The other thing that has changed in the mean time is the introduction of the list indexing syntax via the [ .. ] operator, which has replaced the element function in the user_data interpolation. (There are some remaining limitations of this which are captured in #3449.)
With these two changes to configurations it is possible to avoid the issue described here.
There are still some situations this doesn't cover where replacing with a data source is not appropriate and these should eventually get addressed once we've done the foundational work described in #4149. Thus I'm going to close this issue with the recommendation to switch to the template_file data source as a solution for the common case described in this issue, and anticipate a more complete solution to follow after #4149.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.