I have been quietly working on a really sweet terraform + provisioner setup. I've now run into a pretty difficult issue I can't seem to resolve.
See code below. This is m
resource "aws_instance" "ec2_instance" {
ami = "${lookup(var.aws_machine_images, "${var.ubuntu_version},${var.aws_region}")}"
instance_type = "${var.instance_type}"
count = "${var.total_instances}"
# Need to fix this
# changing the number of subnets causes instances to be recreated
subnet_id = "${element(var.subnet_ids, count.index)}"
key_name = "${var.ssh_key-id}"
vpc_security_group_ids = ["${var.security_group_ids}"]
associate_public_ip_address = "${var.public_ip}"
root_block_device {
volume_type = "${var.root_volume_type}"
volume_size = "${var.root_volume_size}"
delete_on_termination = "${var.storage_delete_on_termination}"
}
tags {
Name = "${var.instance_name}"
Environment = "${var.environment}"
}
}
# Taint this to re-run provision on the host
resource "null_resource" "provision_run" {
count = "${var.total_instances}"
# TODO: This trigger causes ansible to run on every single node when you
# add a new node. Figure out more efficient triggering.
# Without this trigger, you can taint an instance and recreate it, but
# ansible provisioner won't run until you taint your null resource too.
triggers {
cluster_instance = "${ var.total_instances > 1 ? element(aws_instance.ec2_instance.*.id, count.index) : aws_instance.ec2_instance.*.id[0] }"
}
provisioner "ExampleThing" {
connection {
user = "${var.provision_user}"
private_key = "${var.ssh_key}"
host = "${aws_instance.ec2_instance.*.public_ip[count.index]}"
type = "ssh"
}
}
This is an ec2 module I wrote on top of the ec2 resource.
In my main.tf, I reference this module and create a module for "web_servers" and pass that a count of 1, 2, 3, etc.
When an instance is created, the null_resource immediately provisions it. After that succeeds, it also creates a corresponding null_resource.ansible_run[idx] in state.
What's really neat is that it generates a null_resource for each of the instances that are created. So, for 3 instances, I get 3 null resources.
1) If I want to reprovision instance0, I can simply taint its corresponding null_resource.ansible_run.0, and run an apply. It will fire off another round of provisioning on that one host.
2) If I want to modify my server count and add another instance in my web_servers module, a new instance will be created, and another new null_provisioner that corresponds to it, and then that node will be provisioned. Again, super neat.
1) Without triggers, when I taint an instance, I have to also run a second command to taint its corresponding null_provisioner. This is not intuitive or fun. If you taint an ec2 instance without also tainting its null_provisioner resource in state, the instance will be recreated, but the provisioner won't run again.
1) When I I add a new instance, and run plan, the two existing null resources are forced to be recreated, even though I've tagged them with a "triggers.cluster_instance: " unique value that should tell terraform that they're the same and don't need recreation.
With a total_instance value of 2, when I run "terraform show" I get this:
module.pritunl_server.null_resource.provision_run.0:
id = 918071134436281956
triggers.% = 1
triggers.cluster_instance = EC2_INSTANCE_ID_REDACTED
module.pritunl_server.null_resource.provision_run.1:
id = 4594341873744931197
triggers.% = 1
triggers.cluster_instance = EC2_INSTANCE_ID_REDACTED
When I bump the value of total_instances to 3 and run a plan, I get this:
-/+ module.pritunl_server.null_resource.ansible_run[0] (new resource required)
id: "918071134436281956" => <computed> (forces new resource)
triggers.%: "1" => <computed> (forces new resource)
triggers.cluster_instance: "REDACTED" => "" (forces new resource)
-/+ module.pritunl_server.null_resource.ansible_run[1] (new resource required)
id: "4594341873744931197" => <computed> (forces new resource)
triggers.%: "1" => <computed> (forces new resource)
triggers.cluster_instance: "REDACTED" => "" (forces new resource)
+ module.pritunl_server.null_resource.ansible_run[2]
id: <computed>
triggers.%: <computed>
I'm guessing this is because of the way I'm grabbing the instance ID's, since they show as "computed."
Is there some smarter way for me to do this without hard-coding the values in a map or locals? That wouldn't be great.
Happy to use locals too, or lifecycle hooks, or ANY other method that can get me where I'm trying to go.
I could try splitting the "resource null_resource" "provision_run" resource into its own module, and then nest that in the ec2 module, and maybe grab its outputs?
I've been at this for 2 days, so I'm losing my mind a bit.
Any help would be greatly appreciated. Thanks!
Hi @armenr! Sorry for this weird behavior.
If you use the square-bracket indexing syntax instead of the element function then Terraform should be able to understand better what's going on here:
triggers {
cluster_instance = "${ aws_instance.ec2_instance.*.id[count.index] }"
}
The reason for this different behavior is that Terraform just sees element as any other function and doesn't understand that only one element of aws_instance.ec2_instance.*.id is being accessed. By using the first-class index syntax, Terraform has more information and can understand that only one element is relevant to the decision of whether this value is "computed".
It's also not necessary to use the conditional on var.total_instances because count.index still produces 0 when count = 1.
I hope this helps!
@apparentlymart - After I posted this, I figured out the solution because of a comment you'd left for someone else about first-class index access elsewhere. YOU, my friend, are a gentleman.
I appreciate your response, but I actually did manage to figure it out before checking back here.
The problem was that I had to move my null_provisioner module BACK into the module with my EC2. I couldn't use the square bracket index accessor on output variables that I was outputting from the root ec2 module, up into the module in my main.tf, then back down again into my ansinull module.
When I moved my ansinull module code back into the EC2 module I switched over to the bracket notation and found that it worked by directly referencing aws_instance.ec2_instance.*.id[count.index].
Once I had that figured out, I realized there was no need to coerce the interpolation with a ternary, just as you mentioned, also.
That was just an hour before your response. I appreciate your feedback so much, because it tells me my head's in the right place.
I've been on a non-stop Terraform rampage for the last week, and went from knowing surface knowledge to really understanding the power that terraform's API exposes. Along the way, I found the github issues and your responses to many of them to be even more useful than the terraform docs themselves.
Thanks for everything and keep up the great work!
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
Hi @armenr! Sorry for this weird behavior.
If you use the square-bracket indexing syntax instead of the
elementfunction then Terraform should be able to understand better what's going on here:The reason for this different behavior is that Terraform just sees
elementas any other function and doesn't understand that only one element ofaws_instance.ec2_instance.*.idis being accessed. By using the first-class index syntax, Terraform has more information and can understand that only one element is relevant to the decision of whether this value is "computed".It's also not necessary to use the conditional on
var.total_instancesbecausecount.indexstill produces0whencount = 1.I hope this helps!