I'm trying to setup a multi-node cluster with attached ebs volumes. An example below:
resource "aws_instance" "nodes" {
instance_type = "${var.model}"
key_name = "${var.ec2_keypair}"
ami = "${lookup(var.zk_amis, var.region)}"
count = "${var.node_count}"
vpc_security_group_ids = ["${aws_security_group.default.id}"]
subnet_id = "${lookup(var.subnet_ids, element(keys(var.subnet_ids), count.index))}"
associate_public_ip_address = true
user_data = "${file("cloud_init")}"
tags {
Name = "${var.cluster_name}-${count.index}"
}
}
resource "aws_ebs_volume" "node-ebs" {
count = "${var.node-count}"
availability_zone = "${element(keys(var.subnet_ids), count.index)}"
size = 100
tags {
Name = "${var.cluster_name}-ebs-${count.index}"
}
}
resource "aws_volume_attachment" "node-attach" {
count = "${var.node_count}"
device_name = "/dev/xvdh"
volume_id = "${element(aws_ebs_volume.node-ebs.*.id, count.index)}"
instance_id = "${element(aws_instance.nodes.*.id, count.index)}"
}
If a change happens to a single node (for instance if a single ec2 instance is terminated) ALL of the aws_volume_attachments are recreated.
Clearly we would not want volume attachments to be removed in a production environment. Worse than that, in conjunction with #2957 you first must unmount these attachments before they can be recreated. This has the effect of making volume attachments only viable on brand new clusters.
Confirmed. We have run into this issue as well. I think it has to do with dependencies not taking the "count" into account.
I think this comes down to that the state does not track which specific instance an instance depends on, only the resource. Here is an example:
"aws_volume_attachment.db_persistent_volume_attachment.0": {
"type": "aws_volume_attachment",
"depends_on": [
"aws_ebs_volume.db_volume",
"aws_instance.db_instance"
],
"primary": {
"id": "vai-1795886726",
"attributes": {
"device_name": "/dev/sdb",
"id": "vai-1795886726",
"instance_id": "i-bb16c319",
"volume_id": "vol-cca36821"
}
}
}
When removing an aws_instance, you would have to find all aws_volume_attachments which happen to share the same "instance_id" attribute. But that would be provider, and perhaps even resource, specific.
However, this is not specific to aws. It will occur anytime you have to resources with count parameters, where one resource depends on the other. The right abstraction would be to depend on "gw_instance.db_instance.0" in this case. I don't know what the implications of that would be, though.
Turns out I was wrong. The "depends_on" attribute in the state file has nothing to do with this. Consider this diff:
-/+ aws_volume_attachment.persistent_volume_attachment.0
device_name: "/dev/sdb" => "/dev/sdb"
force_detach: "" => "<computed>"
instance_id: "i-deb76479" => "${element(aws_instance.my_instance.*.id, count.index)}" (forces new resource)
volume_id: "vol-9ba36878" => "vol-9ba36878"
It seems like changing one element of the aws_instance.my_instance.*.id causes the entire "element" expression to be considered changed.
Our current workaround is to duplicate the "aws_volume_attachment" resources, rather than using the element function.
I dug further into this. It seems the expected behaviour broke with commit 7735847579e777160664088b830624d0cde876e6, which was introduced to fix issue #2744.
To me, it seems like you want the treatment of unknown values in splats to behave differently depending on the interpolation context. When you use formatlist, you want to treat the entire list as unknown if it contains any unknown value, but for element, you only care about if a specific value in the list is unknown or not.
I did a test where I introduced a new splat operator with the only difference being how it is treated if the list contains unknown values. It solves the problem, but having two splat operators is kind of confusing.
@mitchellh: Ideas?
Thanks for the report @kklipsch and thanks for taking the time to do a deep dive, @danabr!
To me, it seems like you want the treatment of unknown values in splats to behave differently depending on the interpolation context. When you use formatlist, you want to treat the entire list as unknown if it contains any unknown value, but for element, you only care about if a specific value in the list is unknown or not.
Yep I think this is the key insight. Switching this to core
since it's not a provider-level bug.
In the meantime, duplicating aws_volume_attachments
to avoid the usage of splat / element()
is a valid workaround.
Ok thanks. Unfortunately, for our use case that very quickly becomes unwieldy as we are doing 10s of nodes currently but want to be able to scale up to hundreds.
@kklipsch: If you are OK with running a patched terraform for a while, and you don't rely on the formatlist behavior anywhere, you can just comment out the three lines at https://github.com/hashicorp/terraform/blob/master/terraform/interpolate.go#L466, and compile terraform yourself.
@danabr and @phinze
I tried the work around by using terraform constructs in following manner but it did not help, Could you please share more details on the work around mentioned above "In the meantime, duplicating aws_volume_attachments to avoid the usage of splat / element() is a valid workaround."
resource "aws_instance" "appnodes" {
instance_type = "${var.flavor_name}"
ami = "${var.image_name}"
key_name = "${var.key_name}"
security_groups = ["${split(",", var.security_groups)}"]
availability_zone = "${var.availability_zone}"
user_data = "${file("mount.sh")}"
tags {
Name = "${var.app_name}-${format("%02d", 1)}"
}
}
resource "aws_volume_attachment" "ebsatt" {
device_name = "/dev/sdh"
volume_id = "${aws_ebs_volume.ebsvolumes.id}"
instance_id = "${aws_instance.appnodes.id}"
}
resource "aws_ebs_volume" "ebsvolumes" {
availability_zone = "${var.availability_zone}"
size = "${var.ebs_size}"
type = "${var.ebs_type}"
}
resource "aws_instance" "app-nodes" {
instance_type = "${var.flavor_name}"
ami = "${var.image_name}"
key_name = "${var.key_name}"
security_groups = ["${split(",", var.security_groups)}"]
availability_zone = "${var.availability_zone}"
user_data = "${file("mount.sh")}"
tags {
Name = "${var.app_name}-${format("%02d", 1)}"
}
}
resource "aws_volume_attachment" "ebs_att" {
device_name = "/dev/sdh"
volume_id = "${aws_ebs_volume.ebs-volumes.id}"
instance_id = "${aws_instance.app-nodes.id}"
}
resource "aws_ebs_volume" "ebs-volumes" {
availability_zone = "${var.availability_zone}"
size = "${var.ebs_size}"
type = "${var.ebs_type}"
}
@pdakhane: Just take kklipsch, example, but instead of using a "count" attribute of the aws_volume_attachment resource, create multiple aws_volume_attachment_resources referring directly to the instances and volumes. For example if you have three instances:
resource "aws_volume_attachment" "persistent_volume_attachment_0" {
device_name = "/dev/sdb"
instance_id = "${aws_instance.instance.0.id}"
volume_id = "${aws_ebs_volume.volume.0.id}"
}
resource "aws_volume_attachment" "persistent_volume_attachment_1" {
device_name = "/dev/sdb"
instance_id = "${aws_instance.instance.1.id}"
volume_id = "${aws_ebs_volume.volume.1.id}"
}
resource "aws_volume_attachment" "persistent_volume_attachment_2" {
device_name = "/dev/sdb"
instance_id = "${aws_instance.instance.2.id}"
volume_id = "${aws_ebs_volume.volume.2.id}"
}
This only works if you have a small number of nodes, though, and are OK to use the same number of instances in all environments.
@phinze pointed to this issue as potentially related to mine.
Here is my config (redacted for readability):
resource "aws_instance" "cockroach" {
tags {
Name = "${var.key_name}-${count.index}"
}
count = "${var.num_instances}"
...
}
resource "null_resource" "cockroach-runner" {
count = "${var.num_instances}"
connection {
...
host = "${element(aws_instance.cockroach.*.public_ip, count.index)}"
}
triggers {
instance_ids = "${element(aws_instance.cockroach.*.id, count.index)}"
}
provisioner "remote-exec" {
....
}
}
The basic idea is that every instance gets a "runner" attached that does binary deployment and other things. I'm using a null_resource to break a dependency cycle with ELB addresses used by the runner.
The first time I bring up an instance, everything works fine: each instance gets created, then the null_resource runs properly on each.
However, when I terminate an arbitrary instance through the EC2 console (eg: destroying instance 1), all null_resources get rerun.
Here's the log of terraform plan after terminating an instance:
~ aws_elb.elb
instances.#: "" => "<computed>"
+ aws_instance.cockroach.1
ami: "" => "ami-1c552a76"
availability_zone: "" => "us-east-1b"
ebs_block_device.#: "" => "<computed>"
ephemeral_block_device.#: "" => "<computed>"
instance_type: "" => "t2.medium"
key_name: "" => "cockroach-marc"
placement_group: "" => "<computed>"
private_dns: "" => "<computed>"
private_ip: "" => "<computed>"
public_dns: "" => "<computed>"
public_ip: "" => "<computed>"
root_block_device.#: "" => "<computed>"
security_groups.#: "" => "1"
security_groups.2129892981: "" => "cockroach-marc-security-group"
source_dest_check: "" => "1"
subnet_id: "" => "<computed>"
tags.#: "" => "1"
tags.Name: "" => "cockroach-marc-1"
tenancy: "" => "<computed>"
vpc_security_group_ids.#: "" => "<computed>"
-/+ null_resource.cockroach-runner.0
triggers.#: "1" => "<computed>" (forces new resource)
triggers.instance_ids: "i-21867290" => ""
-/+ null_resource.cockroach-runner.1
triggers.#: "1" => "<computed>" (forces new resource)
triggers.instance_ids: "i-fd85714c" => ""
-/+ null_resource.cockroach-runner.2
triggers.#: "1" => "<computed>" (forces new resource)
triggers.instance_ids: "i-20867291" => ""
I was expecting only "null_resource.cockroach-runner.1" to be updated, but it seems that 0 and 2 changed as well.
Re-titling this to indicate the nature of the core issue here. We'll get this looked at soon!
Just pinging here since we just ran into this issue as well.
Okay just consolidated a few other issue threads that were expressions of this bug into this one.
My apologies to all the users who have been hitting this - this is now in my list of top priority core bugs to get fixed soon.
As I alluded to with the re-title, this issue comes down to the fact that Terraform core is currently unaware that ${element(some.splat.*.reference)}
is a dependency on a single element from the splat. It simply sees "there is a splat reference in this interpolation" and therefore believes--incorrectly--that it needs to consider every element in the list when calculating whether or not the overall value of the interpolation is computed or not.
The most direct solution would be to "just make it work for element()
". In other words, add special-casing into Terraform's interpolation handling of splats that would look for a surrounding element()
and use the different logic for computed calculations if it is found.
This is probably not the right way to go as it is (a) difficult to implement "context-awareness" into that part of the codebase, and (b) a brittle solution that sets a bad precedent of special casing certain functions in the core.
Because of this, the core team thinks the best way forward is to add first-class list indexing into the interpolation language. This would promote the behavior of element()
to a language feature (likely square-bracket notation) and give the core interpolation code a rich enough context to be able to implement the correct computed-value scoping we need to fix this bug.
I've got a spike of first-class-indexing started, and I'll keep this thread updated with my progress.
:100:
@phinze thank you so much for the detailed response and the ongoing effort! :tada:
Thanks for the report @phinze - is there a WIP branch available to follow along?
Keen to see this one resolved. Quite limiting for those of using count with instances and template_file to generate userdata.
Does anyone know of a workaround?
@jen20 has WIP to address this issue. Stay tuned! :grinning:
:+1:
@jkinred I haven't tried, but from top of my head, the only workaround is to use provisioner's for now.
@jkinred it's case-specific, but we've been working around this issue by not using count on the template_file, and instead embedding node-specific vars in metadata and querying them at launch time using scripts that curl the cloud provider's metadata service. A lot this sort of thing:
runcmd:
- curl -s http://169.254.169.254/latest/meta-data/role >> /etc/environment
@jkinred I've come up with a workaround. We set the count on our user-data.cfg template files to a much higher number than we expect to have instances, and then only change the count of the aws_instance resource, not the count of the template files. The count of the template files never changes, so the dependent aws_instances are not recreated.
resource "aws_instance" "web" {
instance_type = "${var.instance_type}"
ami = "${var.aws_ami}"
count = "${var.instance_count}"
user_data = "${element(template_file.user-data-web.*.rendered, count.index)}"
}
resource "template_file" "user-data-web" {
template = "${file("templates/user-data.cfg")}"
vars { fqdn = "${var.short-env}-web-${count.index}.${var.vpc_domain}" }
count = "20"
}
This may or may not help in your situation, depending on what's in your user_data.
@billputer I think yours workaround is awesome!
@jen20 if it is possible, could you please think of backward compatibility. I mean that, will it be possible to increment/decrement size without resources being destroyed and created again, after your patch will be applied?
Thanks for all the suggestions! @billputer, great workaround and suitable for our use case.
Workaround, TODO and bug reference added!
Hi @phinze , any progress on this issues? The workaround if fine for template_file but not for EBS volumes. It doesn't make sense to overbuild storage as a workaround. We need to be able to use count to build instances and attach EBS volumes.
It is very possible this is fixed in 0.7 (as of writing, 0.7 is the "master" branch of Terraform). At the very least, we're a lot closer. 0.7 introduces first class lists so you no longer need element()
(we're keeping it for backwards compat though). In this case, you can now directly do listvar[count.index]
.
The internals of list access are very different from element()
which is a function that simply mimicked list access. Because it was a function, core was unable to realize you're referencing a single list element, as @phinze pointed out above. Its still very possible we don't make that distinction and this is still a bug, but the right architecture is in place now where we can fix it.
I'm going to keep this open as we should be in a strong place now where we can add a core test case to verify whether or not this works to prevent regressions. For anyone who wants to help: we'd need a test in context_apply_test.go
that reproduces this bug on 0.6 (check out a tag to get a confirmed failure). Whether it passes or fails having the test would be helpful. We can write it in time otherwise.
With the test in place the actual work can begin to fix it if it is failing, or if its passing it will prevent any regressions in the future.
Apologies this has taken so long, bringing first class lists/maps into Terraform has been a huge undertaking and has taken months of full time work.
Hello,
I'm trying to apply the workaround provided by @billputer
It seems to be very logical, but I don't know why, it's doesn't work for me.
When I try to scale by increasing the nb_instance variable, all my instances are recreated.
Do you know why ?
Thank you,
Pierre
variable "nb_instance" {default = "3"}
variable "build_number" {default = "16"}
resource "template_file" "config_filebeat" {
template = "${file("${path.module}/templates/provisioner/filebeat.cfg.tpl")}"
vars {
laas_project = "123456789"
}
}
resource "template_file" "config" {
count = 10
template = "${file("${path.module}/templates/provisioner/chef.cfg.tpl")}"
vars {
instance_name = "${format("myappone_terraform_pierre_%s_%02d",var.build_number,count.index + 1)}"
#for yaml convention, we need to add some tabulation (4 spaces) in front of each line of the validation key.
validationkey = "${replace(replace(file("${path.module}/keys/validation.pem"), "/^/" , " "), "\n", "\n ")}"
secret_key = "${file("${path.module}/keys/chef_secret")}"
filebeat_config_file = "${base64encode(template_file.config_filebeat.rendered)}"
}
}
resource "template_cloudinit_config" "init_server" {
count = 10
gzip = true
base64_encode = false
part {
filename = "init.cfg"
content_type = "text/cloud-config"
content = "${element(template_file.config.*.rendered, count.index)}"
}
}
# Create a web server
resource "openstack_compute_instance_v2" "myapp" {
count = "${var.nb_instance}"
name = "${format("myappAS_%02d", count.index + 1)}"
flavor_name = "m1.medium"
image_name = "RED_HAT_6.2"
key_pair = "pierre"
security_groups = [
"op-default",
"myapp_as"
]
network {
name = "Internal_Internet_Protected"
}
network {
name = "Internal_Network_protected"
}
user_data = "${element(template_cloudinit_config.init_server.*.rendered, count.index)}"
}
Here's another repro case w/ EFS resources (i.e. that bug still exists in 0.7 ddc0f4cdb0c5b5fb848ac4856e9bcf32cc55ec0f
):
https://gist.github.com/radeksimko/869c266bc8572c8f190059e65f12dee3
@radeksimko Thanks for the repro, I think I know where this might be.
Is using ignore_changes
for lifecycle
on the EBS and EBS attachment a reasonable solution? The below example seems to work (increasing and decreasing the nodes count). I just wanted a second opinion (pros/cons) before moving forward with it.
Example:
resource "aws_instance" "database" {
ami = "${var.amis}"
instance_type = "${var.instance_type}"
subnet_id = "${element(split(",", var.private_subnet_ids), count.index)}"
key_name = "${var.key_name}"
vpc_security_group_ids = ["${aws_security_group.database.id}"]
disable_api_termination = true
count = "${var.nodes}"
tags { Name = "${var.name}${format("%02d", count.index + 1)}" }
lifecycle { create_before_destroy = true }
}
resource "aws_ebs_volume" "database_mysql_vol" {
availability_zone = "${element(aws_instance.database.*.availability_zone, count.index)}"
iops = 1000
size = 500
type = "io1"
count = "${var.nodes}"
lifecycle {
ignore_changes = ["availability_zone"]
}
tags { Name = "${var.name}-mysql" }
}
resource "aws_ebs_volume" "database_binlog_vol" {
availability_zone = "${element(aws_instance.database.*.availability_zone, count.index)}"
size = 50
type = "gp2"
count = "${var.nodes}"
lifecycle {
ignore_changes = ["availability_zone"]
}
tags { Name = "${var.name}-binlog" }
}
resource "aws_volume_attachment" "mysql_vol_attachment" {
device_name = "/dev/sdf"
instance_id = "${element(aws_instance.database.*.id, count.index)}"
volume_id = "${element(aws_ebs_volume.database_mysql_vol.*.id, count.index)}"
count = "${var.nodes}"
lifecycle {
ignore_changes = ["instance_id", "volume_id"]
}
}
resource "aws_volume_attachment" "mysql_binlog_attachment" {
device_name = "/dev/sdg"
instance_id = "${element(aws_instance.database.*.id, count.index)}"
volume_id = "${element(aws_ebs_volume.database_binlog_vol.*.id, count.index)}"
count = "${var.nodes}"
lifecycle {
ignore_changes = ["instance_id", "volume_id"]
}
}
I ran into this issue when bumping count (using openstack), running terraform (v0.7.1-dev), how can I test if the new list interpolation syntax fixes this, how am I to replace the element(foo.*.id, count.inex) here?
@theanalyst new syntax means using: ${foo.*.id[count.index]) instead element() in your case. New syntax is working, but result is the same with destroying operations.
@serjs yeah I figured, I changed the stuff to new syntax and still ran into issues
+1 to changing to new list interpolation syntax and issue persisting and forcing a new resource.
resource "aws_volume_attachment" "SLRS_Data" {
count = 3
device_name = "/dev/sdf"
volume_id = "${aws_ebs_volume.SLRS_Data._.id[count.index]}"
instance_id = "${aws_instance.SLRS._.id[count.index]}"
}
+1 I have issue using openstack and both 'element' and '[]' syntax.
+1 I have same issue, tried ignore_changes, ${foo.*.id[count.index]) and element(). they did not work.
Same issue - new syntax didn't work for me either.
I've submitted a fix and a test for the bug which causes a resource with an attribute containing ${list[idx]}
to be re-created when list
contains an uncomputed element, even if idx
references a known element.
Note that, as of now, this does not fix the closely related ${element(list,idx)}
bug.
Any update on this issue? I think requiring the bracket notation/indexing format is a fair and logical fix, but it'd be great to know a timeline around this as the current bug can have some relatively harmful results.
Bracket notation/indexing format is not compatible with math operations, eg. something.*.id[count.index % 3]
.
@matti for this behaviour at the moment you can use the element
function - ${element(something.*.id, count.index)
- this will wrap.
@jen20 but that does not work because of the bug thats open in this thread...? not sure what you are saying.
I was referencing to the [] workaround that is coming (?) is not compatible with other stuff.
@matti interesting! Was unaware of that limitation. I tried and got something like this (formatting mine):
* node doesn't support evaluation: *ast.Arithmetic{
Op:2,
Exprs: []ast.Node{
*ast.VariableAccess{
Name:"var.count",
Posx:ast.Pos{Column:20, Line:1}
},
*ast.VariableAccess{
Name:"count.index",
Posx:ast.Pos{Column:31, Line:1}
}
},
Posx:ast.Pos{Column:20, Line:1}
} in: ${var.instance_ids[var.count - count.index]}-baz
@jen20 @matti do you know if is there an open issue anywhere for this particular problem? I searched for terms in the message above both in hashicorp/terraform and hashicorp/hil but couldn't find one.
I'm not sure if that can be solved with the current templating engine. I think that the current templating engine is the root of all evil and results in clever workarounds summarized in here https://blog.gruntwork.io/terraform-tips-tricks-loops-if-statements-and-gotchas-f739bbae55f9#.ub1tfxsmg
As the variables can not contain interpolations (https://github.com/hashicorp/terraform/issues/4084) and there are no intermediate variables I'm starting to feel very cornered here.
The point of having count
in resources is currently very broken when it's not working with splats (deletes ALB attachments, EBS volumes etc)
Hey @matti there certainly are a number of issues related to count
, but I don't know that I agree with your view that the templating engine - do you mean HIL? - is the root in all cases. The issue here, for instance, is caused by Terraform making certain decisions about how changes in state should be treated, and, as such, the solution lies in Terraform-land, not HIL-land. The fix I've offered doesn't make any changes to the templating engine.
In any case, I think the issue you brought up - not being able to do math operations inside of array index operations, is problem distinct from this issue, and deserves its own issue. Do you know if there's an open GH issue for that particular problem? Do you get an error similar to the one I posted above, or a different one?
@maxenglander, Sorry I might be wrong about it. I'm just really tired of trying to do stuff nicely and then finding out waaay later that you have cornered yourself.
On the error: I get the same error. I don't think there is an issue anywhere, I tried to look for one.
@matti I opened a separate ticket for the issue you brought up, and a fix has been submitted.
Cool. So currently does [count.index] addressing work? Because for my ALB attachments it's sometimes failing..
On 14 Oct 2016, at 9.31, Max Englander [email protected] wrote:
@matti I opened a separate ticket for the issue you brought up, and a fix has been submitted.
โ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
@matti I can't say without knowing the specifics of your setup. However, once the math fix makes into Terraform, you'll be able to do myresource[var.count - count.index - 1]
.
Providing a minimal in-memory test case to work off:
resource "null_resource" "foo" {
count = 2
}
resource "null_resource" "bar" {
count = 2
triggers {
key = "${element(null_resource.foo.*.id, count.index)}"
}
}
output "bar_0" {
value = "${null_resource.bar.0.id}"
}
after the following:
$ terraform apply
$ terraform taint null_resource.foo.1
a plan
shows null_resource.bar.0
being replaced, and an apply
shows the output of bar_0
has changed.
@jbardin could you verify that the samme happens when changing to
triggers {
key = "${null_resource.foo.*.id[count.index]}"
}
ref @mitchellh comment 13th may?
@sigmunau,
Yes, it's the same behavior with direct indexing as well. While it might be a little easier to calculate which node is referenced with the index operation, we still need to add a way to get at this reference when building the graph.
While this isn't yet implemented, it is in principle possible for the interpolation language to deal with this properly for indexing with [ ... ]
because it has more context to work with.
Since the language evaluator doesn't know what a function does internally, it just assumes that if any function argument is "unknown" then the function can't be called, and so its result is also "unknown". This then combines with the rule that any list containing one computed element is itself entirely computed, creating the behavior folks are hitting here.
But the evaluator does know what [ ... ]
does, so it can in principle have its own special behavior for "unknown" where it considers only the value of the specific element being addressed, rather than checking the whole list for unknown values. Terraform's responsibility then is to just make sure that only the appropriate list elements are marked as "computed".
The part that makes this a little tricky is that we mark a partially-unknown list as being entirely unknown at variable-access time, so by the time we get to the index evaluator we've already lost all of the detail about what exactly was unknown. To implement this will require re-organizing this logic a little, I think as follows:
VariableAccess
to return a partially-unknown collection, rather than marking it as entirely unknown like today.visit
that says that encountering any unknown value at all immediately terminates evaluation. Having unknown values pass around should be fine as long as the above constraints are enforced.@apparentlymart I'd actually go about it a different way since by the time [in Terraform] you're evaluating, graph construction (and therefore ordering) is too late.
First, you're absolutely right this will only work with [...]
. I had no intentions of making this work with any function calls (including element
).
However, I wouldn't modify evaluation at all. The logic I want to spike out is doing an early pass during graph construction (a new transform probably) that does static analysis over the ASTs of all interpolations and looks for ast.Index
operations. You then want to check for a very specific case:
The Node
(in node[key]
syntax) in the Index is a resource reference. We can check this.
The Key
(in node[key]
) is _not_ computed. We can check this by trying to evaluate and making sure it goes to something other than TypeUnknown
.
If this is true, we evaluate Key
, and use that to create the proper reference to the resource referenced in Node
. This can happen at graph construction time and solves a huge variety of issues (since for example count.index
is currently not allowed to be computed.
There still remains the question of what to do for computed index access, and I think a core feature for this isn't quite available yet but I'm curious on building it in. What I'd like to introduce is some concept of a runtime graph edge that dag
understands so that Walk
just does the right thing. I'm not sure if that's possible with the current API but I'm looking into it.
The idea is: the above static analysis is also done one final time during the Eval step (so there should be no TypeUnknowns anymore unless its plan in which case it should be unknown anyways probably). If we detect any NEW references we didn't have previously, introduce a _runtime edge_ into the graph that connects the two references. Then call some special Wait
type function to make sure your dependencies are once again checked if they're evaluated. If they are, Wait
is basically a no-op. If they aren't, then you just block until they are.
I think this solves everything but clearly requires a lot of new work, hence we're pushing it off to 0.9.
I think I must be missing the case where having Terraform deal with this is necessary, vs. just adjusting the HIL behavior as I mentioned.
Considering the following config as an example:
variable "instance_count" {}
resource "aws_instance" "foo" {
count = "${var.instance_count}"
# etc, etc...
}
resource "null_resource" "foo" {
count = "${var.instance_count}"
triggers {
instance_id = "${aws_instance.foo.*.id[count.index]}"
}
provisioner "remote-exec" {
# ...
}
}
For the sake of what we're discussing, I think the desired behavior is that I should be able to change the value of var.instance_count
and have Terraform understand only run the null_resource
provisioner for any instances that were added.
Let's assume that in our initial state we have instance_count
set to 2 and the two instances already exist, so aws_instance.foo.*.id
evaluates to ["i-abc", "i-123"]
and we also already have the two corresponding null_resource
instances.
Now I increment instance_count
to 3 and run terraform plan
. During the walk, Terraform will visit the aws_instance.foo
node, dynamic-expand it, and then notice that aws_instance.foo.2
needs to be created, and so aws_instance.foo.*.id
is written into the scope as ["i-abc", "i-123", <unknown>]
Continuing the plan walk, Terraform will then visit null_resource.foo
and dynamic-expand that. Terraform will notice that two of the three desired instances already exist in state, but will evaluate ${aws_instance.foo.*.id[count.index]}
for them and find concrete values that are unchanged from what's in the state, and thus generate no diff, due to the adjusted HIL evaluation rules I posted before.
It will then generate the "create" diff for null_resource.foo.2
and evaluate ${aws_instance.foo.*.id[count.index]}
. For this it will get "unknown" and thus Terraform will mark this attribute as <computed>
in the create diff.
During apply, things proceed as expected: there's no diff for the pre-existing aws_instance
or null_resource
instances, so Terraform leaves them alone. The apply walk will visit aws_instance.foo.2
before it visits null_resource.foo.2
, so the previously-unknown value will be a concrete string by the time we get to creating the null resource.
That seems to get us what we need as far as I can tell. We just need to make sure Terraform can set a partially-unknown value at index 2 of in aws_instance.foo.*.id
and then the rest just falls out logically as a result of the adjusted HIL evaluation rules, without the need for any static analysis.
I didn't cover it in my ruleset above, but computed indices can also be handled by a further rule that if the i
in n[i]
is unknown then the result is also unknown.
I feel like I must be missing an edge-case here. :grinning:
@apparentlymart Thats right, too, actually. I went down that path way back and wasn't able to get it to work so I think there is a missing edge case but on the surface its worth trying too. It probably doesn't even need much HIL change, since HIL only returns unknown if it encounters an unknown. If we just build the list out (rather than making whole list TypeUnknown), then it probably just works for many cases.
I have recently encountered this issue. I have a set of hosts that require fixed IPs, and certain flavor sizes, so I created a list to define them statically:
static_infra = [
{ "ip" = "10.1.1.55", "service" = "av1-service-030-srv01", "type" = "small" },
{ "ip" = "10.1.1.56", "service" = "av1-service-050-srv01", "type" = "small" },
{ "ip" = "10.1.1.57", "service" = "av1-service-070-srv01", "type" = "medium" },
{ "ip" = "10.1.1.58", "service" = "av1-service-090-srv01", "type" = "large" },
]
fyi, "type" just allows me to look up a flavor name in a map, so not important here but figure i'd explain why i wrote it in the comment.
Anyway, my .tf file looks something like:
resource "openstack_compute_instance_v2" "instances" {
count = "${length(var.static_infra)}"
region = "${var.region}"
name = "${format("${var.hostname_prefix}-%s", lookup(var.static_infra[count.index], "service"))}"
flavor_name = "${lookup(var.flavors, lookup(var.static_infra[count.index], "type"))}"
image_name = "${var.image}"
user_data = "${data.template_file.cloud-config.rendered}"
network {
name = "${var.network_name}"
fixed_ip_v4 = "${lookup(var.static_infra[count.index], "ip")}"
}
}
If i remove one of the entries in the middle of the static_infra list, then various things get recomputed or new resources are forced, munging hostnames and ips in the statefile, because the count iteration throws off the whole thing since the count gets put onto the end of the resource name/key like "openstack_compute_instance_v2.instances.3"
Maybe an interim workaround until a real fix is to allow an empty item in the list to be skipped over and allow count to be incremented to the next number, thereby skipping that resource and keeping the resulting list the same length? I would hope the 'missing' resource is thereby marked for destruction with the count var iterating correctly and not changing other resource's state.
like this:
static_infra = [
{ "ip" = "10.1.1.55", "service" = "av1-service-030-srv01", "type" = "small" },
{ "ip" = "10.1.1.56", "service" = "av1-service-050-srv01", "type" = "small" },
{ },
{ "ip" = "10.1.1.58", "service" = "av1-service-090-srv01", "type" = "large" },
]
Not sure how this could be implemented, however, or if it in reality even works the way i am imagining.
I suppose this would still fail when adding a new resource in the middle of the list, though, so any additions could only be appended onto the list. Over time, you'd potentially build up cruft of empty { }
scattered throughout the list depending on where/when you need to remove entries.
This is a core issue, I wish it had more priority than adding new features :(
Hello, I am currently using terraform version 0.8.6.
I took the Azure example and added a count parameter to the objects that needed them
Here is the .tf file
variable "counts" {}
provider "azurerm" {
subscription_id = "<removed>"
client_id = "<removed>"
client_secret = "<removed>"
tenant_id = "<removed>"
}
resource "azurerm_resource_group" "test" {
name = "acctestrg"
location = "West US"
}
resource "azurerm_virtual_network" "test" {
name = "acctvn"
address_space = ["10.0.0.0/16"]
location = "West US"
resource_group_name = "${azurerm_resource_group.test.name}"
}
resource "azurerm_subnet" "test" {
name = "acctsub"
resource_group_name = "${azurerm_resource_group.test.name}"
virtual_network_name = "${azurerm_virtual_network.test.name}"
address_prefix = "10.0.2.0/24"
}
resource "azurerm_network_interface" "test" {
count = "${var.counts}"
name = "acctni${count.index}"
location = "West US"
resource_group_name = "${azurerm_resource_group.test.name}"
ip_configuration {
name = "testconfiguration1"
subnet_id = "${azurerm_subnet.test.id}"
private_ip_address_allocation = "dynamic"
}
}
resource "azurerm_storage_account" "test" {
count = "${var.counts}"
name = "accsai${count.index}"
resource_group_name = "${azurerm_resource_group.test.name}"
location = "westus"
account_type = "Standard_LRS"
tags {
environment = "staging"
}
}
resource "azurerm_storage_container" "test" {
count = "${var.counts}"
name = "vhds"
resource_group_name = "${azurerm_resource_group.test.name}"
storage_account_name = "${azurerm_storage_account.test.*.name[count.index]}"
container_access_type = "private"
}
resource "azurerm_virtual_machine" "test" {
count = "${var.counts}"
name = "acctvm${count.index}"
location = "West US"
resource_group_name = "${azurerm_resource_group.test.name}"
network_interface_ids = ["${azurerm_network_interface.test.*.id[count.index]}"]
vm_size = "Standard_A0"
storage_image_reference {
publisher = "Canonical"
offer = "UbuntuServer"
sku = "14.04.2-LTS"
version = "latest"
}
storage_os_disk {
name = "myosdisk1"
vhd_uri = "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd"
caching = "ReadWrite"
create_option = "FromImage"
}
os_profile {
computer_name = "hostname${count.index}"
admin_username = "testadmin"
admin_password = "Password1234!"
}
os_profile_linux_config {
disable_password_authentication = false
}
tags {
environment = "staging"
}
}
I started with a count = 1, I get all 7 resources created without issues
Apply complete! Resources: 7 added, 0 changed, 0 destroyed.
After that, I increase the count to 2
.
The planning shows that I will get the following: Plan: 5 to add, 0 to change, 1 to destroy.
It seems that the os_disk is at fault here, more specifically the vhd_uri. I am using the new bracketing format
instead of the element
one. This is causing us issues in production where we are loosing VM when we try to augment the capacity of a cluster.
For the Azure or Terraform experts, anything I could do until this issue gets resolved to prevent my resources from being deleted? I need a storage Account PER VMs so reusing the same one is not an option.
storage_os_disk.#: "1" => "1"
storage_os_disk.730729623.create_option: "FromImage" => ""
storage_os_disk.730729623.disk_size_gb: "0" => "0"
storage_os_disk.730729623.image_uri: "" => ""
storage_os_disk.730729623.name: "myosdisk1" => ""
storage_os_disk.730729623.os_type: "" => ""
storage_os_disk.730729623.vhd_uri: "https://accsai0.blob.core.windows.net/vhds/myosdisk1.vhd" => "" (forces new resource)
storage_os_disk.~4275591411.caching: "" => "ReadWrite"
storage_os_disk.~4275591411.create_option: "" => "FromImage"
storage_os_disk.~4275591411.disk_size_gb: "" => ""
storage_os_disk.~4275591411.image_uri: "" => ""
storage_os_disk.~4275591411.name: "" => "myosdisk1"
storage_os_disk.~4275591411.os_type: "" => ""
storage_os_disk.~4275591411.vhd_uri: "" => "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd" (forces new resource)
The full output
+ azurerm_network_interface.test.1
applied_dns_servers.#: "<computed>"
dns_servers.#: "<computed>"
enable_ip_forwarding: "false"
internal_dns_name_label: "<computed>"
internal_fqdn: "<computed>"
ip_configuration.#: "1"
ip_configuration.2250438086.load_balancer_backend_address_pools_ids.#: "<computed>"
ip_configuration.2250438086.load_balancer_inbound_nat_rules_ids.#: "<computed>"
ip_configuration.2250438086.name: "testconfiguration1"
ip_configuration.2250438086.private_ip_address: "<computed>"
ip_configuration.2250438086.private_ip_address_allocation: "dynamic"
ip_configuration.2250438086.public_ip_address_id: "<computed>"
ip_configuration.2250438086.subnet_id: "/subscriptions/f0dc697c-673c-4fb0-8852-7f47533b4dd6/resourceGroups/acctestrg/providers/Microsoft.Network/virtualNetworks/acctvn/subnets/acctsub"
location: "westus"
mac_address: "<computed>"
name: "acctni1"
network_security_group_id: "<computed>"
private_ip_address: "<computed>"
resource_group_name: "acctestrg"
tags.%: "<computed>"
virtual_machine_id: "<computed>"
+ azurerm_storage_account.test.1
access_tier: "<computed>"
account_kind: "Storage"
account_type: "Standard_LRS"
location: "westus"
name: "accsai1"
primary_access_key: "<computed>"
primary_blob_endpoint: "<computed>"
primary_file_endpoint: "<computed>"
primary_location: "<computed>"
primary_queue_endpoint: "<computed>"
primary_table_endpoint: "<computed>"
resource_group_name: "acctestrg"
secondary_access_key: "<computed>"
secondary_blob_endpoint: "<computed>"
secondary_location: "<computed>"
secondary_queue_endpoint: "<computed>"
secondary_table_endpoint: "<computed>"
tags.%: "1"
tags.environment: "staging"
+ azurerm_storage_container.test.1
container_access_type: "private"
name: "vhds"
properties.%: "<computed>"
resource_group_name: "acctestrg"
storage_account_name: "accsai1"
-/+ azurerm_virtual_machine.test.0
availability_set_id: "" => "<computed>"
delete_data_disks_on_termination: "false" => "false"
delete_os_disk_on_termination: "false" => "false"
license_type: "" => "<computed>"
location: "westus" => "westus"
name: "acctvm0" => "acctvm0"
network_interface_ids.#: "1" => "<computed>"
os_profile.#: "1" => "1"
os_profile.2123949718.admin_password: "" => "Password1234!"
os_profile.2123949718.admin_username: "testadmin" => "testadmin"
os_profile.2123949718.computer_name: "hostname0" => "hostname0"
os_profile.2123949718.custom_data: "" => "<computed>"
os_profile_linux_config.#: "1" => "1"
os_profile_linux_config.2972667452.disable_password_authentication: "false" => "false"
os_profile_linux_config.2972667452.ssh_keys.#: "0" => "0"
resource_group_name: "acctestrg" => "acctestrg"
storage_image_reference.#: "1" => "1"
storage_image_reference.1807630748.offer: "UbuntuServer" => "UbuntuServer"
storage_image_reference.1807630748.publisher: "Canonical" => "Canonical"
storage_image_reference.1807630748.sku: "14.04.2-LTS" => "14.04.2-LTS"
storage_image_reference.1807630748.version: "latest" => "latest"
storage_os_disk.#: "1" => "1"
storage_os_disk.730729623.create_option: "FromImage" => ""
storage_os_disk.730729623.disk_size_gb: "0" => "0"
storage_os_disk.730729623.image_uri: "" => ""
storage_os_disk.730729623.name: "myosdisk1" => ""
storage_os_disk.730729623.os_type: "" => ""
storage_os_disk.730729623.vhd_uri: "https://accsai0.blob.core.windows.net/vhds/myosdisk1.vhd" => "" (forces new resource)
storage_os_disk.~4275591411.caching: "" => "ReadWrite"
storage_os_disk.~4275591411.create_option: "" => "FromImage"
storage_os_disk.~4275591411.disk_size_gb: "" => ""
storage_os_disk.~4275591411.image_uri: "" => ""
storage_os_disk.~4275591411.name: "" => "myosdisk1"
storage_os_disk.~4275591411.os_type: "" => ""
storage_os_disk.~4275591411.vhd_uri: "" => "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd" (forces new resource)
tags.%: "1" => "1"
tags.environment: "staging" => "staging"
vm_size: "Standard_A0" => "Standard_A0"
+ azurerm_virtual_machine.test.1
availability_set_id: "<computed>"
delete_data_disks_on_termination: "false"
delete_os_disk_on_termination: "false"
license_type: "<computed>"
location: "westus"
name: "acctvm1"
network_interface_ids.#: "<computed>"
os_profile.#: "1"
os_profile.1736693719.admin_password: "Password1234!"
os_profile.1736693719.admin_username: "testadmin"
os_profile.1736693719.computer_name: "hostname1"
os_profile.1736693719.custom_data: "<computed>"
os_profile_linux_config.#: "1"
os_profile_linux_config.2972667452.disable_password_authentication: "false"
os_profile_linux_config.2972667452.ssh_keys.#: "0"
resource_group_name: "acctestrg"
storage_image_reference.#: "1"
storage_image_reference.1807630748.offer: "UbuntuServer"
storage_image_reference.1807630748.publisher: "Canonical"
storage_image_reference.1807630748.sku: "14.04.2-LTS"
storage_image_reference.1807630748.version: "latest"
storage_os_disk.#: "1"
storage_os_disk.~4275591411.caching: "ReadWrite"
storage_os_disk.~4275591411.create_option: "FromImage"
storage_os_disk.~4275591411.disk_size_gb: ""
storage_os_disk.~4275591411.image_uri: ""
storage_os_disk.~4275591411.name: "myosdisk1"
storage_os_disk.~4275591411.os_type: ""
storage_os_disk.~4275591411.vhd_uri: "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd"
tags.%: "1"
tags.environment: "staging"
vm_size: "Standard_A0"
@djsly I had a somewhat similar issue and was able to use a lifecycle block in the resource with ignore_changes to work around, Maybe that would work for you here if you set it to storage_os_disk? https://www.terraform.io/docs/configuration/resources.html
thanks @bpoland , this worked indeed perfectly. I also added the os_profile section since the admin_password was also detected as a change. nothing major but cleaner.
lifecycle {
ignore_changes = ["storage_os_disk", "os_profile"]
}
thanks a lot!
No worries. Just be careful -- if those ever change and it SHOULD cause the resources to get deleted and recreated, it won't happen.
And let's be clear, this is a workaround not a solution to the problem in this thread. It seems like there was some discussion late last year about possible fixes, would love to see this get solved correctly.
@mitchellh In #8595 you said you had plans for having a fix ready for 0.9. Does this mean this is an incompatible change? Also, is there some branch with the work in progress we could look at?
You also said that #8595 isn't complete and has unintended side effects. Can you elaborate on that? The incompleteness previously mentioned in that PR is about modules interactions and it seems to have been fixed later.
I care to ask because this can be very problematic when trying to increase the cluster size behind a load balancer, and having the load balancer reconstructed and thus unavailable for a relatively long period of time (#8684). We absolutely need this fixed and are available to help to fix that issue. Workarounds don't work well in our case because we have a fully automated process.
@jen20 it seems my last comment unassigned you from this issue. This was unintended, please fix if you can ๐จ
This issue is a huge blocker for creating a cluster of nodes that are backed by EBS. Right now, I see people copying and pasting resources multiple times, which is error prone and totally defeats the point of having the count
attribute. Given Terraform's lack of proper "looping", it's really important for the count
construct to work without this bug. In the end, we had to write our own pre-processor to generate TF files to keep things DRY.
Note that since terraform 0.8.X, the workaround with manually duplicating resources has an additional caveat. If you destroy a resource manually, you will run into an error like this next time you run terraform plan
:
````
I guess it is a improvement that terraform now has better dependency tracking. It does make this issue more painful to work around, though.
I've encountered another issue with the workaround of duplicating resources in 0.8.8 and 0.9.2.
Consider the following setup.
````
resource "aws_instance" "node" {
count = 3
...
}
resource "aws_ebs_volume" "node-ebs" {
count = 3
...
}
resource "aws_volume_attachment" "node-attach-0" {
device_name = "/dev/xvdh"
volume_id = "${aws_ebs_volume.node-ebs.0.id}"
instance_id = "${aws_instance.node.0.id}"
}
resource "aws_volume_attachment" "node-attach-1" {
device_name = "/dev/xvdh"
volume_id = "${aws_ebs_volume.node-ebs.1.id}"
instance_id = "${aws_instance.node.1.id}"
}
resource "aws_volume_attachment" "node-attach-2" {
device_name = "/dev/xvdh"
volume_id = "${aws_ebs_volume.node-ebs.2.id}"
instance_id = "${aws_instance.node.2.id}"
}
````
Now, if I terraform taint aws_instance.node.0
, terraform taint aws_instance.node.1
, and terraform taint aws_instance.node.2
, terraform will only recreate the corresponding volume attachments for the first instance. The plan will be:
-/+ aws_instance.node.0 (tainted)
-/+ aws_instance.node.1 (tainted)
-/+ aws_instance.node.2 (tainted)
-/+ aws_instance.node-ebs.0
instance_id: "i-aaaabbbbccccdddde" => "${aws_instance.node.0.id}" (forces new resource)
Hence, we are missing the recreation of aws_instance.node-ebs.1
and aws_instance.node-ebs.2
.
With all the issues related to count
, I start to think that perhaps the sanest option is to go in the same direction as @kishorenc and stop using count
and instead use a preprocessor.
Has been an issue for me in 0.9.2 to manage my aws_s3_buckets (tried to remove the 1st one, so all were going to be replaced see #13724).
What I'm wondering is if there's a way for the terraform core to use the id instead of the count index in the extension?
For example, instead of calling the resource aws_s3_bucket.mybuckets.${count.index}
, calling it aws_s3_bucket.mybuckets.${id}
?
The id
attribute is unique inside a resource type and used as a reference when calling other resources, so could work as an identifier here.
Hi @aerostitch!
The issue of the count indices becoming "misaligned" when you add/remove items from the middle of your list is a separate but related issue to this one. The likely solution to this will be a foreach
attribute which can be used instead of count
to get a result like you're looking for, but we need to make some foundational configuration language changes first before that will be usable. This is likely to arrive in a future version, but we need to do some more internal design work first.
OK. Thanks for the answer @apparentlymart. I thought that leveraging the id
attribute could have been an easier fix, but a foreach could become handy indeed! :)
Should I reopen my original issue in this case?
@aerostitch the problem and intended (high-level) solution is already known, so I think we can leave your other issue closed though I do appreciate you taking the time to open it, and to hunt out this other similar issue!
Is there any progress update on this issue? I am experiencing this problem with a combination of aws_instance
and aws_alb_target_group_attachment
, whereby all target group attachments are destroyed when only two aws instances need creation.
As you can imagine, removing all web nodes from a production environment's load balancer is a pretty bad outcome when you're trying to make a partial change.
Partial plan output;
-/+ aws_alb_target_group_attachment.web_tga.0
port: "80" => "80"
target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq"
target_id: "i-aaaa" => "${aws_instance.web.*.id[count.index]}" (forces new resource)
-/+ aws_alb_target_group_attachment.web_tga.1
port: "80" => "80"
target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq"
target_id: "i-bbbb" => "${aws_instance.web.*.id[count.index]}" (forces new resource)
-/+ aws_alb_target_group_attachment.web_tga.2
port: "80" => "80"
target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq"
target_id: "i-cccc" => "${aws_instance.web.*.id[count.index]}" (forces new resource)
-/+ aws_alb_target_group_attachment.web_tga.3
port: "80" => "80"
target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/production-alb-web-tg/qqqq"
target_id: "i-dddd" => "${aws_instance.web.*.id[count.index]}" (forces new resource)
+ aws_instance.web.1
[...omitted]
+ aws_instance.web.3
[...omitted]
Hi everyone! Sorry this one sat here for so long.
Since this has been open for a long time and things have changed a bunch throughout its life, I want to add some context to close out this great, multi-year discussion:
When this issue was originally filed, Terraform's support for lists was rather rudimentary, and we had a function called element
that served to extract a particular element from a list. From the perspective of the interpolation language, this was just a function like any other and so the language was conservative and assumed that any unknown values in the list had to produce an unknown result.
This interacted poorly with how the splat syntax deals with new instances created when count
is increased, since the elements for the new instances were marked as unknown until the apply
completed, and thus caused the whole set to be treated as computed.
Back in 0.7 we added a first-class indexing operator using brackets, like var.foo[1]
which then gave the interpolation language an awareness of indexing. This wasn't enough to solve the problem, because there were still assumptions about lists either being wholly known or unknown, but it gave us an important building block to fix this.
In #14135 I reorganized how the interpolation language deals with unknowns so that partially-unknown collections (lists and maps) can be passed around _within a single interpolated string, with the final "partially unknown becomes fully unknown" mapping now done at the end, before the final result is returned to Terraform.
To get the benefit of this fix, it will be necessary to rewrite any existing configs using this pattern:
some_attr = "${element(some_other_resource.foo.*.some_attr, count.index)}"
The new form, with the first-class indexing operator, would be the following:
some_attr = "${some_other_resource.foo.*.some_attr[count.index]}"
Great addition! @ahl this should solve your problem.
After changing my terraform files to use this syntax it becomes impossible to recreate resources that are referenced by other resources when they disappear. I get this sort of error message:
Error refreshing state: 2 error(s) occurred:
* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.name (max 3) in:
${openstack_compute_instance_v2.worker.*.name[count.index]}
* data.template_file.workers_ansible: 1 error(s) occurred:
* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.name (max 3) in:
${openstack_compute_instance_v2.worker.*.name[count.index]}
That is I have a openstack_compute_instace_v2 resource and an output resource to generate an ansible inventory. This works fine, but if I delete one of the instances in the openstack portal terraform fails to refresh state and subsequently to recreate the node. Use element()
allows recreating the instance
Hi @sigmunau,
I think what you're seeing there is the bug that was fixed in #14098. It's merged in master but hasn't made it into a release yet.
I'm actually seeing the same issue that @sigmunau is experiencing using 0.9.6-dev (installed via go get -u github.com/hashicorp/terraform
). I'll try to isolate this.
Hmm sorry on second read it does look a bit different. I'll take a look.
@kreisys if you dig up some info it'd be cool to have a new top-level issue for this! Thanks ๐
@apparentlymart I managed to isolate this: #14521
From the description of #14098 my reported issue seem to be just that. Particularly my case does not involve modules or the [] syntax that @kreisys reports in #14521, so it doesn't seem that my issue is that. I'll see if I can reproduce using 0.9.6-dev
With 0.9.6-dev I get this:
Error refreshing state: 1 error(s) occurred:
* data.template_file.workers_ansible: 1 error(s) occurred:
* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.name (max 3) in:
${openstack_compute_instance_v2.worker.*.name[count.index]}
So the error message seems the same, but no longer repeated.
I created a new issue #14536 with details of my problem
I have a similar issue passing a splat list to a module, even if I access the elements inside the module with [] syntax. Does passing a list into a module impact whether its considered 'unknown' as a whole?
@apparentlymart I have the same issue as @dannytrigo :'(
Here is my sample :
.
โโโ inputs.tfvars
โโโ instance
โย ย โโโ main.tf
โโโ main.tf
main.tf
```.tf
variable "shortnames" {
type = "list"
}
module "generic_linux" {
source = "./instance/"
shortnames = "${var.shortnames}"
}
resource "aws_ebs_volume" "data" {
count = "${length(var.shortnames)}"
availability_zone = "eu-west-1a"
size = "5"
type = "gp2"
}
resource "aws_volume_attachment" "data_ebs_att" {
count = "${length(var.shortnames)}"
device_name = "/dev/sdc"
volume_id = "${aws_ebs_volume.data.*.id[count.index]}"
instance_id = "${module.generic_linux.instances_ids[count.index]}"
}
My module code is :
instance/main.tf
```.tf
variable "shortnames" {
type = "list"
description = "list of shortname"
}
resource "aws_instance" "instances" {
count = "${length(var.shortnames)}"
instance_type = "t2.micro"
key_name = "formation-hpc"
ami = "ami-xxxxxxxx"
vpc_security_group_ids = ["sg-xxxxxxxx"]
subnet_id = "subnet-xxxxxxxx"
tags {
Name = "${var.shortnames[count.index]}-${count.index}"
}
}
output "instances_ids" {
value = "${aws_instance.instances.*.id}"
}
Usage of instance_id = "${module.generic_linux.instances_ids[count.index]}"
force new resource :
-/+ aws_volume_attachment.data_ebs_att[0] (new resource required)
id: "vai-764663169" => <computed> (forces new resource)
device_name: "/dev/sdc" => "/dev/sdc"
force_detach: "" => <computed>
instance_id: "i-02759cd6c3590764f" => "${module.generic_linux.instances_ids[count.index]}" (forces new resource)
skip_destroy: "" => <computed>
volume_id: "vol-096815f03a512625c" => "vol-096815f03a512625c"
Is there a workaround to add nodes on an undefined size cluster based on a generic instance module without recreate each dependant resources ?
I do have the same issue. It's a bit shocking that this issue has been opened for 2 years and hasn't been fixed already. It'd be great if someone had a workaround for this.
I just happened to find a workaround-ish using lifecycle ignore_changes
. So, in @jnahelou's example, the amended terraform script would look like:
resource "aws_volume_attachment" "data_ebs_att" {
count = "${length(var.shortnames)}"
device_name = "/dev/sdc"
volume_id = "${aws_ebs_volume.data.*.id[count.index]}"
instance_id = "${module.generic_linux.instances_ids[count.index]}"
lifecycle {
ignore_changes: ["instance_id"]
}
}
I've done similar @loalf - but it feels as though that really shouldn't be necessary. Being that Terraform is intentionally declarative, I can see how it's ended up being this way.
In my case, I dynamically allocate instances in round-robin fashion to whatever variable number of subnets I have. BUT, when you change the number of subnets you have provisioned in a given VPC, it can dangerously trigger the recreation of your EC2's, so I've done something similar to what you have.
Check this:
resource "aws_instance" "ec2_instance" {
ami = "${lookup(var.aws_machine_images, "${var.ubuntu_version},${var.aws_region}")}"
instance_type = "${var.instance_type}"
count = "${var.total_instances}"
disable_api_termination = "${var.enable_instance_protection}"
# TODO: Fix this!
# Changing the number of subnets will trigger resource recreation
# ergo, the lifecycle manager
subnet_id = "${element(var.subnet_ids, count.index)}"
key_name = "${var.key_pair_id}"
vpc_security_group_ids = ["${var.security_group_ids}"]
associate_public_ip_address = "${var.associate_public_ip_address}"
root_block_device {
volume_type = "${var.root_volume_type}"
volume_size = "${var.root_volume_size_gb}"
delete_on_termination = "${var.storage_delete_on_termination}"
}
tags {
Name = "${var.total_instances > 1 ? format("%s-%02d-%s", var.instance_name, (count.index + 1), var.environment) : format("%s-%s", var.instance_name, var.environment)}"
ServerGroup = "${var.instance_name}-${var.environment}"
ServerName = "${var.instance_name}${count.index}"
Environment = "${var.environment}"
}
lifecycle {
ignore_changes = ["subnet_ids"]
}
}
@armenr would your solution create additional ec2 instances or remove extra ones when the count of subnets changes?
Or is this designed to always keep the number of ec2 instances static after initial creation?
Please check this option:
https://github.com/hashicorp/terraform/issues/14357
Instead of "element" use the [] option.
@misham - Good question! It will KEEP existing instances in the subnets where they reside, and add instances when you add a subnet to your list of subnets.
From what I recall, if I issue a destroy on a specific subnet, the EC2's get destroyed also.
Recently upgraded terraform
Terraform v0.11.7
+ provider.aws v1.18.0
+ provider.template v1.0.0
I've tried the syntax by @apparentlymart
# Create AWS Instances
resource "aws_instance" "web" {
count = "${var.count}"
ami = "${var.aws_ami}"
instance_type = "${var.aws_instance_type}"
associate_public_ip_address = "${var.aws_public_ip}"
...
}
# Attach Instances to Application Load Balancer
resource "aws_alb_target_group_attachment" "web" {
count = "${var.count}"
target_group_arn = "${var.aws_alb_target_group_arn}"
# target_id = "${element(aws_instance.web.*.id, count.index)}"
target_id = "${aws_instance.web.*.id[count.index]}"
port = "${var.aws_alb_target_group_port}"
}
However when I issue the command:
terraform plan --destroy --var-file=staging.tfvars -target=aws_alb_target_group_attachment.web[2] -target=aws_instance.web[2]
or just
terraform plan --destroy --var-file=staging.tfvars -target=aws_instance.web[2]
Terraform wants to destroy all aws_alb_target_group_attachments:
Terraform will perform the following actions:
- aws_alb_target_group_attachment.web[0]
- aws_alb_target_group_attachment.web[1]
- aws_alb_target_group_attachment.web[2]
- aws_alb_target_group_attachment.web[3]
- aws_instance.web[2]
Plan: 0 to add, 0 to change, 5 to destroy.
I can properly remove just the aws_alb_target_group_attachment:
terraform plan --destroy --var-file=staging.tfvars -target=aws_alb_target_group_attachment.web[2]
However, if I follow that up with a destroy of the instance it will want to remove all other remaining target group attachment still.
Is the approach wrong or is there still a bug here?
I still have same problem.
Example:
resource "aws_instance" "masters" {
count = "3"
ami = "${var.ami}"
}
resource "null_resource" "outindex" {
count = "3"
triggers {
cluster_instance = "${aws_instance.masters.*.id[count.index]}"
}
provisioner "local-exec" {
command = "date"
}
lifecycle { create_before_destroy = true }
}
When I try to update instance with new AMI for first resource it first updated ALL instances, then start execute null resource.
$ terraform plan -target="null_resource.outindex[0]"
-/+ aws_instance.masters[0] (new resource required)
ami: "ami-xxxx" => "ami-yyy" (forces new resource)
-/+ aws_instance.masters[1] (new resource required)
ami: "ami-xxxx" => "ami-yyy" (forces new resource)
-/+ aws_instance.masters[2] (new resource required)
ami: "ami-xxxx" => "ami-yyy" (forces new resource)
+ null_resource.outindex[0]
I expected to see only for first instance changes.
Environment:
$ terraform version
Terraform v0.11.11
+ provider.aws v1.57.0
+ provider.external v1.0.0
+ provider.local v1.1.0
+ provider.null v2.0.0
+ provider.template v2.0.0
OS: MacOS
UPDATE:
Currently to fix this I do:
$ terraform plan -target="null_resource.outindex[0]" -target="aws_instance.masters[0]"
-/+ aws_instance.masters[0] (new resource required)
ami: "ami-xxxx" => "ami-yyy" (forces new resource)
+ null_resource.outindex[0]
I see that this is closed but I'm still experiencing the same issue in v0.11.10. Is this expected?
I noticed this issue appears to still be happening in Terraform v0.11.14. Could this be because we are using a module under the hood to create the EC2 instances? Incrementing our count from 7 => 8 causes all volume attachments 1-7 to be re-attached.
module "elk-elasticsearch-node" {
source = "./app-cluster-static"
}
# ./app-cluster-static/main.yml
module "this" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "~> 1.19.0"
...
}
-/+ aws_volume_attachment.elk-elasticsearch-node[7] (new resource required)
id: "vai-2048368739" => <computed> (forces new resource)
device_name: "/dev/sdf" => "/dev/sdf"
instance_id: "i-08493c7837712a7ea" => "${module.elk-elasticsearch-node.instance_ids[count.index]}" (forces new resource)
volume_id: "vol-0c9a19f4f4ce4dfd6" => "vol-0c9a19f4f4ce4dfd6"
+ aws_volume_attachment.elk-elasticsearch-node[8]
id: <computed>
device_name: "/dev/sdf"
instance_id: "${module.elk-elasticsearch-node.instance_ids[count.index]}"
volume_id: "${aws_ebs_volume.elk-elasticsearch-node.*.id[count.index]}"
$ terraform -v
Terraform v0.11.14
+ provider.aws v2.13.0
+ provider.azuread v0.3.1
+ provider.null v2.1.0
+ provider.random v2.1.0
+ provider.template v2.1.0
+ provider.tls v2.0.1
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
Hi everyone! Sorry this one sat here for so long.
Since this has been open for a long time and things have changed a bunch throughout its life, I want to add some context to close out this great, multi-year discussion:
When this issue was originally filed, Terraform's support for lists was rather rudimentary, and we had a function called
element
that served to extract a particular element from a list. From the perspective of the interpolation language, this was just a function like any other and so the language was conservative and assumed that any unknown values in the list had to produce an unknown result.This interacted poorly with how the splat syntax deals with new instances created when
count
is increased, since the elements for the new instances were marked as unknown until theapply
completed, and thus caused the whole set to be treated as computed.Back in 0.7 we added a first-class indexing operator using brackets, like
var.foo[1]
which then gave the interpolation language an awareness of indexing. This wasn't enough to solve the problem, because there were still assumptions about lists either being wholly known or unknown, but it gave us an important building block to fix this.In #14135 I reorganized how the interpolation language deals with unknowns so that partially-unknown collections (lists and maps) can be passed around _within a single interpolated string, with the final "partially unknown becomes fully unknown" mapping now done at the end, before the final result is returned to Terraform.
To get the benefit of this fix, it will be necessary to rewrite any existing configs using this pattern:
The new form, with the first-class indexing operator, would be the following: