I'm attempting to create a module to package up a common pattern: deploying a NAT instance into a subnet. I'd like to be able to do the following:
Unfortunately, my attempts at doing so have been met with unexpected behavior in Terraform. I'm not sure if what I'm seeing is intended behavior or a bug. Repro cases are in the following repo:
https://github.com/mpage/tf_module_issues
This is against Terraform 0.5.3
The _missing_attr_ error looks like that one described in #1997 (see example).
The missing attribute should be fixed with a PR shortly.
The cycle I've gotten down to the following minimal case. It isn't valid Terraform, but it fails nonetheless. The cycle requires no state, even. A key thing seems to be the count. If I change count to 1 in the main module, the cycle goes away.
main.tf:
resource "aws_subnet" "pub_subnet" {
count = "2"
}
module "nat" {
source = "./nat"
aws_subnet_id = "${join(",", aws_subnet.pub_subnet.*.id)}"
}
nat/main.tf
variable "aws_subnet_id" {
description = "Subnet that will contain the nat instance"
}
resource "aws_instance" "nat" {
count = "${length(split(",", var.aws_subnet_id))}"
}
Updates
Ah, a count of not "1" forces the destroy nodes to stick around, which causes the cycle.
Ah, so I've narrowed the cycle down to this. I've only pasted the relevant bit. Read the comment.
// GraphNodeDestroyEdgeInclude impl.
func (n *GraphNodeConfigVariable) DestroyEdgeInclude(v dag.Vertex) bool {
// Only include this variable in a destroy edge if the source vertex
// "v" has a count dependency on this variable.
cv, ok := v.(GraphNodeCountDependent)
if !ok {
return false
}
...
It requires some thinking to see why this is the case. In cases of real cycles like this I always ask myself "what would a human do [to break the cycle]?" In this case, you'd probably want the "destroy" NAT instance node to use a _cached_ value of var.aws_subnet_id if available. And if not, to assume there are 0.
This would require some pretty serious graph majiggery.
A workaround for now is to just make the count a separate variable that is more static.
I believe I am also hitting this issue while doing automation through bastion host. Here is the code snippet .
resource "aws_instance" "deployment-host-nodes" {
ami = "${var.source_ami}"
availability_zone = "${element(split(",",var.availability_zones),count.index)}"
instance_type = "${var.deployment_host_type}"
count = "${var.deployment_host_count}"
#vpc_security_group_ids = ["${aws_security_group.deployment-host.id}", "${aws_vpc.main.default_security_group_id}"]
vpc_security_group_ids = ["${aws_security_group.deployment-host.id}", "${aws_security_group.internal.id}"]
key_name = "${aws_key_pair.deployer.key_name}"
associate_public_ip_address=false
#disable_api_termination = true
#Create a list of subnets from , separated strings
subnet_id = "${element(aws_subnet.deployment_host._.id,count.index)}"
root_block_device {
delete_on_termination = true
}
tags {
Name = "${var.short_name}-deployment-host-${format("%02d", count.index+1)}"
sshUser = "${var.ssh_username}"
role = "deployment-host"
keyPath = "${var.ssh_private_key}"
dc = "${var.datacenter}"
}
provisioner "remote-exec" {
_depends_on = ["${element(aws_instance.bastion_host..id,count.index)}"]*
connection {
bastion_host = "${element(aws_instance.bastion_host..public_ip,count.index)}"
bastion_port = 22
bastion_user = "${var.ssh_username}"
bastion_key_file = "${var.ssh_private_key}"
user = "${var.ssh_username}"
host = "${self.public_ip}"
key_file = "${var.ssh_private_key}"
}
inline = [
"ls -l",
"sleep 1"
]
}
}
Although I am using depends_on explicitly , I get the following error :
terraform plan
Error configuring: 2 error(s) occurred:
Curious to know if it's the same issue or a new one.
Terraform version is the current master :
terraform --version
Terraform v0.6.7-dev (965e59843741f3de587bcfe61b588a304262c734+CHANGES)
I believe I am seeing the same issue with EBS volumes and Instance Changes.
The EBS volumes cannot be destroyed because this causes a cycle error when the system flags a change to the instance, and a destroy of the EBS volume:
The change I made was adding count.index + 1, which causes an update to the instances:
EC2 MODULE:
variable "num_servers" {}
resource "aws_instance" "aws-ec2" {
ami = "${var.ami}"
count = "${var.num_servers}"
tags {
Name = "${format(var.instance_tag, count.index + 1)}"
}
}
output "aws_instance_ids" {
value = "${join(",", aws_instance.aws-ec2.*.id)}"
}
EBS MODULE:
variable "aws_instance_ids" {}
resource "aws_ebs_volume" "ebs-volume" {
count = "${length(split(",", var.aws_instance_ids))}"
availability_zone = "${var.availability}"
size = 100
tags {
Name = "${format("ebs-volume-%d", count.index)}"
}
}
resource "aws_volume_attachment" "ebs-att" {
count = "${length(split(",", var.aws_instance_ids))}"
device_name = "/dev/sdh"
volume_id = "${element(aws_ebs_volume.ebs-volume.*.id, count.index)}"
instance_id = "${element(split(",", var.aws_instance_ids), count.index)}"
force_detach = true
}
Error:
* Cycle: module.ebs.aws_ebs_volume.ebs-volume (destroy), module.ebs.var.aws_instance_ids, module.ebs.aws_volume_attachment.ebs-att (destroy), module.ec2.aws_instance.aws-ec2 (destroy), module.ec2.aws_instance.aws-ec2, module.ec2.output.aws_instance_ids
Error configuring: 1 error(s) occurred:
Just adding that I've seen this as well - in my instance, in one module, I had a resource count that depended on two variables, one supplied statically and one provided via a an output from another module, which in turn was computed. In fact, my use case was almost exactly how @mpage laid out the original post.
I'm not too sure if this is something that's being addressed as part of #4961 but if count remains a directive where the variable chain needs to be static, then perhaps a specific error should trigger for this, so that people don't try and chase down cycles that really don't exist.
I was bit by this issue while trying to separate a bunch of aws_ebs_volume resources into their own module. I'm glad this bug was filed; I don't think I would have figured out the answer myself.
In my case, the solution was to generate a count from the source data and pass it between modules using an input and an output. In this way, the count is dependent on a variable rather than a resource, avoiding a dependency loop.
Thanks @cbarbour, your solution worked for me also.
Here's my repro in case anyone is interested...
test.tf
provider "aws" {
region = "ap-southeast-2"
}
module "vpc" {
source = "./vpc"
cidrs = ["10.0.0.0/16", "10.1.0.0/16"]
}
module "igw" {
source = "./igw"
vpc_ids = "${module.vpc.vpc_ids}"
}
vpc/vpc.tf
variable "cidrs" { type = "list" }
resource "aws_vpc" "management" {
count = "${length(var.cidrs)}"
cidr_block = "${element(var.cidrs, count.index)}"
}
output "vpc_ids" {
value = ["${aws_vpc.management.*.id}"]
}
igw/igw.tf
variable vpc_ids { type = "list" }
resource "aws_internet_gateway" "internet_gateway" {
count = "${length(var.vpc_ids)}"
vpc_id = "${element(var.vpc_ids, count.index)}"
}
If I replace:
count = "${length(var.vpc_ids)}"
with
count = 2
it works without an error.
Any chance we'll see this one squashed soon? 😄
Cheers
Fotis
An update: this isn't quite ready to be fixed yet. We're heading in that direction with some major graph changes coming in for apply and destroy, but this requires major changes to plan which aren't on the docket for 0.8 yet. Still on my radar though.
Thanks so much @mitchellh, can totally appreciate the complexity of this issue 😄
Any update on the fix? I hit this bug today. I'm running 8.3.
As of 0.9 the situation has changed a little here:
In 0.9 we finished the graph-builder refactoring and changed the rules for count so that resource attributes can be referenced as long as they aren't computed at the time when the count is evaluated. As a side-effect of these changes, Mitchell's earlier repro case no longer causes a cycle and instead fails with a new error:
Error running plan: 1 error(s) occurred:
* module.nat.null_resource.nat: null_resource.nat: value of 'count' cannot be computed
(note that I replaced the AWS resources in his example with null_resource blocks just to make the repro faster)
This error, while annoying, is the currently-expected behavior for this situation since we can't currently allow computed values in count. The full solution for computed counts is #4149, which is still on our radar to implement in a future version of Terraform now that the graph refactoring has completed.
For now, the workaround is to comment out the resources whose counts are computed and apply to get the dependencies created, and then restore the commented resources and apply once more, after which the counts should be populated as expected. #4149 will eventually take care of this two-step process automatically without the need for manual workarounds, so I'm going to close this issue now (accepting the current, limited behavior) in anticipation of partial apply later making this work more smoothly.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
An update: this isn't quite ready to be fixed yet. We're heading in that direction with some major graph changes coming in for apply and destroy, but this requires major changes to
planwhich aren't on the docket for 0.8 yet. Still on my radar though.