Terraform: Graph cycle if count is computed in module

Created on 10 Jun 2015 · 13Comments · Source: hashicorp/terraform

I'm attempting to create a module to package up a common pattern: deploying a NAT instance into a subnet. I'd like to be able to do the following:

Define the NAT as a module that takes a comma separated list of subnet ids as an input variable.
Specify the subnets as a resource in a top-level terraform file.
Use the subnet multi-resource from (2) in an interpolation to construct the list of subnet ids that needs to be passed to the module.

Unfortunately, my attempts at doing so have been met with unexpected behavior in Terraform. I'm not sure if what I'm seeing is intended behavior or a bug. Repro cases are in the following repo:

https://github.com/mpage/tf_module_issues

This is against Terraform 0.5.3

bug core

Source

mpage

👍3

Most helpful comment

An update: this isn't quite ready to be fixed yet. We're heading in that direction with some major graph changes coming in for apply and destroy, but this requires major changes to plan which aren't on the docket for 0.8 yet. Still on my radar though.

mitchellh on 26 Oct 2016

👍3

All 13 comments

The _missing_attr_ error looks like that one described in #1997 (see example).

juls on 10 Jun 2015

The missing attribute should be fixed with a PR shortly.

The cycle I've gotten down to the following minimal case. It isn't valid Terraform, but it fails nonetheless. The cycle requires no state, even. A key thing seems to be the count. If I change count to 1 in the main module, the cycle goes away.

main.tf:

resource "aws_subnet" "pub_subnet" {
    count = "2"
}

module "nat" {
    source = "./nat"

    aws_subnet_id = "${join(",", aws_subnet.pub_subnet.*.id)}"
}

nat/main.tf

variable "aws_subnet_id" {
    description = "Subnet that will contain the nat instance"
}

resource "aws_instance" "nat" {
    count = "${length(split(",", var.aws_subnet_id))}"
}

Updates

Ah, a count of not "1" forces the destroy nodes to stick around, which causes the cycle.

mitchellh on 25 Jun 2015

Ah, so I've narrowed the cycle down to this. I've only pasted the relevant bit. Read the comment.

// GraphNodeDestroyEdgeInclude impl.
func (n *GraphNodeConfigVariable) DestroyEdgeInclude(v dag.Vertex) bool {
    // Only include this variable in a destroy edge if the source vertex
    // "v" has a count dependency on this variable.
    cv, ok := v.(GraphNodeCountDependent)
    if !ok {
        return false
    }

...

It requires some thinking to see why this is the case. In cases of real cycles like this I always ask myself "what would a human do [to break the cycle]?" In this case, you'd probably want the "destroy" NAT instance node to use a _cached_ value of var.aws_subnet_id if available. And if not, to assume there are 0.

This would require some pretty serious graph majiggery.

A workaround for now is to just make the count a separate variable that is more static.

mitchellh on 25 Jun 2015

I believe I am also hitting this issue while doing automation through bastion host. Here is the code snippet .

resource "aws_instance" "deployment-host-nodes" {
ami = "${var.source_ami}"
availability_zone = "${element(split(",",var.availability_zones),count.index)}"
instance_type = "${var.deployment_host_type}"
count = "${var.deployment_host_count}"
#vpc_security_group_ids = ["${aws_security_group.deployment-host.id}", "${aws_vpc.main.default_security_group_id}"]
vpc_security_group_ids = ["${aws_security_group.deployment-host.id}", "${aws_security_group.internal.id}"]

key_name = "${aws_key_pair.deployer.key_name}"

associate_public_ip_address=false
#disable_api_termination = true
#Create a list of subnets from , separated strings
subnet_id = "${element(aws_subnet.deployment_host._.id,count.index)}"
root_block_device {
delete_on_termination = true
}
tags {
Name = "${var.short_name}-deployment-host-${format("%02d", count.index+1)}"
sshUser = "${var.ssh_username}"
role = "deployment-host"
keyPath = "${var.ssh_private_key}"
dc = "${var.datacenter}"
}
provisioner "remote-exec" {
_depends_on = ["${element(aws_instance.bastion_host..id,count.index)}"]*
connection {
bastion_host = "${element(aws_instance.bastion_host..public_ip,count.index)}"
bastion_port = 22
bastion_user = "${var.ssh_username}"
bastion_key_file = "${var.ssh_private_key}"
user = "${var.ssh_username}"
host = "${self.public_ip}"
key_file = "${var.ssh_private_key}"
}
inline = [
"ls -l",
"sleep 1"
]
}
}

Although I am using depends_on explicitly , I get the following error :

terraform plan
Error configuring: 2 error(s) occurred:

aws_instance.deployment-host-nodes: missing dependency: aws_instance.bastion_host
aws_instance.deployment-host-nodes: missing dependency: aws_instance.bastion_host

Curious to know if it's the same issue or a new one.
Terraform version is the current master :
terraform --version
Terraform v0.6.7-dev (965e59843741f3de587bcfe61b588a304262c734+CHANGES)

saswatp on 22 Nov 2015

I believe I am seeing the same issue with EBS volumes and Instance Changes.

The EBS volumes cannot be destroyed because this causes a cycle error when the system flags a change to the instance, and a destroy of the EBS volume:

The change I made was adding count.index + 1, which causes an update to the instances:

EC2 MODULE:

variable "num_servers" {}

resource "aws_instance" "aws-ec2" {
  ami = "${var.ami}"
  count = "${var.num_servers}"
  tags {
    Name = "${format(var.instance_tag, count.index + 1)}"
  }
}


output "aws_instance_ids" {
  value = "${join(",", aws_instance.aws-ec2.*.id)}"
}

EBS MODULE:

variable "aws_instance_ids" {}

resource "aws_ebs_volume" "ebs-volume" {
  count = "${length(split(",", var.aws_instance_ids))}"
  availability_zone = "${var.availability}"
  size = 100
  tags {
    Name = "${format("ebs-volume-%d", count.index)}"
  }
}

resource "aws_volume_attachment" "ebs-att" {
  count = "${length(split(",", var.aws_instance_ids))}"
  device_name = "/dev/sdh"
  volume_id = "${element(aws_ebs_volume.ebs-volume.*.id, count.index)}"
  instance_id = "${element(split(",", var.aws_instance_ids), count.index)}"
  force_detach = true
}

Error:


* Cycle: module.ebs.aws_ebs_volume.ebs-volume (destroy), module.ebs.var.aws_instance_ids, module.ebs.aws_volume_attachment.ebs-att (destroy), module.ec2.aws_instance.aws-ec2 (destroy), module.ec2.aws_instance.aws-ec2, module.ec2.output.aws_instance_ids
Error configuring: 1 error(s) occurred:

gregorskii on 9 Mar 2016

Just adding that I've seen this as well - in my instance, in one module, I had a resource count that depended on two variables, one supplied statically and one provided via a an output from another module, which in turn was computed. In fact, my use case was almost exactly how @mpage laid out the original post.

I'm not too sure if this is something that's being addressed as part of #4961 but if count remains a directive where the variable chain needs to be static, then perhaps a specific error should trigger for this, so that people don't try and chase down cycles that really don't exist.

vancluever on 10 May 2016

👍2

I was bit by this issue while trying to separate a bunch of aws_ebs_volume resources into their own module. I'm glad this bug was filed; I don't think I would have figured out the answer myself.

In my case, the solution was to generate a count from the source data and pass it between modules using an input and an output. In this way, the count is dependent on a variable rather than a resource, avoiding a dependency loop.

cbarbour on 30 Aug 2016

👍2

Thanks @cbarbour, your solution worked for me also.

Here's my repro in case anyone is interested...

test.tf

provider "aws" {
  region  = "ap-southeast-2"
}

module "vpc" {
  source = "./vpc"
  cidrs = ["10.0.0.0/16", "10.1.0.0/16"]
}

module "igw" {
  source = "./igw"
  vpc_ids = "${module.vpc.vpc_ids}"
}

vpc/vpc.tf

variable "cidrs" { type = "list" }

resource "aws_vpc" "management" {
  count = "${length(var.cidrs)}"
  cidr_block = "${element(var.cidrs, count.index)}"
}

output "vpc_ids" {
  value = ["${aws_vpc.management.*.id}"]
}

igw/igw.tf

variable vpc_ids { type = "list" }

resource "aws_internet_gateway" "internet_gateway" {
  count = "${length(var.vpc_ids)}"
  vpc_id = "${element(var.vpc_ids, count.index)}"
}

If I replace:

  count = "${length(var.vpc_ids)}"

with

  count = 2

it works without an error.

Any chance we'll see this one squashed soon? 😄

Cheers
Fotis

fgimian on 12 Oct 2016

mitchellh on 26 Oct 2016

👍3

Thanks so much @mitchellh, can totally appreciate the complexity of this issue 😄

fgimian on 26 Oct 2016

Any update on the fix? I hit this bug today. I'm running 8.3.

magnusthorne on 11 Jan 2017

As of 0.9 the situation has changed a little here:

In 0.9 we finished the graph-builder refactoring and changed the rules for count so that resource attributes can be referenced as long as they aren't computed at the time when the count is evaluated. As a side-effect of these changes, Mitchell's earlier repro case no longer causes a cycle and instead fails with a new error:

Error running plan: 1 error(s) occurred:

* module.nat.null_resource.nat: null_resource.nat: value of 'count' cannot be computed

(note that I replaced the AWS resources in his example with null_resource blocks just to make the repro faster)

This error, while annoying, is the currently-expected behavior for this situation since we can't currently allow computed values in count. The full solution for computed counts is #4149, which is still on our radar to implement in a future version of Terraform now that the graph refactoring has completed.

For now, the workaround is to comment out the resources whose counts are computed and apply to get the dependencies created, and then restore the commented resources and apply once more, after which the counts should be populated as expected. #4149 will eventually take care of this two-step process automatically without the need for manual workarounds, so I'm going to close this issue now (accepting the current, limited behavior) in anticipation of partial apply later making this work more smoothly.

apparentlymart on 4 Apr 2017

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.