Terraform: Dependency cycle errors with modules

Created on 22 Apr 2015 · 19Comments · Source: hashicorp/terraform

I have a build plan which is failing to apply, citing a cycle/dependency error.

With v0.4.2, I would get the cycle error with terraform plan. I built terraform from master as of Sunday (this topic originally started in #1475), and reviewing the plan will now succeed:

/src/terraform-consul-aws-asg/test  ᐅ   tfp           
Refreshing Terraform state prior to plan...
...

+ module.cleader-inbound-sg
    1 resource(s)
+ module.cleaders
    9 resource(s)
+ module.cleaders.cleader-sg
    1 resource(s)
+ module.cminion-inbound-sg
    1 resource(s)
+ module.cminions-a
    7 resource(s)
+ module.cminions-a.cminion-sg
    1 resource(s)

...but errors out when _applying_ the plan:

/src/terraform-consul-aws-asg/test  ᐅ   tfa           
Error applying plan:

1 error(s) occurred:

* Cycle: aws_subnet.c (destroy), aws_subnet.c, module.cleader-sg (expanded), aws_elb.main (destroy), aws_subnet.a (destroy), aws_subnet.a

The code is currently in a private repo, but I have requested @phinze have access. See here and here. The high-level bit is replicated here:

module "cleaders" {
    source = "../leaders"
    max_nodes = 5
    min_nodes = 3
    desired_capacity = 7
    key_name = "${var.key_name}"
    key_file = "${var.key_file}"
    ssh_pubkey = "${var.ssh_pubkey}"
    access_key = "${var.access_key}"
    secret_key = "${var.secret_key}"
    region = "${var.region}"
    consul_secret_key = "${var.consul_secret_key}"
    cidr_prefix_a = "${var.cidr_prefix_leader_a}"
    cidr_prefix_c = "${var.cidr_prefix_leader_c}"
    vpc_id = "${var.vpc_id}"
    route_table_id = "${var.route_table_id}"
    inbound_security_group = "${module.cleader-inbound-sg.id}"
}


module "cminions-a" {
    source = "../minions"
    max_nodes = 5
    min_nodes = 3
    desired_capacity = 7
    key_name = "${var.key_name}"
    key_file = "${var.key_file}"
    ssh_pubkey = "${var.ssh_pubkey}"
    access_key = "${var.access_key}"
    secret_key = "${var.secret_key}"
    region = "${var.region}"
    consul_secret_key = "${var.consul_secret_key}"
    cidr_minions_a = "${var.cidr_minions_a}"
    cidr_minions_c = "${var.cidr_minions_c}"
    vpc_id = "${var.vpc_id}"
    route_table_id = "${var.route_table_id}"
    leader_dns = "${module.cleaders.leader_dns}"
}

# allow minion to leader
module "cleader-inbound-sg" {
    source = "../leader_sg"
    name = "inbound-${var.name}"
    vpc_id = "${var.vpc_id}"
    region = "${var.region}"
    access_key = "${var.access_key}"
    secret_key = "${var.secret_key}"
    cidr_blocks = "${var.cidr_minions_a},${var.cidr_minions_c}"
}

# and allow leader to minion
module "cminion-inbound-sg" {
    source = "../agent_sg"
    name = "inbound-${var.name}"
    vpc_id = "${var.vpc_id}"
    region = "${var.region}"
    access_key = "${var.access_key}"
    secret_key = "${var.secret_key}"
    cidr_blocks = "${module.cleaders.subnet-a-cidr_block},${module.cleaders.subnet-a-cidr_block}"
}

I include this reference because, unless I am mistaken, it confirms there should be no cyclical dependency issue/error here. The cminions-a and cminion-inbound-sg modules depend on the cleaders module, which in turn depends on cleader-inbound-sg. This last module is simple and only uses variables (see the end of the code snippet above). Here is the core of the leader sg module:

# Security group to cover Consul Leaders
resource "aws_security_group" "main" {
    name = "consul-leaders-${var.name}-${var.region}"
    vpc_id = "${var.vpc_id}"
    description = "Allow TCP and UDP ports to consul leaders in ${var.name}"
    tags {
        Name = "consul-leader-${var.name}-${var.region}"
    }
    # Server RPC, used by servers to handle incoming requests from other agents. TCP only.
    ingress {
        from_port = 8300
        to_port = 8300
        protocol = "tcp"
        cidr_blocks = ["${split(",", var.cidr_blocks)}"]
    }
    ...
}

Nothing depends on cminion-inbound-sg or cminions-a. When trying to apply the plan (with Terraform built from master this past Sunday), the plan fails:

ᐅ       tfa
Error applying plan:

1 error(s) occurred:

* Cycle: aws_subnet.a (destroy), aws_subnet.a, aws_subnet.c (destroy), aws_subnet.c, module.cleader-sg (expanded), aws_elb.main (destroy)

If you have _0.4.2_ available, terraform plan should fail in the same way.

bug core

Source

ketzacoatl

Most helpful comment

@mitchellh FYI, experiencing this in 0.7.1 & 0.7.2. The code to reproduce is very similar to this issue, basically a top level module that does some CIDR math and generates an output map using null_data_source (admittedly a hack on my part).

lordnynex on 30 Aug 2016

👍3

All 19 comments

I've been seeing this as well. Additionally if I do a terraform plan -out=plan.out and then try to graph that terraform graph plan.out, the graph will error out with a cycle.

johnrengelman on 23 Apr 2015

👍1

Interesting. @johnrengelman, could you share any of the TF code that errors out? Maybe @mitchellh, @phinze or others can determine if this is the same issue or different.

ketzacoatl on 23 Apr 2015

I'll try and replicate it today.

johnrengelman on 23 Apr 2015

provider "aws" {
  region = "us-east-1"
}

provider "consul" {
  address = "consul:80"
  datacenter = "internal-east"
}

resource "consul_keys" "es_ami" {
  key {
    name = "ami_id"
    path = "amis/elasticsearch/1.4.4-11/us-east-1"
  }
}

resource "aws_launch_configuration" "es_conf" {
  lifecycle {
    create_before_destroy = true
  }
  image_id = "${consul_keys.es_ami.var.ami_id}"
  instance_type = "m3.medium"
  key_name = "peoplenet-load"
  root_block_device {
    volume_type = "gp2"
    volume_size = 8
  }

  associate_public_ip_address = true
}

resource "aws_autoscaling_group" "es_cluster" {
  availability_zones = ["us-east-1b"]
  name = "elasticsearch-cluster"
  max_size = 3
  min_size = 2
  desired_capacity = 3
  launch_configuration = "${aws_launch_configuration.es_conf.id}"
  health_check_type = "EC2"
  tag {
    key = "Name"
    value = "elastic-search"
    propagate_at_launch = true
  }
}

This will fail to graph after outputting the plan.

test git:(master) ✗ terraform plan -out plan.out
Refreshing Terraform state prior to plan...


The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed.

Your plan was also saved to the path below. Call the "apply" subcommand
with this plan file and Terraform will exactly execute this execution
plan.

Path: plan.out

+ aws_autoscaling_group.es_cluster
    availability_zones.#:               "" => "1"
    availability_zones.1305112097:      "" => "us-east-1b"
    default_cooldown:                   "" => "<computed>"
    desired_capacity:                   "" => "3"
    force_delete:                       "" => "<computed>"
    health_check_grace_period:          "" => "<computed>"
    health_check_type:                  "" => "EC2"
    launch_configuration:               "" => "${aws_launch_configuration.es_conf.id}"
    max_size:                           "" => "3"
    min_size:                           "" => "2"
    name:                               "" => "elasticsearch-cluster"
    tag.#:                              "" => "1"
    tag.1615826712.key:                 "" => "Name"
    tag.1615826712.propagate_at_launch: "" => "1"
    tag.1615826712.value:               "" => "elastic-search"
    termination_policies.#:             "" => "<computed>"
    vpc_zone_identifier.#:              "" => "<computed>"

+ aws_launch_configuration.es_conf
    associate_public_ip_address:               "" => "1"
    ebs_block_device.#:                        "" => "<computed>"
    ebs_optimized:                             "" => "<computed>"
    image_id:                                  "" => "${consul_keys.es_ami.var.ami_id}"
    instance_type:                             "" => "m3.medium"
    key_name:                                  "" => "peoplenet-load"
    name:                                      "" => "<computed>"
    root_block_device.#:                       "" => "1"
    root_block_device.0.delete_on_termination: "" => "1"
    root_block_device.0.iops:                  "" => "<computed>"
    root_block_device.0.volume_size:           "" => "8"
    root_block_device.0.volume_type:           "" => "gp2"

+ consul_keys.es_ami
    datacenter:             "" => "<computed>"
    key.#:                  "" => "1"
    key.3999287073.default: "" => ""
    key.3999287073.delete:  "" => "0"
    key.3999287073.name:    "" => "ami_id"
    key.3999287073.path:    "" => "amis/elasticsearch/1.4.4-11/us-east-1"
    key.3999287073.value:   "" => "<computed>"
    var.#:                  "" => "<computed>"


 test git:(master) ✗ terraform graph plan.out
Error creating graph: 1 error(s) occurred:

* Cycle: aws_launch_configuration.es_conf, aws_autoscaling_group.es_cluster, aws_launch_configuration.es_conf (destroy), consul_keys.es_ami (destroy), consul_keys.es_ami

However, if I add a create_before_destroy = true to the consul_keys resource, it works.
I know this is a different issue, but I would expect that if a plan can be generated, then I could graph it, so I can see the cycles.

johnrengelman on 23 Apr 2015

@johnrengelman, your message is cut off at the end. FWIW, @phinze noted we should remove create_before_destroy and test - see here on details.

ketzacoatl on 23 Apr 2015

@ketzacoatl yeah, i accidentally hit . Message updated.

johnrengelman on 23 Apr 2015

I would expect that if a plan can be generated, then I could graph it, so I can see the cycles.

Filed this officially as https://github.com/hashicorp/terraform/issues/1651 - stay tuned on that thread for a PR with my recent work here once I finish it up.

phinze on 23 Apr 2015

great, that should help us all figure out what is what here. I am in over my head atm on it and will need to wait for further guidance from the wizards of Terraform.

ketzacoatl on 23 Apr 2015

Working with @ketzacoatl I was able to determine that this is another expression of the same Terraform bug we've been kicking around across several issues - Terraform needs to change the way it treats modules in the execution graph such that modules are only a UX layer abstraction - the execution graph needs to be flattened out to a single layer to prevent these situations from being problematic.

Terraform generates "destroy nodes" for each of the resources, which it places at the logical place in the graph that destroys would happen for those nodes. But TF can't do this for the module, because it has an unknown number of resources inside of it.

The cycle happens as Terraform tries to figure out how to perform its operations in the proper order assuming a worst case destroy-then-create. If you think of the module as just a security group you'd want it to be:

Destroy ELB
Destroy SG
Destroy Subnets
Create Subnets
Create SG
Create ELB

But the problem is, from this view the graph only has "handle module" as an operation, giving us:

Destroy ELB
Handle SG Module
Destroy Subnets
Create Subnets
Handle SG Module
Create ELB

And hence the cycle.

To work around the issue in the meantime, I'd recommend structuring your config to avoid cross-module references. So instead of directly accessing an output of one module from inside another, set it up as in input parameter instead and wire everything together on the top level.

Tagging this as a core bug and leaving it open for now - I may try and consolidate the "stuff that will be fixed when we flatten the module graph" issues eventually. :ok_hand:

phinze on 25 Apr 2015

Wow, that is intense. It is also confusing to me that Terraform defaults to a _destroy-then-create_ action. Even in a worst-case scenario.. it is not always what the user would want, and it is not always necessary (given the circumstances imposed by the current state of the environment and the provider's APIs).

That aside.. @phinze, I am confused what you mean by _cross-module references_. If I understand your suggestion, I should use an output from one module as an input to another, is that right?

If so, I believe I am doing that with:

    cidr_blocks = "${module.cleaders.subnet-a-cidr_block},${module.cleaders.subnet-a-cidr_block}"

and

    inbound_security_group = "${module.cleader-inbound-sg.id}"

as examples from the code I pasted in this issue. Forgive me if I am not understanding correctly.

I do not intend to reference one module from inside another, except through the variable/output interfaces - do you see a place in my code where I am not doing this? I read the description of the problem/solution as removing the namespacing the modules do - I don't think that is the way to go, but maybe I do not understand what you are describing.

@mitchellh, maybe you have a few spare cycles you can send our way to confirm we are on the right track?

ketzacoatl on 25 Apr 2015

It is also confusing to me that Terraform defaults to a destroy-then-create action. Even in a worst-case scenario.. it is not always what the user would want, and it is not always necessary

It's true that in _many_ cases create_before_destroy would be a preferable default, but unfortunately a lot of the resources Terraform manages would generate errors (name conflicts, etc) if a clone of them were to appear, so we have to assume the worst case scenario by default as we structure the graph.

@phinze, I am confused what you mean by cross-module references. If I understand your suggestion, I should use an output from one module as an input to another, is that right?

Sorry about the unclear explanation here - in your example the references causing the cycle are from the ELB and the LC in the "leaders" module to the id output of the "leader-sg" module.

Here's a better way of describing my suggested workaround: avoid direct references from a resource block to a module. I think the easiest way to do this is by flattening out the module declarations at the top level. So in your case I'd try pulling the "leader-sg" module declaration out to "main.tf" and piping in the SG id as a variable to the "leaders" module instead of nesting them.

It's definitely not ideal, but it should help you avoid cycles until we get the module graph fixed up in core.

phinze on 25 Apr 2015

Ok, I understand now. I will shuffle around the SG modules so they are at the top level with the other modules.

In regards to create_before_destroy, I guess my confusion stems from the way Terraform will delete resources to recreate them, when I could have made the same changes through the AWS console without removing the resource. To work around TF wanting to rm instances on me, I end up making updates in the AWS console, update TF's resource definition, and then have TF sync its snapshot of the state.

EDIT: Thank you @phinze for ensuring I understand what is going on and how to work around the limitation for now!

ketzacoatl on 25 Apr 2015

Ah, now I remember why I did this to begin with.. I have two modules, leaders and minions, each of which creates the resources needed to spin up a cluster running consul as either a cluster of consul leaders or followers (which I refer to as _minions_). In each module we create a group of subnets for that cluster. We then need a default security group for the cluster, and that needs to allow traffic from these subnets. Thus.. I cannot move the SG module to the top level, it needs the CIDR block from the subnets created in the leader module, which would need the SG ID as input if I were to move the module.. so we end up with a circular dependency that route.

For now, I think my work around is to put the security group resources directly in the leader/minion modules, rather than using a module to the SG. I will figure it out and report back. I guess it makes sense to close this issue if Terraform will be addressing multiple issues like this with structural changes in core.

ketzacoatl on 25 Apr 2015

I've been able to work around the issue here by not calling a module from within a module. All modules include resources, no modules. This is less than ideal, but workable. @phinze, do you have plans for consolidating one or more of these issues?

ketzacoatl on 26 Apr 2015

+1 to perhaps making a meta issue, there are a fair amount of module "bugs / feature requests" floating about.

knuckolls on 26 Apr 2015

Fixed by #1781

mitchellh on 5 May 2015

lordnynex on 30 Aug 2016

👍3

I am still having this issue in 0.8.2, modules are all at the top level, using input parameter as suggested by @phinze , here is the main.tf where wires all the modules:

provider "aws" {
  region = "ap-southeast-2"
}

module "my_elb" {
  source = "../modules/elb"
  subnets = ["subnet-481d083f", "subnet-303cd454"]
  security_groups = ["sg-e8ac308c"]
}

module "my_lc" {
  source = "../modules/lc"
  subnets = ["subnet-481d083f", "subnet-303cd454"]
  security_groups = ["sg-e8ac308c"]
  snapshot_id = "snap-00d5e8ef70d1b3e24"
}

module "my_asg" {
  source = "../modules/asg"
  subnets = ["subnet-481d083f", "subnet-303cd454"]
  my_asg_name = "my_asg_${module.my_lc.my_lc_name}"
  my_lc_id = "${module.my_lc.my_lc_id}"
  my_elb_name = "${module.my_elb.my_elb_name}"
}

Use this, I can do initial apply, and destroy, but when I try to lc, it gives cycle error.

JakimLi on 9 Jan 2017

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.